跳转至

个人知识库

Attention 注意力

simonwei97/knowledgebase

Attention 注意力

MHA¶

MHA（Multi-Head Attention），也就是多头注意力。

Attention Is All You Need

MQA¶

MQA，即“Multi-Query Attention”，是减少KV Cache的一次非常朴素的尝试。

Fast Transformer Decoding: One Write-Head is All You Need

https://kexue.fm/archives/4765

GQA¶

GQA（Grouped-Query Attention）

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

MLA¶

MLA（Multi-head Latent Attention）

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model