Attention 注意力
MHA¶
MHA(Multi-Head Attention),也就是多头注意力。
MQA¶
MQA,即“Multi-Query Attention”,是减少KV Cache的一次非常朴素的尝试。
Fast Transformer Decoding: One Write-Head is All You Need
https://kexue.fm/archives/4765
GQA¶
GQA(Grouped-Query Attention)
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
MLA¶
MLA(Multi-head Latent Attention)
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model