跳转至

Attention 注意力

MHA

MHA(Multi-Head Attention),也就是多头注意力。

Attention Is All You Need

MQA

MQA,即“Multi-Query Attention”,是减少KV Cache的一次非常朴素的尝试。

Fast Transformer Decoding: One Write-Head is All You Need

https://kexue.fm/archives/4765

GQA

GQA(Grouped-Query Attention)

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

MLA

MLA(Multi-head Latent Attention)

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model