Prompt
"Explain how transformer attention mechanisms work in large language models."
Transformers use attention to weigh the importance of each word against all others in a sequence. The attention mechanism computes Query, Key, and Value matrices from embeddings. The dot product of Q and K determines attention scores, which are softmax-normalized to produce weights applied to V.
Think of attention like a spotlight: every word in a sentence "looks at" every other word and decides how much to focus on it. Technically, the model creates three matrices — Query, Key, and Value — and computes weighted scores so contextually relevant tokens carry more influence, enabling the model to capture long-range dependencies.
Evaluator Verdict
Response B wins. Response B maintains technical depth while using an accessible analogy. It achieves higher clarity and engagement without sacrificing accuracy — the hallmark of a well-calibrated explanation.