The transformer was introduced in 2017 and replaced older architectures such as RNNs because it processes text in parallel rather than sequentially. Its core is the self-attention mechanism, which assesses how strongly words relate to one another in context. This lets models capture long-range relationships and train efficiently on large amounts of data. Practically all well-known LLMs such as GPT, Claude, and Gemini build on this architecture.
Transformer
The transformer is the neural network architecture on which nearly all modern language models are based. Its attention mechanism lets it capture relationships across the entire text at once.
