TL;DR
I am reading Build a Large Language Model from Scratch and, to deepen my understanding of what I read, I am writing blog posts about the questions that accompany the book. This post covers the second question from Chapter 1.
Primary Function
What is the primary function of a large language model (LLM)?
The primary function of LLM is to understand, generate and respond to human-like text.
I asked ChatGpt this prompt:
How can llm understand, generate and respond to human-like text?
1️⃣ What does “understand” actually mean for an LLM?
An LLM does not understand in a human sense.
Instead, it:
- Learns relationships between words, phrases, and concepts
- Encodes those relationships as numbers (weights)
- Uses them to make accurate predictions
So when we say “understand,” we really mean:
It can model language well enough to behave as if it understands.
2️⃣ Training objective: next-token prediction
During training, the model repeatedly sees text like:
"The capital of France is"
And learns that the most likely next token is:
"Paris"
Formally, it learns:
P(next_token | previous_tokens)
This is the only objective.
3️⃣ Why next-token prediction is enough
Language contains structure:
- Grammar
- Facts
- Logic
- Tone
- Style
To predict the next token correctly, the model must implicitly learn all of these.
Example:
"If it rains, the ground will be"
Correct prediction requires:
- Grammar
- World knowledge
- Cause and effect
I hate grammar in any language. An LLM doesn’t need to hate it — it understands grammar using statistics and pattern recognition.
4️⃣ How text becomes something a model can work with
Step 1: Tokenization
Text is split into tokens (words or word pieces).
"understanding language" → ["understand", "ing", " language"]
Step 2: Embeddings
Each token is converted into a vector of numbers.
Similar words → similar vectors
Relationships emerge automatically
Example:
king − man + woman ≈ queen
This is learned, not programmed.
5️⃣ How context is handled: attention
Transformers (the architecture behind LLMs) use self-attention. Attention is all you need is the paper that gave a birth LLM.
Self-attention allows the model to:
- Look at all previous tokens
- Decide which ones matter most
- Weigh them dynamically
Example: "The trophy doesn't fit in the suitcase because it is too big."
“It” refers to trophy, not suitcase.
Attention learns this.
6️⃣ How generation works
At inference time:
- You provide a prompt
- The model predicts the next token
- That token is appended
- The process repeats
Prompt → predict → append → predict → append → ...
This creates fluent text.
7️⃣ Why responses feel “intentional”
Because the training data contains:
- Questions and answers
- Dialogues
- Explanations
- Instructions
The model learns patterns like:
- Question → explanation
- Problem → solution
- Greeting → greeting
So when you ask a question, the most likely continuation looks like an answer.
8️⃣ Important mental shift (very important)
LLMs do not generate text because they “know things.”
They generate text because that is the most statistically likely continuation.
Understanding is an emergent property, not an explicit goal.
Conclusion
With this prompt and ChatGPT’s answers, we dipped our toes into the famous LLM Transformer architecture (no pictures, just words). This will help us better understand Chapter 2. Before that, however, we still have a few questions related to Chapter 1.