TL;DR

I am reading Build a Large Language Model from Scratch and, to deepen my understanding of what I read, I am writing blog posts about the questions that accompany the book. This post covers the second question from Chapter 1.

Primary Function

What is the primary function of a large language model (LLM)?

The primary function of LLM is to understand, generate and respond to human-like text.

I asked ChatGpt this prompt:

How can llm understand, generate and respond to human-like text?

1️⃣ What does “understand” actually mean for an LLM?

An LLM does not understand in a human sense.

Instead, it:

Learns relationships between words, phrases, and concepts
Encodes those relationships as numbers (weights)
Uses them to make accurate predictions

So when we say “understand,” we really mean:

It can model language well enough to behave as if it understands.

2️⃣ Training objective: next-token prediction

During training, the model repeatedly sees text like:

"The capital of France is"

And learns that the most likely next token is:

"Paris"

Formally, it learns:

P(next_token | previous_tokens)

This is the only objective.

3️⃣ Why next-token prediction is enough

Language contains structure:

Grammar
Facts
Logic
Tone
Style

To predict the next token correctly, the model must implicitly learn all of these.

Example:

"If it rains, the ground will be"

Correct prediction requires:

Grammar
World knowledge
Cause and effect

I hate grammar in any language. An LLM doesn’t need to hate it — it understands grammar using statistics and pattern recognition.

4️⃣ How text becomes something a model can work with

Step 1: Tokenization

Text is split into tokens (words or word pieces).

"understanding language" → ["understand", "ing", " language"]

Step 2: Embeddings

Each token is converted into a vector of numbers.

Similar words → similar vectors

Relationships emerge automatically

Example:

king − man + woman ≈ queen

This is learned, not programmed.

5️⃣ How context is handled: attention

Transformers (the architecture behind LLMs) use self-attention. Attention is all you need is the paper that gave a birth LLM.

Self-attention allows the model to:

Look at all previous tokens
Decide which ones matter most
Weigh them dynamically

Example:

"The trophy doesn't fit in the suitcase because it is too big."

“It” refers to trophy, not suitcase.

Attention learns this.

6️⃣ How generation works

At inference time:

You provide a prompt
The model predicts the next token
That token is appended
The process repeats

Prompt → predict → append → predict → append → ...

This creates fluent text.

7️⃣ Why responses feel “intentional”

Because the training data contains:

Questions and answers
Dialogues
Explanations
Instructions

The model learns patterns like:

Question → explanation
Problem → solution
Greeting → greeting

So when you ask a question, the most likely continuation looks like an answer.

8️⃣ Important mental shift (very important)

LLMs do not generate text because they “know things.”
They generate text because that is the most statistically likely continuation.

Understanding is an emergent property, not an explicit goal.

Conclusion

With this prompt and ChatGPT’s answers, we dipped our toes into the famous LLM Transformer architecture (no pictures, just words). This will help us better understand Chapter 2. Before that, however, we still have a few questions related to Chapter 1.

Attention is all you need