📖 beginner ⏱️ 90 min

Bloque I — Fundamentos Block I — Foundations

Objetivos de aprendizaje

Explicar qué es un LLM en términos de tokens y autoregresión.
Distinguir LLM, VLM, embedding model, diffusion, etc.
Decidir entre API cloud vs modelo local con criterios objetivos.
Conocer el panorama de modelos frontier 2026 (US + China + EU).

Learning objectives

Explain what an LLM is in terms of tokens and autoregression.
Distinguish LLM, VLM, embedding model, diffusion, etc.
Choose between cloud API and local model with objective criteria.
Know the 2026 frontier model landscape (US + China + EU).

1.0 ¿Cómo funciona realmente un LLM? 1.0 How does an LLM really work?

Intuición técnica suficiente para tomar decisiones informadas, sin matemáticas formales. Enough technical intuition for informed decisions, without formal math.

Antes de entrar a prompt engineering, conviene entender la maquinaria. No para derivar las ecuaciones — para eso lee Vaswani 2017 — sino para que las decisiones de diseño que tomas en producción tengan base mecánica, no superstición.

De caracteres a tokens

Un LLM no procesa letras ni palabras: procesa tokens, sub-palabras estadísticamente útiles aprendidas durante el entrenamiento. El tokenizador (BPE byte-level en Claude/GPT/Llama, SentencePiece en Gemma/Qwen) descompone el texto en piezas que el modelo conoce.

"prompt engineering" → ["prompt", " engineering"]    (2 tokens)
"prompt engineering."→ ["prompt", " engineering", "."] (3 tokens)
"propmt engineering" → ["pro", "p", "mt", " engineering"] (4 tokens, typo más caro)
"反对prompt engineering" → ["反", "对", "prompt", " engineering"] (4 tokens, CJK 1 char/token)
"🚀 prompt"          → ["🚀", " prompt"]           (5 bytes encoded, ~3 tokens)

Implicación práctica: idiomas con alfabeto latino son ~4 chars/token, código ~2.5 chars/token, CJK 1.5-2 chars/token, emojis y caracteres raros pueden costar 3-5 tokens cada uno. Tu factura paga el coste real, no el coste "intuitivo".

Embedding: del token al vector

Cada token se convierte en un vector denso de dimensión d (típicamente 4096 en modelos 7-13B, 8192-12288 en modelos 70-405B). Ese vector codifica el "significado" del token aprendido durante el pre-training: tokens con uso semántico similar (rey/reina, ejecutar/correr) acaban cerca en ese espacio vectorial.

Tu prompt completo se convierte en una matriz de (longitud_tokens × d) floats antes de que pase nada más. Ese tensor es lo que el transformer manipula.

El transformer: atención + FFN repetidos N veces

Un transformer decoder (la arquitectura de TODOS los LLMs modernos, 2017-2026) repite el mismo bloque N veces (32 capas en un 7B, 80+ en un 70B). Cada bloque hace dos cosas:

1. Self-attention

Para cada token, calcula a qué OTROS tokens del contexto debería "atender" para producir su siguiente representación. Es esencialmente una suma ponderada: "este token toma el 30% de contexto del token 5, 25% del token 12, 10% del 47...". Los pesos se calculan con tres matrices aprendidas (Q, K, V).

Por qué importa: es lo que permite que un modelo "razone" sobre relaciones entre conceptos lejanos en el texto. Es también lo que cuesta caro: el coste computacional crece O(n²) con la longitud de contexto. Un prompt de 2× tokens cuesta 4× compute en attention. Por eso los modelos con 1M+ contexto (Gemini, Kimi) usan tricks como sliding window attention o sparse attention.

2. Feed-forward network (FFN / MLP)

Una red neuronal pequeña (típicamente 4× más ancha que el embedding) que se aplica a cada token independientemente. Aquí es donde se almacena la mayoría del "conocimiento" del modelo — los hechos, las asociaciones, los patrones aprendidos de internet.

Por qué importa: es por qué los modelos pueden completar "La capital de Francia es ___" sin haber visto exactamente esa frase: el FFN tiene aprendido que "Francia" + "capital" → distribución que pone "París" en top-1.

Repite ambas operaciones N veces. Tras la última capa, una proyección final convierte el vector del último token en una distribución de probabilidad sobre el vocabulario entero (típicamente 32K-256K tokens). El sampler elige el siguiente token según esa distribución.

Las tres fases del entrenamiento

Fase	Objetivo	Datos	Coste	Qué se aprende
Pre-training	Predecir siguiente token	10-30 trillones de tokens (web, libros, código)	$10M-$500M+	Gramática, hechos, razonamiento básico, patrones
Post-training (SFT)	Seguir instrucciones	~1M conversaciones humano-asistente	$10K-$1M	Cómo responder a preguntas, formato útil
RLHF / DPO / Constitutional AI	Alineación con preferencias	~100K-1M juicios humanos sobre pares de respuestas	$100K-$10M	Ser útil, honesto, inofensivo

💡 Por qué importa para prompt engineering: el pre-training enseña qué sabe el modelo. El post-training enseña cómo responde. Si pides al modelo algo que no está en su pre-training, no hay prompt que lo arregle — necesitas RAG o fine-tuning. Si lo que pides está en pre-training pero el modelo no lo accede bien, prompt engineering puede sacarlo. Distinguir entre los dos casos es la habilidad clave del profesional 2026.

Sampling: del logits al token

Tras la última capa, el modelo emite un vector de logits (uno por cada token del vocabulario). El sampler los convierte en una decisión:

Softmax con temperature: divide logits por T y aplica softmax → probabilidades.
Filtrado top_p / top_k: recorta tokens improbables.
Muestreo: elige aleatoriamente según las probabilidades restantes (o el más probable si T=0).

Esa elección del siguiente token se concatena al input y vuelta a empezar. Por eso es autoregresivo: cada nueva predicción condiciona la siguiente.

Thinking / reasoning tokens (2024+)

Los modelos modernos (Claude Opus, GPT-5, Gemini 2.5, DeepSeek R2) tienen una fase EXTRA antes de la respuesta visible: generan tokens de "razonamiento interno" que el usuario no ve. Esos tokens son texto normal generado autoregresivamente — el modelo se está hablando a sí mismo, planificando, verificando.

Mecánicamente no es magia: es el mismo transformer generando tokens, solo que están envueltos en tags especiales (<thinking>…</thinking> en Claude) que el SDK filtra antes de devolver al cliente.

Por qué funciona: el modelo escribe su razonamiento intermedio, lo "lee" en attention en pasos siguientes, y eso le permite razonar multi-step de forma robusta. Es Chain-of-Thought hecho explícito y entrenado.

Coste: esos tokens internos se facturan. Con effort=high un Claude Opus puede consumir 10-30K tokens de thinking por respuesta. Activar reasoning sube calidad 5-30% en problemas complejos a cambio de 5-50× latencia y coste.

Lo que NO es un LLM

Para tomar buenas decisiones, conviene saber qué un LLM no es:

No es una base de datos. No "consulta" información — la regenera. Por eso alucina: si la red no tiene un patrón claro para tu pregunta, inventa uno plausible.
No tiene estado entre llamadas. Cada API call es independiente. La "memoria" en un chatbot es el historial que tú reenvías cada vez.
No razona simbólicamente. Cuando "razona", lo hace generando texto que parece razonamiento. A veces falla en aritmética básica que cualquier calculadora resuelve trivialmente.
No tiene intuición espacial nativa. Trabaja con secuencias, no con imágenes mentales. Cuando un VLM "ve" una imagen, la convierte en tokens visuales y los procesa con los mismos transformers.
No es determinista por defecto. Mismo input → distinto output salvo que fijes temperature=0 + seed + misma versión del modelo.
No "entiende" en el sentido fuerte. Es un simulador estadístico extremadamente sofisticado. Si eso es "entender" o no, es debate filosófico abierto. En la práctica: trátalo como herramienta probabilística, no como persona.

Before diving into prompt engineering, it pays to understand the machinery. Not to derive the equations — read Vaswani 2017 for that — but so that the design decisions you make in production are mechanically grounded, not superstition.

From characters to tokens

An LLM doesn't process letters or words: it processes tokens, statistically useful sub-words learned during training. The tokenizer (byte-level BPE in Claude/GPT/Llama, SentencePiece in Gemma/Qwen) breaks text into pieces the model knows.

"prompt engineering" → ["prompt", " engineering"]    (2 tokens)
"prompt engineering."→ ["prompt", " engineering", "."] (3 tokens)
"propmt engineering" → ["pro", "p", "mt", " engineering"] (4 tokens, typo costs more)
"反对prompt engineering" → ["反", "对", "prompt", " engineering"] (4 tokens, CJK 1 char/token)
"🚀 prompt"          → ["🚀", " prompt"]           (5 encoded bytes, ~3 tokens)

Practical implication: Latin-alphabet languages run ~4 chars/token, code ~2.5 chars/token, CJK 1.5-2 chars/token, emojis and rare characters can cost 3-5 tokens each. Your bill pays the real cost, not the "intuitive" cost.

Embedding: from token to vector

Each token becomes a dense vector of dimension d (typically 4096 in 7-13B models, 8192-12288 in 70-405B models). That vector encodes the token's "meaning" learned during pre-training: tokens with similar semantic use (king/queen, run/jog) end up close in that vector space.

Your full prompt becomes a (token_count × d) matrix of floats before anything else happens. That tensor is what the transformer manipulates.

The transformer: attention + FFN repeated N times

A decoder transformer (the architecture behind EVERY modern LLM, 2017-2026) repeats the same block N times (32 layers in a 7B, 80+ in a 70B). Each block does two things:

1. Self-attention

For each token, it computes which OTHER tokens in the context it should "attend" to in producing its next representation. It's essentially a weighted sum: "this token takes 30% context from token 5, 25% from token 12, 10% from token 47...". The weights are computed with three learned matrices (Q, K, V).

Why it matters: this is what lets a model "reason" about relationships between distant concepts in the text. It's also what costs a lot: compute scales O(n²) with context length. A 2× longer prompt costs 4× attention compute. That's why 1M+ context models (Gemini, Kimi) use tricks like sliding window or sparse attention.

2. Feed-forward network (FFN / MLP)

A small neural network (typically 4× wider than the embedding) applied to each token independently. This is where most of the model's "knowledge" is stored — facts, associations, patterns learned from the internet.

Why it matters: it's why models can complete "The capital of France is ___" without having seen exactly that sentence: the FFN learned that "France" + "capital" → distribution that puts "Paris" at top-1.

Repeat both operations N times. After the last layer, a final projection turns the last token's vector into a probability distribution over the entire vocabulary (typically 32K-256K tokens). The sampler picks the next token according to that distribution.

The three training phases

Phase	Goal	Data	Cost	What's learned
Pre-training	Predict next token	10-30 trillion tokens (web, books, code)	$10M-$500M+	Grammar, facts, basic reasoning, patterns
Post-training (SFT)	Follow instructions	~1M human-assistant conversations	$10K-$1M	How to answer questions, useful format
RLHF / DPO / Constitutional AI	Preference alignment	~100K-1M human judgements over response pairs	$100K-$10M	Be helpful, honest, harmless

💡 Why this matters for prompt engineering: pre-training teaches what the model knows. Post-training teaches how it answers. If you ask the model something that's not in pre-training, no prompt fixes it — you need RAG or fine-tuning. If what you ask IS in pre-training but the model doesn't access it well, prompt engineering can extract it. Distinguishing those two cases is the key 2026 skill.

Sampling: from logits to token

After the last layer, the model emits a logits vector (one entry per vocabulary token). The sampler turns those into a decision:

Softmax with temperature: divides logits by T and applies softmax → probabilities.
Top_p / top_k filtering: trims unlikely tokens.
Sampling: draws randomly per the remaining probabilities (or picks the most likely if T=0).

That next-token choice is concatenated to the input and the loop restarts. That's why it's autoregressive: each new prediction conditions the next.

Thinking / reasoning tokens (2024+)

Modern models (Claude Opus, GPT-5, Gemini 2.5, DeepSeek R2) have an EXTRA phase before the visible answer: they generate "internal reasoning" tokens the user doesn't see. Those tokens are normal text generated autoregressively — the model is talking to itself, planning, verifying.

Mechanically it's not magic: same transformer generating tokens, just wrapped in special tags (<thinking>…</thinking> in Claude) that the SDK strips before returning to the client.

Why it works: the model writes its intermediate reasoning, "reads" it via attention in subsequent steps, and that lets it reason multi-step robustly. It's explicit, trained Chain-of-Thought.

Cost: those internal tokens are billed. With effort=high a Claude Opus can burn 10-30K thinking tokens per answer. Enabling reasoning lifts quality 5-30% on hard problems in exchange for 5-50× latency and cost.

What an LLM is NOT

To make good decisions, it's worth knowing what an LLM isn't:

It's not a database. It doesn't "look up" information — it regenerates it. That's why it hallucinates: if the network has no clear pattern for your question, it invents a plausible one.
It has no state between calls. Each API call is independent. The "memory" in a chatbot is the history you re-send every time.
It doesn't reason symbolically. When it "reasons", it does so by generating text that looks like reasoning. It sometimes fails on basic arithmetic that any calculator solves trivially.
It has no native spatial intuition. It works on sequences, not mental images. When a VLM "sees" an image, it converts it to visual tokens and processes them with the same transformers.
It's not deterministic by default. Same input → different output unless you fix temperature=0 + seed + same model version.
It doesn't "understand" in the strong sense. It's an extremely sophisticated statistical simulator. Whether that's "understanding" is open philosophical debate. In practice: treat it as a probabilistic tool, not a person.

1.1 ¿Qué es un LLM?

Large Language Models — El cerebro detrás de la IA generativa

🔬 ¿Cómo funciona realmente?

Un LLM recibe una secuencia de tokens (sub-palabras) y predice el siguiente token más probable. Genera texto de forma autoregresiva, token por token. No "piensa" como un humano — es un simulador estadístico entrenado con cantidades masivas de texto.

En 2026, los modelos han evolucionado para incorporar razonamiento interno (thinking tokens), uso de herramientas, y visión.

Modelos Frontier (2026)

El curso cubre el ecosistema completo: cloud propietario (US + China), open-weights y modelos locales. Precios orientativos a inicios de 2026, varían por región y volumen.

Modelo	Empresa	Contexto	Precio (1M tok in)	Fortaleza
Claude Opus 4.7	Anthropic 🇺🇸	200K	$15	Razonamiento profundo, agentes
GPT-5.5	OpenAI 🇺🇸	256K	$3.75	Generalista, ecosistema
Gemini 2.5 Pro	Google 🇺🇸	1M+	$1.25	Contexto masivo, multimodal
DeepSeek V4 Pro	DeepSeek 🇨🇳	128K	~$0.50	30× más barato, coding veloz
Kimi K2	Moonshot 🇨🇳	2M	~$0.60	Contexto enorme, agentes nativos
MiniMax M2	MiniMax 🇨🇳	1M	~$0.30	Multimodal eficiente, vídeo
Qwen3-Max	Alibaba 🇨🇳	262K	~$0.80	Multilingüe, open-weights primos
GLM-5	Zhipu 🇨🇳	128K	~$0.40	Agentes web, coding, function-calling
Grok 4	xAI 🇺🇸	256K	$3	Razonamiento, datos en tiempo real (X)
Mistral Large 3	Mistral 🇪🇺	256K	$2	Eficiencia europea, multilingüe

Modelos Open-Weights / Locales (2026)

Familia	Tamaños	Empresa	Notas
Llama 4	8B / 70B / 405B	Meta	Open weights, multimodal nativo
Qwen 3	0.5B / 7B / 14B / 32B / 72B	Alibaba	Apache 2.0, top en multilingüe + coding
Gemma 3	1B / 4B / 12B / 27B	Google	On-device first, eficiente, <1.5 GB Q4
Mistral / Codestral	7B / 22B / Large	Mistral	Apache 2.0, fuerte en coding
DeepSeek V3 / Coder V4	16B-MoE / 671B-MoE	DeepSeek	MIT, MoE eficiente, coding SOTA OSS
Phi-4	3.8B / 14B	Microsoft	Pequeño + razonamiento, edge devices
Yi-Lightning / Yi-2	9B / 34B	01.AI	Multilingüe ZH/EN, contexto largo
Command R+	104B	Cohere	RAG nativo, function calling robusto

Parámetros Clave de Configuración

Cada API usa un nombre ligeramente distinto, pero los conceptos son universales. Esta es la lista práctica de parámetros que tocas en el 95% de los casos.

🌡️ temperature — aleatoriedad del muestreo

Controla cuánta aleatoriedad introduce el sampler al elegir el siguiente token. Internamente divide los logits por T antes del softmax: T baja agudiza la distribución (el token top siempre gana), T alta la aplana (todos los tokens viables tienen oportunidad).

Rango típico: 0.0 – 2.0 (la mayoría de APIs limitan a 2.0; valores >1.5 producen alucinación e incoherencia).

Cuándo usar qué:

0.0 — Determinista. Mismo input → mismo output. Code generation, extracción de datos, clasificación, evals reproducibles.
0.2 – 0.4 — Casi determinista con un poco de variedad. Refactoring, traducción, summarization fiel.
0.5 – 0.7 — Equilibrio. Conversación general, asistente, tutorial. Es el default sensato.
0.8 – 1.0 — Creativo. Brainstorming, copywriting, ficción, generación de variantes.
1.0 – 1.5 — Muy aleatorio. Solo casos donde quieres explorar el espacio de salida (p.ej. self-consistency CoT).

🎯 top_p — nucleus sampling

Recorta la cola de la distribución: en cada paso solo se consideran los tokens cuyas probabilidades acumuladas suman p. Si top_p=0.9, descartamos el 10% menos probable de cada decisión. Es complementario a temperature — Anthropic recomienda tocar uno u otro, no ambos a la vez.

Rango típico: 0.7 – 1.0. Por debajo de 0.5 el modelo se vuelve repetitivo.

Defaults sensatos: top_p=1.0 y subir temperature, o temperature=1.0 y bajar top_p.

🔝 top_k — corte por número de candidatos

Solo considera los k tokens más probables en cada paso. Más burdo que top_p (no se adapta a la forma de la distribución). Útil cuando quieres garantizar diversidad acotada. Algunos APIs (Google, DeepSeek) lo exponen; Anthropic no.

Rango: 1 – 50. k=1 es greedy (equivalente a temperature=0).

📏 max_tokens — tope de salida

Número máximo de tokens que el modelo puede generar antes de detenerse forzosamente. Es un cap duro, no una sugerencia. Si el modelo iba a producir más, su respuesta se trunca abruptamente (a menudo dejando JSON malformado).

Reglas prácticas:

Respuesta corta esperada: 256 – 1024.
Respuesta media (artículo, función completa): 2048 – 4096.
Respuesta larga (código grande, ensayo): 8192 – 16384.
Tareas con thinking activo: subir a 32k – 64k (los thinking tokens cuentan).

Cap por modelo: Claude 4.x 64k, GPT-5 16k, Gemini 2.5 Pro 64k, DeepSeek 8k.

🛑 stop_sequences / stop — paradas explícitas

Lista de strings que, si aparecen en la salida, hacen que el modelo se detenga inmediatamente. Útil cuando estás haciendo few-shot con un delimitador propio (###, END, </answer>) y no quieres que el modelo continúe alucinando un siguiente ejemplo.

Hasta 4 secuencias en la mayoría de APIs. La secuencia consumida NO aparece en la respuesta final.

🔁 frequency_penalty y presence_penalty (OpenAI / DeepSeek)

Reducen la probabilidad de que el modelo repita tokens.

frequency_penalty — penaliza tokens proporcionalmente a cuántas veces ya han aparecido (más apariciones, más penalty). Combate bucles literales.
presence_penalty — penaliza cualquier token que ya haya aparecido al menos una vez (penalty plana). Empuja a explorar vocabulario nuevo.

Rango: -2.0 – 2.0. Default 0.0. Subir a 0.3 – 0.7 si ves repeticiones; pasar de 1.0 introduce ruido. Anthropic no los expone (Claude tiene anti-repetición interna).

🎲 seed — reproducibilidad

Fija el RNG del sampler. Mismo seed + mismos parámetros + misma versión del modelo → misma salida (con asterisco: la mayoría de proveedores no garantizan determinismo bit-exact entre versiones de infraestructura).

Crítico para evals automatizados, debugging de prompts y reproducibilidad de papers. OpenAI y DeepSeek lo exponen; Anthropic no.

🧠 thinking / reasoning_effort — razonamiento interno

Activa los tokens de razonamiento internos antes de la respuesta visible. El modelo razona "para sí mismo" durante segundos o minutos, y luego entrega la respuesta final más sólida. Estos tokens se facturan.

API	Parámetro	Valores
Claude	`thinking={"type":"adaptive"}` + `output_config.effort`	`low / medium / high / xhigh / max`
OpenAI (o-series, GPT-5)	`reasoning_effort`	`minimal / low / medium / high`
Gemini 2.5	`thinking_config.thinking_budget`	`0 – 24576` (tokens) o `-1` (auto)
DeepSeek R / V4	`thinking={"type":"enabled"}`	`on / off`
Qwen 3	`enable_thinking`	`true / false`

Activar thinking sube calidad 5-30% en tareas multi-step (matemáticas, debugging, planificación) pero multiplica latencia 5-50× y coste por la misma proporción.

📦 response_format — formato estructurado

Fuerza al modelo a emitir output que cumpla un esquema determinado.

{"type": "text"} — Default. Texto libre.
{"type": "json_object"} — Garantiza JSON válido en la salida (OpenAI, DeepSeek, Mistral, GLM). Formato libre.
{"type": "json_schema", "schema": {...}} — JSON estricto que cumple el schema (OpenAI, Gemini "controlled generation"). El modelo NO puede desviarse.
Anthropic no expone response_format directamente; usa tool_use con input_schema para el mismo efecto.

🛠️ tools y tool_choice — function calling

Lista de herramientas que el modelo puede invocar (definidas con nombre, descripción y schema de parámetros).

tool_choice="auto" — El modelo decide si llama a alguna o responde directamente (default).
tool_choice="required" / "any" — Forzar llamada a alguna tool.
tool_choice={"type":"tool","name":"X"} — Forzar tool específica.
tool_choice="none" — Prohibir tools, responder con texto solo.

💾 cache_control — prompt caching (Anthropic)

Marca bloques del prompt como cacheables. La primera llamada paga 1.25× el precio normal de input para escribirlos en caché; las siguientes pagan 0.10× para leerlos. TTL típico 5 min (extensible a 1 h con "ttl": "1h"). Critical para system prompts largos y RAG con docs repetidos.

🌐 system / mensaje system — instrucciones globales

El system prompt tiene prioridad máxima sobre los mensajes del usuario. Es donde defines rol, reglas, formato esperado, restricciones de seguridad. NO va dentro de messages[] en Anthropic (es un parámetro top-level system); en OpenAI/DeepSeek va como primer mensaje con role: "system".

⚠️ Defaults son tu enemigo. Si dejas todos los parámetros sin especificar, cada modelo elige unos diferentes (Claude usa temperature=1.0, OpenAI 1.0, DeepSeek 1.0, Gemini 1.0 pero con top_p=0.95 y top_k=64). Para producción, especifica explícitamente temperature, max_tokens y al menos uno de top_p/top_k.

🔬 How does it actually work?

An LLM receives a sequence of tokens (sub-words) and predicts the next most likely token. It generates text autoregressively, token by token. It does not "think" like a human — it's a statistical simulator trained on massive amounts of text.

By 2026, models have evolved to incorporate internal reasoning (thinking tokens), tool use, and vision.

Frontier Models (2026)

The course covers the full ecosystem: proprietary cloud (US + China), open-weights, and local models. Prices indicative as of early 2026 and vary by region and volume.

Model	Company	Context	Price (1M tok in)	Strength
Claude Opus 4.7	Anthropic 🇺🇸	200K	$15	Deep reasoning, agents
GPT-5.5	OpenAI 🇺🇸	256K	$3.75	Generalist, ecosystem
Gemini 2.5 Pro	Google 🇺🇸	1M+	$1.25	Massive context, multimodal
DeepSeek V4 Pro	DeepSeek 🇨🇳	128K	~$0.50	30× cheaper, fast coding
Kimi K2	Moonshot 🇨🇳	2M	~$0.60	Massive context, native agents
MiniMax M2	MiniMax 🇨🇳	1M	~$0.30	Efficient multimodal, video
Qwen3-Max	Alibaba 🇨🇳	262K	~$0.80	Multilingual, open-weights cousins
GLM-5	Zhipu 🇨🇳	128K	~$0.40	Web agents, coding, function-calling
Grok 4	xAI 🇺🇸	256K	$3	Reasoning, real-time data (X)
Mistral Large 3	Mistral 🇪🇺	256K	$2	European efficiency, multilingual

Open-Weights / Local Models (2026)

Family	Sizes	Company	Notes
Llama 4	8B / 70B / 405B	Meta	Open weights, native multimodal
Qwen 3	0.5B / 7B / 14B / 32B / 72B	Alibaba	Apache 2.0, top in multilingual + coding
Gemma 3	1B / 4B / 12B / 27B	Google	On-device first, efficient, <1.5 GB Q4
Mistral / Codestral	7B / 22B / Large	Mistral	Apache 2.0, strong at coding
DeepSeek V3 / Coder V4	16B-MoE / 671B-MoE	DeepSeek	MIT, efficient MoE, OSS coding SOTA
Phi-4	3.8B / 14B	Microsoft	Small + reasoning, edge devices
Yi-Lightning / Yi-2	9B / 34B	01.AI	ZH/EN multilingual, long context
Command R+	104B	Cohere	Native RAG, robust function calling

Key Configuration Parameters

Each API uses slightly different names, but the concepts are universal. This is the practical list of parameters you'll touch in 95% of cases.

🌡️ temperature — sampling randomness

Controls how much randomness the sampler injects when picking the next token. Internally it divides the logits by T before softmax: low T sharpens the distribution (top token always wins), high T flattens it (every viable token gets a chance).

Typical range: 0.0 – 2.0 (most APIs cap at 2.0; values >1.5 produce hallucination and incoherence).

When to use what:

0.0 — Deterministic. Same input → same output. Code generation, data extraction, classification, reproducible evals.
0.2 – 0.4 — Almost deterministic with a touch of variety. Refactoring, faithful translation, summarization.
0.5 – 0.7 — Balanced. General conversation, assistant, tutorial. The sensible default.
0.8 – 1.0 — Creative. Brainstorming, copywriting, fiction, generating variants.
1.0 – 1.5 — Very random. Only when you want to explore the output space (e.g. self-consistency CoT).

🎯 top_p — nucleus sampling

Trims the tail of the distribution: at each step only tokens whose cumulative probabilities sum to p are considered. With top_p=0.9, you discard the 10% least likely tokens at every decision. It's complementary to temperature — Anthropic recommends tuning one or the other, not both at once.

Typical range: 0.7 – 1.0. Below 0.5 the model becomes repetitive.

Sensible defaults: top_p=1.0 and raise temperature, or temperature=1.0 and lower top_p.

🔝 top_k — candidate count cutoff

Only considers the k most likely tokens at each step. Cruder than top_p (it doesn't adapt to the shape of the distribution). Useful when you want bounded diversity guaranteed. Some APIs (Google, DeepSeek) expose it; Anthropic doesn't.

Range: 1 – 50. k=1 is greedy (equivalent to temperature=0).

📏 max_tokens — output cap

Maximum number of tokens the model can generate before being forcibly stopped. It's a hard cap, not a hint. If the model intended to produce more, the response truncates abruptly (often leaving malformed JSON).

Practical rules:

Short response expected: 256 – 1024.
Medium response (article, full function): 2048 – 4096.
Long response (large code, essay): 8192 – 16384.
Tasks with thinking enabled: raise to 32k – 64k (thinking tokens count).

Per-model caps: Claude 4.x 64k, GPT-5 16k, Gemini 2.5 Pro 64k, DeepSeek 8k.

🛑 stop_sequences / stop — explicit halts

List of strings that, if they appear in the output, make the model stop immediately. Useful when you're doing few-shot with your own delimiter (###, END, </answer>) and don't want the model to keep hallucinating a next example.

Up to 4 sequences in most APIs. The consumed sequence is NOT included in the final response.

🔁 frequency_penalty and presence_penalty (OpenAI / DeepSeek)

Reduce the probability of the model repeating tokens.

frequency_penalty — penalizes tokens proportionally to how many times they've appeared (more occurrences, more penalty). Fights literal loops.
presence_penalty — penalizes any token that appeared at least once (flat penalty). Pushes the model to explore new vocabulary.

Range: -2.0 – 2.0. Default 0.0. Raise to 0.3 – 0.7 if you see repetitions; above 1.0 introduces noise. Anthropic doesn't expose them (Claude has internal anti-repetition).

🎲 seed — reproducibility

Fixes the sampler RNG. Same seed + same parameters + same model version → same output (asterisk: most providers don't guarantee bit-exact determinism across infra versions).

Critical for automated evals, prompt debugging, and paper reproducibility. OpenAI and DeepSeek expose it; Anthropic doesn't.

🧠 thinking / reasoning_effort — internal reasoning

Enables internal reasoning tokens before the visible answer. The model thinks "to itself" for seconds or minutes, then delivers a stronger final answer. These tokens are billed.

API	Parameter	Values
Claude	`thinking={"type":"adaptive"}` + `output_config.effort`	`low / medium / high / xhigh / max`
OpenAI (o-series, GPT-5)	`reasoning_effort`	`minimal / low / medium / high`
Gemini 2.5	`thinking_config.thinking_budget`	`0 – 24576` (tokens) or `-1` (auto)
DeepSeek R / V4	`thinking={"type":"enabled"}`	`on / off`
Qwen 3	`enable_thinking`	`true / false`

Enabling thinking lifts quality 5-30% on multi-step tasks (math, debugging, planning) but multiplies latency 5-50× and cost proportionally.

📦 response_format — structured format

Forces the model to emit output that conforms to a given schema.

{"type": "text"} — Default. Free text.
{"type": "json_object"} — Guarantees valid JSON in the output (OpenAI, DeepSeek, Mistral, GLM). Free form.
{"type": "json_schema", "schema": {...}} — Strict JSON conforming to the schema (OpenAI, Gemini "controlled generation"). The model CANNOT deviate.
Anthropic doesn't expose response_format directly; use tool_use with input_schema for the same effect.

🛠️ tools and tool_choice — function calling

List of tools the model can invoke (defined with name, description, and parameter schema).

tool_choice="auto" — The model decides whether to call any tool or answer directly (default).
tool_choice="required" / "any" — Force a tool call.
tool_choice={"type":"tool","name":"X"} — Force a specific tool.
tool_choice="none" — Forbid tools, text-only response.

💾 cache_control — prompt caching (Anthropic)

Marks blocks of the prompt as cacheable. The first call pays 1.25× the normal input price to write to cache; subsequent calls pay 0.10× to read it. Typical TTL 5 min (extensible to 1 h with "ttl": "1h"). Critical for long system prompts and RAG with repeated docs.

🌐 system / system message — global instructions

The system prompt has top priority over user messages. It's where you define role, rules, expected format, safety constraints. It does NOT go inside messages[] in Anthropic (it's a top-level system parameter); in OpenAI/DeepSeek it goes as the first message with role: "system".

⚠️ Defaults are your enemy. If you leave all parameters unspecified, each model picks different ones (Claude uses temperature=1.0, OpenAI 1.0, DeepSeek 1.0, Gemini 1.0 but with top_p=0.95 and top_k=64). For production, explicitly set temperature, max_tokens and at least one of top_p/top_k.

12.1 Tipos de IA y Modelos

No toda la IA es igual — conoce las categorías y qué modelo usar para cada tarea

🧬 Taxonomía de Modelos de IA (2026)

Categoría	Qué hace	Ejemplos
LLM (Large Language Model)	Texto → Texto. Genera, resume, traduce, razona.	Claude Opus, DeepSeek V4, GPT-5
VLM (Vision Language Model)	Imagen + Texto → Texto. Describe, analiza imágenes.	Claude Vision, GPT-5V, Gemini 2.5
Embedding Model	Texto → Vector numérico. Búsqueda semántica.	text-embedding-3, Cohere Embed v4, BGE-M3
Diffusion Model	Texto → Imagen. Genera imágenes desde descripciones.	DALL-E 4, Stable Diffusion 3.5, Midjourney V7
Audio/Speech Model	Audio ↔ Texto. Transcribe, sintetiza voz.	Whisper V4, ElevenLabs, OpenAI TTS
Code Model	Especializado en generación de código.	Claude Code, Codex, DeepSeek Coder V4
Reward Model	Evalúa calidad de respuestas (para RLHF).	Interno de Anthropic/OpenAI

Niveles de IA (Conceptual)

Nivel	Descripción	Estado actual
ANI (Narrow AI)	Especializada en una tarea concreta	✅ Actualidad — todos los LLMs
AGI (General AI)	Igual o superior al humano en cualquier tarea intelectual	🔬 En investigación — debate abierto
ASI (Super AI)	Supera a la humanidad en todos los dominios	📚 Teórico — no existe

💡 En la práctica: Aunque los LLMs modernos parecen "inteligentes", siguen siendo ANI — modelos estadísticos especializados en lenguaje, no entidades conscientes.

🧬 AI Model Taxonomy (2026)

Category	What it does	Examples
LLM (Large Language Model)	Text → Text. Generate, summarize, translate, reason.	Claude Opus, DeepSeek V4, GPT-5
VLM (Vision Language Model)	Image + Text → Text. Describe, analyze images.	Claude Vision, GPT-5V, Gemini 2.5
Embedding Model	Text → Numeric vector. Semantic search.	text-embedding-3, Cohere Embed v4, BGE-M3
Diffusion Model	Text → Image. Generate images from descriptions.	DALL-E 4, Stable Diffusion 3.5, Midjourney V7
Audio/Speech Model	Audio ↔ Text. Transcribe, synthesize voice.	Whisper V4, ElevenLabs, OpenAI TTS
Code Model	Specialized in code generation.	Claude Code, Codex, DeepSeek Coder V4
Reward Model	Scores response quality (for RLHF).	Internal at Anthropic/OpenAI

AI Levels (Conceptual)

Level	Description	Current state
ANI (Narrow AI)	Specialized in one specific task	✅ Today — all current LLMs
AGI (General AI)	Equal or better than humans on any intellectual task	🔬 Under research — open debate
ASI (Super AI)	Surpasses humanity in all domains	📚 Theoretical — does not exist

💡 In practice: Although modern LLMs feel "intelligent," they're still ANI — statistical models specialized in language, not conscious entities.

12.2 Online (API) vs Local LLMs

La decisión más importante de arquitectura: ¿nube o local?

☁️ API / Cloud

✅ Sin GPU: No necesitas hardware
✅ Modelos top: Acceso a Claude Opus, GPT-5
✅ Escala automática: El proveedor gestiona todo
✅ Siempre actualizado: Última versión del modelo
❌ Coste por token
❌ Latencia de red
❌ Datos salen de tu infraestructura
❌ Dependencia del proveedor

💻 Local / Self-hosted

✅ Privacidad total: Datos nunca salen
✅ Sin coste por token: Solo electricidad + HW
✅ Sin límites de rate: Tu hardware, tus reglas
✅ Personalizable: Fine-tuning sin restricciones
❌ Necesitas GPU(s) potentes
❌ Modelos más pequeños (7B-70B params)
❌ Tú gestionas actualizaciones
❌ Mayor latencia en hardware modesto

Herramientas para LLMs Locales

Herramienta	Descripción	Ideal para
Ollama	Ejecuta LLMs localmente con un comando	Desarrollo, prototyping
LM Studio	Interfaz gráfica para descargar y ejecutar modelos	No-técnicos, experimentación
llama.cpp	Inferencia en CPU/GPU con cuantización	Máxima eficiencia, edge devices
vLLM	Serving de alto rendimiento para producción	APIs, producción
Text Generation Inference	Serving optimizado de Hugging Face	Producción con ecosistema HF
SGLang	Serving con structured outputs y constrained decoding	JSON Schema, function calling local
MLX (Apple)	Framework nativo para chips M-series con Unified Memory	Mac M2/M3/M4 (alta eficiencia memoria)
llamafile (Mozilla)	Modelo + runtime en un único archivo ejecutable	Distribución sencilla, sin instalación

Modelos Locales Recomendados por Hardware (2026)

Hardware	Modelos top que caben	Notas
CPU + 8 GB RAM	Gemma 3 1B/4B Q4, Phi-4-mini Q4, Llama 3.2 3B Q4	1-3 tok/s, ok para tareas simples
CPU + 16 GB RAM	Gemma 3 12B Q4, Qwen3-7B Q5, Mistral 7B Q5	3-6 tok/s en DDR4-3200
CPU + 32 GB RAM	Gemma 3 27B Q4, Qwen3-32B Q4, Llama 4 70B Q3	2-5 tok/s, calidad seria
GPU 8-12 GB (RTX 3060/4060/5060)	Qwen3-7B/14B Q4, Phi-4 14B Q5, Mistral 7B	30-50 tok/s
GPU 16-24 GB (RTX 4080/4090/5080/5090)	Qwen3-32B Q5, Llama 4 70B Q4, Codestral 22B Q5	20-40 tok/s, calidad cloud-equivalente
GPU 48 GB+ (RTX A6000, dual 4090)	Llama 4 70B Q6, Qwen3-72B Q5, DeepSeek V3 Q4 (parcial)	15-30 tok/s, frontier OSS local
Mac M2/M3/M4 Pro/Max (32-128 GB UMA)	Gemma 3 27B / Qwen3-72B / Llama 4 70B (Q4-Q5 con MLX)	10-30 tok/s, todo en VRAM unificada
Servidor 4× A100 80GB / H100	Llama 4 405B FP16, DeepSeek V3 671B-MoE, Qwen3-72B FP16	50-200 tok/s con vLLM, frontier OSS al máximo

💡 Punto dulce 2026 para uso personal: RTX 4090/5090 (24-32 GB VRAM) + Qwen3-32B Q5 o Llama 4 70B Q4 via Ollama. Calidad ~85% de Claude Sonnet en la mayoría de tareas, latencia local, privacidad total. Coste hardware amortizado en ~6-12 meses si sustituye API.

☁️ API / Cloud

✅ No GPU: No hardware needed
✅ Top models: Access to Claude Opus, GPT-5
✅ Auto-scaling: Provider handles everything
✅ Always up to date: Latest model version
❌ Cost per token
❌ Network latency
❌ Data leaves your infrastructure
❌ Vendor dependency

💻 Local / Self-hosted

✅ Total privacy: Data never leaves
✅ No per-token cost: Just electricity + HW
✅ No rate limits: Your hardware, your rules
✅ Customizable: Fine-tuning without restrictions
❌ Need powerful GPU(s)
❌ Smaller models (7B-70B params)
❌ You manage updates
❌ Higher latency on modest hardware

Tools for Local LLMs

Tool	Description	Ideal for
Ollama	Run LLMs locally with one command	Development, prototyping
LM Studio	GUI to download and run models	Non-technical users, experimentation
llama.cpp	CPU/GPU inference with quantization	Maximum efficiency, edge devices
vLLM	High-performance serving for production	APIs, production
Text Generation Inference	Hugging Face optimized serving	Production with HF ecosystem
SGLang	Serving with structured outputs and constrained decoding	JSON Schema, local function calling
MLX (Apple)	Native framework for M-series chips with Unified Memory	Mac M2/M3/M4 (high memory efficiency)
llamafile (Mozilla)	Model + runtime in a single executable file	Easy distribution, no install

Recommended Local Models by Hardware (2026)

Hardware	Top models that fit	Notes
CPU + 8 GB RAM	Gemma 3 1B/4B Q4, Phi-4-mini Q4, Llama 3.2 3B Q4	1-3 tok/s, OK for simple tasks
CPU + 16 GB RAM	Gemma 3 12B Q4, Qwen3-7B Q5, Mistral 7B Q5	3-6 tok/s on DDR4-3200
CPU + 32 GB RAM	Gemma 3 27B Q4, Qwen3-32B Q4, Llama 4 70B Q3	2-5 tok/s, serious quality
GPU 8-12 GB (RTX 3060/4060/5060)	Qwen3-7B/14B Q4, Phi-4 14B Q5, Mistral 7B	30-50 tok/s
GPU 16-24 GB (RTX 4080/4090/5080/5090)	Qwen3-32B Q5, Llama 4 70B Q4, Codestral 22B Q5	20-40 tok/s, cloud-equivalent quality
GPU 48 GB+ (RTX A6000, dual 4090)	Llama 4 70B Q6, Qwen3-72B Q5, DeepSeek V3 Q4 (partial)	15-30 tok/s, local frontier OSS
Mac M2/M3/M4 Pro/Max (32-128 GB UMA)	Gemma 3 27B / Qwen3-72B / Llama 4 70B (Q4-Q5 via MLX)	10-30 tok/s, all in unified VRAM
Server 4× A100 80GB / H100	Llama 4 405B FP16, DeepSeek V3 671B-MoE, Qwen3-72B FP16	50-200 tok/s with vLLM, frontier OSS maxed

💡 2026 sweet spot for personal use: RTX 4090/5090 (24-32 GB VRAM) + Qwen3-32B Q5 or Llama 4 70B Q4 via Ollama. Quality ~85% of Claude Sonnet on most tasks, local latency, full privacy. Hardware cost amortized in ~6-12 months if it replaces API spend.

12.3 Especializaciones de Modelos

No todos los LLMs son generalistas — modelos entrenados para dominios específicos

Especialidad	Modelos	Características
Código	DeepSeek Coder V4, Claude Code, Codex, Qwen Coder	Fine-tuned en repositorios GitHub, entienden arquitectura de proyectos
Matemáticas	DeepSeek Math, Qwen Math, Llemma	Razonamiento formal, demostraciones, cálculo simbólico
Medicina	Med-PaLM 3, Claude (con system prompt médico)	Diagnóstico diferencial, literatura médica, terminología clínica
Legal	Claude (con role prompting legal), Harvey AI	Contratos, jurisprudencia, regulación, compliance
Finanzas	BloombergGPT, FinGPT	Análisis financiero, reporting, SEC filings
Multilingüe	DeepSeek V4, Aya Expanse, Tower	100+ idiomas, traducción, comprensión cross-lingual
Pequeños/Edge	Phi-4, Gemma 3, Llama 3.2 1B/3B	Ejecutan en móviles, IoT, sin conexión

💡 La mayoría de tareas no necesitan un modelo especializado. Un buen system prompt + Claude Opus 4.7 o DeepSeek V4 Pro cubre el 95% de casos. Usa especializados solo cuando necesites consistencia extrema en un dominio.

Specialty	Models	Characteristics
Code	DeepSeek Coder V4, Claude Code, Codex, Qwen Coder	Fine-tuned on GitHub repos, understand project architecture
Math	DeepSeek Math, Qwen Math, Llemma	Formal reasoning, proofs, symbolic computation
Medicine	Med-PaLM 3, Claude (with medical system prompt)	Differential diagnosis, medical literature, clinical terminology
Legal	Claude (with legal role prompting), Harvey AI	Contracts, case law, regulation, compliance
Finance	BloombergGPT, FinGPT	Financial analysis, reporting, SEC filings
Multilingual	DeepSeek V4, Aya Expanse, Tower	100+ languages, translation, cross-lingual understanding
Small/Edge	Phi-4, Gemma 3, Llama 3.2 1B/3B	Run on phones, IoT, offline

💡 Most tasks don't need a specialized model. A good system prompt + Claude Opus 4.7 or DeepSeek V4 Pro covers 95% of cases. Use specialized models only when you need extreme consistency in a domain.

12.4 Open Source vs Propietario

🔓 Open-Weights / Open Source

DeepSeek V3 / Coder V4 — MIT (DeepSeek)
Llama 4 — 8B/70B/405B (Meta, Llama Community License)
Qwen 3 — 0.5B-72B Apache 2.0 (Alibaba)
Qwen3-Coder / Qwen3-Math — Variantes especializadas
Gemma 3 — 1B-27B (Google, Gemma License)
Mistral / Codestral / Pixtral — Apache 2.0 (Mistral)
Phi-4 / Phi-4-mini — MIT (Microsoft)
GLM-4 — 9B-32B (Zhipu)
Yi-2 / Yi-Lightning — Apache 2.0 (01.AI)
Command R+ — CC-BY-NC (Cohere, comercial vía API)
OLMo 2 — Apache 2.0 + datos + recipe (AI2, único 100% OSS)
✅ Lo ejecutas donde quieras
✅ Fine-tuning sin restricciones
✅ Sin dependencia de un vendor
❌ Necesitas GPU para los grandes
❌ Frontier OSS sigue 5-15% detrás del top propietario

🔒 Propietario (Solo API)

Claude Opus 4.7 / Sonnet 4.6 / Haiku 4.5 — Anthropic
GPT-5.5 / GPT-5 / GPT-5 Mini — OpenAI
Gemini 2.5 Pro / Flash — Google
Grok 4 — xAI
Kimi K2 — Moonshot (cloud only)
MiniMax M2 — MiniMax (cloud only)
Qwen3-Max — Alibaba (top tier solo cloud)
GLM-5 — Zhipu (top tier solo cloud)
✅ Rendimiento top-tier
✅ Sin gestión de infraestructura
✅ Actualizaciones automáticas
✅ Multimodal nativo (Gemini, GPT-5, Claude)
❌ Vendor lock-in
❌ Coste por token
❌ No puedes fine-tunear (salvo excepciones)
❌ Datos salen de tu infraestructura

⚠️ "Open Source" en IA: La mayoría de modelos "open source" solo publican los pesos (open weights), no el código de entrenamiento ni los datos. Pocos son verdaderamente open source — OLMo de AI2 publica pesos + datos + recipe + scripts (única familia 100% reproducible). Llama, Qwen, Gemma, Mistral, DeepSeek caen en "open weights" — pesos descargables, training data privado.

📊 OSS vs Propietario en 2026 — el gap se cierra

En benchmarks 2026, los frontier OSS top (DeepSeek V3 671B-MoE, Llama 4 405B, Qwen3-72B) están a ~5-15% del frontier propietario (Claude Opus, GPT-5, Gemini Pro). Para muchos casos prácticos OSS basta — la diferencia se nota más en tareas extremas (razonamiento profundo multi-step, agentes largos, ARC-AGI).

Reglas de pulgar 2026:

Si el caso es "uso interno + datos sensibles" → OSS gana siempre (privacidad).
Si el caso es "producto público, ROI > coste API" → propietario gana (calidad + actualizaciones).
Si el caso es "alto volumen, calidad media" → OSS self-hosted (Qwen3 / Llama 4) o cloud chino barato (DeepSeek / GLM / Kimi).

🔓 Open-Weights / Open Source

DeepSeek V3 / Coder V4 — MIT (DeepSeek)
Llama 4 — 8B/70B/405B (Meta, Llama Community License)
Qwen 3 — 0.5B-72B Apache 2.0 (Alibaba)
Qwen3-Coder / Qwen3-Math — Specialized variants
Gemma 3 — 1B-27B (Google, Gemma License)
Mistral / Codestral / Pixtral — Apache 2.0 (Mistral)
Phi-4 / Phi-4-mini — MIT (Microsoft)
GLM-4 — 9B-32B (Zhipu)
Yi-2 / Yi-Lightning — Apache 2.0 (01.AI)
Command R+ — CC-BY-NC (Cohere, commercial via API)
OLMo 2 — Apache 2.0 + data + recipe (AI2, the only 100% OSS)
✅ Run anywhere
✅ Fine-tuning without restrictions
✅ No vendor dependency
❌ Need a GPU for the big ones
❌ Frontier OSS still 5-15% behind top proprietary

🔒 Proprietary (API only)

Claude Opus 4.7 / Sonnet 4.6 / Haiku 4.5 — Anthropic
GPT-5.5 / GPT-5 / GPT-5 Mini — OpenAI
Gemini 2.5 Pro / Flash — Google
Grok 4 — xAI
Kimi K2 — Moonshot (cloud only)
MiniMax M2 — MiniMax (cloud only)
Qwen3-Max — Alibaba (top tier cloud only)
GLM-5 — Zhipu (top tier cloud only)
✅ Top-tier performance
✅ No infrastructure management
✅ Automatic updates
✅ Native multimodal (Gemini, GPT-5, Claude)
❌ Vendor lock-in
❌ Cost per token
❌ No fine-tuning (with rare exceptions)
❌ Data leaves your infrastructure

⚠️ "Open Source" in AI: Most "open source" models only publish the weights (open weights), not the training code or data. Few are truly open source — AI2's OLMo publishes weights + data + recipe + scripts (only fully reproducible family). Llama, Qwen, Gemma, Mistral, DeepSeek fall under "open weights" — downloadable weights, private training data.

📊 OSS vs Proprietary in 2026 — the gap is closing

On 2026 benchmarks, top frontier OSS (DeepSeek V3 671B-MoE, Llama 4 405B, Qwen3-72B) sit ~5-15% behind frontier proprietary (Claude Opus, GPT-5, Gemini Pro). For many practical cases OSS is enough — the gap shows mostly on extreme tasks (deep multi-step reasoning, long-running agents, ARC-AGI).

2026 rules of thumb:

If the case is "internal use + sensitive data" → OSS always wins (privacy).
If the case is "public product, ROI > API cost" → proprietary wins (quality + updates).
If the case is "high volume, medium quality" → OSS self-hosted (Qwen3 / Llama 4) or cheap Chinese cloud (DeepSeek / GLM / Kimi).

📋 Knowledge check 📋 Knowledge check

Cinco preguntas para verificar tu comprensión del bloque. Cada respuesta se guarda en localStorage; puedes volver más tarde y recordar tu progreso.

Five questions to verify your understanding of the block. Each answer is saved in localStorage; you can return later and recall your progress.

Referencias

Documentación oficial y papers consultados durante este módulo. Revísalos para profundizar.