Posts
All the articles I've posted.
-
llm-concepts7 min readTool Use, Function Calling, and MCP: How a Chatbot Became an Agent
Tools turn a chatbot into an agent. What function calling actually is, why MCP changed the rules, and the loop that makes a model do work.
-
llm-concepts8 min readPrompting and RAG: The Two Levers You Actually Pull
Most teams will never train a model. Most teams will spend a lot of time on prompts and retrieval. What the practical 2026 stack actually looks like.
-
llm-concepts7 min readInterpretability: What's Actually Inside
We can train a 70B model and watch it work. We mostly cannot explain why it works. Interpretability is the science trying to fix that.
-
llm-concepts7 min readBenchmarks: How Labs Measure Intelligence (and the Games They Play)
Every model launch comes with a chart. The numbers look big. What benchmarks actually measure, what they miss, and how labs game them.
-
ai2 min readAI Digest W18: A Frontier Launch Week
GPT-5.5 lands, DeepSeek V4 ships open weights at frontier-near quality, Gemini 3 Flash goes default, and ChatGPT grows arms.
-
llm-concepts7 min readHallucinations and Jailbreaks: The Two Ways LLMs Fail
LLMs produce confident wrong answers and can be tricked into ignoring safety rules. What is actually happening and why both failures are hard to fix.
-
llm-concepts8 min readThe 2026 Model Lineup: Who Ships What
A field guide to the 2026 frontier and open-weight model field, and a practical way to think about which model to actually pick.
-
llm-concepts7 min readMultimodality: Teaching Models to See and Hear
A multimodal model is not many models in a trench coat. It is one transformer trained to treat pixels, audio, and text as the same kind of thing.
-
llm-concepts7 min readReasoning Models: Chain-of-Thought and Test-Time Compute
Reasoning models do not have a new architecture. They have a new training recipe and permission to think for longer before answering.