The previous article ended on a tease. Prompting and retrieval are two levers, and a third one was waiting. That third lever is tool use, the ability for the model to call functions, query databases, send messages, and run actions in the world.
This is the lever that turns a clever conversation into something that does work. Without it the model is a brilliant intern who can only talk. With it, the same model can finally pick up the phone, read your calendar, and write back with a draft.

WHAT TOOL USE ACTUALLY IS
In its simplest form, tool use is structured output dressed up as a verb.
You hand the model a list of tools at the start of the conversation. Each tool is a JSON description: a name, a one-line purpose, and the shape of its arguments. Think of it like a menu in a restaurant where the customer can only order dishes the kitchen actually makes.
When the user asks something like what is the weather in Berlin, the model does not invent the answer. It picks the matching tool from the list, fills in the arguments (city: Berlin, units: metric), and returns a structured object instead of free text. Your code reads that object, calls the actual API, and feeds the response back into the next turn.
That is the whole trick. The model writes the request, your code executes it, and the result re-enters the conversation. The model is no longer the only voice in the loop. It is one player among several.
THE 2023 TO 2026 EVOLUTION
This pattern did not exist at the start of the GPT era.
The first version, in mid-2023, was OpenAI’s function calling. You described your functions in JSON Schema, and a fine-tuned variant of GPT learned to emit function calls instead of prose when appropriate. It worked, but it was vendor-specific, brittle on edge cases, and limited to one function call per turn. Anyone who shipped a product on top of it remembers the glue code more vividly than the demo.
By 2024 the picture had broadened. Tool use replaced the older function calling term, parallel calls became the default (a single reasoning step might fan out to four lookups at once), and most major models, like Claude, GPT, Gemini, and the Llama family, supported it natively. The skill stopped being something OpenAI alone offered and became table stakes.
By 2026 most frontier labs train their models from scratch with tool use in the loop. The models do not just emit calls on demand. They decide when to call, plan multi-step tool sequences, and recover when a call fails. The tool layer feels less like an add-on and more like another form of language the model speaks fluently.

ENTER MCP
For two years, every team that wanted to give a model real tools had to write the glue twice. Once for OpenAI’s schema, once for Anthropic’s, sometimes a third time for Google. The situation rhymed with the early days of mobile apps, where the same product had to ship as separate iOS and Android codebases.
Each integration had its own auth, its own error shape, its own retry semantics. A small team building a chatbot with five tools ended up maintaining three forks of the same code, which is, to put it mildly, a pain.
Anthropic shipped the Model Context Protocol in late 2024 to settle this. MCP is, in one sentence, an open protocol that defines how a model talks to tools: what a tool description looks like, how arguments flow, how results return, how authentication works, and how a tool can stream progress back. It is roughly what HTTP did for documents, a boring layer of agreement that makes everything above it possible.
The point is not novelty. The point is that MCP is not Anthropic-specific. The spec is open, the SDKs are open source, and other model providers and tools have started to adopt it. A tool you write once becomes a tool any MCP-aware client can use, the same way a website written once works in any browser.
HOW MCP DIFFERS FROM PLUGINS
It is fair to ask, didn’t ChatGPT plugins try this in 2023?
They did, but they were one company’s storefront for one company’s chatbot, with one company’s approval queue. MCP flips that. The model is the client, the tools are servers, and anyone can run a server, on their laptop, behind their VPN, on a public host.
There is no marketplace gatekeeper. The discovery story is closer to paste a URL than submit for review.
That difference matters more than it looks. It is the difference between a walled mobile app store and the open web. One you visit. The other you wire up.
PUTTING IT TOGETHER
A modern agent is the three pieces compressed into one loop.
The model receives a prompt and the available tools. It reasons about what to do, decides whether to call a tool, calls it, reads the result, then decides if it wants more steps and continues until it can answer.
Reasoning models shine here, because the planning step is exactly what they got better at. Tool use plus chain-of-thought plus a server full of capabilities is not a chatbot anymore. It is an agent with hands.

WHERE THIS BREAKS
The failure modes are interesting, and most of them are not the model’s fault.
The model can invent a tool call that does not match the schema. Modern models do this rarely, but when it happens the application has to recover gracefully, retry, fall back. A tool can timeout in the middle of a long agent run, and the orchestration code must decide whether to tell the model and let it adapt or fail the whole turn.
A tool can return data the model has no way to interpret, a 500-page PDF for example, and now you need a smaller summarizer in the loop. None of these are exotic. They are, mostly, the same operational problems an old microservice architecture has, dressed in new vocabulary.
Then there is security. A tool that can write to a calendar will write the wrong thing if a clever prompt tricks the model. Prompt injection that hides in retrieved documents, which we covered last article, becomes more dangerous the moment the model can act.
Real systems put guardrails between the model and the tools: confirmation prompts for destructive actions, allow-listed domains, audit logs for every call. Treating tool use as a trust boundary is the price of admission.
THE PRACTICAL TAKEAWAY
If prompting is what to do, and retrieval is what you know, then tool use is what you can touch.
Most useful 2026 applications use all three. A coding assistant prompts the model to behave like a pair programmer, retrieves the right code from your repo, and uses tools to actually run tests, edit files, and open pull requests. A customer-support agent prompts for tone, retrieves your knowledge base, and uses tools to look up the order status and issue a refund.
The interesting product work in 2026 is not picking a single lever. It is wiring the three together so that each one does what it is good at and nothing else.
MCP is the wiring standard for that third lever. If you are building anything beyond a chat window in 2026, learning it pays back fast. Have you noticed how quickly the conversation in your team shifted from which model to which tools?
The next article changes the question from how do you give a model new behavior at runtime to how do you give it new behavior permanently, with fine-tuning and LoRA.
T.
References
- Function Calling and Tool Use (Anthropic Documentation) - Practical reference for how Claude structures tool use, useful for a working sense of the API contract.
- Model Context Protocol Specification (Anthropic, 2024) - The open spec, with examples of servers and clients in TypeScript and Python.
- OpenAI Function Calling Guide - The 2023 introduction that started the pattern, still useful for comparison and historical context.
- Awesome MCP Servers (GitHub) - A curated index of community-built MCP servers, helpful for seeing the breadth of what builders have wired up since the spec dropped.
- The Promise of Agentic AI (a16z) - A vendor-adjacent but readable framing of why tool use plus reasoning is a different category from the chatbot era.