You stand at a hotel concierge desk. You want a table at the restaurant downstairs, a reservation at the spa, theater tickets, and a car to the airport. You do not want the concierge to do these things. You want the concierge to route your requests to the people who can, collect the confirmations, and hand you the results. The concierge is not the doer. The concierge is the coordinator. The concierge has a list of specialists and knows how to route to each one. The concierge does not cook dinner, massage shoulders, sell tickets, or drive cars. The concierge orchestrates.
Tool calling in AI agents works the same way. When a model has tool calling enabled, it does not have those capabilities built in. It has a list of available tools and the ability to request that they be invoked. The model outputs a structured request, the system executes the tool, the result comes back, and the model continues. The model is the concierge. The tools are the specialists. The model decides which tool to call; the system executes the tool; the model processes the result.
Tool calling is not the model performing an action. It is the model requesting an action. The distinction matters for debugging. When a tool returns unexpected results, the failure is in the tool or the tool’s interface, not in the model’s reasoning. The model asked for the right thing. The tool did the wrong thing. Separate those failure modes before you start hunting for model bugs. The concierge asked for a table at 7pm; the restaurant said they are closed. That is not a concierge failure.
A team building a calendar agent spent two days trying to fix a “model reasoning failure” where the agent was scheduling double bookings. The model was correctly identifying conflict. The tool was returning stale availability data. Fixing the data source, not the model, solved it. If they had recognized the concierge/specialist separation earlier, they would have isolated the problem in the first hour. The concierge cannot fix a broken specialist; but knowing the concierge is not the problem saves two days of debugging.
Tool calling is also not the same as function calling as a generic term. Some vendors use “function calling” to mean model-native tool use. MCP uses “tool calling” for its protocol-level version. The concept is similar; the implementation surfaces differ. When you are debugging why a tool call failed, knowing which interface you are using matters for where you look for the problem. The terminology is overloaded; the separation of concerns is not.
Tool calling is also not the same as agent behavior. Tool calling is a mechanism for extending model capabilities; agents are systems that use tools to accomplish goals autonomously. A system can use tool calling without being an agent. An agent uses tool calling as part of its operation, but the agent architecture includes additional components: goal representation, planning, memory, and feedback loops. Tool calling is a feature; agents are an architecture.
The Concierge Problem
Concierge desks add a step. That step takes time. If the restaurant is closed, the concierge finds out and tells you. If the spa does not have availability, the concierge offers alternatives. This extra loop is where tool calling adds value and where it adds latency. Each tool invocation is a round trip. Chain multiple tools together and you accumulate latency before the user gets a response. The concierge is valuable precisely because it handles complexity; but the handling takes time.
This diagram requires JavaScript.
Enable JavaScript in your browser to use this feature.
The latency compounds in non-obvious ways. A tool call has the model call overhead, the network to the tool, the tool processing, the network back, and then the model processing the response. If your tools are slow, your agent is slow. If your tools are unreliable, your agent is unreliable in ways the model cannot recover from. The concierge can work around a closed restaurant by suggesting the one next door. A tool-calling agent can only work with what the tool returns. The agent’s ceiling is set by the tools’ floor.
Parallel tool invocation is possible when tools are independent. If the agent needs information from three sources that do not depend on each other, invoke all three simultaneously rather than sequentially. This reduces the total latency from the sum of individual latencies to the maximum of the latencies. Not all tool dependencies are obvious; design your tool schemas to make independence explicit. The concierge does not wait to call the spa before calling the restaurant if the two requests are independent.
There is also the error surface. If the tool schema is wrong, the model calls it incorrectly. If the tool returns structured data the model does not expect, the model mishandles the response. If the tool is unavailable, the call fails. Tool calling is only as reliable as the tools it coordinates. A flaky calendar integration will make your calendar agent flaky. The concierge cannot compensate for a broken specialist. The quality of the tool is the quality of the capability.
Timeout and retry policies matter for production tool-calling systems. A tool that takes too long should fail gracefully rather than hanging the agent. A tool that fails transiently should be retryable. Define timeouts and retry budgets at the system level, not per tool. A tool that times out without a retry policy leaves the agent hanging. A tool that retries indefinitely can make the agent unresponsive.
Tool Schema Design Matters
The interface between the concierge and the specialist is the tool schema: the description of what the tool does, what parameters it accepts, and what it returns. Schema design is undersung. A vague tool description leads to vague tool calls. A precise description with examples leads to precise calls. The schema is the API contract; treat it with the same care you would treat any API contract.
When a tool has multiple optional parameters, models often omit ones that would improve the result. If your calendar tool can accept a “priority” parameter but the model does not know priority matters, it never sends it. The tool schema needs to communicate not just what the tool accepts, but which combinations produce meaningfully different results. If optional parameters exist for a reason, say so in the description. “Include priority if this meeting conflicts with an existing high-priority meeting” is better than just listing the parameter.
Return shape matters equally. A tool that returns a long unstructured blob forces the model to parse it. A tool that returns a well-structured JSON response with clear field names gives the model something to work with cleanly. If your tool returns a raw HTML page when the model needs structured data, the agent has to add a parsing step that may fail. Better to have the tool do the parsing and return what the model actually needs. The specialist should hand the concierge a clean confirmation, not a pile of paperwork.
Error return shape is part of the schema. Tools that return errors in unexpected formats cause model confusion. If a tool returns a 500 error with an HTML error page, the model may try to parse that as the expected response. Define consistent error shapes across all tools: error code, error message, and any error details in structured format. The concierge should get a clear “we are closed” not a cryptic error page.
Tool Selection and Routing
An agent with many tools must decide which tool to call. The model makes this decision based on the tool descriptions. If descriptions are similar, the model may choose incorrectly. Distinct, specific tool descriptions improve selection accuracy. The concierge needs clear descriptions of what each specialist does.
Tool grouping can help. Instead of twelve individual search tools, have two: searchDocuments and searchWeb. The model chooses between the groups, then the system routes within the group. This reduces the selection problem without losing functionality. The concierge does not need to know every server’s extension; just the departments.
A routing layer between the model and tools can add logic that the model should not handle. If a tool requires authentication, the routing layer injects credentials. If a tool has rate limits, the routing layer queues requests. If a tool call matches a cache, the routing layer returns cached results without invoking the tool. This keeps the tool schema clean while adding system-level handling. The concierge desk handles the logistics so the specialist does not have to.
When to Use a Concierge
Not every task needs a concierge. If you want a direct answer from the model and the model can produce it, do not add tool calling overhead. The concierge is worth it when the task requires something the model alone cannot provide: current information, external systems, computation, or persistent state. If the answer is in the model’s training, do not call a specialist.
A common mistake is reaching for tool calling when a simpler approach suffices. If you want a model to write a haiku, it can do that without tools. If you want a model to tell you the current weather, it needs a weather tool. The test is whether the task’s value depends on something outside the model’s training. If yes, tool calling earns its place. If no, it adds latency and failure modes for no benefit.
There is also a middle ground. Some tasks benefit from tool calling even when the model could approximate the answer. A model that computes a tip with tool-based arithmetic is more reliable than one that does it from training. The tool removes a class of error even if the model could have gotten close. The cost is the round trip. Whether that trade-off is worth it depends on how wrong the model’s unaided answer would be and how important the difference is. Sometimes the specialist earns their fee even when the concierge could guess.
Use tool calling when the task requires external data or systems the model cannot access directly, when you are building an agent that coordinates multiple operations, when reliability matters: you want explicit invocation rather than hoping the model generates correct commands, when you need audit trails of what actions were requested, when the model’s unaided output would be meaningfully worse than the tool-augmented output, and when accuracy on this task is more important than speed.
Do not add tool calling when a single model response suffices without external data, when the task is primarily generative or creative, when latency is critical and the tool round trips would be prohibitive, when the added complexity is not justified by the reliability or accuracy difference, and when you are adding tools because it feels more agentic, not because the task needs them. The concierge desk makes sense when you have things to coordinate. If you just need a direct answer, walk past the desk.