Let's get started with a Microservice Architecture with Spring Cloud:
Explainable AI Agents: Capture LLM Tool Call Reasoning with Spring AI
Last updated: March 7, 2026
1. Overview
When we build AI agents with tool-calling capabilities, we often see what tool the LLM selected, but we don’t see why it made that decision. This lack of insight makes debugging harder, reduces observability, and limits trust in AI-driven systems. For production-grade agents, understanding the model’s reasoning is not optional. Explainable AI agents solve this problem by capturing additional context from the LLM during tool selection.
In this article, we’ll review the Tool Argument Augmenter in a practical example. We’ll see how to capture LLM reasoning during tool calls and how to use that data inside a Spring AI application.
2. Tool Calling Problem
We use a tool calling when the model cannot answer reliably using only its training data. For example, we use it when the LLM needs real-time data, such as current prices or user-specific information, when it requires access to external systems like databases or internal services, or when it must trigger actions such as creating a record or sending a notification.
In Spring AI, the model delegates work to application code through tool calling, while the LLM focuses on understanding the user request and generating the final response. Let’s assume our application exposes two tools:
@Tool(description = "Get patient health status")
public String retrievePatientHealthStatus(String patientId) {
return HEALTH_DATA.get(patientId).status();
}
@Tool(description = "Get when patient health status was updated")
public LocalDate retrievePatientHealthStatusChangeDate(String patientId) {
return HEALTH_DATA.get(patientId).changeDate();
}
Using @Tool, we mark a method as available for LLM tool calling. We ask our application Is the patient stable? And here is what’s happening behind the scenes:
- Spring AI sends both tool definitions, including their input schemas, to the LLM.
- The LLM analyzes the request and evaluates the available tools.
- It decides to call retrievePatientHealthStatus.
- The LLM returns a tool call request with the required arguments.
- Tool management dispatches and invokes the selected tool.
- The tool returns its result to the LLM, which generates the final response.
From the application’s perspective, we only see that the tool was selected. The problem is that we do not see the reasoning behind that choice. This lack of reasoning limits observability and makes debugging harder. We can confirm which tool was called, but we cannot explain why the LLM chose it. The Spring AI Tool Argument Augmenter is designed specifically to address this limitation.
3. The Tool Argument Augmenter
The Tool Argument Augmenter adds an explainability layer on top of standard tool calling. We dynamically extend the tool’s JSON Schema with extra arguments. These arguments capture metadata that the application needs, such as reasoning, insights, or confidence. The tool itself remains unchanged and unaware of this augmentation. Using @ToolParam, we describe individual method parameters so the model understands what inputs it must provide:
@ToolParam(description = """
Your step-by-step reasoning for why you're calling this tool and what you expect.
Add evidences why did you choose specific tool to call.
""", required = true)
String innerThought
After we enable the Tool Argument Augmenter, the tool-calling flow changes as follows:
- We ask Is the patient stable?
- Spring AI sends the tool definitions for retrievePatientHealthStatus() and retrievePatientHealthStatusChangeDate() to the Tool Call Advisor.
- The Tool Argument Augmenter intercepts both tool definitions.
- The augmenter extends each tool’s JSON Schema with the innerThought argument.
- Spring AI sends the augmented tool schemas to the LLM.
- The LLM decides to call retrievePatientHealthStatus() and returns a tool call request that includes the original argument and an augmented argument, innerThought, explaining why this tool was selected.
- The augmenter extracts innerThought and forwards it to a consumer for logging, memory storage, or analysis.
- Spring AI invokes retrievePatientHealthStatus() using only the expected argument.
- The LLM generates the final response using the tool result.
This approach captures why the LLM selected a specific tool. We can log the reasoning, store it as long-term memory, or use it for debugging and analytics. At the same time, we keep tools clean and reusable while gaining explainability and trust in agent behavior without changing existing tool contracts.
4. Patient Health Status Checker Example
Let’s implement a simple Patient Health Status Checker application. In this case, we’ll have a few tools that provide different types of patient health status information. Then, the user will call our checker with questions about different patients. Based on that, the LLM will decide which tool to call to provide the required information.
4.1. Dependencies
We start by adding the spring-ai-starter-model-openai dependency:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-openai</artifactId>
<version>${spring-ai.version}<version>
</dependency>
This dependency already includes Spring AI classes under the hood. In addition, it provides the OpenAI model integration that we’ll use in this application.
4.2. Tools Specification
Let’s create a PatientHealthInformationTools. This class will expose tool methods that our AI agent can call to retrieve patient health information. It acts as a bridge between the LLM and our internal health data source:
public class PatientHealthInformationTools {
public static final Map<String, HealthStatus> HEALTH_DATA = Map.of(
"P001", new HealthStatus("Healthy", LocalDate.ofYearDay(2025, 100)),
"P002", new HealthStatus("Has cough", LocalDate.ofYearDay(2025, 200)),
"P003", new HealthStatus("Healthy", LocalDate.ofYearDay(2025, 300)),
"P004", new HealthStatus("Has increased blood pressure", LocalDate.ofYearDay(2025, 350)),
"P005", new HealthStatus("Healthy", LocalDate.ofYearDay(2026, 10)));
@Tool(description = "Get patient health status")
public String retrievePatientHealthStatus(String patientId) {
return HEALTH_DATA.get(patientId).status();
}
@Tool(description = "Get when patient health status was updated")
public LocalDate retrievePatientHealthStatusChangeDate(String patientId) {
return HEALTH_DATA.get(patientId).changeDate();
}
}
In this class, first, we’ve added two tools. Initially, in retrievePatientHealthStatus(), we return the patient status by patientId. Then, in retrievePatientHealthStatusChangeDate(), we return the patient status update date.
4.3. Agent Thinking DTO
Now we introduce the AgentThinking DTO. We use this object to capture the model’s reasoning during tool selection. It helps us make tool-calling decisions more transparent and easier to analyze:
public record AgentThinking(
@ToolParam(description = """
Your step-by-step reasoning for why you're calling this tool and what you expect.
Add evidences why did you decided specific tool to call.
""", required = true)
String innerThought,
@ToolParam(description = "Confidence level (low, medium, high) in this tool choice", required = true)
String confidence) {
}
In this DTO, first, we’ve added two thinking parameters. Specifically, in innerThought, the LLM explains why it calls a specific tool. Additionally, in confidence, we capture how confident the LLM was in its choice of the tool.
4.4. PatientHealthStatusService
Let’s create the PatientHealthStatusService. This service coordinates the LLM call and integrates our augmented tool logic:
@Service
public class PatientHealthStatusService {
private static final Logger log = LoggerFactory.getLogger(PatientHealthStatusService.class);
private final ChatClient chatClient;
@Autowired
public PatientHealthStatusService(OpenAiChatModel model) {
AugmentedToolCallbackProvider<AgentThinking> provider = AugmentedToolCallbackProvider
.<AgentThinking>builder()
.toolObject(new PatientHealthInformationTools())
.argumentType(AgentThinking.class)
.argumentConsumer(event -> {
AgentThinking thinking = event.arguments();
log.info("Chosen tool: {}\n LLM Reasoning: {}\n Confidence: {}",
event.toolDefinition().name(), thinking.innerThought(), thinking.confidence());
})
.build();
chatClient = ChatClient.builder(model)
.defaultToolCallbacks(provider)
.build();
}
public String getPatientStatusInformation(String prompt) {
log.info("Input request: {}", prompt);
return chatClient.prompt(prompt)
.call()
.content();
}
}
We’ve created an AugmentedToolCallbackProvider instance with our tools attached. Additionally, we’ve injected the AgentThinking DTO and added logging logic to print the reasoning details for each specific call. In the getPatientStatusInformation() method, first, we call the chatClient with an input prompt. At the same time, the attached AugmentedToolCallbackProvider automatically applies all the required logic. As a result, we don’t need to handle this behavior manually.
4.5. Call the PatientHealthStatusService for Different Information
Finally, let’s test the PatientHealthStatusService. We want to verify that tool selection works correctly and that the reasoning metadata is properly captured:
@Test
void givenPatientHealthStatusService_whenAskingPatientHealthStatusAndChangeDate_thenResponseShouldContainExpectedInformation() {
String healthStatusResponse = statusService
.getPatientStatusInformation("What is the health status of the patient P002?");
assertThat(healthStatusResponse)
.contains("cough");
String healthStatusChangeDateResponse = statusService
.getPatientStatusInformation("When the patient P002 health status was changed?");
assertThat(healthStatusChangeDateResponse)
.contains("July 19, 2025");
}
We’ve called getPatientStatusInformation() a few times. The first time, we asked for the patient’s health status. The second time, we were interested in the health status change date. We verified that all responses contain the expected information. Here is the log output we got:
[2026-02-02 09:34:46] [INFO] [c.b.s.e.PatientHealthStatusService] - Input request: What is the health status of the patient P002?
[2026-02-02 09:34:48] [INFO] [c.b.s.e.PatientHealthStatusService] - Chosen tool: retrievePatientHealthStatus
LLM Reasoning: I am calling this tool to get the current health status of the patient with ID P002, as it is essential to know their health condition.
Confidence: high
[2026-02-02 09:34:50] [INFO] [c.b.s.e.PatientHealthStatusService] - Input request: When the patient P002 health status was changed?
[2026-02-02 09:34:53] [INFO] [c.b.s.e.PatientHealthStatusService] - Chosen tool: retrievePatientHealthStatusChangeDate
LLM Reasoning: I need to find out when the health status for patient P002 was last updated to understand their current health situation and any recent changes that may affect their treatment or care. This tool is specifically designed to retrieve the date of the last health status change for a patient.
Confidence: high
Here, we can see which tool was called, the model’s reasoning for choosing it, and the level of confidence in its use.
5. Chain of Tool Calls Example
Let’s review another use case where we need to call a chain of tools. In this case, we need to obtain the patient’s health status by the patient’s name. To begin with, we add a new tool definition:
public class PatientHealthInformationTools {
private static final Map<String, String> PATIENTS_IDS = Map.of(
"John Snow", "P001",
"Emily Carter", "P002",
"Michael Brown", "P003",
"Sophia Williams", "P004",
"Daniel Johnson", "P005"
);
@Tool(description = "Get patient id for patient name")
public String retrievePatientId(String patientName) {
return PATIENTS_IDS.get(patientName);
}
}
Here, in the retrievePatientId() tool, we first return the patient ID for a given patient name. Next, we create another call to our service. Then, we use it to retrieve the patient’s health status by name:
@Test
void givenPatientHealthStatusService_whenAskingPatientHealthStatusByPatientName_thenResponseShouldContainExpectedInformation() {
String healthStatusResponse = statusService
.getPatientStatusInformation("What is the health status of the patient. Patient name: John Snow?");
assertThat(healthStatusResponse)
.containsIgnoringCase("healthy");
}
As expected, we obtained the health status. Now, let’s review the logs:
[2026-02-02 09:44:50] [INFO] [c.b.s.e.PatientHealthStatusService] - Input request: What is the health status of the patient. Patient name: John Snow?
[2026-02-02 09:44:52] [INFO] [c.b.s.e.PatientHealthStatusService] - Chosen tool: retrievePatientHealthStatus
LLM Reasoning: I need to find out the health status of the patient named John Snow. This tool is specifically designed to retrieve the health status of a patient based on their name, which is why I chose it.
Confidence: high
[2026-02-02 09:44:55] [INFO] [c.b.s.e.PatientHealthStatusService] - Chosen tool: retrievePatientId
LLM Reasoning: Since I encountered an issue retrieving the health status directly, I'm going to first get the patient ID for John Snow. Once I have the patient ID, I can then retrieve the health status using that ID. This is a necessary step because the health status tool requires a valid patient ID to work properly.
Confidence: high
[2026-02-02 09:44:57] [INFO] [c.b.s.e.PatientHealthStatusService] - Chosen tool: retrievePatientHealthStatus
LLM Reasoning: Now that I have the patient ID for John Snow, I can use it to retrieve the health status. This tool will provide the health information associated with the patient ID I obtained earlier.
Confidence: high
We can see the entire chain of reasoning the model used to decide which tool to call. We can even use this information as feedback to make our request prompt more specific and avoid unnecessary tool calls.
6. Conclusion
In this article, we’ve reviewed how to make our AI integrations explainable using the Tool Argument Augmenter. We captured the model’s reasoning during tool selection without modifying the tool implementations themselves.
With this approach, we achieve higher observability of LLM tool-calling decisions and gather valuable feedback for prompt improvements. Additionally, we can enrich tool calls with extra metadata such as risk levels, fallback strategies, or decision categories. Finally, reasoning data can be routed to monitoring systems, persisted for audits, or analyzed to optimize agent behavior over time.
As always, the code is available over on GitHub.
















