← Back to Blog
Series: build-agent-from-scratch (2/2) View

Deconstructing the Agent Loop Line by Line: Why a Single Bash Tool is Enough

Full code: github.com/geyuxu/yuxu-java-agent

In the previous article, we discussed that an Agent is essentially a while loop. Today, we'll unpack this loop, examining each design decision line by line. Then, we'll answer a common question: Why is giving the Agent just one bash tool sufficient?


Complete Code Overview

First, let's present our complete Agent loop, which spans 56 lines (lines 84-139). We'll break it down section by section later.

static void agentLoop(List<Map<String, Object>> messages, ModelClient modelClient)
        throws IOException {

    while (true) {

        // 1. Call the model
        Map<String, Object> choice = modelClient.call(messages);

        String finishReason = (String) choice.get("finish_reason");
        Map<String, Object> assistantMsg =
                (Map<String, Object>) choice.get("message");

        // 2. Append the reply
        messages.add(assistantMsg);

        // 3. Check for exit
        if (!"tool_calls".equals(finishReason)) {
            String content = (String) assistantMsg.get("content");
            if (content != null) {
                System.out.println("\nAssistant: " + content);
            }
            return;
        }

        // 4. Execute tools
        List<Map<String, Object>> toolCalls =
                (List<Map<String, Object>>) assistantMsg.get("tool_calls");

        for (Map<String, Object> toolCall : toolCalls) {
            String callId = (String) toolCall.get("id");
            Map<String, Object> function =
                    (Map<String, Object>) toolCall.get("function");
            String arguments = (String) function.get("arguments");

            Map<String, String> args = gson.fromJson(arguments, Map.class);
            String command = args.get("command");

            System.out.println("\n> bash: " + command);
            String output = executeBash(command);
            System.out.println(output);

            // 5. Append results
            messages.add(Map.of(
                    "role", "tool",
                    "tool_call_id", callId,
                    "content", output
            ));
        }
    }
}

Now, let's break it down section by section.


Line 1: while (true)

while (true) {

This is an infinite loop, where the exit condition is determined by the model. When the model says, "I still need to call a tool," the loop continues; when the model says, "I'm done," the loop ends.

If you look at other Agent projects, you'll find that everyone does it this way.

OpenAI's Codex CLI core loop written in Rust (codex.rs line 5877):

loop {
    // ...prepare model input...
    let sampling_request_input = sess.clone_history().await;
    
    match run_sampling_request(...).await {
        Ok(output) => {
            if !needs_follow_up { break; }  // Model says it's done, exit
        }
        Err(...) => handle_error_or_return,
    }
}

Princeton's mini-swe-agent loop written in Python (default.py line 85):

while True:
    try:
        self.step()          # step() = query() + execute_actions()
    except InterruptAgentFlow as e:
        self.add_messages(*e.messages)
    finally:
        self.save(self.config.output_path)
    if self.messages[-1].get("role") == "exit":
        break                # Exit when the last message has role "exit"

Section 1.5: Telling the Model "What Tools You Have"

How does the model know it can call tools? In the createOpenAIClient method (lines 144-176):

static ModelClient createOpenAIClient(String apiKey) {
    return messages -> {
        Map<String, Object> body = new LinkedHashMap<>();
        body.put("model", MODEL);
        body.put("messages", messages);
        body.put("tools", TOOLS);        // ← Key: Always include tool definitions with each request
        // ...send HTTP request...
    };
}

TOOLS is a constant that defines the list of tools we provide to the model (lines 29-47):

static final List<Map<String, Object>> TOOLS = List.of(
    Map.of(
        "type", "function",
        "function", Map.of(
            "name", "bash",
            "description", "Execute a bash shell command and return the output.",
            "parameters", Map.of(
                "type", "object",
                "properties", Map.of(
                    "command", Map.of(
                        "type", "string",
                        "description", "The bash command to execute"
                    )
                ),
                "required", List.of("command")
            )
        )
    )
);

This is OpenAI's function calling protocol: you include a tools array in the request body, and each tool is described using JSON Schema, detailing its name, purpose, and parameter format. After seeing these definitions, the model knows which tools it can call and what parameters each tool requires.

If you don't pass tools, the model will never return finish_reason: "tool_calls"—it simply won't know that tools are available. Therefore, the tools parameter is a prerequisite for the entire Agent loop to function.

This format is not exclusive to OpenAI. Services compatible with OpenAI, such as DeepSeek and Groq, can directly reuse it. Anthropic (Claude) and Google (Gemini) have their own tool definition formats, but the core idea is the same: use a structured schema to tell the model "what tools you have and what their parameters look like."


Section 2: Calling the Model

Map<String, Object> choice = modelClient.call(messages);

String finishReason = (String) choice.get("finish_reason");
Map<String, Object> assistantMsg =
        (Map<String, Object>) choice.get("message");

We send the complete message history to the model and receive two things back:

  • finishReason: Why did the model stop? "stop" means the task is complete, "tool_calls" means it wants to call a tool.
  • assistantMsg: The model's reply content, which could be text or include tool call requests.

Note that we pass the complete messages list, not just the latest one. The model is stateless—it doesn't remember what was said in the previous turn. Each call requires sending all historical messages again for it to make coherent decisions.

This is also why the message array grows larger and eventually fills the context window.

Section 3: Appending the Reply

messages.add(assistantMsg);

The model's reply must be appended to the message history. This is because if the model called a tool in this turn, in the next turn, it needs to see its own "request to call a tool" message to correlate the tool's execution result with its request.

If you remove this line, the Agent will immediately become confused. The model would see the tool result but wouldn't know who called it or why.

Section 4: Checking Exit Condition

if (!"tool_calls".equals(finishReason)) {
    String content = (String) assistantMsg.get("content");
    if (content != null) {
        System.out.println("\nAssistant: " + content);
    }
    return;
}

This is the only exit point for the entire loop.

finishReason can have several possible values:

finish_reason Meaning What the Agent should do
stop Model proactively ended, task complete Output final answer, exit loop
tool_calls Model wants to call a tool Execute tool, continue loop
length Output reached max_tokens limit Truncated, may require handling
content_filter Content was filtered by safety system Needs to inform the user

We only care about one thing: whether finishReason is tool_calls. If it is, continue working; otherwise, exit.

We don't distinguish between stop and length. A more robust implementation should do something when length occurs, such as prompting the user that "output was truncated."

Section 5: Executing Tools

List<Map<String, Object>> toolCalls =
        (List<Map<String, Object>>) assistantMsg.get("tool_calls");

for (Map<String, Object> toolCall : toolCalls) {
    String callId = (String) toolCall.get("id");
    Map<String, Object> function =
            (Map<String, Object>) toolCall.get("function");
    String arguments = (String) function.get("arguments");

    Map<String, String> args = gson.fromJson(arguments, Map.class);
    String command = args.get("command");

    System.out.println("\n> bash: " + command);
    String output = executeBash(command);
    System.out.println(output);

Note that toolCalls is a list. The model can request to call multiple tools in a single response. For example, it might want to run both ls and cat README.md simultaneously.

Each tool call contains three key pieces of information:

  • id: A unique identifier for the call, used to correlate the result with the request.
  • function.name: Which tool to call (we only have bash, so no dispatch is needed).
  • function.arguments: The JSON string of the tool's parameters.

arguments is a JSON string rather than an object; this is by design in the OpenAI API. The JSON generated by the model might have formatting issues (e.g., extra commas, missing quotes), so production-grade code typically requires more robust parsing.

Section 6: Appending Tool Results

    messages.add(Map.of(
            "role", "tool",
            "tool_call_id", callId,
            "content", output
    ));
}

Append the tool execution result to the message history. Note the three fields:

  • role: "tool": One of OpenAI's message roles, specifically for tool results.
  • tool_call_id: Must correspond to the id in the request, otherwise the API will return an error.
  • content: The tool's output text.

The tool_call_id exists because the model might call multiple tools simultaneously. Each result must clearly correspond to a specific call for the model to interpret it correctly.

After appending all tool results, the loop returns to step 1, calling the model again with the longer message history. The model will see its previous requests and the tool's execution results, then decide the next step.


Why Only One Bash Tool?

This is the most frequently asked question. The answer is simple: bash is a universal tool.

Think about what you can do in a terminal:

# Read a file
cat src/main/java/Agent.java

# Write a file
echo "hello" > test.txt

# Search code
grep -r "while (true)" --include="*.java"

# Find files
find . -name "*.java" -type f

# Run tests
mvn test

# Check git status
git status && git diff

# Install dependencies
pip install requests

# Send HTTP requests
curl -s https://api.example.com/data

Reading files, writing files, searching, executing, network requests—all can be done. A single bash tool is equivalent to a collection of countless specialized tools.

Princeton's mini-swe-agent project achieved 74% accuracy on SWE-bench with approximately 100 lines of Python, and it also used only the shell as its tool.

So Why Do Production-Grade Agents Use Dozens of Tools?

If bash is universal, why does Claude Code have 40+ tools instead of just one bash tool?

Three reasons:

1. Precise Control

FileEditTool performs precise string replacement. It ensures that only the target lines are modified, without affecting other content. In contrast, sed or echo for writing files can be error-prone—escape characters, newlines, nested quotes, each is a potential pitfall.

2. Security Boundaries

BashTool can execute any command, including rm -rf /. By breaking it down into specialized tools, you can set precise permissions for each tool. Reading a file might not require confirmation, writing a file might need confirmation, and deleting a file might require double confirmation. A universal tool cannot provide this level of fine-grained control.

3. Reduce Model Error Rate

When tool parameters are structured (e.g., {"file_path": "...", "old_string": "...", "new_string": "..."}), the probability of the model making a mistake is much lower than when it has to write a complete sed command. Structured input helps reduce the model's degrees of freedom; the fewer the degrees of freedom, the less prone to errors it becomes.

Progressive Strategy

Therefore, the best strategy is progressive:

One bash tool          →   Verify the Agent loop works
Split out file ops     →   Read/write/edit/search, precise and controllable
Split out execution    →   Bash limited to only command execution
Add specialized tools  →   LSP/Git/Web, each with its specific function

First, use one tool to prove the loop is effective, then split out specialized tools to address precision and security issues.

Our project will also evolve along this path.


executeBash: The Only Tool Execution Function

Finally, let's look at the actual tool execution:

static String executeBash(String command) {
    try {
        Process process = new ProcessBuilder("sh", "-c", command)
                .redirectErrorStream(true)
                .start();

        String output = new String(process.getInputStream().readAllBytes());
        boolean finished = process.waitFor(120, TimeUnit.SECONDS);

        if (!finished) {
            process.destroyForcibly();
            return "Error: Timeout (120s)";
        }
        return output.isBlank() ? "(no output)" : output.trim();
    } catch (Exception e) {
        return "Error: " + e.getMessage();
    }
}

Several key design choices:

Design Why
sh -c Allows shell features like pipes and redirection
redirectErrorStream(true) Merges stdout and stderr, allowing the model to see error messages
waitFor(120, SECONDS) Timeout protection to prevent commands from hanging
destroyForcibly() Forcibly kills the process after a timeout
Empty output returns "(no output)" Informs the model that "the command succeeded but produced no output," rather than confusing it with an empty string
Exceptions return "Error: ..." Never throws exceptions; always returns a string to the model

The last point is especially important: tool functions should never throw exceptions. Regardless of what happens, return a string for the model to see. Models are good at recovering from error messages: if it sees "Error: Permission denied," it might try a different approach. But if you throw an exception, the loop breaks directly, and the model doesn't even get a chance to retry.


Three Project Loop Comparisons

Let's compare our loop with two production-grade Agents:

yuxu-java-agent mini-swe-agent Claude Code Codex CLI
Language Java Python TypeScript Rust
Loop while (true) while True line 85 while (true) line 307 loop {} line 5877
Exit Condition finishReason != "tool_calls" role == "exit" !needsFollowUp !needs_follow_up
Tool Execution Synchronous, sequential Synchronous, sequential Streaming + Concurrent execution Asynchronous + FuturesOrdered concurrency
API Call Synchronous, waits for full response Synchronous, waits for full response Streaming, token-by-token response Streaming, event-driven
Error Recovery None Exception handling + message appending prompt-too-long auto-compression, max-tokens auto-extension Hook system + retries
Turn Limit None Yes Yes, to prevent infinite loops Yes
Lines of Code ~56 lines ~100 lines ~1400 lines ~7600 lines

The skeleton is the same; the thousands of additional lines are dedicated to handling real-world edge cases: streaming responses, concurrent tool execution, error recovery, and turn limits.


Summary

  1. while (true): The exit is determined by the model, not by you.
  2. Pass complete history when calling the model: The model is stateless; you must resend all messages each time.
  3. Append assistant messages: The model needs to see what it said in the previous turn.
  4. finishReason is the sole exit signal.
  5. tool_call_id correlates requests and results one-to-one.
  6. Tool functions should never throw exceptions: Return errors as strings to the model.

Bash, as the sole tool, is sufficient because it is itself a universal interface. First, use it to get the loop working, then split it as needed—this is the natural evolution path for every Agent project.


Full code for this article: github.com/geyuxu/yuxu-java-agent

Series: build-agent-from-scratch (2/2) View