\n\n\n\n My AI Agent Journey: From Confusion to Clarity - Agent 101 \n

My AI Agent Journey: From Confusion to Clarity

📖 16 min read•3,029 words•Updated May 1, 2026

Hey everyone, Emma here from agent101.net!

I don’t know about you, but lately, it feels like every other day there’s a new “AI agent” popping up, promising to do everything from scheduling your life to writing your next novel. And if you’re anything like I was a few months ago, you might be nodding along, thinking, “Yeah, that sounds cool… but what is it, really? And how do I even start playing with one without needing a PhD in computer science?”

Well, you’re in the right place. Today, we’re diving headfirst into one of the most accessible and frankly, mind-bendingly useful types of AI agents for beginners: the local, file-system-aware agent. Specifically, we’re going to build a super simple one using a local large language model (LLM) and a bit of Python that can read and write files on your computer. Think of it as giving an LLM a tiny, supervised set of hands to interact with your local environment. This isn’t about building a fully autonomous super-agent that takes over your PC (not yet, anyway!), but about understanding the fundamental mechanics of how an agent can “see” and “act” beyond just generating text in a chat window.

Why this specific angle? Because while cloud-based agents are fantastic, the ability to run an agent locally, with access to your own files, unlocks a whole new level of practical utility and privacy. Plus, it’s a fantastic way to grasp the core concepts of agentic behavior without getting bogged down in API keys, cloud credits, or complex deployment. It’s like learning to drive in your own backyard before hitting the highway.

My “Aha!” Moment with Local Agents

I remember trying to organize a mountain of scattered research notes for a project. I had PDFs, text files, markdown files, all over the place. I tried various desktop search tools, but none really understood the *context* of what I was looking for. I wished I could just tell an AI, “Find all notes related to ‘project X’ that mention ‘neural networks’ and summarize them into a new markdown file.”

At the time, I was playing around with local LLMs and realized: what if I could give the LLM not just my prompt, but also the ability to *look at* my files? And then, based on what it found, *create* a new file? That’s when the lightbulb went off. This wasn’t just about chatting with an AI; it was about an AI *doing things* for me on my own machine. It felt incredibly powerful, like having a tiny, super-smart intern who could actually access my digital filing cabinet.

So, today, we’re going to recreate a simplified version of that “intern.”

The Core Idea: LLM + Tools + Loop

At its heart, any AI agent, especially one that interacts with its environment, follows a simple loop:

  1. Perceive: The agent “sees” its environment (e.g., reads a file, gets a user prompt, receives tool output).
  2. Think: The agent (our LLM) processes this information, decides what to do next, and plans its action.
  3. Act: The agent performs an action (e.g., writes a file, executes a tool, asks for more information).
  4. Repeat: The loop continues until a goal is achieved or a stop condition is met.

For our local file agent, “Perceive” means reading file contents or receiving our instructions. “Think” is our LLM deciding if it needs to read a file, write a file, or if it’s done. “Act” means using specific “tools” we give it – simple Python functions to read or write files.

Setting Up Our Local Lab (The Prerequisites)

Before we jump into the code, you’ll need a couple of things. Don’t worry, it’s pretty straightforward:

  • Python 3.8+ installed: If you don’t have it, head over to python.org.
  • A local LLM: This is the brain of our agent. The easiest way to get started is with Ollama. Download it, install it, and then in your terminal, run ollama run llama3 to download and start the Llama 3 model (or any other model you prefer, like Mistral or Gemma). This will give us a local API endpoint to talk to.
  • Basic familiarity with your terminal/command line: We’ll be running a few commands there.

Once Ollama is running, you should be able to send requests to http://localhost:11434/api/generate. This is what our Python script will do.

Building Our Tiny File Agent: Step-by-Step

Let’s create a new folder for our project, say my_local_agent. Inside it, create a Python file named agent.py.

Step 1: The Tools (Our Agent’s Hands)

First, we need to define the actions our agent can take. For now, it’ll be just two: reading a file and writing a file. We’ll present these to the LLM in a structured way so it understands what they do and how to use them.


import json
import os
import requests

# --- Tools our agent can use ---
def read_file_tool(filename: str) -> str:
 """Reads the content of a specified file.
 Args:
 filename (str): The path to the file to read.
 Returns:
 str: The content of the file, or an error message if not found.
 """
 try:
 with open(filename, 'r') as f:
 return f.read()
 except FileNotFoundError:
 return f"Error: File '{filename}' not found."
 except Exception as e:
 return f"Error reading file '{filename}': {e}"

def write_file_tool(filename: str, content: str) -> str:
 """Writes content to a specified file. If the file exists, it will be overwritten.
 Args:
 filename (str): The path to the file to write.
 content (str): The content to write into the file.
 Returns:
 str: A success message or an error message.
 """
 try:
 with open(filename, 'w') as f:
 f.write(content)
 return f"Successfully wrote to file '{filename}'."
 except Exception as e:
 return f"Error writing to file '{filename}': {e}"

# Map tool names to their functions
available_tools = {
 "read_file": read_file_tool,
 "write_file": write_file_tool,
}

# --- Tool descriptions for the LLM ---
tool_descriptions = [
 {
 "name": "read_file",
 "description": "Reads the content of a specified file. Useful for gathering information from existing files.",
 "parameters": {
 "type": "object",
 "properties": {
 "filename": {"type": "string", "description": "The path to the file to read."}
 },
 "required": ["filename"]
 }
 },
 {
 "name": "write_file",
 "description": "Writes content to a specified file. If the file exists, it will be overwritten. Useful for creating new files or modifying existing ones.",
 "parameters": {
 "type": "object",
 "properties": {
 "filename": {"type": "string", "description": "The path to the file to write."},
 "content": {"type": "string", "description": "The content to write into the file."}
 },
 "required": ["filename", "content"]
 }
 }
]

In this snippet, we’ve defined two Python functions, read_file_tool and write_file_tool. These are the actual “hands” of our agent. Crucially, we also have tool_descriptions, which is a list of dictionaries. This is how we’ll tell our LLM what tools are available, what they do, and what arguments they expect. This structured format is common in many agent frameworks and LLM APIs.

Step 2: The Agent’s Brain (Interacting with the LLM)

Now, let’s connect to our local LLM (Ollama) and set up the agent’s main loop. We’ll use a specific “system prompt” to instruct the LLM on its role and how to use the tools.


OLLAMA_API_BASE_URL = "http://localhost:11434/api/generate" # Or your Ollama URL

def call_ollama(prompt, model="llama3", system_message="", temperature=0.7):
 """Sends a request to the Ollama API."""
 messages = []
 if system_message:
 messages.append({"role": "system", "content": system_message})
 messages.append({"role": "user", "content": prompt})

 data = {
 "model": model,
 "prompt": json.dumps({"messages": messages, "tool_descriptions": tool_descriptions}),
 "stream": False,
 "options": {"temperature": temperature}
 }
 
 # Ollama's /api/generate expects a simple prompt string for non-chat completions.
 # For tool calling, we have to wrap the messages and tool_descriptions in the prompt.
 # This is a bit of a workaround for the current Ollama API, which is evolving.
 # More advanced tool calling APIs (like OpenAI's or specialized frameworks) handle this more directly.
 # For simplicity, we're passing the structured data as a JSON string within the prompt.
 # The model needs to be instructed in the system prompt to parse this.

 response = requests.post(OLLAMA_API_BASE_URL, json=data)
 response.raise_for_status() # Raise an exception for HTTP errors
 return response.json()['response'] # Ollama returns the full response in 'response' key

def agent_loop(task: str, max_iterations: int = 5):
 """The main agent loop."""
 print(f"Agent starting with task: '{task}'\n")

 # The system prompt is crucial. It tells the LLM its role and how to interact with tools.
 system_prompt = f"""
 You are an AI assistant capable of interacting with the local file system using provided tools.
 Your goal is to complete the user's request by intelligently using the available tools.
 You can read files and write files.

 Available tools: {json.dumps(tool_descriptions, indent=2)}

 To use a tool, respond with a JSON object in the following format:
 ```json
 {{
 "tool_name": "name_of_the_tool",
 "tool_args": {{
 "arg1": "value1",
 "arg2": "value2"
 }}
 }}
 ```
 After using a tool, you will be given the tool's output. Incorporate this output into your next thought process.
 If you have completed the task or cannot proceed, respond with a final answer starting with "FINAL ANSWER: ".
 Do NOT include any tool calls after "FINAL ANSWER: ".
 Always think step by step before making a decision.
 """

 messages = [{"role": "system", "content": system_prompt}]
 messages.append({"role": "user", "content": task})

 history = [] # To keep track of the conversation and tool outputs

 for i in range(max_iterations):
 print(f"\n--- Iteration {i+1}/{max_iterations} ---")
 
 # Combine all messages into a single prompt for Ollama
 # This is a simplification for Ollama's current API.
 # In more advanced setups, you'd pass a list of messages directly.
 combined_prompt = "\n".join([f"{msg['role']}: {msg['content']}" for msg in messages])
 
 print(f"Agent thinking with prompt:\n{combined_prompt}\n")
 
 # Get LLM's response
 llm_response = call_ollama(combined_prompt, system_message=system_prompt)
 print(f"LLM Response:\n{llm_response}\n")

 if llm_response.strip().startswith("FINAL ANSWER:"):
 print("Agent completed task.")
 return llm_response.replace("FINAL ANSWER:", "").strip()

 try:
 # Try to parse the LLM's response as a tool call
 tool_call_data = json.loads(llm_response)
 tool_name = tool_call_data.get("tool_name")
 tool_args = tool_call_data.get("tool_args", {})

 if tool_name in available_tools:
 print(f"Agent calling tool: {tool_name} with args: {tool_args}")
 tool_output = available_tools[tool_name](**tool_args)
 print(f"Tool output: {tool_output}")
 
 # Add tool output back to the conversation for the LLM to consider
 messages.append({"role": "tool_output", "content": tool_output})
 messages.append({"role": "user", "content": "Tool output received. What's next?"}) # Prompt for next thought
 else:
 print(f"Error: LLM requested unknown tool: {tool_name}. Trying to guide it back.")
 messages.append({"role": "user", "content": f"The tool '{tool_name}' does not exist. Please use one of the available tools or provide a FINAL ANSWER."})

 except json.JSONDecodeError:
 print("LLM did not respond with a valid tool call JSON. Assuming it's trying to talk or finish.")
 # If it's not a tool call, treat it as part of the thought process or a direct answer attempt.
 messages.append({"role": "assistant", "content": llm_response})
 # Add a prompt for the LLM to either use a tool or provide a final answer
 messages.append({"role": "user", "content": "Please either call a tool or provide your FINAL ANSWER."})
 except Exception as e:
 print(f"An unexpected error occurred during tool call or parsing: {e}")
 messages.append({"role": "user", "content": f"An error occurred: {e}. Please try again or provide a FINAL ANSWER."})

 print("\nAgent finished without reaching a final answer within max iterations.")
 return "Agent could not complete the task within the given iterations."

if __name__ == "__main__":
 # Example Usage:
 # 1. Create a test file for the agent to read
 with open("test_input.txt", "w") as f:
 f.write("This is some important data about AI agents and their local capabilities.")

 # 2. Run the agent with a task
 # task1 = "Read the content of 'test_input.txt' and then write a new file called 'summary.txt'
 # containing a 1-sentence summary of 'test_input.txt'."
 # final_result = agent_loop(task1)
 # print(f"\nFinal Agent Result: {final_result}")

 # You can uncomment and try different tasks!
 # task2 = "List all files in the current directory and then write them to a file named 'directory_listing.txt'."
 # Note: Our current tools don't have 'list files', so this will fail and demonstrate limitations.
 # This is intentional to show how tools define capabilities.
 # final_result = agent_loop(task2)
 # print(f"\nFinal Agent Result: {final_result}")

 task3 = "Read 'test_input.txt', identify any mention of 'AI agents', and then create a new file named 'ai_agent_mentions.txt' containing only the sentence(s) that mention 'AI agents'."
 final_result = agent_loop(task3)
 print(f"\nFinal Agent Result: {final_result}")

 # Clean up test files
 # os.remove("test_input.txt")
 # if os.path.exists("summary.txt"):
 # os.remove("summary.txt")
 # if os.path.exists("ai_agent_mentions.txt"):
 # os.remove("ai_agent_mentions.txt")

A few things to note here:

  • system_prompt: This is the secret sauce! It tells the LLM its role, explains how to use the tools (by outputting specific JSON), and defines the “FINAL ANSWER” format. Without a clear system prompt, the LLM will just chat aimlessly.
  • agent_loop: This is where the magic happens. It takes a task, sends it to the LLM, parses the LLM’s response, executes a tool if one is called, and feeds the tool’s output back to the LLM for the next step. This is the “Perceive-Think-Act” loop in action.
  • Ollama API interaction: The call_ollama function is a simple wrapper. The important part is how we construct the prompt for Ollama. We’re essentially stuffing our messages and tool descriptions into the prompt string, relying on the LLM (guided by the system prompt) to parse this and respond with JSON. This is a common pattern when an LLM API doesn’t have native “tool calling” capabilities built in, or when you’re using a simpler local server like Ollama.

How to Run It (And What to Expect)

  1. Save the code above as agent.py in your my_local_agent folder.
  2. Make sure Ollama is running and Llama 3 is downloaded (ollama run llama3).
  3. Open your terminal, navigate to the my_local_agent folder.
  4. Run python agent.py.

You’ll see a lot of output! The agent will first create test_input.txt. Then, it will start its loop:

  1. It will “think” (generate a response from the LLM based on the task and available tools).
  2. It should output a JSON object calling the read_file tool with test_input.txt as the argument.
  3. Our Python script will execute read_file_tool("test_input.txt").
  4. The output of that tool (the file content) is then fed back into the LLM as part of the next prompt.
  5. The LLM will “think” again, now with the file content in its context, and decide to call the write_file tool to create ai_agent_mentions.txt with the summarized content.
  6. Finally, it should output “FINAL ANSWER:” indicating it’s done.

After it runs, you should find two new files in your directory: test_input.txt and ai_agent_mentions.txt. Open ai_agent_mentions.txt and see if the agent did its job!

Pushing the Boundaries (and Acknowledging Limitations)

This simple agent, while powerful in its demonstration, has some limitations:

  • Error Handling: If the LLM produces malformed JSON, our script might crash or get stuck. Robust agents have more sophisticated parsing and recovery mechanisms.
  • Tool Discovery: We manually defined our tools. Real-world agents might need to dynamically discover tools or even generate new ones.
  • Complex Reasoning: Our LLM is just given the raw tool output. For more complex tasks, you might need to give the LLM intermediate prompts or chain tools together in more sophisticated ways.
  • No Directory Listing: Notice how I mentioned a task to “list files”? Our agent doesn’t have a tool for that. This highlights that an agent is only as capable as the tools you give it. Add a list_directory_tool if you want it to browse!
  • Context Window Limits: For very large files or many iterations, the LLM’s context window can fill up, causing it to “forget” earlier parts of the conversation.

Despite these, this setup is a fantastic starting point. You’ve just built an AI agent that can *perceive* its local environment (by reading files) and *act* on it (by writing files), all powered by a local LLM!

Actionable Takeaways

Alright, you’ve got the code and seen it in action. What next?

  1. Experiment with the System Prompt: This is where you really define your agent’s personality and capabilities. Try making it a “creative writer agent” or a “code reviewer agent.”
  2. Add More Tools:
    • list_directory(path: str) -> str: Returns a string with filenames in a directory.
    • search_web(query: str) -> str: (Requires an external API like SerpAPI or Google Search API for real web access, but you could simulate it with a mock function for local testing).
    • run_command(command: str) -> str: (Use with EXTREME CAUTION!) This is powerful but dangerous. An LLM running arbitrary shell commands could delete files or execute malicious code. Only implement this if you know exactly what you’re doing and perhaps in a sandboxed environment.
  3. Explore Agent Frameworks: Once you understand the core loop, look into frameworks like LangChain, AutoGen, or Rasa. These provide more structured ways to build agents, manage tools, and handle conversational flows. They abstract away a lot of the boilerplate we wrote here.
  4. Think About Specific Problems: Instead of general tasks, think about a specific, repetitive task on your computer. Can you break it down into steps that an LLM with file R/W access could solve? Maybe organizing downloads, generating boilerplate code, or summarizing meeting notes.

This is just the tip of the iceberg when it comes to AI agents, but by building this local, file-aware agent, you’ve taken a massive step beyond simply chatting with an LLM. You’ve given it a way to interact with the world – your world, right on your machine. That, my friends, is truly exciting!

Let me know in the comments what kind of local agents you’re dreaming up!

Happy hacking,

Emma

agent101.net

🕒 Published:

🎓
Written by Jake Chen

AI educator passionate about making complex agent technology accessible. Created online courses reaching 10,000+ students.

Learn more →
Browse Topics: Beginner Guides | Explainers | Guides | Opinion | Safety & Ethics
Scroll to Top