AI_devs 4: S01E01 — Getting Vi off the ground

AI_devs 4 is a course where, “by building production-ready solutions, we’ll discover what generative AI can and can’t do”.

I’m building Vi. A personal assistant to learn the course content with (or instead of?) me, and tackle every lesson’s challenges. I may be a bit late to the party, but who cares?

This note recaps my struggles to get the workflow working, and capture the first flag in an absolutely overengineered manner.

Vi fetching the first lesson and outlining the approach

For the tech stack I decided to use:

NanoClaw as agent runtime
Hetzner VPS as infrastructure
Tailscale for access control
OpenTofu for infrastructure management
Langfuse for agent monitoring
OpenSpec as Spec-Driven Development framework
Telegram to interact with Vi
Claude Code as the only way to write any code

Why NanoClaw

I wanted to give OpenClaw a solid test. But it’s a huge thing, with tons of code, and their PR is rather poor. So I looked at various available forks, and NanoClaw got my attention. Their philosophy is something I could add to my own projects:

Small enough to understand. One process, a few source files and no microservices. If you want to understand the full NanoClaw codebase, just ask Claude Code to walk you through it.
Secure by isolation. Agents run in Linux containers and they can only see what’s explicitly mounted.
Customization = code changes. No configuration sprawl. Want different behavior? Modify the code. The codebase is small enough that it’s safe to make changes.

In practice, this meant:

Container isolation — each agent runs in its own Docker container.
Credential proxy — agents can never see API keys or secrets.
Fork and modify — no plugin system, no config layers.

The feedback loop

To make development work, I run two versions of the bot — production on the VPS, development locally. Claude Code can send messages directly to the dev bot through a Telegram CLI client. It builds a tool, sends a message to dev Vi, sees what breaks, fixes it, and iterates — all without touching production. This way I can focus on writing specifications and guiding direction instead of acting as glue between Claude Code iterations.

I came up with the following repo structure:

ai-devs-4/
├── vi/
│   ├── CLAUDE.md                # Agent identity, workflow instructions
│   ├── tools/                   # Fetch lessons, call LLMs, etc.
│   ├── knowledge/lessons/       # Knowledge part of the lessons
│   ├── missions/                # Lesson challenges
│   └── work/                    # Workspace for Vi
├── dev/
│   ├── send.js                  # Send message to Telegram
│   └── traces.mjs               # Wrappers around langfuse CLI
├── deploy/                      # Docker Compose (dev + prod)
│   ├── docker-compose.yml       # Production
│   └── docker-compose.dev.yml   # Development
├── infra/                       # OpenTofu (Hetzner VPS)
└── openspec/                    # Spec-driven changes

Vi automates a headless browser through a Chromium browserless container, can open the Circle.so course page and fetch the lesson content. It then splits the lesson into knowledge and mission parts.

It outputs the plan, consults with me, and proceeds to execution if approved.

Once a final challenge is solved, the hub client verifies the answer and auto-extracts the {FLG:...} pattern from the response.

I left the final celebratory flag submission to myself.

Mission S01E01 completed — flag extracted

What hurt

A lot in the initial setup. The most important:

Dev/prod drift – I hadn’t used NanoClaw before, so obvious things caught me off guard. Stale containers running in production even though I already deployed a new version. Sessions leaking between dev and production. At one point Claude figured out it’s a great idea to copy dev session content and push it to production as a “fix” for one of the problems.

Keeping Claude Code away from solving challenges. When I asked it to build tools for Vi to solve a mission, Claude would sometimes just solve the mission directly. Instead of creating a tool that lists files in a directory, it would list the files itself and provide that output to Vi.

Free/cheap models — I wanted to use as cheap models as possible, however I realised that NanoClaw can’t work out of the box with non-Anthropic models. For now I’m defaulting to Haiku 4.5 as main model.

Where I am now

The dev loop works in a (kinda) deterministic and useful way, and Claude Code was finally able to go over the workflow as I desired – that is by adding tools and instructions to Vi instead of solving the challenges.

Langfuse turned out to be quite useful already for debugging production Vi problems, which actually came as a surprise to me. I expected it to come in handy a bit later in the course.

And with that — first flag captured.

Why NanoClaw

The feedback loop

What hurt

Where I am now

LET'S TALK