Untitled

Xiaomi’s MiMo Code Claims to Beat Claude Code on Long Tasks — Let’s Look at That

Xiaomi released MiMo Code on June 10 under the MIT license. It’s a terminal-native AI coding assistant, forked from OpenCode, and it comes with something most coding agents don’t: a serious attempt at solving the context-loss problem for long-horizon tasks.

The benchmark claim — that it outperforms Claude Code on 200+ step tasks — is plausible for a specific reason that’s worth understanding. Not because Xiaomi has a better underlying model, but because they built a different memory architecture around it.


The Context Loss Problem

Here’s what happens when you run a coding agent on a genuinely long task — something with 100+ steps, spanning multiple files, with context that needs to carry forward:

The agent runs, fills its context window, and starts forgetting. Early decisions, architectural choices, the reason a particular pattern was chosen three hours ago — gone. The agent either makes inconsistent choices from that point, or you intervene manually, feeding it back the context it lost.

This isn’t a model intelligence problem. It’s an architecture problem. The model can’t remember what it can’t read. Claude Code added nested sub-agents in 2.1.172 to help with task decomposition, but even with multiple agents each managing their own context, the cross-session state problem remains: what does the agent know when it starts a new session tomorrow?

MiMo Code’s answer is a four-layer persistent memory system.


The Four Layers

MEMORY.md is the top-level project knowledge file. Architectural decisions, coding patterns, project rules, things that never change — they live here and are injected into every new session. Think of it as the agent’s permanent briefing document.

checkpoint.md is managed by a dedicated checkpoint-writer subagent that runs in parallel with the main agent. When the main agent’s context window approaches its limit, the checkpoint-writer snapshots the current state — what’s been done, what’s in progress, what decisions were made — into checkpoint.md. On resume, this file reconstructs the working context without manual intervention.

notes.md is scratch space. Temporary reasoning, intermediate steps, things the agent is working through but doesn’t need to persist. Kept separate from permanent memory to avoid contaminating MEMORY.md with noise.

tasks//progress.md is a hierarchical task execution log. Tasks are structured as T1, T1.1, T1.2 — a tree structure that tracks which sub-tasks exist, which are complete, and what state they’re in. This integrates with the checkpoint system so task progress survives session breaks.

All four layers use SQLite FTS5 for full-text search, which means retrieval isn’t just sequential — the agent can search across its own memory for relevant context rather than linearly reading all of it.


The /dream and /distill Commands

Two command additions stand out.

/dream compresses recent session traces, extracts persistent knowledge into MEMORY.md, and removes outdated entries. It’s a deliberate memory consolidation step — run it at the end of a long session to make sure what matters survives.

/distill does something more interesting: it identifies repeated manual workflows the agent has observed across sessions and packages high-confidence patterns into reusable skills, subagents, or commands. It’s an attempt at self-improvement through usage observation. If you always handle a specific type of error the same way, distill notices that and can formalize it.

The checkpoint-writer subagent deserves extra attention because it runs in parallel with the main agent — it doesn’t block the main task. It makes autonomous decisions about when to checkpoint based on context window utilization, not on a timer or manual trigger.


Is It Worth Testing?

For marketing and content automation workflows — the kind of multi-step research, drafting, and distribution pipelines this site covers — MiMo Code’s memory architecture solves a real problem if you’re running tasks that span multiple days or sessions.

The limitations to be honest about: MiMo Code doesn’t bundle a frontier model. You’re configuring your own. The performance claims about 200+ step tasks aren’t backed by published benchmarks in the README — “outperforms Claude Code on long tasks” is an assertion, not a documented evaluation. The architecture is sound; the benchmark is a marketing claim.

The MIT license means you can run it, modify it, and integrate it freely. As an open-source framework for long-horizon agent workflows, it’s worth adding to your toolkit — especially if context loss on multi-session tasks has been a concrete problem.


Building agents that need to remember across sessions? The AI Agent Lab is working on exactly this class of problems → https://bit.ly/aiagentslab

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post