research

February 17, 2026

Agent Lightning ↗

project

Microsoft Research · 2025

Decouples agent optimization from agent execution via a sidecar that collects traces non-intrusively — the same separation of concerns that Unix process boundaries already enforce between a program and its observer
The sidecar pattern is tee for training data: watch what the agent does, capture the interaction traces, optimize separately — no modification to the agent’s own code or workflow
Proves that optimization is orthogonal to orchestration; if your agent is a shell loop, the traces are already in your filesystem (stdout logs, exit codes, tool call records) and the training pipeline is just another downstream consumer

project

sigoden · 2024

Defines LLM tools as plain Bash functions with structured comments — the tool schema is generated automatically from the script itself, no SDK or serialization layer needed
Agents are composed from tools + prompts + documents, assembled at the filesystem level; adding a capability means dropping a shell script into a directory
Proves that function calling does not require a framework: a shell function, a naming convention, and a comment block are sufficient for an LLM to discover and invoke a tool

February 16, 2026

project

GitHub / Vercel · 2026

Browser automation CLI that reduces context usage by 93% through a “snapshot + refs” system — elements get short labels (@e1, @e2) instead of dumping the full accessibility tree into the LLM
Three-layer architecture (Rust CLI → Node.js daemon → Playwright) that looks like any other Unix tool from the agent’s perspective: commands in, structured output out
Same thesis as Playwright CLI but pushed further — the agent controls a browser through shell commands, reinforcing that tool use is just command execution with minimal data passing

project

GitHub / Microsoft · 2025

Browser automation as a CLI instead of MCP — agents discover commands from help output rather than tool schemas, proving that shell conventions are sufficient for tool integration
Deliberately “token-efficient” by not forcing page data into the LLM context, which is the Unix philosophy applied to agents: do one thing, pass minimal data between steps
Validates the thesis that agent tool use reduces to command execution — browser automation is just another program the model shells out to