its-just-shell

Agent Lightning ↗

project

Microsoft Research · 2025

Decouples agent optimization from agent execution via a sidecar that collects traces non-intrusively — the same separation of concerns that Unix process boundaries already enforce between a program and its observer
The sidecar pattern is tee for training data: watch what the agent does, capture the interaction traces, optimize separately — no modification to the agent’s own code or workflow
Proves that optimization is orthogonal to orchestration; if your agent is a shell loop, the traces are already in your filesystem (stdout logs, exit codes, tool call records) and the training pipeline is just another downstream consumer

AgentFS — The Missing Abstraction for AI Agents ↗

post

Turso · November 2025

Everything an agent does — files, state, tool calls — lives in a single SQLite database exposed as a POSIX filesystem; the abstraction is not a new API, it is the filesystem itself
FUSE support lets agents use git, grep, and standard Unix tools directly against their state store with zero integration code; the trust boundary is the mount point, not a permission model in application code
Makes agent state portable (one file), auditable (SQL queries over history), and composable (multiple agents share a filesystem with conflict resolution) — the same properties Unix gives processes via /tmp and pipes

Bash One-Liners for LLMs ↗

post

Justine Tunney · December 2023

Treats LLMs as standard Unix filters: pipe data in via stdin, get structured output on stdout, chain with sed, curl, and links — the model is just another composable process
Uses –temp 0 to make LLM output deterministic, turning a stochastic model into a reproducible Unix tool suitable for scripting and automation
Demonstrates that llamafile turns an LLM into a single-file executable callable from bash — no Python, no framework, no daemon; the filesystem is the package manager

llm-functions ↗

project

sigoden · 2024

Defines LLM tools as plain Bash functions with structured comments — the tool schema is generated automatically from the script itself, no SDK or serialization layer needed
Agents are composed from tools + prompts + documents, assembled at the filesystem level; adding a capability means dropping a shell script into a directory
Proves that function calling does not require a framework: a shell function, a naming convention, and a comment block are sufficient for an LLM to discover and invoke a tool

From Commands to Prompts: LLM-based Semantic File System for AIOS ↗

paper

Shi, Mei, Zhang et al. · 2025

Proposes replacing shell commands with natural-language prompts that compile down to the same POSIX file operations — the filesystem API is the stable interface, whether the caller is a human or an LLM
Demonstrates 15%+ retrieval accuracy gains and 2.1x speed improvement over traditional file systems by adding a semantic index layer, while preserving full POSIX semantics underneath
Includes safety mechanisms (confirmation before destructive ops, rollback) that map exactly to the trust-gradient argument: the OS already has the permission model, the agent just needs to respect it

smolagents ↗

framework

Hugging Face · December 2024

The entire library is roughly 1,000 lines of code — a deliberate rejection of the sprawling framework approach; minimal abstraction means you can read the whole agent runtime in one sitting
Code agents outperform JSON tool-calling agents by ~30% fewer steps because code is inherently composable: you can nest calls, define variables, and loop — the same properties that make shell scripts powerful
The core design insight maps directly to the shell thesis: agents should write executable actions (code), not describe desired actions (JSON) — the agent is a script

The Unreasonable Effectiveness of an LLM Agent Loop with Tool Use ↗

post

sketch.dev · May 2025

The entire agent pattern reduces to a 9-line while loop: read input, call tool, feed output back — this is a read-eval-print loop, the same pattern shells have used for fifty years
With just one general-purpose tool — bash — current models can solve many problems in a single shot; the agent does not need a framework, it needs a shell
Argues custom agent loops will replace tasks “too specific for general tools and too unstable to automate traditionally” — the exact niche shell scripts have always filled

agent-browser ↗

project

GitHub / Vercel · 2026

Browser automation CLI that reduces context usage by 93% through a “snapshot + refs” system — elements get short labels (@e1, @e2) instead of dumping the full accessibility tree into the LLM
Three-layer architecture (Rust CLI → Node.js daemon → Playwright) that looks like any other Unix tool from the agent’s perspective: commands in, structured output out
Same thesis as Playwright CLI but pushed further — the agent controls a browser through shell commands, reinforcing that tool use is just command execution with minimal data passing

Building Effective Agents ↗

post

Anthropic · December 2024

Argues the most effective agent architectures are augmented LLMs with simple tool loops, not multi-agent frameworks
Distinguishes “workflows” (predetermined tool orchestration) from “agents” (model-directed tool use) — both reduce to tool loops at different autonomy levels
Recommends starting with the simplest implementation and adding complexity only when measurably needed

Playwright CLI ↗

project

GitHub / Microsoft · 2025

Browser automation as a CLI instead of MCP — agents discover commands from help output rather than tool schemas, proving that shell conventions are sufficient for tool integration
Deliberately “token-efficient” by not forcing page data into the LLM context, which is the Unix philosophy applied to agents: do one thing, pass minimal data between steps
Validates the thesis that agent tool use reduces to command execution — browser automation is just another program the model shells out to

Taste Is Not a Moat ↗

post

sshh.io · 2026

Argues that taste is “alpha” (a decaying edge) not a “moat” — as AI baselines improve every few months, individual judgment only matters relative to what the tools do by default
Reframes the human role as “taste extractor”: articulating tacit preferences so tool loops can operationalize them, which is exactly the shell pattern of encoding intent into composable commands
Proposes concrete extraction techniques (A/B interviews, ghost writing, external reviews) that all reduce to the same structure — a human-in-the-loop refining outputs through iterative feedback cycles

Model Context Protocol (MCP) ↗

protocol

Anthropic · November 2024

Open protocol for connecting AI assistants to external data sources and tools through a standardized JSON-RPC interface
Servers expose tools, resources, and prompts; clients (LLMs) discover and invoke them — the AI equivalent of USB-C for context
Keeps tool integration composable: each server is a single-purpose process, orchestrated by the model’s own tool loop

ReAct: Synergizing Reasoning and Acting in Language Models ↗

paper

arXiv · October 2022

Interleaves chain-of-thought reasoning traces with concrete actions in an observe-think-act loop
Outperforms pure reasoning (chain-of-thought) and pure acting (action-only) on knowledge-intensive tasks by grounding thoughts in tool outputs
Foundational pattern behind most modern agent frameworks — the shell-like “read, eval, print” loop applied to LLMs

Toolformer: Language Models Can Teach Themselves to Use Tools ↗

paper

arXiv · February 2023

Demonstrates that language models can learn when and how to call external tools (calculator, search, calendar) through self-supervised training
The model inserts API calls into its own text generation when doing so reduces perplexity — tool use emerges from utility, not instruction
Shows that tool augmentation is a natural extension of next-token prediction, not a bolted-on capability

LangChain ↗

framework

GitHub · October 2022

Framework for composing LLM calls with tools, memory, and retrieval into multi-step chains and agents
Popularized the “chain” abstraction — sequential LLM calls where each step’s output feeds the next — and the “agent” pattern with dynamic tool selection
Useful as a reference for what complexity emerges when tool loops scale; argues for the shell thesis by showing what happens without simplicity constraints

Anthropic Tool Use Documentation ↗

docs

Anthropic Docs · 2024

Reference for Claude’s native tool-use interface: define tools as JSON schemas, the model emits structured tool_use blocks, you execute and return results
The interaction pattern is a synchronous tool loop — exactly the shell paradigm of prompt → command → output → prompt
Supports forced tool use, parallel tool calls, and streaming, showing how the simple loop extends without changing its fundamental shape

research