Agentic Engineering Patterns: What Simon Willison Is Documenting
Executive Summary
Agentic engineering is maturing from ad-hoc prompting into an operational discipline. The strongest emerging pattern language centers around red/green loops, tests-first startup, role separation, and explicit closure states. Independent guidance from practitioners and model vendors converges on the same thesis: workflow governance now determines quality more than raw code generation capability.
Introduction & Background
The discourse around coding agents often over-focuses on model output quality. But in production settings, teams are discovering that output quality is only one variable; reproducibility, verification cadence, and review structure dominate outcomes. Simon Willison’s documentation effort is important because it captures this transition from clever prompting to operator-grade methods.
Methodology
This report synthesizes primary pattern documentation and corroborating practitioner/operator references. Sources were selected for practical relevance to engineering operations. Claims were included only when they provided actionable mechanism-level insight (how teams should run work), not just conceptual framing.
Key Findings
- Code generation is commoditizing; verification is not. The bottleneck is trust in outcomes, not volume of output. Source
- Red/green is becoming the default reliability loop. It provides a compact but durable control for agent output quality. Source
- Session bootstrap quality strongly predicts drift. “First run tests” discipline reduces context drift early. Source
- Linear walkthroughs create auditable system understanding. They improve onboarding and review quality in unfamiliar codebases. Source
- Parallel agent throughput requires governance. Without bounded review, parallelism amplifies defects. Source
- Vendor guidance converges on decomposition + evaluators. Anthropic and OpenAI operator guidance align with practitioner patterns. Source · Source
Analysis & Discussion
The common misconception is that better prompts alone solve reliability. The evidence says otherwise: prompts matter most when embedded in enforced loops. This is the same historical pattern seen in software delivery more broadly—teams win by institutionalizing quality constraints, not by depending on individual heroics. Agentic engineering appears to be replaying this arc faster.
Recommendations & Conclusion
Adopt a minimum operating standard: tests-first startup, red/green verification for non-trivial changes, planner/implementer/verifier role separation, and explicit terminal states (done/blocked/cancelled). For orgs scaling agent usage, treat this as process infrastructure, not optional craftsmanship. The strategic upside is not just fewer failures—it is faster trustworthy iteration.