An Opinionated Agentic Engineering Workflow

Table of Contents

Introduction
#

Agentic engineering refers to creating code in a structured way using AI coding agents and harnesses. It stands apart from ‘vibe coding’ due to its formal, disciplined process, prioritizing software design, security, and quality. Essential steps include review, analysis, and iteration.

Below, I present my current agentic engineering workflow and a consolidated set of guidance from various sources. I’ll aim to keep this post up to date as I continue to learn and experiment with new approaches.

Please be aware of a few caveats related to the workflow. AI was not used in the writing of this post.

Overview
#

At a high level, I use the following workflow. The workflow is implemented using a custom automation skill:

Context: Work is captured in a file artefact (I use ISSUES.md, others use prd.json). Additional research may be added to items.
Plan: Using Claude Code and custom subagents, the user iteratively designs the change and creates an implementation plan.
Implementation: The root agent executes the implementation plan, including adding tests.
Test: Automated tests are reviewed, and the agent requests manual testing and feedback from the user.
Review: The code and UX are reviewed by custom subagents and by a human reviewer. This includes identifying refactoring opportunities.
Refactor: The agent addresses review feedback.
Validation: Final validation is done (manually) alongside pre-commit checks (primarily static analysis).
Commit: ISSUES.md is updated, and a commit message is prepared. The agent commits upon approval.

I describe each phase in more detail below.

1. Context
#

Configuring and managing the agent’s context is a critically important capability for successful agentic engineering. Effectively setting an agent’s context requires knowledge of best practices and official technical documentation (such as vendor-supplied model cards), as well as developing predictive intuition through experience about model behavior given specific input within the context window. Claude Code’s best practices have some great guidance on developing this intuition.

I rely on four techniques to effectively configure and manage context: dedicated instruction files (primarily CLAUDE.md), granular task decomposition, explicit context seeding, and Claude’s persistent memory architecture.

CLAUDE.md
#

CLAUDE.md forms the foundation of the agent’s context for a given task. The intent is to give the agent project specific instructions, guidance, and knowledge on how you’d like to implement the project. My CLAUDE.md generally has the following structure:

Project guidelines: personalized guidelines specific to the project and the detailed tech stack.
Architecture guidelines: high-level architecture guidance and expectations.
Domain guidance: specific domain knowledge and rules related to the project.
Implementation guidance: specific implementation patterns and rules to follow for general domains (e.g., Authentication, Database).
Common errors and solutions: known complications and problems within the domain or tech stack, and solutions to those. May include code snippets.
Anti-patterns: things the agent should never do.

Tip

My preferred tech stack generates an AGENTS.md. To enable interoperability with Claude Code, I symlink CLAUDE.md to AGENTS.md (ln -s AGENTS.md CLAUDE.md).

A comprehensive CLAUDE.md is important, but keep it as terse as possible, as it will always be part of the context window. I use common techniques, such as CRITICAL- or NEVER-styled emphasis, throughout CLAUDE.md. Additionally, it’s important to continually update CLAUDE.md during development as guidelines are refined and problems are identified. When guidelines are general enough, and visibility is important, I’ll update the project’s README and refer to it in CLAUDE.md.

Decomposing Tasks
#

Breaking up work into small, focused chunks is important for managing context and preventing rabbit-holing. The goal here is to fit each change into a single context window. What’s worked well for me is to use user stories as the unit of work. The way user stories are typically written also gives the agent a natural way to verify the work. I’m pedantic about story scope: a feature to list all products would be separate from creating a new product on a product page. With 1M context windows becoming commonplace, I may relax this in the future.

Seeding Context
#

Another useful technique is to seed the context with pertinent information to both guide and constrain the agent. I often give the agent specific links, guides, or even deep research documents before planning starts.

I also like two specific pieces of advice from Simon Willison on context seeding:

First run the tests refers to running the test suite before anything else. This serves two purposes: it gives the agent an overview of what’s in the codebase, and it makes the agent aware that there is a test suite to update.
Review the last N commits is especially useful when work spans multiple commits and the agent needs context of what’s been implemented so far.

What About Memory?
#

Throughout 2025, I implemented workflow steps to capture knowledge throughout the development process. Typically, I’d capture new knowledge in LEARNINGS.md or DESIGN.md, and then directly refer to these files from CLAUDE.md. In the last couple of months, I’ve found this unnecessary, as Claude Code’s memory system seems much better at remembering tricky parts of the development or key decisions. My subagent specifications all include guidance on creating effective memories, and I will often force the root agent to remember something specific by instructing it to do so.

2. Plan
#

Planning is a critical part of the workflow, and I rarely skip it, even for small tasks. The planning phase is where I ensure the agent is aware of relevant code in the existing codebase and that the correct design decisions are made. Commonly, I will also give the agent specific research, URLs, libraries, and code examples to guide and constrain the plan. I use Claude Code’s plan mode to prevent the agent from making any changes during the planning phase.

The key to a good plan is iteration and attention to detail. It helps to be particular and nitpicky, as even small deviations from the intended implementation can spiral out of control later on.

Tip

A particular technique that’s worked well for me is to ask the agent for a MINIMAL implementation plan. This technique tends to avoid over-engineering and saves context during implementation.

It’s useful to instruct the agent to ask you questions during the planning phase. The questions will both improve the plan and give you an idea of what the agent has in mind (which sometimes differs greatly from what you’d expect).

Claude Code will launch several exploration agents during planning to understand the existing codebase, but referring to specific code modules in the task instructions can save time and tokens.

I set up two subagents for planning (and later, review): an architect and a UX designer (for web applications). It’s important for subagents to maintain separate context windows, as they often conduct extensive web research during implementation. Additionally, the subagents’ memories help prevent duplication of specific guidance or general implementation requests.

3. Implementation
#

Implementation proceeds by switching to Claude’s ‘accept edits’ mode (I’ve also been evaluating auto mode). I periodically audit the agent’s code-generation process, focusing on monitoring the utilization of the context window.

I prefer keeping the implementation within the root context for a couple of reasons. First, it affords me better visibility into what the agent is doing. Second, it allows me to tailor the context (through planning and review) to the primary goal: creating high-quality, functional code. Third, it allows direct interaction and iteration with the agent. And finally, it leverages Claude Code’s system and coding instructions, which I’ve generally found to be more robust than custom implementation agents I’ve created myself.

During the implementation phase, my main goal is to avoid over-engineering. You’ll often be able to catch the agent in the midst of an engineering rabbit-hole, in which case you can intervene and get the implementation back on track.

Compaction and the Context Window
#

As mentioned above, you should strive to contain a unit of work within a single context window. Even with the 1M context window, benchmarks show a decrease in both retrieval and reasoning performance as context increases, so keeping the context window small remains important.

Of course, there will be situations where you run out of context window (especially if the agent struggles). Claude Code’s default action here is to run compaction. Compaction is detrimental to the agent’s performance, and the agent might lose important instructions, guidance, or even track of its progress in the workflow. Generally, I use one of the following options when I’m out of context:

Formally pause the work, that is, commit the current changes as WIP, then start a new task with updated instructions, but seed the context with the WIP commits and then replan the implementation.
Allow compaction to occur, but reseed the context with highly pertinent information after completion. Custom compaction instructions help with this.
Stop work well before the compaction threshold (say 80% of the window) and reduce the scope of the remaining work (e.g., skipping UI or test implementation). This work then shifts to the next task.

4. Tests
#

The project’s automated test suite serves two purposes: As always, it’s one of the primary quality gates for the project, and the usual guidance on creating a good automated test suite (testing what vs. how, isolation and repeatability, etc.) still applies.

However, specifically for agentic engineering, the test suite also plays an important role in conveying context to the agent, both in terms of the functionality and feature set and of the actual code design and implementation. As such, maintaining a good test suite is paramount to having a high-quality codebase. Further, tests help ensure the agent doesn’t write code that is never run or simply doesn’t work.

In my experience, I’ve rarely needed to instruct the agent to write tests; the need to do so seems to be fundamental to Anthropic’s models and Claude Code’s system prompt (and other instructions). However, my CLAUDE.md and README always emphasize the need for the test suite to pass before a commit, and I generally include guidance on what good tests look like.

Models also understand Test Driven Development well, and many practitioners recommend this approach with agents. I agree that this tends to improve code quality with agentic engineering, but it comes with significant overhead and costs that are sometimes unnecessary.

Tests and Context
#

Test implementation is a natural way to split up development, and therefore can be used as a mechanism to break down work. In fact (though this is no longer the case since 1M context windows), my general advice used to be to never implement tests and code in the same task. That said, test implementation and running the test suite can consume a lot of the context window, so one technique is to implement a Testing subagent with a separate context that focuses specifically on running and implementing tests.

Functional Testing
#

The workflow also instructs the agent to test changes via the browser. Simon Willison has a good guide on this, but with Claude’s Chrome integration, little instruction or setup is required to enable automation of functional testing.

5. Review
#

I used a three-pronged approach to review: static analysis, agentic review with tooling and subagents, and manual review.

Static Analysis
#

I’ve found static code analysis to be far more useful for agentic engineering than when writing code by hand. Coding agents readily absorb the annoyances and overhead of static analyzers, and it’s a great way of making sure the agent sticks to the rules. In particular, the static analyzer should support a strict mode (that fails the build or precommit) and should check for refactoring opportunities in addition to simple errors and complexity checks. For my preferred tech stack, I use Credo.

Agentic Review
#

Agentic review (having the agent review its own code and implementation) works well, even if it’s exactly the same model. I’ve found two additional techniques critical to improving the feedback:

Give the agent a way to check its own work. Primarily, I leverage Claude’s Chrome integration for this (the Playwright CLI and an associated skill will do the same), and when needed, database-specific tools. Claude Code also has bash tools that work well in many situations.
Subagents. I use dedicated, project-specific code review subagents to leverage specific models (Opus) and to provide review-specific instructions. The review subagent also tends to accumulate specific memories that can complicate the implementation context, so it’s useful to have it separate.

Manual Review
#

Finally, I manually review all changes made by the agent. This is still a critical part of the process for building scalable, secure, production-grade software.

Security Review
#

The code review subagent is instructed to review security alongside other objectives (guided by guidance in CLAUDE.md and its own instructions). For sensitive areas, I’ll additionally use Claude Code’s /security-review command.

6. Refactor
#

The refactor step typically works the same way as the main implementation step (accept edits), except that the review feedback serves as the plan.

It’s likely you’ll run out of context if the refactor requires a major code rework. In such cases, rather log the refactoring as a separate task and start with a fresh (but seeded) context.

I adhere to the Rule of Three for code duplication (either detected through static analysis or manual review). I’ve found that modern coding agents are good at creating the correct abstraction when there are three or more duplications.

7. Validation
#

During the validation phase, I perform functional acceptance testing and final review. This is the step where I make sure I’m happy with what’s been done and ensure I can take responsibility for the code. The emphasis here is on ensuring the code is working and that review feedback has been incorporated. I sanity-check the code addressing review feedback and perform final (manual) testing of the functionality. At this stage, you should have a good idea of what has changed and how, so you can focus on last-mile polish and edge cases. Code is cheap now, so it pays to focus on even small details, as they have a way of producing cumulative errors later on.

8. Commit
#

The final phase is to commit the changes. I ensure the issue tracking (ISSUES.md, prd.json, etc.) is updated and that the agent generates an accurate, well-formatted commit message. A final check worth doing is to ensure any minor decisions, gotchas, or higher-level guidance are captured in the agent’s memory tooling.

Tip

It’s common for the agent to ‘forget’ initial implementation work if you’ve gone through compaction and then leave it out of the commit message. Always review your commit messages.

Summary
#

Agentic engineering is rapidly evolving into a deep and sophisticated engineering discipline under the software engineering umbrella. Workflows such as the one I outline above (and many other, far more sophisticated ones) are rapidly being developed, analyzed, and refined as the field collectively establishes best practices. Many tout the goal as full-scale automation of software engineering, and to a degree, that is true, but it doesn’t imply a lack of human involvement. The engineering focus is shifting from the code to what I’m terming the software factory: the setup, configuration, fine-tuning, specialization, and maintenance of the workflows, skills, tools, and harnesses used to run coding agents and produce software. In the same way that industrial engineers optimize production processes, software engineers are increasingly taking responsibility for automated processes that produce working software rather than producing software themselves.

In the appendix below, I provide additional detail on some aspects of the workflow.

Caveats
#

The following caveats are applicable:

This is simply my workflow, and is definitely not the most sophisticated or best out there. Simplicity is a goal and also a moving target. But it does work. I’ve built complete, production-grade, scalable applications of varying complexity in the 50k-100k LOC range using this workflow and earlier variants.
The workflow is specifically developed for Claude Code. I’ve been a Claude Code user since release and have the most experience with it. That being said, I expect the workflow to be portable to tools like OpenCode; I just haven’t tried it myself yet.
The workflow is not cost-optimal. I’ve mainly used Claude Code through a Max subscription, which carries a fixed cost. Cost optimization is a priority for me, and I will look into it in the near future.
Disciplined agentic engineering is hard. It often forces frequent context switching, mandates deep expertise, and requires strong critical thinking and reasoning skills. It’s different from coding, but it requires similar levels of sustained focus. I do not recommend approaches like these if you are still learning the field.

Support
#

Tip

Found this post useful? Consider subscribing for updates, or supporting my work.

Appendix
#

Subagents
#

I currently have 3 subagents in the core workflow and am experimenting with a 4th.

Architect
#

The architect’s primary role is to analyze a user story (or instruction) and create a clear design and implementation plan. The system prompt encapsulates the following:

The core design philosophy and principles
The design process, with a step-by-step breakdown of how to analyze requirements and produce a design
Detailed knowledge about the foundational tech stack
Instructions and guidelines for memorization (what to save, and what not to save)

Code Reviewer
#

The code reviewer’s primary role is the review of modified code before commits. The system prompt encapsulates:

Guidance for reviews, with a focus on pragmatism, functional code, and security
Review checklist
Feedback prioritization (Security > Correctness > Best Practices > Performance)
Guidelines for memorization

UX Reviewer
#

The UX reviewer’s primary role is to conduct user-centric reviews of UI elements. The system prompt encapsulates:

UX principles and the design philosophy (e.g., mobile-first, visual hierarchy, color and spacing, etc.)
Tech stack specific guidance
Review checklist
Guidelines for memorization

Tester (Experimental)
#

The tester agent is experimental, as I’m still sceptical of the value of the agent vs. automated e2e tests. The role of the agent is to test the application as a user might. The system prompt encapsulates:

Guidance and skills on how to access the application (Chrome integration/Playwright)
Test prioritization
Domain knowledge and memorization guidelines

Model Selection
#

Opus 4.6: Opus has been the primary model driving the workflow and subagents in the initial part of 2026. I’ve found Opus to be more consistent in delegating to subagents (as compared to Sonnet), and have often been astonished at its ability to find issues during review, including niche domain-specific issues.
Sonnet 4.6: After the release of Sonnet 4.6, I’ve switched over all subagents except for the UX reviewer. I still prefer Opus for the root agent (due to the subagent delegation mentioned above and greater Chrome capabilities), but Sonnet easily handles the rest of the work.
Gemini 3.1 Pro: By installing the gemini-cli, I’ve experimented with giving Claude Code the ability to use Gemini models. Gemini 3.1 Pro excels at visual reasoning and understanding, which I’ve found useful for UI analysis.

Skills
#

I’ve found the built-in Claude Code skills sufficient for the majority of projects I’ve worked on. In addition to some tech/domain-specific skills I create on a per-project basis, I usually include the following general skills.

/implement: the implement skill encapsulates the entire workflow detailed above, step by step, with specific instructions. It’s not agent-invocable, and it’s designed to be executed in a loop.
/security-review (not Claude’s): a custom security review workflow that’s specific to my default tech stack (Elixir/Phoenix/Digital Ocean).
playwright-cli: the Playwright CLI skill for browser automation. Recently, I’ve switched to Claude’s Chrome integration.

MCP
#

I use MCP servers very judiciously as they can easily fill up the context. I’ve found that effective MCP usage is project-specific (e.g., Supabase, Github), and the only MCP server I’ve used consistently is Context7.

Debugging
#

I have not spent any time automating and optimizing debugging in an agentic way. My usual workflow is to ask the root agent to keep trying if an approach fails and to give it guidance on what I think is wrong. In my experience, I rarely have to ask more than twice, but I fall back to manual, iterative debugging in such cases. One thing to watch out for is overengineering when the agent has to work around an issue that it can’t fix. Future work will include a specialized debugging agent and the development of skills.

Tech Stack
#

It’s worth mentioning my default tech stack (for web applications): Phoenix/Elixir. I’ve found agents to be highly successful with Phoenix applications, even in lieu of (I’m assuming) an abundance of data in the training corpus. I have no clear evidence as to why agents tend to perform well within the framework, but hypothesize it’s due to: the consistency and conventions of the framework itself, code generators, natural context boundaries, Ecto instead of an ORM, functional programming paradigms, the monolithic architecture (with LiveViews), and a single language for front- and backend.

Introduction #

Overview #

1. Context #

CLAUDE.md #

Decomposing Tasks #

Seeding Context #

What About Memory? #

2. Plan #

3. Implementation #

Compaction and the Context Window #

4. Tests #

Tests and Context #

Functional Testing #

5. Review #

Static Analysis #

Agentic Review #

Manual Review #

Security Review #

6. Refactor #

7. Validation #

8. Commit #

Summary #

Caveats #

Support #

Further Reading #

Appendix #

Subagents #

Architect #

Code Reviewer #

UX Reviewer #

Tester (Experimental) #

Model Selection #

Skills #

MCP #

Debugging #

Tech Stack #

Introduction
#

Overview
#

1. Context
#

CLAUDE.md
#

Decomposing Tasks
#

Seeding Context
#

What About Memory?
#

2. Plan
#

3. Implementation
#

Compaction and the Context Window
#

4. Tests
#

Tests and Context
#

Functional Testing
#

5. Review
#

Static Analysis
#

Agentic Review
#

Manual Review
#

Security Review
#

6. Refactor
#

7. Validation
#

8. Commit
#

Summary
#

Caveats
#

Support
#

Further Reading
#

Appendix
#

Subagents
#

Architect
#

Code Reviewer
#

UX Reviewer
#

Tester (Experimental)
#

Model Selection
#

Skills
#

MCP
#

Debugging
#

Tech Stack
#