Agentic Squad: modeling the SDLC with agents, contracts, and continuous learning
A practical way to organize AI agents around the real stages of software development — without pretending autonomy replaces engineering.
- TypeScript
- Markdown
- YAML
- GitHub
- Claude Code
- Codex
- LLMs
There is a big difference between using AI to speed up tasks and redesigning the engineering flow to work better with AI.
The first is common: asking a model to write tests, explain code, generate a README, or suggest a refactor. The second is harder: building a system where agents, context, technical decisions, team rules, project history, and human review work together without turning into chaos.
It's in that second space that the idea of the Agentic Squad lives.
The Agentic Squad isn't an attempt to replace an engineering team. It also isn't a loose collection of prompts. The proposal is to model a way of working in which agents help with specific stages of the SDLC, with clear contracts, versioned context, explicit roles, validations, and continuous learning.
The problem: agents without a system become fragile automation
The most common way to use agents today is still improvised.
Someone opens a tool, pastes a big task, adds a few files, asks for an implementation, and hopes for a good result. Sometimes it works. Often it seems to work. In real systems, though, the problems show up fast:
- the agent doesn't know the previous decisions;
- the context grows large, expensive, and inconsistent;
- important team rules aren't remembered;
- local patterns get mixed with generic ones;
- the code looks correct but breaks a convention;
- human review turns into a hunt for side effects;
- learnings from one task don't improve the next.
The mistake isn't using agents. The mistake is using agents without a working architecture.
What Agentic Squad is
Agentic Squad is a proposed architecture for coordinating AI agents across the software development lifecycle.
It organizes:
- the stages of the SDLC;
- the capabilities triggered at each stage;
- context contracts;
- rules and playbooks;
- reusable skills;
- automated validations;
- human review;
- continuous learning;
- the separation between global knowledge, team knowledge, and the repository's local knowledge.
Instead of thinking "I want a PM agent, a Tech Lead agent, a QA agent, and a Dev agent," the core idea is different:
model the SDLC by stages, and trigger capabilities as needed.
This avoids a theater of artificial job titles and keeps the system close to the real engineering flow.
The most important decision: stages before roles
A common trap in agentic systems is starting from the roles.
It feels natural to design something like:
| Role | Expected responsibility |
|---|---|
| PM Agent | Understand the problem, rank |
| Architect Agent | Define the solution |
| Developer Agent | Implement |
| QA Agent | Test |
| Reviewer Agent | Review |
It's seductive, but it can become theatrical. On real teams, work doesn't happen in such clean boxes. A Tech Lead takes part in discovery. A developer spots an architecture risk. QA helps clarify a requirement. Security shows up during refinement. SRE influences deployment decisions.
That's why the Agentic Squad models the stages first.
| Stage | Central question | Capabilities triggered |
|---|---|---|
| Discovery | What problem are we solving? | Product, domain, data, UX, risk |
| Refinement | What must be clear before we build? | PM, Tech Lead, QA, Security |
| Specification | What contract will we follow? | Spec, API, use cases |
| Architecture | Which structural decisions sustain this? | Architecture, security, observability |
| Planning | How do we break it into safe steps? | Sequencing, dependencies, rollback |
| Implementation | How do we build with the least risk? | Code, tests, integration |
| Quality | How do we prove it works? | Tests, lint, build, a11y, performance |
| Review | What needs to be questioned? | Code review, trade-offs, maintainability |
| Merge/Deploy | How do we ship without surprises? | CI/CD, release, feature flags |
| Retrospective/Learning | What did we learn for next time? | Memory, patterns, playbooks |
The guiding phrase is:
SDLC first; roles as capabilities.
A layered model
The Agentic Squad needs to separate what is generic, what belongs to the team, and what is local.
A practical way to organize this is in layers:
agentic-squad-reference/
├─ patterns/
├─ skills/
├─ rules/
├─ playbooks/
├─ checklists/
└─ examples/
agentic-squad-team-core/
├─ domain/
├─ architecture-decisions/
├─ team-rules/
├─ delivery-playbooks/
└─ quality-gates/
consumer-repo/
├─ agentic-squad.yaml
├─ .agentic/
│ ├─ context.md
│ ├─ local-rules.md
│ └─ lessons.md
└─ src/The reference layer holds reusable patterns. The team layer holds decisions specific to that context. The consumer repository loads only what it needs to operate well locally.
This separation reduces two risks:
- turning every repository into a dumping ground of duplicated prompts;
- putting too many local rules into a global template.
The contract: agentic-squad.yaml
An Agentic Squad needs a contract.
Without a contract, every run becomes a new conversation. With a contract, the agent understands which capabilities exist, which files are the source of truth, which commands validate the delivery, and which limits must not be crossed.
A simplified example:
version: "0.1"
project:
name: "booking-backoffice"
domain: "post-sale travel operations"
defaultLocale: "pt-BR"
sources:
localContext:
- ".agentic/context.md"
- ".agentic/local-rules.md"
teamCore:
- "../agentic-squad-team-core/team-rules/"
- "../agentic-squad-team-core/architecture-decisions/"
reference:
- "../agentic-squad-reference/playbooks/"
- "../agentic-squad-reference/checklists/"
stages:
discovery:
requiredOutputs:
- problem-statement
- constraints
- open-questions
architecture:
requiredOutputs:
- tradeoffs
- risks
- decision-record
implementation:
qualityGates:
- pnpm lint
- pnpm typecheck
- pnpm test
- pnpm build
review:
requiredChecks:
- "Does this preserve existing behavior?"
- "Are errors observable?"
- "Is rollback clear?"
policies:
humanInTheLoop: true
neverCommitWithoutApproval: true
neverExposeSecrets: true
preferSmallDiffs: trueThis file doesn't need to be big at first. The value is in declaring the minimum necessary to avoid improvisation.
How an agent should use this contract
An agent shouldn't just receive a task. It should receive a task within a stage.
A bad example:
Implement the search screen.A better example:
We are in the Specification stage.
Goal:
define the contract for search before implementing.
Use:
- agentic-squad.yaml
- local rules
- the team's architecture decisions
- the UX handoff
Deliver:
- use cases
- states
- events
- acceptance criteria
- risks
- open questions
Do not implement code yet.The difference seems small, but it changes the agent's behavior. It stops rushing to code and starts respecting the stage.
A CLI as setup and context compiler
An important idea in the Agentic Squad is having a CLI that prepares the right context for each stage.
For example:
agentic-squad init
agentic-squad run discovery --issue 123 --repo booking-backoffice
agentic-squad run architecture --input .agentic/work/discovery-123.md
agentic-squad run implementation --plan .agentic/work/plan-123.md
agentic-squad learn --from-pr 456The CLI doesn't need to "be smart" at first. It can simply validate the
agentic-squad.yaml, resolve paths, assemble context, select playbooks, generate
prompts, run quality gates, save outputs, and record learnings.
The value is less in magical automation and more in operational consistency.
A context-compilation example
A command like agentic-squad context could produce a temporary bundle:
{
"stage": "implementation",
"repo": "booking-backoffice",
"task": "add newsletter subscribe endpoint",
"contextFiles": [
".agentic/context.md",
".agentic/local-rules.md",
"../agentic-squad-team-core/team-rules/api-style.md",
"../agentic-squad-reference/checklists/implementation.md"
],
"qualityGates": ["pnpm lint", "pnpm typecheck", "pnpm test", "pnpm build"],
"constraints": [
"do not commit without approval",
"do not add dependencies without justification",
"preserve i18n",
"preserve data-theme"
]
}That bundle can become a prompt for Claude Code, Codex, or another tool.
A teaching implementation example
A loader skeleton in TypeScript could start like this:
// Teaching skeleton: load and validate the contract (Zod) before any execution.
import { readFile } from "node:fs/promises";
import path from "node:path";
import { z } from "zod";
const StageSchema = z.object({
requiredOutputs: z.array(z.string()).optional(),
qualityGates: z.array(z.string()).optional(),
requiredChecks: z.array(z.string()).optional(),
});
const AgenticSquadConfigSchema = z.object({
version: z.string(),
project: z.object({
name: z.string(),
domain: z.string().optional(),
defaultLocale: z.string().default("pt-BR"),
}),
sources: z.object({
localContext: z.array(z.string()).default([]),
teamCore: z.array(z.string()).default([]),
reference: z.array(z.string()).default([]),
}),
stages: z.record(StageSchema),
policies: z.object({
humanInTheLoop: z.boolean().default(true),
neverCommitWithoutApproval: z.boolean().default(true),
neverExposeSecrets: z.boolean().default(true),
preferSmallDiffs: z.boolean().default(true),
}),
});
export type AgenticSquadConfig = z.infer<typeof AgenticSquadConfigSchema>;
export async function loadAgenticSquadConfig(
cwd: string,
): Promise<AgenticSquadConfig> {
const configPath = path.join(cwd, "agentic-squad.yaml");
const raw = await readFile(configPath, "utf8");
const parsed = parseYaml(raw);
return AgenticSquadConfigSchema.parse(parsed);
}
function parseYaml(_raw: string): unknown {
throw new Error(
"This is a teaching skeleton — wire a real YAML parser here before parsing the agentic-squad.yaml contract.",
);
}This code doesn't solve the whole problem. It only shows a direction: contract first, validation next, execution by stage.
The learning loop
Without learning, the Agentic Squad becomes just a prompt generator.
Classifying learning into layers
Learning has to be classified into layers.
| Type of learning | Where it lives | Example |
|---|---|---|
| Local | Consumer repo | "In this repo, don't use barrel exports" |
| Workspace/team-core | The team core | "Public APIs must have an OpenAPI contract" |
| Reference | The base repo | "Generic PR review checklist" |
Not every learning deserves to become a global rule.
This point is critical. Bad teams turn any incident into a universal rule. Good teams ask:
- is this local or recurring?
- is this a rule, an example, or a smell?
- should this block delivery?
- does this improve the next PR?
- does this age quickly?
- is this true in every context?
A learning-record example
# Learning Record
## Context
PR #456 introduced a regression in the search page after changing locale-aware URLs.
## What happened
The implementation updated visible links but missed canonical URLs and sitemap entries.
## Lesson
When changing public URL structure, update:
- route helpers
- canonical metadata
- hreflang
- sitemap
- RSS
- language switcher
- redirect tests
## Scope
team-core
## Suggested artifact
Add this to the URL migration checklist.From record to checklist
That record can later become a checklist the agent can actually use:
# URL Migration Checklist
Before implementation:
- [ ] Identify route helpers
- [ ] Identify canonical generation
- [ ] Identify hreflang generation
- [ ] Identify sitemap generation
- [ ] Identify RSS generation
- [ ] Identify language switcher behavior
- [ ] Identify redirects
During implementation:
- [ ] Update public route mapping
- [ ] Preserve internal locale model
- [ ] Add redirects from old URLs
- [ ] Update tests
- [ ] Update screenshots
Before PR:
- [ ] Validate canonical URLs
- [ ] Validate sitemap output
- [ ] Validate RSS output
- [ ] Validate language switcher
- [ ] Validate old URLs redirectThis kind of artifact is far more useful than a generic prompt that says "be careful."
What Agentic Squad should not be
The Agentic Squad shouldn't turn into a fantasy of total autonomy.
It must not be:
- a fake team of avatars;
- an attempt to remove human review;
- a pile of duplicated prompts;
- a bureaucratic layer over development;
- a system that approves its own work;
- an excuse to accept code you don't understand;
- a complex orchestrator before a real problem exists.
Trade-offs
| Decision | Benefit | Cost |
|---|---|---|
| Stage-based SDLC | Reduces chaos and rush to code | Requires discipline |
| YAML contract | Provides predictability | Can become bureaucracy |
| CLI | Standardizes context | One more tool to maintain |
| Skills/playbooks | Reuse across projects | Risk of getting too generic |
| Continuous learning | Improves with use | Needs human curation |
| Human-in-the-loop | Reduces risk | Less apparent "autonomy" |
| Quality gates | Prevents regressions | Can increase feedback time |
One possible workflow
1. Discovery
↓
2. Refinement
↓
3. Specification
↓
4. Architecture
↓
5. Planning
↓
6. Implementation
↓
7. Quality
↓
8. Review
↓
9. Merge/Deploy
↓
10. Retrospective/LearningEach stage can produce small, versionable, reviewable artifacts. For example:
.agentic/work/123-discovery.md
.agentic/work/123-specification.md
.agentic/work/123-architecture.md
.agentic/work/123-plan.md
.agentic/work/123-review.md
.agentic/work/123-learning.mdThis creates traceability without needing a heavy platform.
How this talks to tools like Claude Code and Codex
The Agentic Squad doesn't need to replace existing tools. It can work as a layer of context and governance.
Claude Code, Codex, GitHub Copilot, local agents, and internal scripts can all be consumers of the same contract. The logic would be:
agentic-squad.yaml
↓
context compiler
↓
prompt/stage pack
↓
Claude Code / Codex / local agent
↓
diff + validation
↓
human review
↓
learning recordThe gain is in not depending on the implicit memory of a conversation.
Where to start small
The minimal version doesn't need real multi-agent orchestration. A good first
version could have just an agentic-squad.yaml, an .agentic/ directory, three
playbooks, three checklists, a script to assemble context, and a manual learning
flow.
For example:
mkdir -p .agentic/work .agentic/lessons
touch agentic-squad.yaml
touch .agentic/context.md
touch .agentic/local-rules.md
touch .agentic/lessons.mdAnd three commands:
agentic-squad context --stage implementation
agentic-squad check --stage review
agentic-squad learn --from-pr 123How to tell whether it's working
The Agentic Squad only has value if it improves engineering.
Good signs:
- smaller PRs;
- less rework;
- more explicit decisions;
- fewer reinvented prompts;
- faster onboarding;
- more consistent validations;
- fewer regressions from forgetting;
- better checklists;
- human review more focused on trade-offs.
Bad signs:
- a flood of files no one reads;
- agents producing decorative documentation;
- conflicting rules;
- huge, fragile prompts;
- low confidence in the diffs;
- humans reviewing more, not less;
- quality that looks good only because the text is pretty.
A prompt example per stage
# Stage: Architecture
You are helping with the architecture stage of a software change.
Use the configured Agentic Squad context.
Your job is not to implement code yet.
Produce:
1. problem summary
2. constraints
3. relevant existing patterns
4. proposed architecture
5. alternatives considered
6. trade-offs
7. risks
8. observability concerns
9. security concerns
10. quality gates
11. questions before implementation
Rules:
- Do not change files.
- Do not assume missing requirements.
- Prefer small, reversible decisions.
- Be explicit about uncertainty.This prompt is simple, but it keeps the agent in the right role for that moment.
The human part stays central
The Agentic Squad doesn't remove judgment. It depends on it.
Humans still decide:
- whether the problem is worth solving;
- whether the design is correct;
- whether the trade-off makes sense;
- whether the solution is simple enough;
- whether a rule should become a standard;
- whether the agent is hallucinating with confidence;
- whether the delivery actually improves the system.
AI can speed up the production of options. Engineering decides which options survive.
Conclusion
Agentic Squad is less about creating brilliant agents and more about creating an environment where ordinary agents can work better.
The core of the idea is simple: clear stages, the right context, versioned contracts, explicit rules, automated validation, human review, and continuous learning.
It isn't a promise of total autonomy. It's a proposal for technical governance of AI applied to software development.
If it works, the biggest gain won't be writing code faster. It will be reducing the cost of keeping coherence while speed increases.
Next steps
- Turn the
agentic-squad.yamlcontract into a versioned schema. - Build a minimal CLI to compile context.
- Write playbooks per SDLC stage.
- Run the Agentic Squad in a small real repository before scaling.
If you'd like to follow the evolution, keep an eye on the Writing page.
Comments
Comments will be enabled soon.