Jul 2, 202614 min read

Agentic Squad: modeling the SDLC with agents, contracts, and continuous learning

A practical way to organize AI agents around the real stages of software development — without pretending autonomy replaces engineering.

Expert

TypeScript
Markdown
YAML
GitHub
Claude Code
Codex
LLMs

There is a big difference between using AI to speed up tasks and redesigning the engineering flow to work better with AI.

The first is common: asking a model to write tests, explain code, generate a README, or suggest a refactor. The second is harder: building a system where agents, context, technical decisions, team rules, project history, and human review work together without turning into chaos.

It's in that second space that the idea of the Agentic Squad lives.

The Agentic Squad isn't an attempt to replace an engineering team. It also isn't a loose collection of prompts. The proposal is to model a way of working in which agents help with specific stages of the SDLC, with clear contracts, versioned context, explicit roles, validations, and continuous learning.

The problem: agents without a system become fragile automation

The most common way to use agents today is still improvised.

Someone opens a tool, pastes a big task, adds a few files, asks for an implementation, and hopes for a good result. Sometimes it works. Often it seems to work. In real systems, though, the problems show up fast:

the agent doesn't know the previous decisions;
the context grows large, expensive, and inconsistent;
important team rules aren't remembered;
local patterns get mixed with generic ones;
the code looks correct but breaks a convention;
human review turns into a hunt for side effects;
learnings from one task don't improve the next.

The mistake isn't using agents. The mistake is using agents without a working architecture.

What Agentic Squad is

Agentic Squad is a proposed architecture for coordinating AI agents across the software development lifecycle.

It organizes:

the stages of the SDLC;
the capabilities triggered at each stage;
context contracts;
rules and playbooks;
reusable skills;
automated validations;
human review;
continuous learning;
the separation between global knowledge, team knowledge, and the repository's local knowledge.

Instead of thinking "I want a PM agent, a Tech Lead agent, a QA agent, and a Dev agent," the core idea is different:

model the SDLC by stages, and trigger capabilities as needed.

This avoids a theater of artificial job titles and keeps the system close to the real engineering flow.

The most important decision: stages before roles

A common trap in agentic systems is starting from the roles.

It feels natural to design something like:

Role	Expected responsibility
PM Agent	Understand the problem, rank
Architect Agent	Define the solution
Developer Agent	Implement
QA Agent	Test
Reviewer Agent	Review

It's seductive, but it can become theatrical. On real teams, work doesn't happen in such clean boxes. A Tech Lead takes part in discovery. A developer spots an architecture risk. QA helps clarify a requirement. Security shows up during refinement. SRE influences deployment decisions.

That's why the Agentic Squad models the stages first.

Stage	Central question	Capabilities triggered
Discovery	What problem are we solving?	Product, domain, data, UX, risk
Refinement	What must be clear before we build?	PM, Tech Lead, QA, Security
Specification	What contract will we follow?	Spec, API, use cases
Architecture	Which structural decisions sustain this?	Architecture, security, observability
Planning	How do we break it into safe steps?	Sequencing, dependencies, rollback
Implementation	How do we build with the least risk?	Code, tests, integration
Quality	How do we prove it works?	Tests, lint, build, a11y, performance
Review	What needs to be questioned?	Code review, trade-offs, maintainability
Merge/Deploy	How do we ship without surprises?	CI/CD, release, feature flags
Retrospective/Learning	What did we learn for next time?	Memory, patterns, playbooks

The guiding phrase is:

SDLC first; roles as capabilities.

A layered model

The Agentic Squad needs to separate what is generic, what belongs to the team, and what is local.

A practical way to organize this is in layers:

text

agentic-squad-reference/
  ├─ patterns/
  ├─ skills/
  ├─ rules/
  ├─ playbooks/
  ├─ checklists/
  └─ examples/
agentic-squad-team-core/
  ├─ domain/
  ├─ architecture-decisions/
  ├─ team-rules/
  ├─ delivery-playbooks/
  └─ quality-gates/
consumer-repo/
  ├─ agentic-squad.yaml
  ├─ .agentic/
  │   ├─ context.md
  │   ├─ local-rules.md
  │   └─ lessons.md
  └─ src/

The reference layer holds reusable patterns. The team layer holds decisions specific to that context. The consumer repository loads only what it needs to operate well locally.

This separation reduces two risks:

turning every repository into a dumping ground of duplicated prompts;
putting too many local rules into a global template.

The contract: `agentic-squad.yaml`

An Agentic Squad needs a contract.

Without a contract, every run becomes a new conversation. With a contract, the agent understands which capabilities exist, which files are the source of truth, which commands validate the delivery, and which limits must not be crossed.

A simplified example:

yaml

version: "0.1"
project:
  name: "booking-backoffice"
  domain: "post-sale travel operations"
  defaultLocale: "pt-BR"
sources:
  localContext:
    - ".agentic/context.md"
    - ".agentic/local-rules.md"
  teamCore:
    - "../agentic-squad-team-core/team-rules/"
    - "../agentic-squad-team-core/architecture-decisions/"
  reference:
    - "../agentic-squad-reference/playbooks/"
    - "../agentic-squad-reference/checklists/"
stages:
  discovery:
    requiredOutputs:
      - problem-statement
      - constraints
      - open-questions
  architecture:
    requiredOutputs:
      - tradeoffs
      - risks
      - decision-record
  implementation:
    qualityGates:
      - pnpm lint
      - pnpm typecheck
      - pnpm test
      - pnpm build
  review:
    requiredChecks:
      - "Does this preserve existing behavior?"
      - "Are errors observable?"
      - "Is rollback clear?"
policies:
  humanInTheLoop: true
  neverCommitWithoutApproval: true
  neverExposeSecrets: true
  preferSmallDiffs: true

This file doesn't need to be big at first. The value is in declaring the minimum necessary to avoid improvisation.

How an agent should use this contract

An agent shouldn't just receive a task. It should receive a task within a stage.

A bad example:

text

Implement the search screen.

A better example:

text

We are in the Specification stage.
Goal:
define the contract for search before implementing.
Use:
- agentic-squad.yaml
- local rules
- the team's architecture decisions
- the UX handoff
Deliver:
- use cases
- states
- events
- acceptance criteria
- risks
- open questions
Do not implement code yet.

The difference seems small, but it changes the agent's behavior. It stops rushing to code and starts respecting the stage.

A CLI as setup and context compiler

An important idea in the Agentic Squad is having a CLI that prepares the right context for each stage.

For example:

bash

agentic-squad init
agentic-squad run discovery --issue 123 --repo booking-backoffice
agentic-squad run architecture --input .agentic/work/discovery-123.md
agentic-squad run implementation --plan .agentic/work/plan-123.md
agentic-squad learn --from-pr 456

The CLI doesn't need to "be smart" at first. It can simply validate the agentic-squad.yaml, resolve paths, assemble context, select playbooks, generate prompts, run quality gates, save outputs, and record learnings.

The value is less in magical automation and more in operational consistency.

A context-compilation example

A command like agentic-squad context could produce a temporary bundle:

json

{
  "stage": "implementation",
  "repo": "booking-backoffice",
  "task": "add newsletter subscribe endpoint",
  "contextFiles": [
    ".agentic/context.md",
    ".agentic/local-rules.md",
    "../agentic-squad-team-core/team-rules/api-style.md",
    "../agentic-squad-reference/checklists/implementation.md"
  ],
  "qualityGates": ["pnpm lint", "pnpm typecheck", "pnpm test", "pnpm build"],
  "constraints": [
    "do not commit without approval",
    "do not add dependencies without justification",
    "preserve i18n",
    "preserve data-theme"
  ]
}

That bundle can become a prompt for Claude Code, Codex, or another tool.

A teaching implementation example

A loader skeleton in TypeScript could start like this:

// Teaching skeleton: load and validate the contract (Zod) before any execution.
import { readFile } from "node:fs/promises";
import path from "node:path";
import { z } from "zod";
 
const StageSchema = z.object({
  requiredOutputs: z.array(z.string()).optional(),
  qualityGates: z.array(z.string()).optional(),
  requiredChecks: z.array(z.string()).optional(),
});
 
const AgenticSquadConfigSchema = z.object({
  version: z.string(),
  project: z.object({
    name: z.string(),
    domain: z.string().optional(),
    defaultLocale: z.string().default("pt-BR"),
  }),
  sources: z.object({
    localContext: z.array(z.string()).default([]),
    teamCore: z.array(z.string()).default([]),
    reference: z.array(z.string()).default([]),
  }),
  stages: z.record(StageSchema),
  policies: z.object({
    humanInTheLoop: z.boolean().default(true),
    neverCommitWithoutApproval: z.boolean().default(true),
    neverExposeSecrets: z.boolean().default(true),
    preferSmallDiffs: z.boolean().default(true),
  }),
});
 
export type AgenticSquadConfig = z.infer<typeof AgenticSquadConfigSchema>;
 
export async function loadAgenticSquadConfig(
  cwd: string,
): Promise<AgenticSquadConfig> {
  const configPath = path.join(cwd, "agentic-squad.yaml");
  const raw = await readFile(configPath, "utf8");
  const parsed = parseYaml(raw);
  return AgenticSquadConfigSchema.parse(parsed);
}
 
function parseYaml(_raw: string): unknown {
  throw new Error(
    "This is a teaching skeleton — wire a real YAML parser here before parsing the agentic-squad.yaml contract.",
  );
}

This code doesn't solve the whole problem. It only shows a direction: contract first, validation next, execution by stage.

The learning loop

Without learning, the Agentic Squad becomes just a prompt generator.

Classifying learning into layers

Learning has to be classified into layers.

Type of learning	Where it lives	Example
Local	Consumer repo	"In this repo, don't use barrel exports"
Workspace/team-core	The team core	"Public APIs must have an OpenAPI contract"
Reference	The base repo	"Generic PR review checklist"

Not every learning deserves to become a global rule.

This point is critical. Bad teams turn any incident into a universal rule. Good teams ask:

is this local or recurring?
is this a rule, an example, or a smell?
should this block delivery?
does this improve the next PR?
does this age quickly?
is this true in every context?

A learning-record example

# Learning Record
 
## Context
 
PR #456 introduced a regression in the search page after changing locale-aware URLs.
 
## What happened
 
The implementation updated visible links but missed canonical URLs and sitemap entries.
 
## Lesson
 
When changing public URL structure, update:
 
- route helpers
- canonical metadata
- hreflang
- sitemap
- RSS
- language switcher
- redirect tests
 
## Scope
 
team-core
 
## Suggested artifact
 
Add this to the URL migration checklist.

From record to checklist

That record can later become a checklist the agent can actually use:

# URL Migration Checklist
 
Before implementation:
 
- [ ] Identify route helpers
- [ ] Identify canonical generation
- [ ] Identify hreflang generation
- [ ] Identify sitemap generation
- [ ] Identify RSS generation
- [ ] Identify language switcher behavior
- [ ] Identify redirects
 
During implementation:
 
- [ ] Update public route mapping
- [ ] Preserve internal locale model
- [ ] Add redirects from old URLs
- [ ] Update tests
- [ ] Update screenshots
 
Before PR:
 
- [ ] Validate canonical URLs
- [ ] Validate sitemap output
- [ ] Validate RSS output
- [ ] Validate language switcher
- [ ] Validate old URLs redirect

This kind of artifact is far more useful than a generic prompt that says "be careful."

What Agentic Squad should not be

The Agentic Squad shouldn't turn into a fantasy of total autonomy.

It must not be:

a fake team of avatars;
an attempt to remove human review;
a pile of duplicated prompts;
a bureaucratic layer over development;
a system that approves its own work;
an excuse to accept code you don't understand;
a complex orchestrator before a real problem exists.

Trade-offs

Decision	Benefit	Cost
Stage-based SDLC	Reduces chaos and rush to code	Requires discipline
YAML contract	Provides predictability	Can become bureaucracy
CLI	Standardizes context	One more tool to maintain
Skills/playbooks	Reuse across projects	Risk of getting too generic
Continuous learning	Improves with use	Needs human curation
Human-in-the-loop	Reduces risk	Less apparent "autonomy"
Quality gates	Prevents regressions	Can increase feedback time

One possible workflow

text

1. Discovery
   ↓
2. Refinement
   ↓
3. Specification
   ↓
4. Architecture
   ↓
5. Planning
   ↓
6. Implementation
   ↓
7. Quality
   ↓
8. Review
   ↓
9. Merge/Deploy
   ↓
10. Retrospective/Learning

Each stage can produce small, versionable, reviewable artifacts. For example:

text

.agentic/work/123-discovery.md
.agentic/work/123-specification.md
.agentic/work/123-architecture.md
.agentic/work/123-plan.md
.agentic/work/123-review.md
.agentic/work/123-learning.md

This creates traceability without needing a heavy platform.

How this talks to tools like Claude Code and Codex

The Agentic Squad doesn't need to replace existing tools. It can work as a layer of context and governance.

Claude Code, Codex, GitHub Copilot, local agents, and internal scripts can all be consumers of the same contract. The logic would be:

text

agentic-squad.yaml
        ↓
context compiler
        ↓
prompt/stage pack
        ↓
Claude Code / Codex / local agent
        ↓
diff + validation
        ↓
human review
        ↓
learning record

The gain is in not depending on the implicit memory of a conversation.

Where to start small

The minimal version doesn't need real multi-agent orchestration. A good first version could have just an agentic-squad.yaml, an .agentic/ directory, three playbooks, three checklists, a script to assemble context, and a manual learning flow.

For example:

bash

mkdir -p .agentic/work .agentic/lessons
touch agentic-squad.yaml
touch .agentic/context.md
touch .agentic/local-rules.md
touch .agentic/lessons.md

And three commands:

bash

agentic-squad context --stage implementation
agentic-squad check --stage review
agentic-squad learn --from-pr 123

How to tell whether it's working

The Agentic Squad only has value if it improves engineering.

Good signs:

smaller PRs;
less rework;
more explicit decisions;
fewer reinvented prompts;
faster onboarding;
more consistent validations;
fewer regressions from forgetting;
better checklists;
human review more focused on trade-offs.

Bad signs:

a flood of files no one reads;
agents producing decorative documentation;
conflicting rules;
huge, fragile prompts;
low confidence in the diffs;
humans reviewing more, not less;
quality that looks good only because the text is pretty.

A prompt example per stage

# Stage: Architecture
 
You are helping with the architecture stage of a software change.
Use the configured Agentic Squad context.
Your job is not to implement code yet.
 
Produce:
 
1. problem summary
2. constraints
3. relevant existing patterns
4. proposed architecture
5. alternatives considered
6. trade-offs
7. risks
8. observability concerns
9. security concerns
10. quality gates
11. questions before implementation
 
Rules:
 
- Do not change files.
- Do not assume missing requirements.
- Prefer small, reversible decisions.
- Be explicit about uncertainty.

This prompt is simple, but it keeps the agent in the right role for that moment.

The human part stays central

The Agentic Squad doesn't remove judgment. It depends on it.

Humans still decide:

whether the problem is worth solving;
whether the design is correct;
whether the trade-off makes sense;
whether the solution is simple enough;
whether a rule should become a standard;
whether the agent is hallucinating with confidence;
whether the delivery actually improves the system.

AI can speed up the production of options. Engineering decides which options survive.

Conclusion

Agentic Squad is less about creating brilliant agents and more about creating an environment where ordinary agents can work better.

The core of the idea is simple: clear stages, the right context, versioned contracts, explicit rules, automated validation, human review, and continuous learning.

It isn't a promise of total autonomy. It's a proposal for technical governance of AI applied to software development.

If it works, the biggest gain won't be writing code faster. It will be reducing the cost of keeping coherence while speed increases.

Agentic Squad: modeling the SDLC with agents, contracts, and continuous learning

The problem: agents without a system become fragile automation

What Agentic Squad is

The most important decision: stages before roles

A layered model

The contract: `agentic-squad.yaml`

How an agent should use this contract

A CLI as setup and context compiler

A context-compilation example

A teaching implementation example

The learning loop

Classifying learning into layers

A learning-record example

From record to checklist

What Agentic Squad should not be

Trade-offs

One possible workflow

How this talks to tools like Claude Code and Codex

Where to start small

How to tell whether it's working

A prompt example per stage

The human part stays central

Conclusion

Next steps

Comments