Introduction
Three Commands, One Pipeline
You describe what to build. The system plans it, builds it with tests,
reviews it adversarially, and produces a mergeable PR — while you get coffee.
/plan_to_build
↓
specs/*.md
Investigates the codebase, explores design options with you, writes an exhaustive
implementation spec
/build
↓
working code
Reads the spec, dispatches specialist agents, enforces TDD, validates every task, rolls
back on failure
/bug_to_pr
↓
merged PR
Triages the bug, routes to specialist, fixes with full pipeline, adversarial review,
merge gate
For the unfamiliar
Think of this as a self-managing engineering team. You play the role of product owner
— you describe what needs to happen and make final decisions. The system handles
investigation, planning, coding, testing, reviewing, and creating the PR.
Each command produces a real artifact you can inspect and modify.
For the architect
This is a spec-driven, stateless multi-agent orchestration system. The spec file is
the coordination artifact — it compensates for the absence of native TaskCreate/resume
primitives in Copilot. Quality gates are enforced at three levels: prompt instructions
(soft), postToolUse hooks (system), and agentStop hooks (system). The engineering
philosophy is embedded non-ignorably in the prompts rather than living in lazy-loaded
skill files.
The Full Architecture
End-to-end system overview
flowchart LR
DEV([👤 You]) -->|describe feature| PTB[/plan_to_build/]
PTB -->|explores codebase\nasks approach + team| SPEC[(specs/*.md)]
SPEC -->|spec is airtight| BLD[/build/]
BLD -->|TDD + validators\nper task| CODE[working code\n+ tests passing]
DEV -->|describe bug| BTP[/bug_to_pr/]
BTP -->|triage → route\nplan → build| PR[GitHub PR\n+ artifacts]
PR -->|adversarial\nreview| GATE{both\napprove?}
GATE -->|yes + you confirm| MERGE[merged to main]
GATE -->|no| FIX[fix cycle\nmax 2]
FIX --> PR
subgraph HOOKS ["🪝 Always-on guardrails"]
H1[sessionStart\nauto-install deps]
H2[postToolUse\nruff · tsc · sections]
H3[agentStop\nspec completeness]
end
style DEV fill:#faf8f3,stroke:#1a1a1a,color:#1a1a1a
style PTB fill:#f7ece8,stroke:#c84b2f,color:#5a1f12
style SPEC fill:#fdf6e3,stroke:#b08a2e,color:#1a1a1a
style BLD fill:#e8f4ed,stroke:#2d7a4f,color:#1a4a2e
style CODE fill:#eef7f1,stroke:#2d7a4f,color:#1a1a1a
style BTP fill:#e8eef8,stroke:#2a5fa5,color:#1a3060
style PR fill:#edf2fb,stroke:#2a5fa5,color:#1a1a1a
style GATE fill:#fdf0ec,stroke:#c84b2f,color:#1a1a1a
style MERGE fill:#eef7f1,stroke:#2d7a4f,color:#1a1a1a
style FIX fill:#fdf0ec,stroke:#c84b2f,color:#1a1a1a
style HOOKS fill:#faf8f3,stroke:#d4cfc2,color:#1a1a1a
Command 01
Plan to Build
/plan_to_build
Turns a requirement into an airtight implementation spec that stateless
agents can execute without asking questions.
The most important insight in this system: the spec is the contract.
Everything downstream — builders, validators, reviewers — works from the spec and
nothing else. If the spec is vague, the build will be wrong. If the spec is exhaustive,
the build will be right. /plan_to_build exists to make specs exhaustive.
"The intelligence is in the spec. The build prompt is a mechanical dispatcher."
Design principle
What It Does
-
01
Explore Before Planning
For features and enhancements, asks you one multiple-choice question about the technical approach
before writing anything. Forces alternatives to be considered. Skipped for bug fixes, chores, and
refactors.
-
02
Team Composition
Asks how work should be split: single builder, two builders by layer, or three builders by area. Your
answer shapes the task breakdown and which workstreams run independently.
-
03
Codebase Investigation
Reads relevant source files directly to understand existing patterns before designing anything. Agents
that don't read the code produce plans that don't fit the codebase.
-
04
Exhaustive Spec Writing
Each task gets min 50 words with: what to do (step-by-step), files to modify (exact paths), code
patterns to follow, acceptance criteria (specific, verifiable), and a validation command. One-line task
descriptions are forbidden.
-
05
Self-Audit + Mandatory Verify
Before saving, counts builder tasks, validator tasks, descriptions, assertions. Then runs
ls -la specs/file.md and grep -c sections = 7. Cannot report done until both
pass.
Flow Diagram
plan_to_build internal flow
flowchart TD
IN([User types /plan_to_build]) --> CHK
CHK{feature or\nenhancement?}
CHK -->|yes| BRAIN["Prerequisite 1\nExplore Before Planning\n— one question, multiple choice\n— wait for
answer"]
CHK -->|no — bug/chore| SKIP["skip brainstorming\ndocument why in Notes"]
BRAIN --> TEAM["Prerequisite 2\nTeam Composition\n(a) single builder\n(b) two builders by layer\n(c) three
builders by area"]
SKIP --> TEAM
TEAM --> READ["Read codebase\nexisting patterns\narchitecture\nrelevant files"]
READ --> DESIGN["Design solution\narchitecture decisions\ntask breakdown\ndependency graph"]
DESIGN --> WRITE["Write spec\nto specs/name.md\n\nEach task:\n- min 50 word description\n- exact file
paths\n- code patterns\n- acceptance criteria\n- validation command"]
WRITE --> AUDIT["Self-audit\ncount builders/validators\ncheck descriptions\ncheck assertions"]
AUDIT --> VERIFY["Mandatory verify\nls -la specs/name.md\ngrep -c sections == 7"]
VERIFY -->|fail| FIX_SPEC["fix spec\nand re-verify"]
FIX_SPEC --> VERIFY
VERIFY -->|pass| HOOK["postToolUse hook\nvalidates sections\nvalidator frequency\nblocks if incomplete"]
HOOK -->|pass| REPORT["Report\nfile path · tasks · team\nExecute with /build"]
REPORT --> EXEC_DIRECTIVE["EXECUTION DIRECTIVE\nin spec file:\nFORBIDDEN: direct implementation\nREQUIRED:
use /build"]
style IN fill:#f7ece8,stroke:#c84b2f,color:#5a1f12
style CHK fill:#fdf6e3,stroke:#b08a2e,color:#1a1a1a
style BRAIN fill:#edf2fb,stroke:#2a5fa5,color:#1a1a1a
style TEAM fill:#edf2fb,stroke:#2a5fa5,color:#1a1a1a
style SKIP fill:#f5f2eb,stroke:#d4cfc2,color:#7a7468
style READ fill:#f5f2eb,stroke:#6a6860,color:#1a1a1a
style DESIGN fill:#f5f2eb,stroke:#6a6860,color:#1a1a1a
style WRITE fill:#f7f0dc,stroke:#b08a2e,color:#5a4010
style AUDIT fill:#f5f2eb,stroke:#6a6860,color:#1a1a1a
style VERIFY fill:#eef7f1,stroke:#2d7a4f,color:#1a1a1a
style FIX_SPEC fill:#fdf0ec,stroke:#c84b2f,color:#1a1a1a
style HOOK fill:#f7f0dc,stroke:#b08a2e,color:#5a4010
style REPORT fill:#eef7f1,stroke:#2d7a4f,color:#1a1a1a
style EXEC_DIRECTIVE fill:#f7ece8,stroke:#c84b2f,color:#5a1f12
Task Quality Rules
These rules are embedded directly in the prompt — not in a skill file that might not load. The model cannot
ignore them.
| Rule |
What it prevents |
Enforced by |
| min 50 word descriptions |
Vague tasks that builders can't execute without asking questions |
system hook blocks spec write |
| 2–5 min task size |
Tasks too large for a stateless agent to complete reliably |
prompt self-audit step |
| Design assertions |
Weak acceptance criteria like "it works" instead of verifiable checks |
prompt task format template |
| Intermediate validators |
Regressions in earlier work only caught at the end |
system hook: >5 builders → validators required |
| 7 required sections |
Incomplete specs that confuse the build orchestrator |
system postToolUse blocks write |
| EXECUTION DIRECTIVE |
Main agent implementing code directly instead of delegating |
prompt embedded in every spec |
Command 02
Build
/build
A mechanical dispatcher. Reads the spec, executes tasks in dependency order,
enforces TDD on every builder, validates every task, rolls back on failure.
The build prompt makes no decisions. It reads what the spec says and executes it.
This is intentional — all intelligence was front-loaded into the spec by
/plan_to_build. The build prompt is simple enough to be audited in
five minutes and trusted completely.
Execution Loop
build orchestration loop
flowchart TD
START([/build\nspecs/plan.md]) --> PARSE["Parse spec\nextract tasks\nbuild todo list"]
PARSE --> LOOP
subgraph LOOP ["Task loop — sequential, dependency-ordered"]
TASK["Next unblocked task"] --> BUILDER
subgraph BUILDER_PHASE ["Builder dispatch"]
BUILDER["runSubagent('builder')\n+ TDD preamble:\n 1. Write failing test RED\n 2. Implement GREEN\n 3.
Refactor\n+ task description verbatim"]
end
BUILDER --> VAL["runSubagent('validator')\nrun commands\nshow actual output\nPASS or FAIL"]
VAL -->|"✅ PASS"| MARK["mark completed\nnext task"]
MARK --> TASK
VAL -->|"❌ FAIL cycle 1"| DEBUG["runSubagent('builder')\nDEBUG protocol:\n1. Reproduce\n2. Isolate\n3. Root
cause\n4. Fix\n(no random changes)"]
DEBUG --> VAL2["re-validate"]
VAL2 -->|"✅ PASS"| MARK
VAL2 -->|"❌ FAIL cycle 2"| ROLLBACK
ROLLBACK["runSubagent('builder')\ngit checkout -- files\nverify with git diff\nlog + continue"]
ROLLBACK --> MARK
end
LOOP --> CKPT["Checkpoint every 3 tasks\nreport to user\ncourse-correct?"]
CKPT --> LOOP
LOOP --> FINAL["validate-all\nrunSubagent('validator')\nall commands + criteria\nactual output
required\nnever say done without proof"]
FINAL -->|pass| REPORT["Build Complete\ntask table\nactual command output\nfiles changed"]
style START fill:#e8f4ed,stroke:#2d7a4f,color:#1a4a2e
style PARSE fill:#f5f2eb,stroke:#6a6860,color:#1a1a1a
style BUILDER fill:#e8eef8,stroke:#2a5fa5,color:#1a3060
style VAL fill:#e8f4ed,stroke:#2d7a4f,color:#1a4a2e
style MARK fill:#eef7f1,stroke:#2d7a4f,color:#1a1a1a
style TASK fill:#f5f2eb,stroke:#6a6860,color:#1a1a1a
style DEBUG fill:#f7ece8,stroke:#c84b2f,color:#5a1f12
style VAL2 fill:#e8f4ed,stroke:#2d7a4f,color:#1a4a2e
style ROLLBACK fill:#f7ece8,stroke:#c84b2f,color:#5a1f12
style CKPT fill:#fdf6e3,stroke:#b08a2e,color:#1a1a1a
style FINAL fill:#e8f4ed,stroke:#2d7a4f,color:#1a4a2e
style REPORT fill:#eef7f1,stroke:#2d7a4f,color:#1a1a1a
The Rules the Build Prompt Never Breaks
| Rule |
Why it exists |
| NEVER implement code yourself |
Orchestrator has no write tools. All code goes through builder subagents. No exceptions. |
| NEVER skip validation |
Every builder task is followed by a validator dispatch. No exceptions. |
| NEVER say done without proof |
Final report must include actual command output — not "tests passed" but the actual output. |
| NEVER run validation yourself |
If validator can't run commands, report the failure. Don't take over validation. Orchestrator running
commands is a pipeline integrity failure. |
| Max 2 fix cycles per task |
Prevents infinite loops. After 2 failures, rollback and continue — never leave broken code. |
| Rollback on exhausted cycles |
git checkout -- files on failure. Verify with git diff. Broken code never
reaches the next task. |
The Builder's TDD Contract
Every builder dispatch includes this preamble — the model cannot skip it:
TDD preamble — injected into every builder prompt
1. Write a FAILING test that covers the acceptance criteria. Run it. Confirm RED.
2. Write the MINIMAL implementation to make it pass. Run it. Confirm GREEN.
3. Refactor if needed. Run again. Confirm still GREEN.
4. For commands >30 seconds, note estimated duration.
5. Never make random changes hoping to fix issues. Understand WHY before changing code.
Command 03
Bug to PR
/bug_to_pr
A complete local async pipeline. Describe a bug; receive a reviewed,
adversarially-approved, mergeable GitHub PR — with full audit trail attached.
This command deliberately beats the GitHub async Copilot workflow by running
everything locally with your full hooks, TDD enforcement, and nested orchestration —
but still produces a proper GitHub PR with attached artifacts. You walk away.
The pipeline works. You come back to a decision: merge or reject.
The Six Phases
-
P0
Setup
Generates BUG-NNN ID, creates bugs/BUG-NNN/ directory, creates fix/bug-nnn git branch, initialises
pipeline state file for crash recovery.
-
P1
Triage
bug-creator investigates the codebase, reproduces the bug, writes a JIRA-format report with all 8
required sections (hook-enforced). bug-router reads the report and module registry, routes to the
correct specialist fixer.
-
P2
Fix (nested orchestration)
Phase 2a: specialist fixer reads bug report and creates a fix spec (the plan_to_build equivalent).
Phase 2b: orchestrator runs the full build protocol inline — TDD, validators, fix cycles, rollback.
Phase 2c: captures test evidence to bugs/BUG-NNN/test-results.md.
-
P3
PR Creation
Commits and pushes the fix branch. Opens PR with structured body. Posts the full bug report as a PR
comment — the PR becomes the audit hub.
-
P4
Adversarial Review
reviewer-alpha runs independently, verdict held in memory. reviewer-beta runs independently — alpha's
file does not exist yet (structural isolation). Only after both complete are review files written to
disk and posted as formal PR reviews.
-
P5
Merge Gate
Both reviewers must APPROVE. You confirm. Pipeline merges and deletes the branch. If either rejects,
presents reasons and offers a retry fix cycle (max 2 total).
Full Pipeline Diagram
bug_to_pr — all six phases
flowchart TD
IN(["/bug_to_pr\ndescribe the bug"]) --> SETUP
SETUP["P0: Setup\nGenerate BUG-NNN\ncreate branch fix/bug-nnn\nwrite pipeline-state.json"]
SETUP --> CREATOR
CREATOR["P1: Triage\nbug-creator investigates\nreproduces bug\nwrites JIRA report\n8 sections enforced by
hook"]
CREATOR --> ROUTER
ROUTER["P1: Route\nbug-router reads report\nreturns module + fixer\nhigh or medium confidence"]
ROUTER --> FIXER
FIXER["P2a: Fix Plan\nbug-fixer-module\nreads report, investigates\nwrites specs/fix-bug-nnn.md"]
FIXER --> BUILD
BUILD["P2b: Build\norchestrator runs build inline\nTDD + validators per task\nfix cycles + rollback on
failure"]
BUILD --> EVIDENCE
EVIDENCE["P2c: Test Evidence\nrun module test command\ncapture to test-results.md\nverify actual output"]
EVIDENCE --> PR
PR["P3: PR + Artifacts\ngh pr create\nbug report posted as PR comment\npipeline-state updated"]
PR --> ALPHA
ALPHA["P4: Review Alpha\nindependent 5-point review\nverdict held in memory\nbeta file not yet written"]
ALPHA --> BETA
BETA["P4: Review Beta\nindependent 5-point review\nverdict held in memory\nalpha file not yet written"]
BETA --> WRITE
WRITE["Write both review files\npost as gh pr review\napprove or request-changes"]
WRITE --> GATE
GATE{both\napprove?}
GATE -->|yes| CONFIRM["User confirms\ngh pr merge\nbranch deleted"]
GATE -->|no, retry| RETRY["Re-enter fix phase\nwith rejection feedback\nmax 2 cycles"]
GATE -->|no, exhausted| STOP["Report all reasons\nstop pipeline"]
RETRY --> FIXER
style IN fill:#e8eef8,stroke:#2a5fa5,color:#1a3060
style SETUP fill:#f5f2eb,stroke:#6a6860,color:#1a1a1a
style CREATOR fill:#f7ece8,stroke:#c84b2f,color:#5a1f12
style ROUTER fill:#f7f0dc,stroke:#b08a2e,color:#5a4010
style FIXER fill:#e8eef8,stroke:#2a5fa5,color:#1a3060
style BUILD fill:#e8f4ed,stroke:#2d7a4f,color:#1a4a2e
style EVIDENCE fill:#eef7f1,stroke:#2d7a4f,color:#1a1a1a
style PR fill:#edf2fb,stroke:#2a5fa5,color:#1a1a1a
style ALPHA fill:#f7f0dc,stroke:#b08a2e,color:#5a4010
style BETA fill:#f7f0dc,stroke:#b08a2e,color:#5a4010
style WRITE fill:#fdf6e3,stroke:#b08a2e,color:#1a1a1a
style GATE fill:#fdf0ec,stroke:#c84b2f,color:#1a1a1a
style CONFIRM fill:#eef7f1,stroke:#2d7a4f,color:#1a1a1a
style RETRY fill:#fdf0ec,stroke:#c84b2f,color:#1a1a1a
style STOP fill:#ede9df,stroke:#c84b2f,color:#7a7468
Adversarial Review Isolation
Reviewer isolation is enforced structurally rather than by a hook.
The orchestrator holds both verdicts in memory and writes nothing to disk
until both reviewers have completed:
Isolation protocol
Step 1: Dispatch reviewer-alpha. Verdict returned in response text. Do NOT write to
disk.
Step 2: Dispatch reviewer-beta.
bugs/BUG-NNN/reviews/alpha.md does not exist yet
— it cannot be read even if beta tries.
Step 3: Only after both verdicts are held in orchestrator memory: write both files
simultaneously, then post as
gh pr review.
What's genuinely lost: No system-level enforcement prevents the orchestrator from
accidentally writing alpha's file before dispatching beta. This is protocol discipline, not hardware
guarantee.
PR Artifacts
Every bug_to_pr run produces this trail on the GitHub PR:
| Artifact |
Where |
Posted via |
| Bug report |
bugs/BUG-NNN/report.md |
gh pr comment --body-file |
| Fix spec |
specs/fix-bug-nnn.md |
committed to PR branch |
| Test evidence |
bugs/BUG-NNN/test-results.md |
committed to PR branch |
| Alpha verdict |
bugs/BUG-NNN/reviews/alpha.md |
gh pr review --approve/--request-changes |
| Beta verdict |
bugs/BUG-NNN/reviews/beta.md |
gh pr review --approve/--request-changes |
| Merge decision |
bugs/BUG-NNN/verdict.json |
committed to PR branch |
Crash Recovery
Sessions can crash. Pipeline state is written to disk after every phase.
On restart, the orchestrator reads bugs/BUG-NNN/pipeline-state.json
and resumes from exactly the right phase — no re-running completed work.
| State in file |
Resumes from |
| "setup" |
Phase 1: Triage |
| "triage" |
Phase 2a: Fix Planning |
| "fix" |
Phase 3: PR Creation |
| "pr" |
Phase 4: Adversarial Review |
| "review" |
Phase 5: Merge Gate |
Infrastructure
Guardrails
Three layers of enforcement: hooks that fire on platform events,
rules embedded non-ignorably in prompts, and structural design
that makes violations architecturally impossible.
Hook System
hooks.json — three events
flowchart LR
subgraph SESSION ["sessionStart"]
S["setup.sh / setup.ps1\n- pip install -r requirements.txt\n- npm install if no node_modules\n- runs before
every session\n- no manual setup ever needed"]
end
subgraph POST ["postToolUse — fires on every file write"]
P1[".py written\n→ ruff check\nblocks on lint failure"]
P2[".ts/.tsx written\n→ tsc --noEmit\nblocks on type error"]
P3["specs/*.md written\n→ 7 required sections\n→ validator frequency\nblocks if incomplete"]
P4["bugs/*/report.md written\n→ 8 required sections\nblocks if incomplete"]
end
subgraph STOP ["agentStop — fires when agent finishes"]
A["validate_spec.py\nfinal gate:\nall 7 sections present\nblocks agent from stopping\nif spec incomplete"]
end
style SESSION fill:#f7f0dc,stroke:#b08a2e,color:#5a4010
style POST fill:#f7f0dc,stroke:#b08a2e,color:#5a4010
style STOP fill:#f7f0dc,stroke:#b08a2e,color:#5a4010
style S fill:#fdf6e3,stroke:#b08a2e,color:#1a1a1a
style P1 fill:#fdf0ec,stroke:#c84b2f,color:#1a1a1a
style P2 fill:#fdf0ec,stroke:#c84b2f,color:#1a1a1a
style P3 fill:#fdf0ec,stroke:#c84b2f,color:#1a1a1a
style P4 fill:#fdf0ec,stroke:#c84b2f,color:#1a1a1a
style A fill:#fdf6e3,stroke:#b08a2e,color:#1a1a1a
Enforcement Levels
| Concern |
Level |
Mechanism |
| Spec 7 required sections |
system |
postToolUse blocks write if any missing |
| Bug report 8 required sections |
system |
postToolUse blocks write if any missing |
| Python lint on every .py write |
system |
ruff check — blocks with exact error |
| TypeScript types on every .ts write |
system |
tsc --noEmit — blocks with exact error |
| Validator frequency (>5 builders) |
system |
postToolUse: (builders//5)+1 validators required |
| Dependency auto-install |
system |
sessionStart hook: pip + npm before every session |
| TDD on every builder task |
prompt |
TDD preamble injected into every builder dispatch |
| Systematic debugging |
prompt |
reproduce→isolate→root cause→fix — embedded in build |
| No direct implementation |
prompt |
EXECUTION DIRECTIVE in every spec file |
| Review isolation (alpha/beta) |
structural |
alpha's file doesn't exist when beta runs |
| Read-only agents (router/reviewer) |
prompt |
no disallowedTools in Copilot — instructions only |
| Merge requires both approvals |
prompt |
orchestrator rule + ask_questions confirmation |
Design Philosophy
Why It Works This Way
The Spec Is the Contract
The build prompt makes no decisions — it reads what the spec says and executes it.
All intelligence is front-loaded into the spec by /plan_to_build.
This constraint is intentional: a mechanical executor is predictable, auditable,
and trustworthy in a way that a decision-making orchestrator is not.
The spec must be airtight because the build prompt is mechanical. This is not a
limitation — it is a design choice that produces better specs. When the
executor cannot compensate for vagueness, the planner has no choice but to be precise.
"The spec is the contract. Everything downstream — builders, validators,
reviewers — works from the spec and nothing else."
Design principle
Skills vs Embedded Philosophy
The system has skill files in .github/skills/ covering TDD,
systematic debugging, verification before completion, safe rollback, and more.
These skills are loaded on relevance — the model decides whether to pull them.
That's probabilistic. Under pressure, the model might not load the right skill.
The solution: embed the critical rules directly in the prompts where
the model cannot choose not to read them. Skills are the source of truth.
Prompts are the enforcement layer.
brainstorming
plan_to_build → Prerequisite 1
One question at a time, multiple-choice preferred
team-composition
plan_to_build → Prerequisite 2
Single / two / three builders based on workstream independence
writing-plans
plan_to_build → Task Quality Rules
≥50 word descriptions, 2-5 min task size, design assertions required
test-driven-development
build → builder preamble
RED-GREEN-REFACTOR. Test first is mandatory.
systematic-debugging
build → fix cycle dispatch
Reproduce → isolate → root cause → fix. No random changes.
verification-before-completion
build → validator dispatch
Never say PASS without actual command output
safe-rollback
build → exhausted fix cycle
git checkout on 2 failed cycles. Verify with git diff.
executing-plans
build → batch checkpoints
Pause every 3 tasks, give user chance to course-correct
plan-reviewer
plan_to_build → Self-Audit
Count builders/validators before saving. Fix if wrong.
Local vs Async — The Comparison
The local pipeline is not a compromise version of async — it is the superior workflow
for work that demands quality. You can launch multiple /bug_to_pr sessions
simultaneously, one per terminal, each on its own branch, each running the full pipeline
autonomously. Go to lunch. Come back to multiple PRs — each with a JIRA bug report,
test evidence, and two independent adversarial review verdicts attached. That is
genuinely parallel async work, locally, with no quality compromised.
The GitHub async Copilot workflow produces a PR. The local pipeline produces a PR
plus everything the async workflow cannot: mandatory TDD on every task,
hooks that block bad code at write time, adversarial review by two isolated agents,
semantic merge conflict resolution, and a full audit trail from triage through merge gate.
The only thing the async workflow adds is that no terminal needs to be running — a
marginal advantage when your terminal is already open.
Use the local pipeline for work you care about. Use the async GitHub
Copilot workflow only for pure delegation — simple, low-risk tasks you would assign
to a junior developer and review later, where quality gates can be lighter.
For everything else, the local pipeline wins.
One more advantage worth stating plainly: the local pipeline is completely portable.
It runs in any IDE, against any git-compatible repository — GitHub, GitLab, Bitbucket,
self-hosted Gitea, whatever your organisation uses. The GitHub async Copilot workflow
is locked to GitHub.com. If your team is on GitLab or your enterprise runs an internal
git server, the async workflow is simply unavailable. The local pipeline has no such
dependency.
| Dimension |
Local pipeline |
GitHub async Copilot |
| Quality gate |
Hooks at write time + adversarial review |
CI at PR time + human review |
| TDD enforcement |
Mandatory preamble in every builder |
Copilot's own defaults |
| Review |
Two isolated adversarial agents |
Human code review |
| Merge conflict |
Orchestrator resolves semantically |
You resolve manually |
| Audit trail on PR |
Bug report + test evidence + two verdicts |
PR diff + CI logs |
| Parallel execution |
Multiple sessions simultaneously |
Multiple issues assigned to @copilot |
| Crash recovery |
pipeline-state.json — resume any phase |
Stateless — each session fresh |
| Terminal required |
Yes — any terminal or IDE |
No — runs in GitHub Actions |
| Best for |
Work you care about — quality is non-negotiable |
Pure delegation — simple, low-risk tasks |
| Portability |
Any IDE · any git host · GitHub, GitLab, Bitbucket, self-hosted |
GitHub.com only |