Skills Reference

Skills are internal sub-agent skill files in skills/. They are not invoked directly by the user — commands load them as system prompts when spawning sub-agents. Each skill defines a narrow role, the files it reads, and the files it writes.

Pipeline skills

`orchestrator.md`

Read by: every slash command, before anything else.

Defines the execution rules that all agents must follow: how to run notebooks, the filesystem-as-state model, immutability rules, referee isolation protocol, and error handling.

Key rules enforced: - Agents communicate exclusively through files. No in-memory state passing. - data/paper_spec.json is read-only after stage 1. - code_run/ is written by nbconvert only. - paper/review_history/ is append-only.

`notebook_runner.md`

Read by: /run, /recast, /recast-cf, /stage.

Executes a single notebook:

jupyter nbconvert --to notebook --execute --inplace \
  --ExecutePreprocessor.timeout=1800 \
  --output-dir <project>/code_run/ \
  <project>/code_build/0N_*.ipynb

Notebooks must be executed with cwd = <project>/code_build/ because they import from paths import * to resolve all file paths. On failure, reports the failing cell number and error traceback.

Gate

`advisor_gate.md`

Read by: /recast, /recast-cf (after stage 6).

Spawns three independent checks in sequence. Each check reads specific files and returns PASS or FAIL with a reason.

Check	Files read	Validates
Code Auditor	All `code_run/` notebooks	No execution errors; all expected outputs exist
Identification Checker	`paper_spec.json`, `replication_check.json`	Internally consistent identification strategy
Data Validator	`diagnostics_flags.json`	No critical data quality flags

All three checks must return PASS. The orchestrator stops the pipeline on the first FAIL.

Review skills

`review_loop.md`

Read by: /review.

Defines the loop logic: state detection (count existing round_*/ dirs), per-round execution (3 referees → synthesis → revision), exit conditions, and progress reporting format.

`revision_agent.md`

The only agent that writes to paper/paper.tex and code_build/ notebooks.

Reads the synthesis report and implements changes:

Issue severity	Action
Blocking	Edits `code_build/0N_*.ipynb` (appends `## Revision Round N` section) + signals `RERUN_NEEDED: yes`
Major	Direct edit to `paper/paper.tex`
Minor	Direct edit to `paper/paper.tex`

After implementing all changes, writes paper/review_history/round_N/changelog_N.md documenting every change made and every issue deferred, with reasons.

Key rule: never overwrites prior cells in notebooks — always appends. Never overwrites prior files in review_history/.

`synthesis_referee.md`

Read by: review loop, after all three referee reports are written.

The first agent in the chain that sees all three referee reports. Deduplicates overlapping concerns, classifies each unique issue as blocking / major / minor, and checks prior changelog_*.md files to avoid re-raising resolved issues.

`final_referee.md`

Read by: /final.

Reads all rounds’ ref*.md, synthesis.md, and changelog_*.md files, plus the final paper/paper.tex. Writes paper/review_history/final_report.md as a human-readable summary.

Referees

Each referee is isolated: it receives only its own skill file as a system prompt and reads only the files listed below. No referee has access to any other referee’s output. The synthesis referee is the first agent that sees all three reports.

`referee_1_identification.md`

Mandate: Causal identification strategy only.

Reads: paper_spec.json, data/results/replication_results.json, paper/paper.tex

Evaluates: instrument validity, exclusion restriction, first-stage strength, LATE vs. ATE interpretation, external validity.

`referee_2_dml_methods.md`

Mandate: DoubleML / Causal Forest implementation and methodological choices.

Reads: paper_spec.json, data/results/dml_results.json and/or causal_forest_results.json, hte_results.json, code_build/04_*.ipynb, paper/paper.tex

Evaluates: cross-fitting procedure, learner selection, nuisance R², GATE/CATE validity. For Causal Forest: honesty, tree count, SE plausibility (check 21 — blocking if ATE CI is 10x+ narrower than individual CATE CIs), feature importance interpretation.

`referee_3_robustness.md`

Mandate: Replication fidelity and robustness.

Reads: paper_spec.json, data/results/replication_check.json, data/results/diagnostics_flags.json, paper/paper.tex

Evaluates: numerical match to published tables, sample coverage, sensitivity to specification, data quality flags from stage 5.