Pipeline Architecture

Phase 1 · Intelligence

Paper Intelligence Agent

Reads the PDF, extracts identification strategy, variables, specifications

→ paper_spec.json spec mismatch → fix & re-run

↓

Data Agent

Merges raw .dta/.csv files, constructs variables, applies sample filters

→ dataset.parquet

↓

Phase 2 · Analysis

Replication Agent

OLS / IV / 2SLS — reproduces the original paper's main table

→ replication_check.json → table_replication.tex

↓

/recast or /recast-cf

DML Extension

DoubleML PLIV / PLR
RF + Lasso learners
GATE heterogeneity

→ dml_results.json N learners

Causal Forest

CausalForestDML / CausalIVForest
Individual CATEs
Feature importances

→ causal_forest_results.json

↓

Diagnostics Agent

Replication tolerance · nuisance R² · CI coverage · CATE plausibility · SE sanity check

→ diagnostics_flags.json critical flag → hard fail

↓

Report Agent

Compiles LaTeX paper with tables, figures, and interpretation

→ paper.tex → paper.pdf

↓

Phase 3 · Review

Advisor Gate · 3 independent checks

Code audit · Identification validity · Data quality — all must pass

hard fail → stop

↓

Referee 1

Identification
Estimand validity
Instrument strength

Referee 2

DML / CF mechanics
Learner choice
SE plausibility

Referee 3

Robustness
Table replication
Sensitivity checks

↓

Synthesis Referee

Deduplicates · ranks severity · flags disagreements

→ synthesis.md

↓

↻ re-loop if blocking issues remain (max 3 rounds)

Revision Agent

Edits paper.tex directly · patches notebooks if blocking · writes changelog

→ changelog_N.md RERUN_NEEDED → /stage 4

↓

Phase 4 · Output

Final Referee · human-readable report

Remaining issues · severity table · key numbers comparison · recommendation

→ final_report.md

↓

Quarto report

/publish → website

Referee reader report

final_report.md

This diagram shows the full RECAST pipeline from paper ingestion to published output. Left path annotations (red) show failure modes. Right path annotations show feedback loops and re-run triggers.

The pipeline runs one of two extension methods depending on the command:

/recast → DoubleML (PLIV/PLR) with learner comparison and GATE analysis
/recast-cf → Causal Forest with individual CATEs and feature importances

Both paths share the same replication stage (1–3), diagnostics, and review loop.