Pipeline Architecture
Paper Intelligence Agent
Reads the PDF, extracts identification strategy, variables, specifications
→ paper_spec.json spec mismatch → fix & re-runData Agent
Merges raw .dta/.csv files, constructs variables, applies sample filters
→ dataset.parquetReplication Agent
OLS / IV / 2SLS — reproduces the original paper's main table
→ replication_check.json → table_replication.texDML Extension
DoubleML PLIV / PLR
RF + Lasso learners
GATE heterogeneity
Causal Forest
CausalForestDML / CausalIVForest
Individual CATEs
Feature importances
Diagnostics Agent
Replication tolerance · nuisance R² · CI coverage · CATE plausibility · SE sanity check
→ diagnostics_flags.json critical flag → hard failReport Agent
Compiles LaTeX paper with tables, figures, and interpretation
→ paper.tex → paper.pdfAdvisor Gate · 3 independent checks
Code audit · Identification validity · Data quality — all must pass
hard fail → stopReferee 1
Identification
Estimand validity
Instrument strength
Referee 2
DML / CF mechanics
Learner choice
SE plausibility
Referee 3
Robustness
Table replication
Sensitivity checks
Synthesis Referee
Deduplicates · ranks severity · flags disagreements
→ synthesis.mdRevision Agent
Edits paper.tex directly · patches notebooks if blocking · writes changelog
→ changelog_N.md RERUN_NEEDED → /stage 4Final Referee · human-readable report
Remaining issues · severity table · key numbers comparison · recommendation
→ final_report.mdQuarto report
/publish → website
Referee reader report
final_report.md
This diagram shows the full RECAST pipeline from paper ingestion to published output. Left path annotations (red) show failure modes. Right path annotations show feedback loops and re-run triggers.
The pipeline runs one of two extension methods depending on the command:
/recast→ DoubleML (PLIV/PLR) with learner comparison and GATE analysis/recast-cf→ Causal Forest with individual CATEs and feature importances
Both paths share the same replication stage (1–3), diagnostics, and review loop.