Quick Start
Prerequisites
- Claude Code CLI installed
- Python ≥ 3.10 with Jupyter and
nbconvert - The following Python packages (installed once per machine):
pip install doubleml econml scikit-learn pandas pyarrow statsmodels linearmodels
pip install nbconvert jupyter pyyamlFor Stata .dta files: pip install pyreadstat
1 · Scaffold a new project
/init ~/papers/acemoglu2001
This creates the standard project folder layout with a config.yaml template and empty raw_data/, code_build/, code_run/, data/, and paper/ directories.
2 · Add your files
Drop the paper PDF and replication data into raw_data/:
cp acemoglu2001.pdf ~/papers/acemoglu2001/raw_data/paper.pdf
cp colonial_origins_data.dta ~/papers/acemoglu2001/raw_data/The pipeline expects exactly one paper.pdf. Data files can be .dta, .csv, .xlsx, or .parquet.
3 · Edit config.yaml
Open ~/papers/acemoglu2001/config.yaml and fill in the key fields:
paper:
title: "Colonial Origins of Comparative Development"
authors: "Acemoglu, Johnson, Robinson"
year: 2001
journal: "American Economic Review"
analysis:
outcome_var: log_gdp # column name in the data
treatment_var: institutions # endogenous variable
instrument_var: settler_mort # instrument (if IV)
controls: [latitude, africa, asia, other_cont]
ml_learners: [lasso, random_forest, xgboost]
review:
max_rounds: 3The full schema is documented in paper/paper_spec_schema.json after stage 1 runs.
4 · Launch the pipeline
Choose your extension method:
/recast ~/papers/acemoglu2001 # DoubleML extension
/recast-cf ~/papers/acemoglu2001 # Causal Forest extension
The full pipeline runs unattended:
| Step | Time (approx.) |
|---|---|
| Stages 1–6 | 5–20 min depending on data size |
| Advisor Gate | 2–3 min |
| Review loop (1–3 rounds) | 10–30 min |
| Final report | 1–2 min |
5 · Read the outputs
~/papers/acemoglu2001/
├── paper/paper.pdf ← compiled paper
├── paper/figures/forest_plot.pdf ← replication + DML comparison
└── paper/review_history/final_report.md ← START HERE
final_report.md summarises: what replicated, what changed under DML, which referee issues were resolved, and which remain open.
Re-running from a specific stage
If you fix a data issue after stage 2, resume from stage 3 without re-running the paper intelligence notebook:
/stage 3 ~/papers/acemoglu2001
If a referee flags a blocking issue in the DML extension, re-run from stage 4:
/stage 4 ~/papers/acemoglu2001
Running only the review loop
If you have already run the analysis notebooks and want to re-run the peer review on a revised paper.tex:
/review ~/papers/acemoglu2001