Quick Start

Prerequisites

Claude Code CLI installed
Python ≥ 3.10 with Jupyter and nbconvert
The following Python packages (installed once per machine):

pip install doubleml econml scikit-learn pandas pyarrow statsmodels linearmodels
pip install nbconvert jupyter pyyaml

For Stata .dta files: pip install pyreadstat

1 · Scaffold a new project

/init ~/papers/acemoglu2001

This creates the standard project folder layout with a config.yaml template and empty raw_data/, code_build/, code_run/, data/, and paper/ directories.

2 · Add your files

Drop the paper PDF and replication data into raw_data/:

cp acemoglu2001.pdf ~/papers/acemoglu2001/raw_data/paper.pdf
cp colonial_origins_data.dta ~/papers/acemoglu2001/raw_data/

The pipeline expects exactly one paper.pdf. Data files can be .dta, .csv, .xlsx, or .parquet.

3 · Edit `config.yaml`

Open ~/papers/acemoglu2001/config.yaml and fill in the key fields:

paper:
  title: "Colonial Origins of Comparative Development"
  authors: "Acemoglu, Johnson, Robinson"
  year: 2001
  journal: "American Economic Review"

analysis:
  outcome_var: log_gdp           # column name in the data
  treatment_var: institutions    # endogenous variable
  instrument_var: settler_mort   # instrument (if IV)
  controls: [latitude, africa, asia, other_cont]
  ml_learners: [lasso, random_forest, xgboost]

review:
  max_rounds: 3

The full schema is documented in paper/paper_spec_schema.json after stage 1 runs.

4 · Launch the pipeline

Choose your extension method:

/recast ~/papers/acemoglu2001       # DoubleML extension
/recast-cf ~/papers/acemoglu2001    # Causal Forest extension

The full pipeline runs unattended:

Step	Time (approx.)
Stages 1–6	5–20 min depending on data size
Advisor Gate	2–3 min
Review loop (1–3 rounds)	10–30 min
Final report	1–2 min

5 · Read the outputs

~/papers/acemoglu2001/
├── paper/paper.pdf                          ← compiled paper
├── paper/figures/forest_plot.pdf            ← replication + DML comparison
└── paper/review_history/final_report.md     ← START HERE

final_report.md summarises: what replicated, what changed under DML, which referee issues were resolved, and which remain open.

Re-running from a specific stage

If you fix a data issue after stage 2, resume from stage 3 without re-running the paper intelligence notebook:

/stage 3 ~/papers/acemoglu2001

If a referee flags a blocking issue in the DML extension, re-run from stage 4:

/stage 4 ~/papers/acemoglu2001

Running only the review loop

If you have already run the analysis notebooks and want to re-run the peer review on a revised paper.tex:

/review ~/papers/acemoglu2001