Project Structure
Each paper lives in its own folder, created by /init. The .dml_project marker file identifies a valid project — do not delete it.
Folder layout
~/papers/<slug>/
├── .dml_project ← project marker (do not delete)
├── config.yaml ← edit before running
│
├── raw_data/ ← DROP YOUR FILES HERE
│ ├── paper.pdf ← the paper to replicate (required)
│ └── *.{dta,csv,xlsx} ← replication data
│
├── data/ ← generated by the pipeline
│ ├── paper_spec.json ← ⚠ READ-ONLY after stage 1
│ ├── dataset.parquet
│ └── results/
│ ├── descriptives.json
│ ├── data_report.md
│ ├── replication_check.json
│ ├── replication_results.json
│ ├── dml_results.json
│ ├── diagnostics_flags.json
│ └── diagnostics_report.md
│
├── code_build/ ← SOURCE notebooks — edit here if needed
│ ├── paths.py ← imported by every notebook
│ ├── 01_paper_intelligence.ipynb
│ ├── 02_data.ipynb
│ ├── 03_replication.ipynb
│ ├── 04_dml_extension.ipynb
│ ├── 05_diagnostics.ipynb
│ └── 06_report.ipynb
│
├── code_run/ ← EXECUTED notebooks (never edit)
│ └── *.ipynb ← output of nbconvert --execute
│
└── paper/ ← FINAL DELIVERABLES
├── paper.tex
├── paper.pdf
├── paper_template.tex
├── paper_spec_schema.json
├── tables/
│ ├── table_replication.tex
│ └── table_dml.tex
├── figures/
│ ├── forest_plot.pdf
│ └── forest_plot.png
├── prompts/ ← per-project copies of referee skill files
└── review_history/ ← APPEND-ONLY
├── round_1/
│ ├── ref1.md
│ ├── ref2.md
│ ├── ref3.md
│ ├── synthesis.md
│ └── changelog_1.md
├── round_2/ ← if needed
├── round_3/ ← if needed
└── final_report.md ← ★ START HERE
Immutability rules
| Path | Rule |
|---|---|
data/paper_spec.json |
Read-only after stage 1 completes |
code_run/ |
Written by nbconvert only; never edit manually |
paper/review_history/ |
Append-only; revision agent adds new files, never overwrites |
code_build/ notebooks |
Revisions are appended as ## Revision Round N sections; existing cells are never modified |
What to read after the pipeline finishes
Read these in order:
paper/review_history/final_report.md— human-readable summary: what was solved, what remains open, severity table, key numbers comparison (original → replicated → DML).paper/paper.pdf— the compiled paper.paper/figures/forest_plot.pdf— coefficient forest plot showing original, replication, and DML estimates side by side.paper/review_history/round_N/— full referee reports and changelogs for each round if you want the detailed review history.
config.yaml reference
paper:
title: ""
authors: ""
year:
journal: ""
analysis:
outcome_var: "" # column name in the data
treatment_var: "" # endogenous variable (for DML)
instrument_var: "" # instrument (leave blank for OLS-only papers)
controls: [] # list of control variable column names
ml_learners: # learners to use in DML first stage
- lasso
- random_forest
- xgboost
n_folds: 5 # cross-fitting folds (default 5)
review:
max_rounds: 3 # maximum review rounds (default 3)