Project Structure

Each paper lives in its own folder, created by /init. The .dml_project marker file identifies a valid project — do not delete it.

Folder layout

~/papers/<slug>/
├── .dml_project              ← project marker (do not delete)
├── config.yaml               ← edit before running
│
├── raw_data/                 ← DROP YOUR FILES HERE
│   ├── paper.pdf             ← the paper to replicate (required)
│   └── *.{dta,csv,xlsx}      ← replication data
│
├── data/                     ← generated by the pipeline
│   ├── paper_spec.json       ← ⚠ READ-ONLY after stage 1
│   ├── dataset.parquet
│   └── results/
│       ├── descriptives.json
│       ├── data_report.md
│       ├── replication_check.json
│       ├── replication_results.json
│       ├── dml_results.json
│       ├── diagnostics_flags.json
│       └── diagnostics_report.md
│
├── code_build/               ← SOURCE notebooks — edit here if needed
│   ├── paths.py              ← imported by every notebook
│   ├── 01_paper_intelligence.ipynb
│   ├── 02_data.ipynb
│   ├── 03_replication.ipynb
│   ├── 04_dml_extension.ipynb
│   ├── 05_diagnostics.ipynb
│   └── 06_report.ipynb
│
├── code_run/                 ← EXECUTED notebooks (never edit)
│   └── *.ipynb               ← output of nbconvert --execute
│
└── paper/                    ← FINAL DELIVERABLES
    ├── paper.tex
    ├── paper.pdf
    ├── paper_template.tex
    ├── paper_spec_schema.json
    ├── tables/
    │   ├── table_replication.tex
    │   └── table_dml.tex
    ├── figures/
    │   ├── forest_plot.pdf
    │   └── forest_plot.png
    ├── prompts/              ← per-project copies of referee skill files
    └── review_history/       ← APPEND-ONLY
        ├── round_1/
        │   ├── ref1.md
        │   ├── ref2.md
        │   ├── ref3.md
        │   ├── synthesis.md
        │   └── changelog_1.md
        ├── round_2/          ← if needed
        ├── round_3/          ← if needed
        └── final_report.md   ← ★ START HERE

Immutability rules

Path	Rule
`data/paper_spec.json`	Read-only after stage 1 completes
`code_run/`	Written by `nbconvert` only; never edit manually
`paper/review_history/`	Append-only; revision agent adds new files, never overwrites
`code_build/` notebooks	Revisions are appended as `## Revision Round N` sections; existing cells are never modified

What to read after the pipeline finishes

Read these in order:

paper/review_history/final_report.md — human-readable summary: what was solved, what remains open, severity table, key numbers comparison (original → replicated → DML).
paper/paper.pdf — the compiled paper.
paper/figures/forest_plot.pdf — coefficient forest plot showing original, replication, and DML estimates side by side.
paper/review_history/round_N/ — full referee reports and changelogs for each round if you want the detailed review history.

`config.yaml` reference

paper:
  title: ""
  authors: ""
  year:
  journal: ""

analysis:
  outcome_var: ""        # column name in the data
  treatment_var: ""      # endogenous variable (for DML)
  instrument_var: ""     # instrument (leave blank for OLS-only papers)
  controls: []           # list of control variable column names
  ml_learners:           # learners to use in DML first stage
    - lasso
    - random_forest
    - xgboost
  n_folds: 5             # cross-fitting folds (default 5)

review:
  max_rounds: 3          # maximum review rounds (default 3)

Folder layout

Immutability rules

What to read after the pipeline finishes

config.yaml reference

`config.yaml` reference