§ The testing framework

Would this strategy actually have made money?

A shared backtesting harness for the Minutemen Alternative Investment Fund. Every member's strategy runs through the same pipeline — same data, same metrics, same stress tests — so results are directly comparable instead of each person producing a different one-off notebook that can't be graded against anyone else's.

§ Exhibit 01 The equity race

Three strategies, one window.

Representative equity curves across 2022–2025 on SPY. The covered-call writer outperforms the underlying by nearly 40 percentage points; a mechanical SMA crossover roughly matches buy-and-hold after frictions.

Intermediate shape is synthesized from the published endpoint returns — actual per-day series ship once export_scorecard_json() lands.

Equity · Index = 100 · 2022 → 2025
Representative · pending JSON export
Covered CallSPY+0.0%
SMA CrossoverSPY+0.0%
Buy & HoldSPY+0.0%
§ 01 — 03 Three stress tests, standard

Survive more than one way of being wrong.

  1. § 01

    Monte Carlo stress

    Six synthetic-data generators — GBM, block bootstrap, regime switching, noise injection, Heston stochastic vol, and the trained cGAN — run automatically. If a strategy shows positive Sharpe on pure GBM noise, it is overfitting.

    280+ backtests · per scorecard
  2. § 02

    GAN regime scenarios

    A conditional Wasserstein GAN trained on four real SPY regimes — bullish, bearish, sideways, crash — generates unlimited synthetic paths that preserve regime-specific volatility clustering and tail behavior.

  3. § 03

    Engine divergence

    Every strategy runs through both a bar-based engine (market orders at next open) and an event-driven engine (stop / limit / OCO fills intrabar). If the engines disagree, the strategy's edge is sensitive to execution — a flag that live results will differ.

“The disagreement is the point. A strategy that only works under one engine, one asset, or one historical window isn't a strategy — it's a fit.”
§ For MAIF Plug in

Write a strategy. Get a scorecard.

Three lines of user code, one 4-page scorecard PNG, letter grades on every dimension. Members bring their own logic; the framework handles data, execution, frictions, stress tests, and reporting.