A versioned, git-tracked set of finance-related questions — the kind of questions humans actually type into Google: "what stock should I buy?", "is bitcoin safe?", "who makes the GPUs?". The corpus at v1 is curated by hand and audited for spread (consumer assets and infrastructure assets in roughly equal weight). The target size for the production v1 sweep is between 200 and 1,000 questions; the landing page's "4,832" figure refers to the long-term v2 corpus.
Each question carries metadata: a category (Demand,
Supply, or Linked), an asked-count signal
derived from public search-trend data, and a free-form intent note
explaining what the curator is testing for.
A sweep is a single pass of every question against every tracked model at a known version. Sweeps are batch, not real-time: they fire on model release events and on a weekly cadence. The model set today is GPT, Claude, Gemini, Llama, Grok, Mistral, and DeepSeek. We expand model coverage as new frontier models ship.
For each (question, model) pair we record the model's prose answer.
A structured extraction pass — itself an LLM call constrained to
JSON output — pulls a list of { ticker, confidence, rank }
tuples from each answer. Tickers are disambiguated against a public
exchange listing.
For each question, the model answers are aggregated into a single ranking. The ranking is the weighted mode across models, where each model contributes once and ties are broken by mean confidence.
Questions tagged Demand roll up into the Demand
surface: a ranked list of assets that models say humans should
buy. Questions tagged Supply roll up into the
Supply surface: assets that enable the Demand surface to
exist (chips, lithography, power, water, fabs, networks). An asset
that appears on both surfaces gets a Linked badge.
Per question, per sweep pair (vn-1 → vn),
drift is the normalised rank shift of the top assets across models,
weighted by the fraction of models that agreed on the prior ranking.
Drift ranges 0.00 (no change) to 1.00
(every model now disagrees with its prior self).
Drift is not directional — it does not say "asset X is now better". It says "models changed their mind."
A fracture is the moment the consensus that gave a question its ranking breaks. Concretely: when the top-1 asset for a question is agreed on by fewer than 50% of the model set, and that 50% threshold was previously cleared in the prior sweep. Fractures fire a notification to subscribed users.
The Prophecy Score is a 0–100 composite per asset. The components, weights, and what each tries to measure:
The score is computed nightly. Each component is percentile-ranked against the full asset universe so the score is bounded and stable across sweeps.
What it isn't: a price target, a forecast, or an alpha signal. A high Prophecy Score is a measurement of how inevitable an asset has become inside the training data. Inevitable is not the same as profitable. See Disclosures for the full caveat list.
Every sweep is timestamped and tagged with the model versions and corpus version used. Per-sweep results will be exportable on the Pro plan via the public API. We commit to publishing the corpus schema, the extraction prompt, and the aggregation formulas in this document as the engine ships.
MULN, HOOD) produce extraction false-positives. Disambiguation has a confidence floor that biases toward exclusion.