How it works

Methodology · the corpus, the sweep, the score

Last updated: 2026-05-16 · Version 0.1 (planning, not shipped)
Forward-looking. This page describes the methodology as designed. The current site renders fixture data while the sweep engine is built. Numbers shown on the home page and inside the app are illustrative, not measured. We are publishing the methodology before the engine ships so you can audit the math before you trust the output.

1. The corpus

A versioned, git-tracked set of finance-related questions — the kind of questions humans actually type into Google: "what stock should I buy?", "is bitcoin safe?", "who makes the GPUs?". The corpus at v1 is curated by hand and audited for spread (consumer assets and infrastructure assets in roughly equal weight). The target size for the production v1 sweep is between 200 and 1,000 questions; the landing page's "4,832" figure refers to the long-term v2 corpus.

Each question carries metadata: a category (Demand, Supply, or Linked), an asked-count signal derived from public search-trend data, and a free-form intent note explaining what the curator is testing for.

2. The sweep

A sweep is a single pass of every question against every tracked model at a known version. Sweeps are batch, not real-time: they fire on model release events and on a weekly cadence. The model set today is GPT, Claude, Gemini, Llama, Grok, Mistral, and DeepSeek. We expand model coverage as new frontier models ship.

For each (question, model) pair we record the model's prose answer. A structured extraction pass — itself an LLM call constrained to JSON output — pulls a list of { ticker, confidence, rank } tuples from each answer. Tickers are disambiguated against a public exchange listing.

3. The surfaces

For each question, the model answers are aggregated into a single ranking. The ranking is the weighted mode across models, where each model contributes once and ties are broken by mean confidence.

Questions tagged Demand roll up into the Demand surface: a ranked list of assets that models say humans should buy. Questions tagged Supply roll up into the Supply surface: assets that enable the Demand surface to exist (chips, lithography, power, water, fabs, networks). An asset that appears on both surfaces gets a Linked badge.

4. Drift

Per question, per sweep pair (vn-1 → vn), drift is the normalised rank shift of the top assets across models, weighted by the fraction of models that agreed on the prior ranking. Drift ranges 0.00 (no change) to 1.00 (every model now disagrees with its prior self).

Drift is not directional — it does not say "asset X is now better". It says "models changed their mind."

5. Fracture

A fracture is the moment the consensus that gave a question its ranking breaks. Concretely: when the top-1 asset for a question is agreed on by fewer than 50% of the model set, and that 50% threshold was previously cleared in the prior sweep. Fractures fire a notification to subscribed users.

6. The Prophecy Score

The Prophecy Score is a 0–100 composite per asset. The components, weights, and what each tries to measure:

The score is computed nightly. Each component is percentile-ranked against the full asset universe so the score is bounded and stable across sweeps.

What it isn't: a price target, a forecast, or an alpha signal. A high Prophecy Score is a measurement of how inevitable an asset has become inside the training data. Inevitable is not the same as profitable. See Disclosures for the full caveat list.

7. What we measure vs. what we don't

8. Reproducibility

Every sweep is timestamped and tagged with the model versions and corpus version used. Per-sweep results will be exportable on the Pro plan via the public API. We commit to publishing the corpus schema, the extraction prompt, and the aggregation formulas in this document as the engine ships.

9. Open problems

Home Open terminal About Terms Privacy Disclosures © 2026 thesignals labs, inc.