Method

How it works.

A forecasting model, pre-committed against the prediction markets and graded the way a trading desk grades itself — on whether the price moves toward it, and on whether its probabilities are calibrated. Not a tip sheet. A track record built in the open.
The model

Each team carries a strength rating from a World Football Elo built on ~49,000 international matches, blended with squad market value and corrected for confederation-level bias (weaker regions inflate against weak regional opposition; an empirical-Bayes shrinkage re-levels them). That rating difference feeds a goal model — Poisson scoring rates with a Dixon-Coles low-score correction — which yields a full scoreline distribution for every fixture, and from it the win / draw / loss probabilities.

The tournament itself is a joint Monte-Carlo simulation: tens of thousands of simulated tournaments, coherent by construction (the advance probabilities sum to the right number of teams, the bracket reconciles). That's where the advance, group-winner, reach-round and champion numbers come from.

Predictions are pre-committed

Every forecast is timestamped to an append-only ledger before the match is played — nothing is fit in hindsight. That's the difference between a forecast and a backtest: when the grade comes, it's honest, because the call can't be edited after the fact.

It learns from the tournament

The model is an updating one. As each group game resolves, the simulation conditions on the actual result — that game is fixed, only the rest is simulated, on ratings that have absorbed what happened. Once the group stage is done, the knockout simulation runs on the real bracket and updated ratings. It's Bayesian-style evidence updating, not a one-shot prediction.

How it's graded

Closing-Line Value (CLV). The cleanest test of skill that doesn't wait for the result: did the market price drift toward the model after the forecast was logged? Positive CLV is a skill signal independent of who eventually wins — the same metric sharp bettors and desks live by. It needs scale to be conclusive, though: the established bar is a few hundred resolved forecasts before consistent positive CLV is more than suggestive, and the forecasts logged here are correlated across markets — so the early numbers are a signal forming, not a verdict.

Calibration. When the model says 30%, does it happen about 30% of the time? Graded with proper scoring rules (Brier, log-loss) against real outcomes as markets resolve. A model can be confident and wrong; calibration is the check that it isn't.

The honesty principle

The premise is pro-market: these markets are very hard to beat, so the model is held to the price, not graded against a strawman. Wins and misses are reported the same way. A mediocre result reported straight is worth more than a cherry-picked one — the rigor is the point, and it's the only thing that makes a public track record mean anything.

Full methodology, code, and pre-registration: github.com/Thesavagecoder7784/xResidual. Paper only — no real-money trading.