Architectural excerpt

Uncertainty, weight classes & layoffs

The honest answer to "how confident is the model?" is not always "very." Some fights sit in uncertain territory: debut fights at a new weight class, returns from long layoffs, matchups between fighters with sparse data at overlapping quality levels. A model that outputs false precision in those situations is not being rigorous. It is hiding uncertainty from the reader. I would rather label the doubt than let it lie in wait.

Why uncertainty is a design choice rather than a limitation

Antifragile systems react to volatility honestly rather than suppressing it to maintain the appearance of stability. The model is built on the same principle. Wide confidence intervals on uncertain fights are the correct output, not a sign that something has gone wrong or that the model is inferior.

A narrow interval on a debut fight would mean the model is making up precision it does not have. That is far more dishonest and carries far worse consequences than admitting the uncertainty.

Bands are 90% intervals for the reasons I stated in the overview article. I won't repeat my tirade here. The site labels which method built the confidence interval for the fight page you are on (bootstrap resampling, Cauchy fallback, weight-class debut, or a combination) so you know which regime you are in.

CI methods with honest triggers

When sufficient reference data exists, the model uses bootstrap intervals. The point forecast is a single regression fit on all eligible UFC history. That fit is useful, but it is not the only fit history could have produced. This split of point estimate and confidence interval in the model distinguishes inherent fight volatility from genuine ignorance about the matchup when choosing a finish method (Hüllermeier & Waegeman (2021)).

At its most basic, bootstrapping asks a question about the historical data: how much would this forecast move if the training sample had looked slightly different? If we removed a single outlier, would our estimate of finish probability change drastically? Bootstrapping captures uncertainty in the sampled dataset. We dont assume anythign about the world at large, or fit any lazy distribution choice like normal or student-T to make the model "cleaner."

A wide bootstrap band means small changes in which fights entered the overall model-wide training would have moved the forecast a lot. A narrow band means the coefficients are comparatively stable for this region of the feature space (Efron (1979)).

As a final clarification, the bootstrap router does not use idle days. Layoff width is handled separately (see below). A bootstrap label does not mean ring rust was accounted for. That distinction matters, and I am calling it out here so it is not misread later.

Why Cauchy is right and Lazy Baesian is wrong

When the effective training sample is too small (below 20 equivalent fights after weighting), the model falls back to Cauchy intervals because that is the maximum amount of uncertainty possible to model. The Cauchy distribution has infinite tails. It does not pretend everything is well-behaved when the data cannot support that assumption.In sparse situations, the relationships driving the forecast are themselves unreliable. Heavy tails reflect that reality rather than pretending it away. Applying conventional thin-tailed assumptions where there is no empirical basis is exactly the kind of overconfidence Nassim Nicholas Taleb warns about (Taleb (2020)).

If either corner has zero prior UFC bouts in the exact weight class before fight night, the model forces a debut-specific Cauchy path and skips bootstrap entirely. Bootstrap measures uncertainty in a well-sampled division with historical comparables. It cannot fix "never seen this fighter at this weight", and we shouldn't pretend that it can. There is no honest basis for tight confidence borrowed from fights at another division.

Why Kalman variance grows during idle time rather than holding steady

A stale 12-month-old rating is a worse prior than tonight's cage result. Uncertainty in the rating grows each day a fighter is idle, so the return fight applies a larger rating update than a short-turnaround bout would (Kalman (1960)).

The rating itself does not decay toward 1500 during idle. Inactivity increases doubt about the number, not automatic erosion of reputation. Certainly a 100 year layoff should be expected to erode the fighter's ELO because they would no longer be breathing, but we choose to separate out the known and unknown in this way. Model features are based on the known, uncertainy is a more natural fit for times when fighters are out of sight.

Why the Kalman update is larger on return, not smaller

Classical ELO computes a full rating move from the result: the "raw" step. Dampening after layoff would mean shrinking to protect legacy and retain reputations through inactivity. That is the fragile choice where we live in the past and not in the world of today. We choose to reject this and amplify.

A stale rating is a worse prior than tonight's cage result, so fresh evidence should weigh heavier. New evidence always tells us something about the world of today better than slavishly retaining our priors. This is against human nature, but is an antifragile construction to black-swan events. Former champions who return and look poor should fall fast. Rising fighters who break through after a layoff should be credited immediately.

Illustratively, a fighter on a normal turnaround might take only about half of the raw ELO move on the next fight. After roughly three months idle, that fraction rises to about two-thirds. After a year, closer to four-fifths. The longer the gap, the less we trust the old number, so the return bout moves the rating more aggressively than a quick turnaround would.

Why layoff also widens confidence intervals

Separate from rating updates, long idle periods widen the forecast bands by perturbing the headline rating gap. Independent shocks apply per corner, scaled by time off.

The widening grows with the square root of years idle, not a single cliff at 12 months. No layoff produces the narrowest band. Roughly three months off is moderate. One year is noticeably wider. Two years is wider still, capped so extremes do not dominate. At the top of each page you can see the percentile of the days off for easy comparison between the two fighers to form your own opinion.

When both bootstrap and layoff widening apply, the site combines them. Weight-class debuts skip layoff sampling and use pure Cauchy on probabilities instead.

Why global and division idle clocks are tracked separately

Rating uncertainty and layoff widening use days since the fighter's last fight in any division. Activity at another weight still observes the athlete. Ring rust is not division-specific as traditionally conceived.

Per-division last-fight dates are stored for division ratings and UI labels. A fighter active at lightweight returning to welterweight has not been globally idle even if they are debuting the welterweight ladder. Those clocks answer different questions and are not interchangeable. Conflating them would mislead, so the site keeps them separate.

Why percentile framing is used in outputs

"99th percentile striking matchup" is more legible than an internal model weight. Reference grids compare each input to every percentile from 0 to 100 in the historical training cohort.

A percentile estimated from hundreds of historical fights is a more reliable label than one estimated from a dozen. The band on the percentile itself tells you how confident the model is in calling a matchup historically extreme, not just whether it looks extreme by the point estimate.

Why abstention lives outside the model

The model outputs a probability distribution on every fight. Whether to act on that distribution is a separate question involving the market line, your own edge assessment, and staking rules. Those layers must never be conflated. WE ARE NOT PROVIDING YOU, THE READER, GAMBLING ADVICE ON THIS SITE, SINCE WE DONT KNOW YOUR PERSONAL RISK TOLERANCE AND THE ODDS YOU ARE GETTING.

Planned stake filtering (if we did have opening and closing lines integrated) would suggest abstaining unless the best priced outcome clears breakeven by a sufficient margin, typically signle digit percents above even money factoring in lines holdout. If looking for guidance, I recommend the 1/4 or 1/2 Kelly criterion is the standard framework for sizing your bet once an edge is identified, BUT THAT DEPENDS ON YOUR OWN FINANCIAL SITUATION AND RISK TOLERANCE.

← All decisions