About the model

Most statistical models have two problematic assumptions: stationarity and independence.

No process is ever truly stationary in the real world. Heraclitus identified this millennia ago: "No man ever steps in the same river twice, for it's not the same river and he's not the same man." No action is truly independent of any other action either. Human agents are reflexive, aware, and modify their behavior based on the actions of others and previous instances.

MMA is a particularly sharp case of both problems. No effective model could pretend that the early UFC was the same as the UFC of today, that every rematch was identical, or that style matchups do not demonstrably change outcomes. This site is built around those realities.

I set out to design a model and a site where the downsides are all taken upfront in the methodology; where honestly and humility take center stage over the smoke, mirrors, and promises that so often dominate the public-facing sports analytics world. I hope in these articles to explain my decisions in a way that both communicates the value in the model to an inquisitive user, as well as establishing the known unknowns for any user to adjust the model's results based on their prior intuitions.

The forecasting pipeline and model methodology are all authored by Andrew Erbs.

What the model actually does

I realized while looking at fight odds that predicting a winner or loser was not that impressive. Every game, someone wins and someone loses and MMA is a highly variable sport. The floor is pretty high for mere guesswork. As a result, I set out to build a six-way probability breakdown: TKO win, Submission win, Decision win, Decision loss, Submission loss, and TKO loss. I have yet to find any other model where this approach is taken. If you know of one, please send me a message because I'd love to look at their methodology.

As even a single piece of data from within the fight can dramatically swing the likely outcome, we chose to limit our analysis to publicly available pre-fight info. Every bit of data we took was from the publc record on either UFCSTATS.com or via the ESPN MMA Fight Center. These forecasts are meant to be actionable before the event, not a post-fight explanation engine.

This public data feeds twelve matchup inputs that in turn feed a readable linear model over the six possible outcomes. The summary bars on the site (total win, finish win, go to decision) are simple sums from the same breakdown.

We judged/trained the model the six-way outcome calibration. When it says 60%, similar outcomes (eg: decision win, submission, etc...) behave like sixty-cent chances over the fight history training set. Headline pick accuracy (overall W/L) is tracked, but it is not what we optimized for. By optimizing for the more granular outcome, we get the higher level W/L accuracy as a secondary result.

Why interpretability is non-negotiable

The exact reason behind any prediction must break apart into readable pieces or there is no way to LEARN from the model. Without interpretability, every bit of "learning" is merely retroactive narrative rationales being fit to whatever data subset you choose to look at. This way, additive logistic regression keeps the smoke and mirrors out of our model. Every result is decomposable and constant coefficients can be learned from as underlying rules that are always and everywhere applicable. Each of the 12 matchup inputs adds a fixed, inspectable contribution to each outcome. There is no XGBoost style black box and no SHAP workaround that presents the fascade of interpretabliity.

My problem with Shapley-style attributions is that they are not a consistent, causally interpretable factor. They are a post-hoc rule applied to a model that was never built to be explained that way. Black box models have interaction effects all over the place, so we cant decompose them back out using a Shapley method that ignores interaction effects altogether.

MMA outcomes do not work in that way. A striker's edge and strategy depends on who they are fighting. Finishing threat depends on chin vulnerability. Pretending every input acts in isolation is not honesty; it is a convenient fiction.

This model builds those interactions in explicitly and does not pretend that someone's chin doesnt affect their striking gameplan. We choose to build in interpretability from the start (Rudin (2019)).

Why uncertainty is treated as a feature, not a problem

Some fights are hard to call. Whether its a long layoff, a debut, a hard weight cut, or the first time a fighter sees a poor matchup. When we are uncertain, the confidence intervals widen rather than faking precision (Fauriat (2025)). wide band on a debut fight is not a failure. It is the model admitting it has not seen this fighter in this context before. I would rather show that uncertainty than imply a false sense of certainty.

We choose a 90% confidence interval, because I reject the orthodox convention of the 95% and want to make the point that any number chosen changes your end results. There is no possible confidence interval that should make one completely confident.

The best-guess probabilities and the bands are separate outputs. A long layoff can widen the band without moving the center forecast. Bands are about uncertainties, point estimates are based on known factors.

What the model deliberately does not do

As we mentioned, the prediction path does not use gradient-boosted trees, neural nets, or other opaque learners.

We choose to avoid a global cross-division ELO. We have no pound for pound raking in the traditional sense. Strength is always tracked per fighter and per weight class. The debut in a new weight class is treated as a new fighter. This means that if you have strongly held priors about a fighter's "natural" weight class, you should adjust expectations based on that.

Ratings do not automatically decay toward 1500 during layoffs. Uncertainty grows; the number itself is not quietly eroded. We have no idea how good Khabib would be today, or how good Jon Jones would be back at LHW, so the correct decision is to revert to the prior with a wider confidence interval.

Explore further

Data distributions show how ELO and matchup inputs spread across large samples of booked fights.

Model decisions walk through ELO per division, how matchup axes are built, and how uncertainty interacts with weight classes and layoffs.