Architectural excerpt

ELO as a method

The foundational question in any combat sports rating system is what a win actually means. Raw records ignore who you beat, when you beat them, and how definitively you beat them. A fighter who goes 5-0 against regional prospects is not the same as a fighter who goes 5-0 inside the UFC against ranked opponents, but a win-loss column treats them identically. ELO fixes that by making the value of a win a function of who you beat.

Why ELO over raw records

ELO learns from who you beat, not just that you won. Strength of schedule is baked into the rating rather than ignored (Elo rating system). A fighter who has been consistently active against elite opposition will rate higher than one who padded a record, even when the win totals look similar on paper.

A wide rating gap makes a win expected; a narrow gap makes an upset very informative. Production tuning sets how steeply expectation rises with the gap and how large each update step is.

Why per-division ELO rather than global

A fighter's quality in one weight class carries very little information about performance in another. Expecially in the modern UFC where fighters can cut massive amounts of weight, we simply dont know whether one fighter is above or below the at-fight weight of a new weightclass. The assumption that quality transfers across a weight cut is not tested, so the model does not make it.

Separate division ratings are cleaner than one global number. Each fighter-weight-class pair maintains its own uncertainty tracker. New entrants start at 1500 with wide initial doubt.

Why the K-factor scales with result type

Production tuning encodes how much each result type should move ratings relative to a unanimous decision. KO/TKO and submission wins move ratings at 1.5× that baseline. A unanimous decision is the 1.0× reference. Split or majority decisions move at 0.5× because one dissenting judge makes the result ambiguous either way. Draws, no-contests, and DQs move nothing; clocks still advance, ratings do not.

Any casual fight observer has seen split decisions that should never have been anything other than unanimous, and vice versa. As a result, I considered avoiding the issue of judging quality entirely and treating all fights that go the distance as identical in their weight. Ultimately, I decided to include the split/majority deliniation as containing meaningful information and included it. This decision is symmetrical for both fighters: split and majority losses also count as decision losses. One dissenting judge either way carries the same signal strength.

A dominant finish is a fundamentally stronger signal than a split decision win. Treating finishes and disputed decisions as equivalent updates is exactly the kind of outcome-frequency confusion Nassim Nicholas Taleb identifies as a core failure mode of naive empiricism (Taleb (2020)).

Doctor stoppages count as KO/TKO since they can't be easily distinguised in the boxscores. In a doctor stoppage, the finish is credited to the winner on the same scale as other stoppages.

Why ELO does double duty in this model

ELO is not just a number on a leaderboard. It quality-weights all downstream feature construction. Beating a strong opponent counts more than beating a weak one when building finish rates, striking tendencies, and grappling profiles. Dominating the striking against a great fighter should tell us more than dominating against a complete novice.

The rating gap between the two fighters also feeds directly into the matchup model. That captures any residual strength that "eliteness" may have in determining the outcome separate from the style axes.

Why the regression is UFC Tier-1 only

The six-way forecast model trains exclusively on UFC fights with full per-fight statistics. Strike, grappling, and control tables need to be consistently available. That is the reference population for the regression, and it is a scope choice I am stating plainly. All Tier-1 fight-level statistics come from ESPN MMA Fight Center. Upcoming cards are kept separate from training data.

I would consider expaning the analysis down the road, but I would need to have strong assurances around quality of all ingested data, and that simply isnt present at joe-schmoes fight card in central saskatoon reported on Sherdog. Cross-promotion discounts exist in the training stack (major promotions at 85% strength, regional outcomes at 65%) if those records are loaded down the road. As stated before, the shipped site model does not carry pre-UFC regional records into the regression features.

Why the modern era only for regression training

The sport has changed structurally since its inception. Athlete quality, fight-camp preparation, judging criteria, and finish-rate norms are different from the early UFC. Training on data from a different version of the sport means fitting to something that no longer exists, and so we choose to limit ourselves in the data set.

The production regression floor is UFC fights from 2005 onward. That calendar cutoff was tuned against holdout performance: several start years (including 2005, 2010, 2013, and 2015) were compared, and 2005 won. It is not a fixed "2013" rule of thumb.

ELO still uses all available UFC history regardless of era. Pre-2005 fights inform ratings but never enter the regression training set. Pooling those rows into the forecast fit would mix fight eras whose rules, athletic norms, and stat quality are not interchangeable. The sport is not stationary across its full history, and the regression is fit to predict today's UFC, not a pooled average across every era.

During development, fights from 2023 onward were held out of the training fit for honest out-of-sample scoring. The shipped model refits the frozen winner configuration on all eligible UFC rows through the latest snapshot date.

← All decisions