Notes on team rankings


compiled by Jon Bruschke of CSU, Fullerton

NOTE: These are my own thoughts and do not represent a community consensus.  There are many critics of this ranking system and many points they make are insightful and valid.  This discussion serves as an argument for system; its validity is certainly not above debate, and I welcome discussion, critique, and challenge, as a way to improve the system.  In the end, I believe the systems adds interest, especially in the JV and novice divisions, and I will defend it as a useful approach that is not, at present, a substitute for district or NDT rankings.

 

A QUICK NOTE ABOUT "OBJECTIVITY:"  Imagine team A has won 3 tournaments (Wake, Northwestern, and Kentucky) and team B hasn't won any, but team B is 3-0 in head-to-head match-ups against team A.  Who's better?  The question reduces no farther than this: it depends on whether you think winning tournaments or head-to-head ratings mean more.  You can assign a numerical weight to each factor, but ultimately it is a SUBJECTIVE decision about which factor means the most.  Team rankings are valuable and give a very good relative strength measure, but it is probably the case that no single system can "objectively" identify which team is better.

There are rankings of schools with NDT points and CEDA points, but no rankings of individual teams, except for the pre-bid rankings to the NDT.  The team rankings I offer are a supplement to those ranking systems.  With all tournament results entered into a single database, there are a number of calculations that can be made that would have heretofore been too labor intensive to try.  I offer them not as a definitively superior system, but as a starting point so that we as a community can start imagining new and different ways of measuring success.  One kind of cool thing is that it allows the comparison of Novice and JV teams in a way that has almost never happened heretofore. 

This document contains explanations of the Bruschke rankings, the Hanson rankings, and the Adjusted Elim rankings.

 

THE BRUSCHKE RANKNING SYSTEM

 

In 2002-2003, this system correctly predicted 15 of 16 first-rounds.  The team on this list that didn't get a bid was NYU GG, who won CEDA nationals and got to the octos of the NDT where they were also the 15th seed.

There are 2 basic calcuations:  One is a raw score that awards points based on a team's finish at a tournament and the quality of that tournament (called the "raw Bruschke score").  The second is a broader score that uses the raw scores and other information, like elim win percentages, record against the top 25, etc.  These latter scores I'm calling the debate RPI.

 

THE RAW SCORE: HOW IS IT CALCULATED?

 

The basic assumption of the system is that the best measure of tournament quality is tournament size.  The more teams, the better the tournament.  While this may not be a perfectly valid assumption, I will mention that, (a) teams go to tournaments that are well run, including the good teams, (b) tournaments with good competition tend to attract good competition, and (c) I don't believe that anyone can seriously name a time they went to a 200-team tournament that really sucked.  More on this in the next section, but for now I will mention that the system is only as good as this assumption.

The first calculation is to compute a percentile finish for each team at each tournament.  If there are 121 teams at a tournament, they are ranked 1-121, and assigned a percentile finish.  Obviously, first and second place are determined by who won the final round, third and fourth are determined by ranking the semi-finalists by normal criteria (prelim wins first, then adjusted points, then total points), fifth through eighth by ranking the quarter-finalists, and so on, finally ranking all the teams that didn't clear.  The team in first gets a percentile rank of 100%, and each team gets a percentile by dividing their finish by the total number of teams (team 17 at a 121-team tournament, for example, gets an 86% score).  This is, by the way, how the SAT is scored.

 

The next step is to have tournaments weighted.

 

For the total points score, each tournament is given a weighting.  The best-attended tournament of the year receives a weighting of 1.00, and each other tournament is weighted by dividing the number of teams at that tournament with the highest-rated tournament.  For example, if the best-attended tournament had 211 teams at it, a tournament with 113 teams would have a weighting of 113 divided by 211, or .54.  Points are calculated by summing percentile ranks at each tournament multiplied by the weighting for the tournament.  Here's an example:

 
Percentile Finish Tournament weight Score
84.23 1.00 84.23
90 .73 65.7
81.31 .65 52.85
97.3 .93 90.49
Total: 293.27

IS TOURNAMENT SIZE THE BEST MEASURE OF TOURNAMENT QUALITY?

 

Well, maybe not, I think that they're the best measure that we have.  See points A, B, and C above.  The system breaks down if a tournament has 100 crappy teams attend, but I again assert that as an empirical point that doesn't really happen.  The point is an empirical one, but I believe that at well-run tournaments teams start to attend, which raises the quality of competition, and then more teams start to attend to debate against the good competition, so that if you hit a tournament with a size of about 60 teams you have one where there's excellent regional competition with a good national draw and if you hit over 100 teams then everybody who's anybody is there.

 

WHAT ABOUT ROUND ROBINS?

 

I hate round robins.  The best teams get together to debate each other and get better and nobody else can get as good as those teams are getting because they can't get a bunch of consecutive rounds against good teams.  And invitations to round robins always leave out deserving teams, so you can't really say a team left out but deserving should miss out on the points they would have earned had they been invited.  All the same teams go to all the same major invitationals anyway and should be debating each other from octos on if they really are the best teams in the nation.  In my view, little is lost by not including them in ranking systems, and much is to be gained in community inclusiveness by not having them altogether.

 

In the current ranking system, round robins count the same as any other small tournament.

 

I will begrudgingly say that there IS a way that they can be incorporated into the current system.  Simply make the weighting for each tournament depend on two things instead of one:  The number of teams at the tournament AND the average point totals for the teams attending the tournament.  Each factor could count equally or some unequal weighting could be generated (tournament weights could, for example, depend 75% on the average points of teams in attendance plus 25% based on size).  This is not done here due mostly to my basic dislike for round robins, but analytically it poses no problem.  With the addition of the Win Quality Index (see below), wins at round robins count just like wins against any other good team.

 

WHAT ABOUT THE NDT?

 

It's true that the NDT is NOT one of the 5-6 largest tournaments of the year, and this system that weights tournaments solely by size may not weight it as heavily.  However, there are three points to be made.

 

First, it doesn't really matter what the rankings are AFTER the NDT.  Much as nobody cares about the coaches' poll when the NCAA basketball tournament is over (because you know who the national champion is and no longer have to rely on polls) you don't really need rankings AFTER the NDT because you know who the champion is.

 

Second, the NDT is NOT a small tournament.  It's weighting would be fairly substantial, although not determinative in end of the year rankings.

 

Third, there are at least two ways the system could be altered to incorporate the NDT (which are not used in the current system).  (a) The NDT could be assigned a weight equal to or ever larger than the largest tournament of the year.  Since the top weight is always 1.0, the NDT could be given a weight of 1.0, or even a weight as large as 2.0 (making it count twice as much as the largest tournament).  (b) If the quality of competition weightings, as described in the round robin section above, were adopted, the NDT weighting would shoot right up there.

 

CONCLUDING THOUGHTS

 

Undoubtedly, this project will strike some as elitist.  I guess in some ways it is.  My only defense is that we are pretty much an elitist activity at our core -- virtually every tournament declares a champion and calls one team better than all the rest.  We have at least 2 major tournaments to crown national champions.  Part (but only part) of the value of our activity comes from its competitive nature, and part of competition is that when someone wins someone loses.  Whether there is a Bruschke ranking system or not, there will be attempts to figure out who the best teams are at various points in the season, if only for invitations to the vice-ridden round robins.

 

Right now, those decisions are made in ways that involve either politics, gut instincts, or judgment calls.  What this system introduces is an attempt to find an objective way to rank the teams, one that doesn't rely on who you know, who you drink with, how you did last year, or what high school camp you went to.  It depends exclusively on how you've finished at the tournaments you've attended.  It isn't perfect, it won't correct the other imbalances in our community, but it represents a way to try to provide an equitable way to rank the teams during the course of the season.  I hope it will stimulate discussion of what our community is and what we should be about.

 

 

THE BCS OF DEBATE: The Debate RPI score

 

The raw Bruschke Points are an attempt to rank teams (rather than schools) based on tournament finishes and nothing else.  Basically, teams get points based on (a) their finish at tournaments, and (b) the tournament size.  The best you could do was win the largest tournament of the year.  I think they remain a useful index of overall team quality.

 

It is, however, possible to do quite well at a tournament while missing the bulk of the competition.  The move to use "opponent wins" as a tie-breaker for seeding has revealed that there is often a vast difference in the quality of competition that teams with equal finishes faced.  The debate RPI (in 2002 the Bruschke Points 2) compensate for this and add some measures of opponent strength.

 

The funky new addition is a "Win Quality Index," which is a measure of the strength of your best wins.  It begins by selecting a number of wins to count based on a sliding scale depending on the number of debates that have happened at a given point in the season.  It finds the team with the most wins and makes 40% of their wins the number counted.  For example, if the team with the most wins in the country has 50 wins, 40% of that is 20 wins, and so the WQI counts everybody’s best 20 wins.  Opponent strength is measured by the original Bruschke Points (see the accompanying explanation for how those are calculated).  The function would sum the Bruschke Points for the 20 opponents with the most Bruschke Points.  In one sentence, the WQI is: "The summed opponent Bruschke Points for the best 40% of their wins."  (Slightly inaccurate – it’s 40% of the wins that the team with the most wins has, not the team in question).

 

Like the original Bruschke Points, the WQI favors teams that debate more but does so in a less direct way.  If the WQI is counting the top 20 wins, a team with 50 wins will have a considerable advantage over a team with only 22 wins.  Nonetheless, if a team has only 22 wins but beat 18 of the top 20 teams in the country they have a shot.  I believe that this is as it should be – if you are debating less often, you need more quality wins to prove you belong at the top.

 

However, relying exclusively on a measure of opponent win strength disadvantages teams at the very top of the bracket, since they will have the best records and they can’t beat themselves.  In addition, since they are winning the most, they are bumping other teams out of the tournaments and lower their records in prelims, creating fewer opponent wins in traditional terms and a lower WQI in Bruschke Points 2 nomenclature.  What is needed is a measure that accounts BOTH for finishes at tournaments AND opponent strength, or at least a measure of how they fare against really good opponents.

 

With the WQI in hand, the rest of the Bruschke 2 points calculate fairly easily.  For five different measures, a team is given a percentile finish (which standardizes the units across measures) and then has them weighted as follows:

 

Original Bruschke Points: 35%

Win Quality Index: 35%

Elim round record: 10%

Record against the Top 25 Bruschke Point teams: 10%

Overall win percentage: 10%

 

The result is a scale where all teams are ranked between 0 and 1, with 1 being the highest score.

THE HANSON POINTS

The Hanson Ranking System ranks teams based on three key premises: 1) How well teams do at tournaments, accounting for how difficult those tournaments are; 2) How many "good" wins a team has and how good those wins are; and 3) How many "bad" losses a team has and how bad those losses are.

 Devising a system to measure these premises is not easy but we believe the below ranking system is one effective method for doing so.

Step One: Calculating Proportionate Tournament Size (PTS)

Rationale for Step 1

Initially we believe that the larger a tournament is, the more likely it is to provide an accurate representation of a team’s ability, both because more teams are at the tournament, and because, typically, more good teams are at larger tournaments.

 PTS calculates the initial weight of a tournament on the basis of its size. PTS is calculated by dividing the size of a tournament by the size of the largest tournament of the year.

 EXAMPLE: Say that Northwestern was the largest tournament of the year with 160 teams. If Fullerton had 80 teams attend, its PTS score would be .5. So, initially, Fullerton tournament results would be weighted at half the weight given to Northwestern results.

Step Two: Calculating a team’s PTS Rating (PTS Rating)

Rationale for Step 2

In this step, we examine each team’s success based on how well they did at tournaments weighting their results based on the proportionate tournament size (PTS).

 A Team’s PTS Rating is an initial rating of a team’s success based solely on a team’s success at tournaments, weighted by the PTS (calculated in Step 1). The PTS rating is a rough gauge of a team’s success over the course of the year showing how well a team performed at tournaments, weighted by the size of the tournaments they attended.

 PTS is calculated by Part One: multiplying a team’s percentile finish at a tournament by the PTS of that tournament to give a Raw PTS, and Part Two: adding up that team’s Raw PTS and then dividing by the total PTS that they could have earned at the tournaments they attended. This gives a sense of how well that team did versus what they possibly could have achieved.

 EXAMPLE: WOOCESTER DZ

 Part 1 of Step Two: Calculate a Team’s Raw PTS Rating

 

PTS of each tournament

Woocester DZ’s Percentile Finish at this Tournament

Woocester DZ’s Raw PTS

PTS times Percentile Rank

Northwestern

1

0.8568

0.8568

Fullerton

.5

0.7456

0.3728

 

 

Sum of Woocester DZ’s Raw PTS

1.2296

 Part 2 of Step Two: Calculate a Team’s PTS Rating

 

Sum of possible PTS

Woocester DZ’s Raw PTS

Woocester DZ’s PTS

Sum of Possible PTS divided by Raw PTS

 

1.5

1.2296

0.819733

 NOTE THIS CAVEAT: Because the PTS potentially inflates the rating of teams that exclusively attended small easier tournaments we have this stipulation: Teams that have not attended tournaments with 20 or more teams shall not receive more than a .4 PTS. We realize this is somewhat arbitrary but otherwise teams attending and winning exclusively small weak tournaments would be counted as an overly high PTS of 1. We believe that very few if any teams would attend only extremely difficult, small round robins (such teams also almost always attend large national tournaments). Such teams will be able to show their strength in head to heads.

Step Three: Tournament Difficulty (TD)

Rationale for Step 3

We believe that four factors indicate the difficulty of a tournament. First, the size of the tournament is indicative of how representative the tournament is. We believe the larger the tournament, the more likely it is to rank a team accurately. Second, the amount of top teams attending the tournament is a good indicator of the strength of the tournament. Tournaments that attract more of the nation’s top teams are, from our observations, more difficult tournaments. The third and fourth factors, the proportion of the pool that consists of top teams and the average rating of a team at that tournament helps show how well distributed the competitiveness of the pool is. We weight each of these four factors equally.

 Tournament Difficulty (TD) calculates how difficult a tournament is based on the size of the tournament and the quality of the teams attending the tournament. TD is based on four equally weighted factors: First, the Proportionate Tournament Size (PTS) for the tournament; Second, the percentage of teams in the top 20% of teams in the country attending the tournament based on their PTS rating (e.g. 6 of the 40 best teams in the country are attending); Third, the proportion of teams at the tournament who are in the top 20% of teams in the country based on their PTS rating (e.g. 6 of the best teams in the country are attending out of the 82 teams at the tournament); and Fourth, the average PTS rating of all teams at the tournament.

 EXAMPLE: TD (Tournament Difficulty) for a variety of tournaments

 

PTS

Percent of Top 20% teams in the country attending the tournament

Proportion of the teams at the tournament that are Top 20% teams in the country

Average PTS Ranking of Teams at the Tournament

Sum of 4 factors

TD: Tournament Difficulty (TD)

Sum of 4 factors divided by the best tournament

Wake

0.875

0.833333

0.357143

0.45

2.515476

0.952231

Kentucky

0.8125

0.75

0.346154

0.42

2.328654

0.881509

Fullerton

0.5

0.5

0.375

0.45

1.825

0.690852

Harvard

0.3875

0.583333

0.564516

0.6

2.135349

0.808334

Berkeley

0.375

0.333333

0.333333

0.4

1.441667

0.545741

Gonzaga

0.25

0.2

0.285714

0.38

1.115714

0.422352

KY RR

0.05625

0.15

1

0.9

2.10625

0.797319

Pepperdine

0.28125

0.083333

0.111111

0.32

0.795694

0.301209

GSU

0.875

0.5

0.214286

0.35

1.939286

0.734114

WGA

0.4375

0.333333

0.285714

0.4

1.456548

0.551374

Weaker Tournie

0.1375

0

0

0.15

0.2875

0.108833

Northwestern

1

0.866667

0.325

0.45

2.641667

1

Best tournament

 

 

 

2.641667

 

Step Four: Tournament Success Rating (TSR)

Rationale for Step 4

We believe that a team’s rating should depend on the difficulty of the tournaments they attended since defeating tougher opponents indicates that a team is better. In this step, we measure a team based on their success at tournaments in relation to the difficulty of the tournaments and size of the tournaments. Thus, a team’s Tournament Success Rating (TSR) is assessed based on a team’s seeding at tournaments and those tournaments’ difficulty as determined by the average of each tournament’s 1) size and 2) difficulty.

 Tournament Success Rating (TSR) is the first rating that determines a team’s final rating. It is based solely on a team’s success at tournaments weighted by tournament difficulty (TD).

 TSR is calculated by doing this:

First, subtract a team's final seed and the number one from the product of 100 and the tournament's TD-Size Average, and then add 1.

For example, a team places 2nd at a tournament with 40 teams and a tournament difficulty of .42. The calculation is: subtract 2 from ((40+(.42*100))/2), then add 1. This yields 40.

Second, divide the result of step 1 by the TD-Size Average, then multiply by the TD of the tournament to give a number of points earned for that tournament.

In our example: 40/41 = .976; .976*.42 = .409 earned at that tournament

Third, the sum of all the points earned at tournaments is divided by the sum of the difficulty of all the tournaments attended to calculate the team's TSR.

In our example below, the Sum of Points earned at tournaments was 4.8534077; divide that by the sum of the TD at the tournaments, 5.47, to give a TSR of .8872775.

 EXAMPLE: Whitman RS TSR

 

Tournament Size

Tournament Difficulty (TD)

TD-Size Average: Avg of Tournament Size and TD

Final Seeding at the tournament

Percentile: calculated by taking seed divided by average of TD and Tournament Size.

Points earned at the tournament

Percentile multiplied by tournament difficulty

Gonzaga

40

0.42

0.41

2

0.97560976

0.4097561

KY

130

0.88

1.09

18

0.8440367

0.7427523

Wake

140

0.95

1.175

13

0.89787234

0.8529787

Pepperdine

45

0.3

0.375

1

1

0.3

USC

80

0.69

0.745

14

0.82550336

0.5695973

Fullerton

80

0.69

0.745

12

0.85234899

0.5881208

Berkeley

60

0.54

0.57

16

0.73684211

0.3978947

Northwestern

160

1

1.3

2

0.99230769

0.9923077

 

Sum of TD

5.47

 

 

Sum of Pts

4.8534077

 

 

 

 

 

TSR

0.8872775

 NOTE THIS CAVEAT: If a team does not attend a tournament with a .7 or higher TD, then you take their TSR and subtract this amount: .7 minus their highest tournament TD. So, for example, if a team’s highest TD is a .52 and they have a TSR of .82, then you subtract (.7 minus .52) from their TSR of .82 to get a TSR of .64. We do this because teams that have not attended large/difficult tournaments will have higher TSR scores than they should have (for example, a team winning 4 small, weak tournaments would have a TSR of 1 without this caveat). When coupled with the head-to-head steps below, we have found this gives a fairly accurate assessment of such teams’ strength.

Step Five: Quality of Wins (QOW)

Rationale for Step 5

A vital indicator of a team’s ability is the quality of their wins. Thus, we look at the top 20% of a team’s wins. 20% was chosen because we found that when more then 20% was chosen, QOW was disproportionately deflated for teams who tended to hit less difficult teams, but still had many quality wins.

 Quality of Wins (QOW) is the second category that will affect a team’s final rating. QOW attempts to measure the quality of a team’s best wins. It also attempts to mitigate the effects of a team that has been successful but has failed to defeat quality teams by demanding that teams with more wins show proportionately more quality wins.

 QOW is calculated by multiplying the total number of a team’s wins by .2 in order to determine the amount of wins that will be considered in the QOW. QOW is the average TSR (Team Success Rating) of the top 20% of a team’s wins.

 EXAMPLE: Whitman RS.

 Whitman RS had 54 Wins.

 54 * .2 = 10.8

 So, we look at the average of Whitman RS’s top 11 wins: 

QOWW

Pts

Georgia CR

0.983

Kansas BJ

0.93

Cal BS

0.925

ISU  YM

0.895

USC IS

0.889

USC IS

0.889

USC IS

0.889

UMKC GF

0.869

UMKC GF

0.869

NW CS

0.863

MSU DR

0.796

QOW:

0.890636

 

Step Six: Quality of Bad Losses (QOBL)

Rationale for Step 66

 Quality of Bad Losses (QOBL) is the average (TSR) Tournament Success Rating of a team’s "bad losses," defined as any loss to a team with a lower TSR.

 EXAMPLE: WHITMAN RSS

 The below is the average of Whitman RS’s losses to teams with lower TSR’s: 

NW CS

0.863

Emory BC

0.635

Texas TW

0.745

Gonz BS

0.600

Loyola MA

0.595

MSUAB

0.746

QOBL:

0.697333

Step Seven: Calculating the Initial Hanson Rating (IHR)

Rationale for Step 7

This calculation weights a Team’s Success Rating (TSR), QOW (Quality of Wins), and QOBL (Quality of Bad Losses). We weighted in a way that we believe best matched with the experience of first round rankings under the assumption that first round rankings are representative of the way that the community weighs tournament success and head to heads.

The weighting ultimately depends on the number of quality wins that a team has, the number of "bad losses" that a team has, with the remainder based on TSR. Quality of Wins tends to range from 15-45% of a team’s final rating with the better a team’s record, the higher the percentage.  The weight of Quality of Bad Losses depends on how many bad losses a team has, but this tends to range from 5-15%. This means that a team’s Tournament Success Rating ranges from 40-80% of a team’s Initial Hanson Rating (IHR). Teams with a higher winning percentage use TSR less and use QOW more in assessing the IHR.

 The Initial Hanson Rating is calculated by weighting TSR, QOW, and QOBL on the basis of a team’s record.

 First, the number of quality wins is divided by the total number of rounds.

EXAMPLE: Whitman RS

54 * .2 = 10.8

10.8/ 77 (total rounds)

= .14

 That number is then multiplied by 300 to determine the percentage of a team’s score that will be derived from the QOW

 EXAMPLE: Whitman RS

.14 * 300 = 42% (QOW Weight)

 Second, the number of bad losses is divided by the total number of rounds.

 EXAMPLE: Whitman RS

5 / 77 = .06

 That number is multiplied by 100 to determine the percentage of a team’s score that will be derived from the QOBL.

  .06 * 100 = 6.4% (QOBL Weight)

 Third, the percentage of the score that will be derived from the TSR is calculated by subtracting from 100 the percent that QOW and QOBL will weight.

 EXAMPLE: Whitman RS

100-6.4-42 = 51.6% (TSR Weight)

 Fourth, multiply the teams rating in a given category by the weighting previously determined and add them together.

 Ex: Whitman RS

 

Rating

Weight

 

TSR

0.887

51.6

45.7692

QOW

0.89

42

37.38

QOBL

0.697

6.4

4.4608

 

 

 

Initial Hanson Rating: 87.61

  Step 8: Victory Point Bonus

Rationale for Step 8

We believe that because the community recognizes that winning tournaments is of greater significance then the statistical difference between seed "1" and "2," that the winner of a tournament should receive a small bonus to distinguish their success.

 The champion of a tournament receives a "victory point bonus" equal to the strength of the tournament they won. For example, if a team wins a tournament with a TD of .8, then .8 is added to their final ranking.

 Example:

80.5 + .8 = 81.4

  Step Nine: Calculate Final Hanson Rating

Rationale for Step 9

Our observations indicate that some team’s strength is not accurately assessed until checking for quality of wins and losses. By recalculating tournament strength as well as head to heads, we believe we ensure a more accurate assessment of teams’ ratings.

 The Initial Hanson Rating calculated in Step Seven is recycled back into Steps Three through Seven, in order to readjust the tournament difficulty and head to heads. We think this is particularly important for head to heads, since some team’s final rankings are significantly different than their Team Success Rating (TSR) ratings after the consideration of head to heads.

Note: Debate results currently does not perform step 9 (JB, 10-28-08)

 

ADJUSTED ELIM HEAD-TO-HEAD

This scheme takes the premise that the best measure of team quality is their elim record.  Only two calculations are performed.  First, teams get credit for each elim win they have weighted by tournament size.  The teams are then ranked, with a lower score being better.  Second, the head-to-head records of that team against all the teams ranked higher and lower than they are get calculated.  If their head-to-head record against the teams ranked higher than they are is above 50%, they get bumped up.  If their record against teams ranked lower than they are is lower than 50%, they get bumped down.

This system is probably more useful assessing the very top teams, and not as useful for comparing mid-range teams that don't have a lot of tournaments in common.