W/kg Cats Fail 2: The Light Rider’s Curse

The Heavy Rider’s Disadvantage

It could be debated if you can call it unfair, but heavy riders do suffer a disadvantage in Zwift races. The lighter riders have an easier time uphill and it is so hard to match the Watts needed to keep level with the lighter rider’s W/kg there.

The above is a very common complaint on various Zwift forums. But is it true?

I like to question those self-evident truths we all take for granted. If they are indeed truths then there is no harm in validating them. Sometimes, however, they turn out not to be true after all, once you actually take a serious look at them. So what about rider weight and race results in Zwift? Let’s take one of those serious looks for a change instead of just passing on what some other guy said in a one-liner on the forum.

The Light Rider’s Advantage

Without any prior knowledge about the impact of weight in Zwift racing we could assume three things:

1. There could be advantages to being light

2. There could also be disadvantages to being light

3. If there are both advantages and disadvantages to being light, perhaps depending on scenarios, then you could compare those advantages and disadvantages, weigh them against each other, and come to some kind of conclusion regarding the net effect of being light – is it more good than bad to be light, or is it the other way around?

So let’s start by looking at possible advantages to being light, since people say there are such advantages. There are no obvious advantages on the flat, and everyone seems to agree (we will get into details on this further down). What heavier riders say instead is that they have a hard time against light riders in climbs. 

On the flat speed is mainly maintained by momentum, so pure Watts is king and heavier riders can usually (although not necessarily) push higher Watts than a lighter rider with a smaller frame and less muscle volume. But in a climb W/kg is king. Body weight comes into play, and maybe it is easier for a light rider to attain a better ratio between Watts and body weight than it is for a heavier rider, especially a heavier rider with a few surplus kilos.

The above is reasoning taken from riding outdoors and in a different setting than Zwift racing with its unique and uniquely stupid rules. But it is actually a completely flawed argument and you need to fully understand why. The explanation is two-pronged. We start off with some physics. 

Question: A rider at 90 kg is time trialing against a rider at 70 kg up the Alpe du Zwift climb. Both are keeping the exact same lines and both are able to keep dead steady, ERG-like Watts. Both are doing exactly 3.19 W/kg. Who will win?

Answer: The lighter rider will win. By a few seconds. But it has nothing to do with Watts or weight. The lighter rider will win because he has an ever so slight advantage in drag, having a smaller frontal area. 

It’s similar to choosing between bikes in your garage before a Zwift climb. One frame will be ever so slightly faster than the other. However, when was the last time you saw a race up AdZ and only that? There are no such races. The closest you can get to that scenario in a Zwift race is a race on Road to Sky, a route which has quite the approach to the mountain, and the approach is flattish. So if we staged an iTT on the Road to Sky course, then this advantage in drag for the light rider, a mere seconds, is more than offset by the heavier rider’s advantage on the flattish approach to the mountain. On Road to Sky, or even Ven-Top with its very short approach, the heavier rider will win!

Also, what you need to understand is that if it wasn’t for the small difference in drag between the two riders, if they both raced in vacuum, then if both started at the same time at the foot of the climb, both riders would arrive at the finish exactly simultaneously. Because if we ignore the drag issue, then 3.19 W/kg is 3.19 W/kg. It doesn’t matter what you weigh. You will travel up the mountain at exactly the same speed. That’s what the measure W/kg implies, it’s its purpose, to equalize riders to make a comparison possible.

A heavier rider could in theory have a hard time producing high enough Watts to be able to match the W/kg of a lighter rider in a climb. But in our example we assumed that both climbed at exactly 3.19 W/kg, so the heavier rider already compensated his higher weight with higher Watts. And thus they are both traveling at the exact same speed up the mountain.

Now here comes the second prong of the argument. Put the above in relation to the W/kg cat system, with the performance ceilings in cat B-D. To be competitive in any cat B-D, you typically need to be able to put out W/kg at or very close to the performance ceiling, be it 2.5 W/kg, 3.2 W/kg or 4.0 W/kg. So to win a race on any course in, say, cat C, you need to be able to hold 3.19 W/kg, or someone else could come and do the 3.19 W/kg and beat you (there’s plenty of such riders). Agreed?

So to win a race up AdZ you thus need to be able to hold 3.19 W/kg. Assume you are contender, someone who could actually win in cat C. Then you will be able to race Road to Sky at 3.19 W/kg. If you are indeed one of those riders who could, then as we just concluded your weight doesn’t matter at all. And we already know that there are heavy riders who can do 3.19 W/kg up AdZ, and there are light riders who can do the same. Both kinds race up the climb at almost the exact same speed, bar the minuscule difference in drag. In fact, given that you are a contender, you advantaged being heavy on Road to Sky since you will be naturally faster in the approach and might thus either get a head start or save some energy before the climb.

GET THIS:

There is no advantage to being light in Zwift racing!

And this is because of the W/kg cat system. Without it things would be different. With a results-based categorization, a race on Road to Sky would favor lighter riders, whereas the heavies would still reign on Tempus Fugit. You would have to specialize and play to your unique advantages, just like in real cycling.

So if there are no advantages to being light in Zwift, could there still be disadvantages?

The light rider’s disadvantages

This post is named the Light Rider’s Curse, which refers to a tendency in Zwift racing. Many light riders have first-hand experience of improving fitness to the point where they reach the top of their current race category. Or rather what should have been the top of the race category. Only it isn’t.

You would think that being able to average e.g. 2.5 W/kg in cat D would make you competitive there. But that is not necessarily the case. First, you have to beat the cruisers. But even if we take the cruisers out of the picture, it can still be surprisingly hard for a light rider to get anywhere near a podium in the average Zwift race.

So they do what anyone would do in that situation. They try to improve fitness further still. Shouldn’t that help getting to a podium then? No, that’s just that final push that tips them over to the bottom of cat C. They got upgraded before they even saw a podium. 

Why is this? Is this real or just some bad excuse from failed light racers? It all seems so counter-intuitive. As a light rider you should have an advantage against the heavies in the hills, said a guy in a one-liner on the forum. And being able to do 2.5 W/kg you should have no problem getting a decent shot at the podium, right? So why don’t you win?

It’s because of this:

Someone doing 300W on the flat is going faster than someone doing 275W.

Yeah, of course he is! So what?

Well, what if it’s a semi-flat cat C race and the guy doing 300W weighs 94 kg? That’s 3.19 W/kg, within ZP’s cat limits. And what if the guy doing 275W weighs 77 kg? That’s 3.57 W/kg, way over limit. See the problem?

The heavy guy wins the race and the light guy, being slower, isn’t anywhere near a podium but is still a disgusting sandbagger who deserves a DQ. But this never happens in real-world cycling, only in Zwift. And it’s because of the W/kg cat system that no other sport uses. 

Specifically, it’s because of the W/kg ceiling of the lower cats in combination with ZP disqualifying racers afterwards, racers that they themselves allowed into the race. But you can’t have a performance ceiling in sports. And you should never have to disqualify a contestant for being “too good” in sports.

Most races consist of mainly flattish stretches and then some shorter climbs. At the W/kg ceiling of a cat, i.e. in that front group with the riders that actually have a chance to win the race, a light rider can in theory never match the speed of a heavier rider without going over limits and getting a DQ or even an upgrade, not unless the heavier rider is a cruiser. It’s simple maths.

If it’s simple maths in theory, then it should show in data too. So does it? Let’s find out!

Weight Study 1 – A Mix of Races

I grabbed some fresh data from ZwiftPower, a sample of 50 consecutive cat C races of all sorts (distances, elevation, etc). I only skipped races where

i) weight data was missing

ii) there were fewer than 6 cat C finishers according to ZP

iii) the race type didn’t lend itself to this test (like e.g. Hare & Hounds, age category or TTT races).

Then I compared the average weight of the 3 riders on the podium to the average weight of the other riders in the race (hence why I wanted at least 6 finishers).

Results

The podiums in the races had an average weight of 81.3 kg.
The remaining riders in the races had an average weight of 77.5 kg.

This nearly 4 kg difference between the average podium winner and the average loser turns out to be highly statistically significant, even at the 1% level (p = 0.00118). For those who aren’t into statistics, this means that it is extremely unlikely that this difference wouldn’t appear again and again if we picked some other random set of 50 races from the ZP database. And thus we can’t refute that there is indeed a difference in average weight between winners and losers. Winners are somewhat heavier on average. It is not bad to be heavy in Zwift racers, quite the opposite. It is bad to be light in Zwift races. The results prove it.

The W/kg cat system screws light riders. I will give a more detailed example than the the simple theoretical one above. Let’s work through this.

Assume the following:

-You are racing in the front group in cat C (for some reason there are no sandbaggers this time…)
-The group keeps a steady pace and you are at least 20 min from finish
-You weigh 75 kg
-You are on the wheel of a bigger guy @ 85 kg
-You are both in draft
-The big guy is able to hold a 20 min average of 286W, i.e. 3.2 W/kg according to ZP (286 x 0.95 = 272. 272/85 = 3.2)

The only way you can stay on his wheel is by matching his 286W. This would put you at (286 x 0.95)/75 = 3.6 W/kg. Keep at it for 20 min (if you can) and ZP will give you a DQ. People might even call you a sandbagger! You simply can’t win this race as a light rider and get away with it on ZP. It’s not just hard. It’s impossible.

Guys weighing 75 kg with a 1 hr FTP of 272W according to ZP will already have been upgraded to cat B. They will have seen very few podiums back in cat C if they were up against heavier riders. Which they were. And data supports our simple maths theory and the existence of a Light Rider’s Curse.

The Objection

But wait a minute! “Assume you are both in draft…” Granted, draft in Zwift doesn’t give quite as much help as outdoors but it is certainly a factor. What if these heavier winners are just better at drafting? It seems unlikely. Why wouldn’t drafting skills be evenly spread out over riders of all weights and sizes? But it’s a good idea to eliminate draft when you are doing a study like this. So how could we eliminate it? By studying only individual time trials instead. On a TT bike you can’t draft.

Weight Study 2 – Only TT Races, No Draft

So instead I scraped 40 consecutive iTT races in cat C from ZP. What were the average weights for the podium vs the rest of the field? Was there a difference? And was it statistically significant (i.e. not random)?

Results for iTT’s in Cat C

Podium avg weight: 83.9 kg
Losers avg weight: 78.1 kg
Difference: 5.8 kg
Statistical significance: p=0.00004 (probability of a random sample/event resulting in such a difference)

Conclusion: The difference is not random. In fact, a pharma company doing a study on a new promising medication would do wheelies and open up the champagne if getting results of this magnitude. So heavier riders do have an advantage in cat C, even in iTT’s where there is no draft.

“Ok, but maybe this is exclusive to cat C. I don’t care about the fat noobs in cat C anyway. I race in B.”

So let’s look at cat B too.

Results for iTT’s in Cat B

Podium avg weight: 77.7 kg
Losers avg weight: 73.0 kg
Difference: 4.7 kg
Statistical significance: p=0.00007

Conclusion: The difference is not random. We can see that people weigh less in cat B, just as I predicted in and older blog post, but there is still a clear advantage for the relatively heavier rider, even without draft.

“Uh-oh… and you mean the reason for this is that both cat C and cat B have a performance ceiling (3.2 W/kg and 4.0 W/kg) that will weed out lighter riders trying to match the speed of heavier riders?”

Exactly!

“A-ha! Gotcha! But cat A doesn’t have a performance ceiling! So if their iTT winners are heavier than the losers too, then your argument implodes!”

Yes, that’s right. It would. We’d have to come up with some other explanation for the differences. Not that I can think of any. But let’s worry about that later. First let’s look at cat A the same way. If we see the same difference, then I’m in trouble. However, if we don’t see the same difference… then the W/kg cat system is in trouble. If I lose, I’ll go jump off a bridge. If the W/kg cat system loses then… it can go jump off a bridge.

Results for iTT’s in Cat A

Podium avg weight: 68.8 kg
Losers avg weight: 69.9 kg
Difference: -1.1 kg
Statistical significance: p=0.18

Conclusion: There is a small difference, but it is pointing in the other direction (better to be light) and it is quite possibly just random. We would get a difference like this almost every 1 in 5 samples from the ZP database. So we conclude that there is no difference in weights between podiums and losers in cat A iTT’s. There is no disadvantage to being light in cat A, where there is no W/kg ceiling stopping you from doing your best.

GET THIS:

There is no advantage to being light in regular Zwift racing, but there are clear disadvantages. Hence the net effect of being light is negative. Or to spell it out: It sucks to be light in Zwift. Heavies have the upper hand. Always, on any course.

The Light Rider’s Curse is a reality. Now where’s that bridge? I have a cat system to escort there.

And don’t you ever come to the forums and complain about being heavy again. You are wrong. Don’t spread misinformation. Either you are not able to perform at the performance ceiling in your category, and then it doesn’t matter if you are light or heavy. You will get dropped in a climb either way. Or you are a real contender and can touch the upper W/kg limit, and then you are not disadvantaged at all by being heavy. In fact, you have the upper hand.

The Takeaway

So what is your takeaway from this post? That you should go buy some cake, french fries and some jars of peanut butter and start gaining weight? No, it’s not weight per se that gives an advantage but the part of it that is muscle volume. And we are talking muscle volume in absolute terms, not relative terms where you factor in body fat.

If you are a little chubby with a nice dad bod but stay on top of your category in terms of W/kg, then you still more than likely have higher absolute muscle volume than a rider 30 kg lighter than you. This translates into higher Watts. And that still gives you an advantage, even if that lighter rider can produce the same W/kg as you.

Of course you only stand to gain from losing excess body fat. It will improve your W/kg if nothing else. Just watch it so you don’t get that dreaded upgrade. You can always do what I do. Cruise!

Tagged : /

Cruiser Sunday Studies – Part 3

We turn again to our investigations of ZwiftPower race data. In the second of the recent Cruiser Sunday posts I discussed briefly whether the spotted difference between cat A and cat C with regards to relative effort levels among top contenders was statistically significant. Now we will try to analyze race data properly, with a third approach.

An Explanatory Sidetrack

We will start with a little loop before we get back on track. Imagine you have kids and that you recently moved to a new area. There are two nearby schools to put your kids in and you have the choice between either and want to choose the one where the students have the highest grades. Is there a difference at all, and if there is, can we somehow determine whether that difference is not just random?

Or let’s make it really simple. You and a friend throw dice. You roll a die 100 times each. The objective is to score the highest total. If the dice are fair, then there should be no difference between your results, right? Or rather, there will be a difference but only a small one. Either of you had a streak of luck resulting in a slightly higher total. Do it all again and it might be reversed. 

But if it turns out your friend’s total is 516 and yours is only 321, is that just luck? Well, in theory it could be. It’s just not very likely that you will see such a large difference. He would have to have rolled a large number of 6’s to get to that total score. It could happen once in a blue moon, sure, but at the same time it wouldn’t be unreasonable to suspect a loaded die. Or?

A better approach here would be to not begin with trying to decide whether the difference is random or not, because right now we don’t know, but rather to start with determining how likely such an extreme random difference would be. Maybe the difference isn’t that big after all when it comes to probabilities?

Fortunately, there are ways to determine this likelihood for various scenarios. In the case of the schools or the dice you can use a fairly simple statistical test called the Mann-Whitney U-test. If the test score is high enough, it indicates that the probability that the differences in dice total is just random is very low. 

You typically set a limit beforehand as a decision rule. In smaller studies where the results aren’t life critical, a 5% limit, a so-called 5% confidence interval, is standard. So if we were to do the 100 dice rolls over and over and you would see differences of the magnitude of 516 vs 321 only in less than 5% of the trials, then we have decided that it is so unlikely that we are better off looking for other explanations than just chance. I.e. we would rather suspect that your friend is cheating.

We will use this same method when looking at the race results on ZwiftPower next.

Method

We will look at HR distributions graphs on Zwift.com among the top 3 in 100 consecutive races in the recent past, in both cat A and cat C.

If a rider spends the best part of his time in the race in a higher HR zone than the other two, visibly so, then that rider has worked harder. The HR graphs aren’t a perfect description of everyone’s fitness, especially when HR zones aren’t tuned to an individual, but on average they will be and we are looking at 300 riders in each category. It will likely average out.

If the winner of a race has worked harder than the rest of the podium, then we will score that race as 0, meaning nobody worked harder than him. If either of the other guys have worked harder than the winner, then we will score the race as 1, meaning one guy worked harder than the winner. If both of the other riders worked harder than the winner, then we will score the race as 2, meaning two others worked harder than the winner.

If there is no HR data available for someone on the podium, we will skip that rider and instead look at the next guy on the results list. It is not uncommon that HR data is missing and the typical reason is that the rider’s Zwift profile is set to private. So if the winner has no HR data, then we will compare the no 2 guy to the no 3 and no 4 guy instead. And if the no 3 guy has no HR data, we will compare the winner to the no 2 and no 4 guy instead. The reason we do this is that the display of all recent races on ZwiftPower is somewhat limited and we need to make sure we get a sample size big enough, 100 races. And it should really make no difference when it comes to our assumptions, or our hypothesis in this study. More about that below.

Once we have scored 100 races in cat A and cat C, we will then compare the results using the Mann-Whitney U-test. If there is a difference big enough to be statistically significant (remember the 5% rule here), then and only then will we draw uncomfortable conclusions.

Hypothesis

Assume we are with the ZP team and we LOVE the W/kg category system. We firmly believe it is fair and reasonable. Every sport should be categorized with W/kg, we think. There is no better option. We just need to get rid of those pesky sandbaggers first somehow…

Then what do we expect in a race with regards to relative effort levels among the top contenders? Perhaps there are two possibilities here. We could for example assume that the strength and prowess among the top contenders is roughly the same. So why does someone come out on top? Because he works harder than the others. All else equal, on average, someone working harder than the others will win. So we expect the winner to have worked the hardest (score 0).

Or we could assume that winning a race isn’t just about working hard, even if you are as fit as other top contenders. It is also about random events in the race, such as splits and breakaways and powerups and whatnot. Maybe those random events, a.k.a. luck, play such a large part in a race that we can’t separate the podium places with differences in effort levels. So instead we assume that the relative effort among the top 3 will be roughly the same. Obviously, the top 3 will be more fit and potentially also work harder than the ones coming in last in a big race, but among the top 3, we assume that the effort of each respective rider will be about the same, if not in every race then at least on average in 100 races. Thus what we will not see is a tendency for score 2 in a lot of races. Rather, races will converge around score 1. 

And what do we expect when comparing cat A with cat C? We expect to see no difference in relative efforts in the two categories. Cat A riders might be used to working harder but when comparing the top 3 in a cat A race, there should be no greater differences among them than among the top 3 in a cat C race. There may or may not be a difference in overall relative effort between cat A and cat C but there will not be a difference between riders in a category that is different from the other category.

Possibly, since we make no distinction between A and A+ riders, and since it is not uncommon that a cat A race is won by an A+, followed by two A riders, we might find a slight tendency for cat A winners to work a little less hard than the rest of the podium. We do not, however, expect to see this in cat C. Because cat C is fair and the W/kg system is appropriate in Zwift, or so we claim.

The “Oh Shit!” Scenario

Now, if we were to find that there is a tendency for cat C winners to work less hard than the rest of the podium, and that there is less of that tendency in cat A, then that would scare us. Because it is unintuitive. Why should races be won by people who work less hard than others, especially when there is an upper limit to performance (W/kg) in a category? We wouldn’t like that. It goes against the nature and ethics of the sport and would distance us from outdoor cycling too.

And it may also indicate that the phenomenon of cruising is a real issue in the lower categories, i.e. that some riders exploit the W/kg system on ZwiftPower by staying behind in a category they are too strong for, making sure they don’t go over W/kg limits, and thus get an unfair advantage in races over riders who couldn’t go over limits due to fitness and who would have to (and will) work extremely hard to finish anywhere near the top.

Results

100 races were sampled starting Fri 7 Aug 2020 and forward in cat A and cat C. According to the scoring method described above, cat A got a total score of 80 whereas cat C got a total score of 106. 

In 43 races in cat A, the top 1 guy worked harder than the following two. In 34 races in cat A, one following rider worked harder than the top 1 guy. In 23 races in cat A, both following riders worked harder than the top 1 guy.

In 29 races in cat C, the top 1 guy worked harder than the following two. In 36 races in cat C, one following rider worked harder than the top 1 guy. In 35 races in cat C, both following riders worked harder than the top 1 guy.

The Mann-Whitney U-test gives a test score of -2.15, which translates into a probability, a p-value, of 0.032 (3.2%) for a random occurence. This is lower than the 5% limit we set. There is indeed a difference between the categories and it goes in a direction we did not expect, that there would be no statistically significant difference between the two categories or that if there was, then it would lean in the other direction, towards a tendency for winners in cat A to work less hard compared to the other two on the podium than in cat C. Hence we have to draw the conclusion that we cannot refute the “Oh shit!” scenario.

Conclusions

The “Oh shit!” scenario is real. We do not live in the best of all Watopias. We live in a Watopia where it pays off to work hard in cat A but apparently not so much so in cat C. We live in a Watopia where the category system makes us behave weirdly in races in the lower categories B-D. We live in a Watopia where you can get away with cruising, even on ZwiftPower.

Now we have a choice. We can either accept that racing is inherently unfair in the lower categories and just live with it. Or we can, inspired by other working and efficient category systems in real-life sports, find a new category system that would prevent not only sandbagging but also weird discrepancies such as the one we just looked at, a system that would also unchain racers in all categories and prevent cruising.

Your choice. I have made up my mind already.

Tagged : /