Race Report: 15 SEP 2020, 3R Sundown Race Series, Watopia Figure 8, 1 Lap

With a shiny new license to cheat from ZwiftPower and with strong assurances from Zwift that the license is not going to get revoked, I quote, “anytime soon”, I am of course eager to get started. I have been waiting for this moment for some time.

As Fortune would have it, however, I ended up in an unusually tricky first race for a ZP approved cruiser. As I clicked Save after the race I had literally no idea of how I was doing. Well, you never do as a cruiser, or even as a legit racer in the lower cats, thanks to the Zwift and ZwiftPower race rules, but this was worse than usual. I could see four different scenarios before me. 

It might have been a DQ

It might have been gold

It might have been silver

It might have been sixth

With the downgrade to “D almost C” and with two weeks left before a nasty 2.64 W/kg in my 90 day average on ZP is washed away, I was scared I would go over limits slightly already in my first race, which I couldn’t afford just yet. Later, at my current 75 kg, I will be able to do 195W over 20 min and get away with it, but I had decided to stick with the straight 1-hr 2.5 W/kg of 187W as the upper limit for my average. I did monitor my average the whole time of course. But you know, when you do the crime, thoughts on having to do the time do pop into your mind. Doing another 90 days behind the bars in cat C didn’t seem all that appealing now that I was finally free. It shouldn’t have been a DQ. I had watched my Wahoo app power meter closely. But what if still somehow…?

Best case scenario was gold or silver. It all depended, firstly, on the guys in front of me. Some of them seemed to be sandbagging. Some of them were no sandbaggers but had very high power variability and thus looked like the quintessential cruisers, possibly on the cusp of getting DQ’d, but I wasn’t sure at all. I decided to drop from them after a while as I was pressed on time and had less than two minutes to get my average down 5W to be on the safe side in the first 20 min block. 

And then, secondly, there was the guy I spent the most time with in the race. He looked potentially legit but wasn’t house broken and just wouldn’t let me exploit him. Since he wasn’t playing ball (cruiser ball) and since he was obviously heavier than me, it was too dangerous to keep up with him so I ended up letting him go eventually. If he turned out to be Mr Gold later on ZP, then I’d happily settle for a silver and call it a day.

But it could also have been sixth. I don’t know why that number popped up in my head. But it represented a scenario where I was way behind in the field. It was a possibility too.

It turned out I actually got a sixth!

I should perhaps point out that I am not disappointed at all. I’m happy to have gotten away without a DQ in my first ZP approved cruiser race and have made a mental note to research my races a little more next time. But I would like to discuss the race anyway as it touches on some interesting points I have been discussing on the blog recently. 

So why the meager result? It turns out there are explanations and that this race was actually unbeatable for a cruiser. Let’s look at the race report.

First thing to note is that – surprise, surprise – twice as many riders were sandbaggers and/or unregistered as were cleared by ZP. Seven riders got removed from the ZP results ahead of me, all above 2.5 W/kg. All these people make it harder for legit riders and particularly for cruisers like me to read the race.

The next thing to note, which isn’t visible from the edited report, is that there is a team in the race. Although they get split up (in front of me and behind me) it probably had some kind of impact on the race, e.g. that the pace at start was exceptionally slow and only picked up in the first climb. Nothing wrong with running a team in the race, but it should be noted there was one.

So let’s go over the podium. In 1st and 2nd place are two children. Neither of them cross 200W in their averages and so they get away with W/kg that would make them competitive in cat C and even cat B. They will never be upgraded by ZP since they technically never go over limits and can go on and win races in cat D indefinitely. You can’t win against them as a cruiser, it’s impossible. Even if not holding back (and thus getting DQ’d), I just might have had a chance against young Mr Silver last summer but certainly not today. And young Mr Gold I couldn’t beat even in my wildest long-term goal fantasies.

Then we have Mr Bronze. He actually goes with Mr 4th as they are part of a team, although they are not obviously “colluding”, which again would have been perfectly fine, at least they weren’t toward the finish. It must have taken more than a sprint for Mr Bronze to gain 15 sec on Mr 4th. He dropped his buddy.

However, look at the weights of those two guys. They are true heavy-weighters. No wonder I had a hard time getting my average W down the first 20 min! They are doing 66W more than me! If you don’t understand why this is of huge importance, then you need to read my recent post on the Light Rider’s Curse. Someone like me can’t beat these guys without going over limits. Unknown to me at race start, as I didn’t research the race at all beforehand, the podium was entirely out of reach already from the beginning, even if I hadn’t been cruising and went over limit. Huge weight differences (both up and down) and an idiotic race cat system are the reasons.

Were the heavies cruising? Both have unusually many wins in their race history but I think those can mainly be attributed to the weight advantage, at least in the case of Mr 4th. While not unusual at all, Mr Bronze clearly had spare capacity in this race, and without having had a look at his previous race efforts, I would assume that he could advance to cat C quickly if he committed to it. Like in three races or so. Mr 4th’s race effort is higher. In fact, it is similar to mine as mine was driven up more than expected by this rather strange race. But of course, seeing as I know I had spare capacity too, although not at all as much as Mr Bronze, I believe Mr 4th has it in him to advance to cat C too, should he want to.

Then we have Mr 5th, the guy who wouldn’t let me suck his wheel (and didn’t even seem interested in taking turns with little me!). Since it was just the two of us completely isolated for quite some time, during which I couldn’t draft and get my average down more efficiently, I had ample time to study him. It was clear that he was heavier than me since i) he blasted past me from behind on descents making him hard to lose on this somewhat hilly course, and since ii) my W/kg average got dangerously high trying to catch him. I never quite realized at the time that the weight difference was that large though. A tough adversary for a fairly light cruiser, even under different circumstances, since he was not quite racing flat out (he was definitely not on a recovery ride though, don’t get me wrong). Another day in another race I would have settled for silver behind him. The heavies are a real menace to us not-so-honest not-so-hardworking cruisers…

Tagged :

W/kg Cats Fail 2: The Light Rider’s Curse

The Heavy Rider’s Disadvantage

It could be debated if you can call it unfair, but heavy riders do suffer a disadvantage in Zwift races. The lighter riders have an easier time uphill and it is so hard to match the Watts needed to keep level with the lighter rider’s W/kg there.

The above is a very common complaint on various Zwift forums. But is it true?

I like to question those self-evident truths we all take for granted. If they are indeed truths then there is no harm in validating them. Sometimes, however, they turn out not to be true after all, once you actually take a serious look at them. So what about rider weight and race results in Zwift? Let’s take one of those serious looks for a change instead of just passing on what some other guy said in a one-liner on the forum.

The Light Rider’s Advantage

Without any prior knowledge about the impact of weight in Zwift racing we could assume three things:

1. There could be advantages to being light

2. There could also be disadvantages to being light

3. If there are both advantages and disadvantages to being light, perhaps depending on scenarios, then you could compare those advantages and disadvantages, weigh them against each other, and come to some kind of conclusion regarding the net effect of being light – is it more good than bad to be light, or is it the other way around?

So let’s start by looking at possible advantages to being light, since people say there are such advantages. There are no obvious advantages on the flat, and everyone seems to agree (we will get into details on this further down). What heavier riders say instead is that they have a hard time against light riders in climbs. 

On the flat speed is mainly maintained by momentum, so pure Watts is king and heavier riders can usually (although not necessarily) push higher Watts than a lighter rider with a smaller frame and less muscle volume. But in a climb W/kg is king. Body weight comes into play, and maybe it is easier for a light rider to attain a better ratio between Watts and body weight than it is for a heavier rider, especially a heavier rider with a few surplus kilos.

The above is reasoning taken from riding outdoors and in a different setting than Zwift racing with its unique and uniquely stupid rules. But it is actually a completely flawed argument and you need to fully understand why. The explanation is two-pronged. We start off with some physics. 

Question: A rider at 90 kg is time trialing against a rider at 70 kg up the Alpe du Zwift climb. Both are keeping the exact same lines and both are able to keep dead steady, ERG-like Watts. Both are doing exactly 3.19 W/kg. Who will win?

Answer: The lighter rider will win. By a few seconds. But it has nothing to do with Watts or weight. The lighter rider will win because he has an ever so slight advantage in drag, having a smaller frontal area. 

It’s similar to choosing between bikes in your garage before a Zwift climb. One frame will be ever so slightly faster than the other. However, when was the last time you saw a race up AdZ and only that? There are no such races. The closest you can get to that scenario in a Zwift race is a race on Road to Sky, a route which has quite the approach to the mountain, and the approach is flattish. So if we staged an iTT on the Road to Sky course, then this advantage in drag for the light rider, a mere seconds, is more than offset by the heavier rider’s advantage on the flattish approach to the mountain. On Road to Sky, or even Ven-Top with its very short approach, the heavier rider will win!

Also, what you need to understand is that if it wasn’t for the small difference in drag between the two riders, if they both raced in vacuum, then if both started at the same time at the foot of the climb, both riders would arrive at the finish exactly simultaneously. Because if we ignore the drag issue, then 3.19 W/kg is 3.19 W/kg. It doesn’t matter what you weigh. You will travel up the mountain at exactly the same speed. That’s what the measure W/kg implies, it’s its purpose, to equalize riders to make a comparison possible.

A heavier rider could in theory have a hard time producing high enough Watts to be able to match the W/kg of a lighter rider in a climb. But in our example we assumed that both climbed at exactly 3.19 W/kg, so the heavier rider already compensated his higher weight with higher Watts. And thus they are both traveling at the exact same speed up the mountain.

Now here comes the second prong of the argument. Put the above in relation to the W/kg cat system, with the performance ceilings in cat B-D. To be competitive in any cat B-D, you typically need to be able to put out W/kg at or very close to the performance ceiling, be it 2.5 W/kg, 3.2 W/kg or 4.0 W/kg. So to win a race on any course in, say, cat C, you need to be able to hold 3.19 W/kg, or someone else could come and do the 3.19 W/kg and beat you (there’s plenty of such riders). Agreed?

So to win a race up AdZ you thus need to be able to hold 3.19 W/kg. Assume you are contender, someone who could actually win in cat C. Then you will be able to race Road to Sky at 3.19 W/kg. If you are indeed one of those riders who could, then as we just concluded your weight doesn’t matter at all. And we already know that there are heavy riders who can do 3.19 W/kg up AdZ, and there are light riders who can do the same. Both kinds race up the climb at almost the exact same speed, bar the minuscule difference in drag. In fact, given that you are a contender, you advantaged being heavy on Road to Sky since you will be naturally faster in the approach and might thus either get a head start or save some energy before the climb.

GET THIS:

There is no advantage to being light in Zwift racing!

And this is because of the W/kg cat system. Without it things would be different. With a results-based categorization, a race on Road to Sky would favor lighter riders, whereas the heavies would still reign on Tempus Fugit. You would have to specialize and play to your unique advantages, just like in real cycling.

So if there are no advantages to being light in Zwift, could there still be disadvantages?

The light rider’s disadvantages

This post is named the Light Rider’s Curse, which refers to a tendency in Zwift racing. Many light riders have first-hand experience of improving fitness to the point where they reach the top of their current race category. Or rather what should have been the top of the race category. Only it isn’t.

You would think that being able to average e.g. 2.5 W/kg in cat D would make you competitive there. But that is not necessarily the case. First, you have to beat the cruisers. But even if we take the cruisers out of the picture, it can still be surprisingly hard for a light rider to get anywhere near a podium in the average Zwift race.

So they do what anyone would do in that situation. They try to improve fitness further still. Shouldn’t that help getting to a podium then? No, that’s just that final push that tips them over to the bottom of cat C. They got upgraded before they even saw a podium. 

Why is this? Is this real or just some bad excuse from failed light racers? It all seems so counter-intuitive. As a light rider you should have an advantage against the heavies in the hills, said a guy in a one-liner on the forum. And being able to do 2.5 W/kg you should have no problem getting a decent shot at the podium, right? So why don’t you win?

It’s because of this:

Someone doing 300W on the flat is going faster than someone doing 275W.

Yeah, of course he is! So what?

Well, what if it’s a semi-flat cat C race and the guy doing 300W weighs 94 kg? That’s 3.19 W/kg, within ZP’s cat limits. And what if the guy doing 275W weighs 77 kg? That’s 3.57 W/kg, way over limit. See the problem?

The heavy guy wins the race and the light guy, being slower, isn’t anywhere near a podium but is still a disgusting sandbagger who deserves a DQ. But this never happens in real-world cycling, only in Zwift. And it’s because of the W/kg cat system that no other sport uses. 

Specifically, it’s because of the W/kg ceiling of the lower cats in combination with ZP disqualifying racers afterwards, racers that they themselves allowed into the race. But you can’t have a performance ceiling in sports. And you should never have to disqualify a contestant for being “too good” in sports.

Most races consist of mainly flattish stretches and then some shorter climbs. At the W/kg ceiling of a cat, i.e. in that front group with the riders that actually have a chance to win the race, a light rider can in theory never match the speed of a heavier rider without going over limits and getting a DQ or even an upgrade, not unless the heavier rider is a cruiser. It’s simple maths.

If it’s simple maths in theory, then it should show in data too. So does it? Let’s find out!

Weight Study 1 – A Mix of Races

I grabbed some fresh data from ZwiftPower, a sample of 50 consecutive cat C races of all sorts (distances, elevation, etc). I only skipped races where

i) weight data was missing

ii) there were fewer than 6 cat C finishers according to ZP

iii) the race type didn’t lend itself to this test (like e.g. Hare & Hounds, age category or TTT races).

Then I compared the average weight of the 3 riders on the podium to the average weight of the other riders in the race (hence why I wanted at least 6 finishers).

Results

The podiums in the races had an average weight of 81.3 kg.
The remaining riders in the races had an average weight of 77.5 kg.

This nearly 4 kg difference between the average podium winner and the average loser turns out to be highly statistically significant, even at the 1% level (p = 0.00118). For those who aren’t into statistics, this means that it is extremely unlikely that this difference wouldn’t appear again and again if we picked some other random set of 50 races from the ZP database. And thus we can’t refute that there is indeed a difference in average weight between winners and losers. Winners are somewhat heavier on average. It is not bad to be heavy in Zwift racers, quite the opposite. It is bad to be light in Zwift races. The results prove it.

The W/kg cat system screws light riders. I will give a more detailed example than the the simple theoretical one above. Let’s work through this.

Assume the following:

-You are racing in the front group in cat C (for some reason there are no sandbaggers this time…)
-The group keeps a steady pace and you are at least 20 min from finish
-You weigh 75 kg
-You are on the wheel of a bigger guy @ 85 kg
-You are both in draft
-The big guy is able to hold a 20 min average of 286W, i.e. 3.2 W/kg according to ZP (286 x 0.95 = 272. 272/85 = 3.2)

The only way you can stay on his wheel is by matching his 286W. This would put you at (286 x 0.95)/75 = 3.6 W/kg. Keep at it for 20 min (if you can) and ZP will give you a DQ. People might even call you a sandbagger! You simply can’t win this race as a light rider and get away with it on ZP. It’s not just hard. It’s impossible.

Guys weighing 75 kg with a 1 hr FTP of 272W according to ZP will already have been upgraded to cat B. They will have seen very few podiums back in cat C if they were up against heavier riders. Which they were. And data supports our simple maths theory and the existence of a Light Rider’s Curse.

The Objection

But wait a minute! “Assume you are both in draft…” Granted, draft in Zwift doesn’t give quite as much help as outdoors but it is certainly a factor. What if these heavier winners are just better at drafting? It seems unlikely. Why wouldn’t drafting skills be evenly spread out over riders of all weights and sizes? But it’s a good idea to eliminate draft when you are doing a study like this. So how could we eliminate it? By studying only individual time trials instead. On a TT bike you can’t draft.

Weight Study 2 – Only TT Races, No Draft

So instead I scraped 40 consecutive iTT races in cat C from ZP. What were the average weights for the podium vs the rest of the field? Was there a difference? And was it statistically significant (i.e. not random)?

Results for iTT’s in Cat C

Podium avg weight: 83.9 kg
Losers avg weight: 78.1 kg
Difference: 5.8 kg
Statistical significance: p=0.00004 (probability of a random sample/event resulting in such a difference)

Conclusion: The difference is not random. In fact, a pharma company doing a study on a new promising medication would do wheelies and open up the champagne if getting results of this magnitude. So heavier riders do have an advantage in cat C, even in iTT’s where there is no draft.

“Ok, but maybe this is exclusive to cat C. I don’t care about the fat noobs in cat C anyway. I race in B.”

So let’s look at cat B too.

Results for iTT’s in Cat B

Podium avg weight: 77.7 kg
Losers avg weight: 73.0 kg
Difference: 4.7 kg
Statistical significance: p=0.00007

Conclusion: The difference is not random. We can see that people weigh less in cat B, just as I predicted in and older blog post, but there is still a clear advantage for the relatively heavier rider, even without draft.

“Uh-oh… and you mean the reason for this is that both cat C and cat B have a performance ceiling (3.2 W/kg and 4.0 W/kg) that will weed out lighter riders trying to match the speed of heavier riders?”

Exactly!

“A-ha! Gotcha! But cat A doesn’t have a performance ceiling! So if their iTT winners are heavier than the losers too, then your argument implodes!”

Yes, that’s right. It would. We’d have to come up with some other explanation for the differences. Not that I can think of any. But let’s worry about that later. First let’s look at cat A the same way. If we see the same difference, then I’m in trouble. However, if we don’t see the same difference… then the W/kg cat system is in trouble. If I lose, I’ll go jump off a bridge. If the W/kg cat system loses then… it can go jump off a bridge.

Results for iTT’s in Cat A

Podium avg weight: 68.8 kg
Losers avg weight: 69.9 kg
Difference: -1.1 kg
Statistical significance: p=0.18

Conclusion: There is a small difference, but it is pointing in the other direction (better to be light) and it is quite possibly just random. We would get a difference like this almost every 1 in 5 samples from the ZP database. So we conclude that there is no difference in weights between podiums and losers in cat A iTT’s. There is no disadvantage to being light in cat A, where there is no W/kg ceiling stopping you from doing your best.

GET THIS:

There is no advantage to being light in regular Zwift racing, but there are clear disadvantages. Hence the net effect of being light is negative. Or to spell it out: It sucks to be light in Zwift. Heavies have the upper hand. Always, on any course.

The Light Rider’s Curse is a reality. Now where’s that bridge? I have a cat system to escort there.

And don’t you ever come to the forums and complain about being heavy again. You are wrong. Don’t spread misinformation. Either you are not able to perform at the performance ceiling in your category, and then it doesn’t matter if you are light or heavy. You will get dropped in a climb either way. Or you are a real contender and can touch the upper W/kg limit, and then you are not disadvantaged at all by being heavy. In fact, you have the upper hand.

The Takeaway

So what is your takeaway from this post? That you should go buy some cake, french fries and some jars of peanut butter and start gaining weight? No, it’s not weight per se that gives an advantage but the part of it that is muscle volume. And we are talking muscle volume in absolute terms, not relative terms where you factor in body fat.

If you are a little chubby with a nice dad bod but stay on top of your category in terms of W/kg, then you still more than likely have higher absolute muscle volume than a rider 30 kg lighter than you. This translates into higher Watts. And that still gives you an advantage, even if that lighter rider can produce the same W/kg as you.

Of course you only stand to gain from losing excess body fat. It will improve your W/kg if nothing else. Just watch it so you don’t get that dreaded upgrade. You can always do what I do. Cruise!

Tagged : /

W/kg Cats Fail 1: The Sprint Race Catapult

The Zwift W/kg category system needs to go. We have talked about it before. A few times. But it is important to understand that the reason why the W/kg cat system is so terribad is not just that it allows for and incites cheating in the forms of sandbagging and cruising. It also does a lot of other stupid things to Zwift racing.

In this and the next blog post we are going to discuss two of those things. I have dubbed them the Sprint Race Catapult and the Light Rider’s Curse. First up is the Sprint Race Catapult. No, it’s not an instruction on how to get yourself catapulted over the finish line in race sprints. It’s a complaint over how sprint races tend to catapult you into a category where you don’t really belong because of how the W/kg cat system works.

A few posts ago I discussed the power curve. Let’s go over it quickly again. A power curve looks something like this.

No two riders’ power curves are exactly the same but all power curves are more or less the same in that it is always roughly the same downward slope with roughly the same shape. 

Your power curve is a continuous mapping of what kind of Watts you can produce over different time frames. You can only keep really high Watts over a sprint for a few seconds, Watts that you couldn’t possibly keep up for 20 min. And your 20 min performance won’t last you a full hour but the difference is not that big. In fact, Zwift reckons you could do 95% of your 20 min power over a full hour, and that is how it arrives at your 1 hr FTP from a 20 min test.

If we wanted to, knowing your weight, we could also plot a corresponding curve for your W/kg over a time scale. Most riders in cat D can actually race with cat A and keep similar W/kg. For a minute or so… But the longer the effort, the more the average W/kg is going to drop. And here is yet another example of why the W/kg cat system fails.

Most races in Zwift are roughly 20-30 km. Some go above 40 km. Longer races than that are rare. Then there are also races shorter than 20 km. The crits in the lower categories tend to be in “the teens” length wise. There are also sprint races that go on for less than 10 km.

Racing a sprint race is very different from a standard 30’ish km race. In real life endurance sports, with cat systems that don’t suck, races over different distances are treated differently. In US road racing and MTB you get upgraded from your cat by racing actively and by collecting race points over a season. Winning a short race does not award you as many points as a longer race. It’s not that the shorter race is easier to win, it’s just a different beast, but the categories would get screwed up if the system didn’t take race distance into account somehow. 

As a different approach, in cross-country skiing your rank upgrades from a race is dependent on the time gap between you and the winner. At one point in the calculation your rank gets multiplied by that time gap seen as a percentage of the total race time. So if your finish time is 1 min slower than the winner in a 10 min sprint race, then your finish time is actually 110% of the winner’s and your rank gets multiplied by that number (you want a low rank score in skiing). But if it’s a 1 hr race and you are 1 min slower than the winner, then that extra minute is just a +1/60th of the winner’s finish time, so your rank is multiplied bya mere 1.017, which is worse for your rank than winning the race but still far better than losing by a minute in a shorter race. And in that sense race distance is taken into account also in skiing, by finish time differences as a proxy.

The Zwift W/kg cats don’t take race distance into account at all. A sprint race is valued and treated the same as a 30 km race. And the performance ceiling for each cat (2.5 W/kg, 3.2 W/kg and so on) is the same regardless of race length.

You might already have first-hand experience of this obvious flaw, but if not then let’s imagine you have been racing mainly 30 km races and not only the flat ones. Your typical finish times will depend on your fitness and what category you are in, but if you are racing in a low cat it might be something like 40-50 min.

Now remember your power curve. If you are on top of your category, i.e. your W/kg is close to the ceiling in your cat, then your average race Watts in a 30 km race will be fairly close to your 1 hr FTP, because that far to the right in the power curve diagram your power curve doesn’t drop that fast anymore.

So you’re fairly comfortable at the top of your cat for the time being. You’re not cruising (let’s assume you aren’t). And then one night you get the stupid impulse to join a sprint race. Now, since you aren’t cruising you are not guarding your Watts. No, you do your best instead trying to beat the three cruisers in the race once the five sandbaggers are just specks on the Watopia horizon.

But remember the power curve. You can do a much higher sprint race W/kg effort than you can hold over 30 km. Let’s say you race in cat C. And so you go over limits. Oops! You get a DQ. If your last 30 km race was a 3.2 W/kg, then all it might take is one more sprint race over limits and your 90 day average is above 3.2 + 0.1 W/kg and ZP boots you to the next category. 

Well, isn’t that fair? Isn’t that working as intended? With the logics of ZP it is. If you are a sprint racer. But note here that your power curve has not changed one bit. You have not become stronger. You have just moved between different parts of the power curve in choosing races of different lengths. You are still a 3.2 W/kg racer in a 30 km race. So in effect, if your preference is to race primarily 30 km races, then you get booted to cat B by ZP while still being below the W/kg span of cat B. You get branded a cat B while still being a cat C. And a poor rider in the bottom half of cat B might not be able to compete with top cat C riders in a longer race even!

Lesson learned: Beware of the Sprint Race Catapult! And be wary of how race distance might affect your categorization in general.

The W/kg cat system is just all too stupid. There would be nothing for you to miss if Zwift came to their senses and replaced it. Absolutely nothing. Except perhaps that cozy feeling of familiarity. 

But what do I know? Maybe you would be too insecure without that cozy feeling. So perhaps you shouldn’t buy a new bike either. A new bike might not be as familiar to you as your old one on the first few rides. It might feel… different somehow… like… better. Scary! Nah, stick to your old bike and keep hugging your old system. We all need our security blankets now that we are grownups and mommy is not around anymore, isn’t that so? We might wet our bibs otherwise.

Tagged :

Why You Should Cheat Too

Sick again. Can’t train, can’t cheat. And it makes no difference whatsoever. In my absence thousands of others are keeping up the struggle, cheating indiscriminately in race categories B through D, whatever their underlying intent is (it absolutely doesn’t matter anyway). And you should be one of them. 

This is a call to arms. I just have to sell you the story first…

Why on earth would you want to be a superhero like me, gloriously crushing cat D or whatever helpless category you feel confident that you can hammer to smithereens? Wouldn’t you rather be the villain if you did? I insist, you wouldn’t. 

Would you agree that being right and being legal isn’t necessarily the same thing? You don’t have to make up your mind. Because here you have the one-time opportunity to be both right and technically legal at the same time, while still fighting a villainous system by its own means. Remember for example that time when Batman…

No, seriously, if you have come to agree with me at this point that some form of results-based categorization in Zwift is mandatory to make racing more fair and fun and more akin to any other sport in the history of mankind, then you will probably also feel that such a change couldn’t come too soon. Why wait?

Zwift are well aware of the issues. They aren’t stupid, although I do think they might weigh in counter-arguments regarding overall subscriber happiness that may border on stupidity, arguments that create hesitation. But awareness is not the problem. The problem is that nothing happens. 

So we could use a little urgency. And that’s where you come into the picture.

Making the cheating in Zwift an even more pressing matter should speed up the process, one would think. And if that doesn’t help, then nothing else will and we might as well give up on racing altogether. So what have you got to lose? Most races you sign up for will be ripe with cheaters anyway. You cannot get fair racing in Zwift as of today, so you are not giving something up for yourself by passing up on the opportunity to race fair (in vain). And you are also not really ruining anything for other subscribers either, because it was already ruined before you lined up at the start. The races are already messed up by sandbaggers and cruisers daily as is and the races don’t flow the way they should.

The last resort now – since nothing happens – is to reach some sort of critical mass. You and I know what is going on beneath the surface not just in Zwift but also on ZwiftPower. The racing isn’t fair and just. But there is still quite a few subscribers who haven’t fully realized it yet. They’re sniffing the stench but they can’t yet tell where it is coming from. Well, the sandbaggers they get. But not the cruising. And they still believe the W/kg system can be saved, e.g. by introducing even more oppressing mechanisms to racing, driving Zwift even further away from any other sport or from any ambition to establish Zwift as a credible e-sports platform to sell to the rest of the world. 

These people will come to their senses sooner or later, one way or the other. I’m not worried. And no one will ever regret a move to a results-based system. No one is ever going to say “it was better before”, especially not these people who typically prefer to have the opinions every one else seems to be having. The day Zwift says “here’s another system for you, it’s better” they will agree and never look back. Trust me.

So how to reach critical mass? The answer is massive cheating. Make the system implode. Make racing obviously pointless. It already is pointless, it is just not that ridiculously overly obvious yet as to make even the slow react, not when it comes to cruising.

So imagine for a second that you would, how exactly should you help yourself and others by contributing to critical mass? To speed up change?

No Weight Cheats

Don’t cheat with weight. There is massive weight cheating. I have stated before that some other types of cheating are the most common, but it’s likely not true. Weight cheating should be the most common form by far.

There is the blatant overnight super diet of course, people suddenly dropping double digit kg’s. They are easy to spot. But there is also the semi-conscious cheating, the constant slight underestimation of weight that even the “fair” racers contribute to. You know, the happy weights. The dating site weights. The “I better not get on the scales after this barbecue party, I might not like what I find” weights. The “Oops, I gained 4 kg over Christmas but umm… I’ll lose it soon anyway, what difference does it make?” weights. 

You may want to round down rather than up, because everyone else will. But you should be fair. Please don’t cheat with weight, because there is no realistic way to enforce accurate weight reporting around the corner. We need your weight for the simulation and we are not there yet that Zwift can demand of every racer that they buy Bluetooth scales that upload the weight online (there are such scales). The day may come – it would be a good thing – but it is too early for the bulk of the subscribers. So we are forced to rely on ourselves rather than a system to keep racing fair. It’s a weak solution but it’s all we have.

No Height Cheats 

The argument is the same as for weight cheating. There is no way to enforce accurate height reporting for the bulk of the subscribers. And to solve it with periferal hardware is even harder than with weights and scales. There isn’t even any such hardware to my knowledge. And Zwift couldn’t really demand that all subscribers submit healthcare certificates testifying height, nor handle the administration. Long-term I’m sure there could be a solution, but again, we are not there yet. So just don’t.

No Hacking

Don’t tamper with hardware or software. It’s hard to enforce. It can also be hard to spot and you will want to be visible cheating. Keep your smart-trainer calibrated too. You need to stay fit through all of this and would only do yourself a disservice by having it “out of tune”.

No Doping

This goes without saying.

Sandbagging?

Sandbagging does help bringing racing closer to the inevitable collapse (that we might as well speed up), and you do stay within the confines of Zwift race rules when sandbagging. But you break ZwiftPower rules. And you also tend to draw attention away from the real problem with Zwift when sandbagging.

You don’t stop sandbagging by convincing subscribers to stop doing it, because it won’t succeed. You stop sandbagging by making it impossible, i.e. by enforced categories, something we still don’t have. You are highly visible when sandbagging, which is a good thing in this context, but there is the risk that people will think that the only problem with Zwift racing is you or the lack of enforced categories. Once they have that (and you disappear as a consequence) you might lose their attention. A results-based system needs enforced categories too, but enforced W/kg categories, while better than what we have, is not enough. So in a nutshell, you could sandbag. But there is a better option…

Cruise!

You should cruise. With ever more abundant cruising in our Zwift races, and with increased awareness over time among the subscribers, nothing drags out the flaws of the W/kg cat system into the light better than cruising. You just need critical mass. And critical mass needs you.

By cruising you also adhere to ZwiftPower rules. Technically speaking, you are doing nothing wrong. It is definitely cheating but it can’t really be held against you. And you will slip through. With sandbagging it is so easy to blame the offender rather than the system, which would be infinitely more constructive. With cruising it is so obvious that the problem isn’t you and that the proper remedy is not to change you, the subscriber, but to change the system. 

The W/kg system will have to fall. Will fall. And the sooner that happens, the better. And you can help speed up that process by cruising, ideally with full disclosure. People will hate you for it now. But they will thank you later. 

Well, they won’t actually thank you. They will pretend that they always thought Zwift should have moved to a results-based categorization and ignore you afterwards since you will be a painful reminder of their previous blindness. But they will be grateful for a better system. So you are helping them, and you are actively helping yourself. 

Trying to race fair today, while sandbaggers and cruisers are pounding you into the pixel tarmac over and over, is not helping yourself. It’s not really helping anyone as things stand, even though it may seem that way on the surface since you want to be one of the good guys. 

Unfortunately, the truly good guys are going to have to break a few eggs now. And unfortunately, by racing fair today you are not really defending moral standards at all. You are effectively just defending the cheaters by marginalizing them. As long as you can still sandbag and cruise within the system at all, and as long as the cheating can still with some effort be perceived as just a margin phenomenon, fighting the cheaters will only slow down change. 

Once the wider implications of the W/kg become apparent even to the ignorant (they are being cheated on too, you know), it will also become apparent that people are not the problem here. The villain is the W/kg system. And it seems it’s going to take quite a few superheroes to throw it into Arkham Asylum where it always belonged. So strap on the spandex and get cracking!

If nothing else, you should at least try it out, cruising once or twice. You don’t have to make a bigger commitment than that right now. But you really should try that at least. Because things will drop into place for you if you do, take it from me. Only then will you truly see the W/kg system for what it really is. On an intellectual level, you might already agree with one or a few points in my previous posts. But the big and shocking revelation only comes when you actually see the things I see with your own eyes. Brace yourself, you’ll need it.

Oh, and should you feel lost, then you might be helped by a Cruising School coming up shortly on the blog. That should sort you out and get you started.

Tagged :

Borg Charting a Cheater

In the wake of my previous studies, proving that winners in cat B-D make a lesser effort than the rest of the podium, as opposed to cat A where winners make a harder effort, a question kept resurfacing in the discussions on the Zwift forum: Is it really reasonable to assume that you can detect cheating (cruising) from just looking at a HR distribution chart?

Coming from the outside it may indeed seem like a fair question. I would, however, like to argue that it is not, that you are missing the point. The point is that cruising is the HR distribution graph. You can’t really detect it any other way, not even in theory. In fact, you can’t really define it any other way. I will try to explain. But first one of those mandatory detours that come with this blog.

I thought we would start off with discussing dead celebrities. Let’s leave the boring Club 27 out of the picture for a change. But do you know who Borg was?

No no, not that Borg. I am referring to Gunnar Borg, PhD MD and former Swedish professor in psychology. 

I saw him in person a few times while he was still active since he was working at the same campus I was studying at for some years. He and his colleagues used to hang by themselves in this creepy brick building that looked more like a crematory than an academic faculty. Psychophysics. Supposedly, the house made for a good lab environment, whether they actually incinerated failed students in there or not. We weren’t sure.

Anyway, Borg, who died early this year (from old age, I would presume, after a long and productive life) is a world celebrity in our game. No, he was not a cyclist, but he was and remains the go-to guy when you need to put a measure on your physical efforts but lack data on Watt, heart rate, max heart rate, lactate levels, etc. Or when you want to match physiological measures to a person’s perceptions of what is going on in his body, regardless of whether this person is an elite athlete or someone with a possible heart condition visiting a hospital lab. 

Borg is famous for the so-called Borg Chart, widely spread in both sports physiology and medicine. You have surely seen it before. If not in this exact form then at least its elements will be familiar to you.

Along with the Borg Chart there is the Borg Scale in which you estimate your physical exertion from 0 to 20, where 20 would be the point of failure e.g. at the end of a ramp test, one where you don’t hold back. The rest should be familiar too. If you look higher up in the chart above you can find the “can talk“, a familiar cue from your recovery or fat burning rides, and so on. Yes, there is a corresponding scale in Strava that you can use when you don’t have a power meter or a heart rate monitor. And it all started with Borg.

On the right you can see the rough percentage of your maximum heart rate that each level of exertion corresponds to. Even though how your working heart maps to your perceived effort can vary a little from individual to individual, there is still a pretty hard correlation between the two. For example, it is very hard to talk at VO2Max (above 90% max HR) for anyone, and it is not something you can get used to or learn. It is just the way our bodies work. Nor can you go beyond 20. There is no “you can always dig deeper, what doesn’t kill you…” when you are at a perceived 20. Max is max, and your legs just stop working.

Obviously, the Borg Chart is relevant when we once more turn to cruising.

I thought I would show you some examples of HR distribution graphs from Zwift again. The other day I posted a race report. The effort in this race can be summed up as follows:

The green part is the spindown and can be ignored. But look at the rest. Was I cruising this time or not? Couldn’t this be a fairly normal, legit race?

We need a point of reference, something to compare with. Here is another race from last year when I was more fit but also had a max HR that seemed to be a couple of beats lower than today. It’s a 3.2 W/kg effort that still left me well outside the podium in cat C on ZP:

Do you notice any difference between the two graphs? 

Returning to Borg, what was the perceived effort in those two races? Let’s start with the second graph. A large part of it was spent above 160 BPM, as you can see. In my case, with a max HR of 173 at the time, this meant 92% of max HR. If you refer to the Borg Chart above this should mean that I perceived a large part of the race as “Very Hard” or worse.

Did I? It checks out. I can attest to that. Or to put the perceived effort in my own words: It was something of a OH-GOD-PLEASE-MAKE-IT-STOP-I-CAN’T-TAKE-IT-ANYMORE-I-WILL-SELL-MY-BIKE-TOMORROW kind of effort (and the day after you are none the wiser).

So what about the first graph? First, I was actively cruising. I had signed up for a D race. I am not as fit today as in the other race, which should have pushed my bars in the graph to the right compared to if I had cruised this race a few days after the first one last year. And this push to the right would also translate into a somewhat higher perceived effort. Even so my perceived effort of the cruiser race was that it was quite easy.

Let’s repeat this AND look closely at the first graph again:

  1. I signed up to a lower category 
  2. I consciously cruised 
  3. It felt easy

Now let’s look at another rider in a race that I participated in a few days ago. The winner in cat C, according to ZP, looked like this:

It should be noted that this rider is very young, a teenager, so he should normally have a max HR in the 200’s. He has won about half his 30-some races on ZP [sic!]. In this particular race he was followed by a podium that looked like the second of my graphs, the “Very Hard” effort according to the Borg Chart.

You are the jury here. What is the verdict? Make ample use of the Borg Chart if in doubt. Did he cruise? Or does he just have a serious heart condition capping his HR, a condition that somehow still lets him win half his races? (I bet you can beat his win-% easily.) Or was there perhaps just a glitch? Maybe Martians sent some rays that affected the graph? Or maybe he has Martian DNA himself and that this is what a typical low cat winner’s HR graph looks like on Mars?

You are the jury here. What is the verdict? Is it at all possible to separate at least some cruisers from legit racers by merely looking at HR distribution graphs?

You are the jury here. What is the verdict? Refer to the Borg Chart again. Is it reasonable that someone can win half his races while talking to a friend without too much difficulty (70% HR), while other contenders can hardly breathe (90% HR) and all of them, winner included, are at or close to the performance ceiling in the category and would get a DQ if they went any harder? Are the W/kg categories appropriate for a sport?

Tagged :

Race Report: 29 Aug 2020, 3R Volcano Climb Race

I should lay low, waiting for the cat downgrade by next month’s end, now that I already have a nice triplet of sub-2.5 races in the ZP race records. But I just couldn’t help myself. I had to cheat a little more today and picked this shortish race. It seemed ideal. I chalked it up as honing my cheating skills.

The start was hard. A few D’s joined up with some C’s in a D front group that I decided quickly not to try to go with. I went with the second group instead the initial km’s, but we actually caught up with the front group in the underwater tunnel and stuck with them. 

I was monitoring my Watts in the Wahoo Fitness app closely of course, and some 12 min into the race the group was still pushing 3.0-3.2 and my average W was by then dangerously high, well above 200W. At 68 kg plus another 7 kg of unflattering belly fat, I had only 8 min to get the average down to 185W, my mark to be on the safe side.

I dropped and just spun the legs for a few minutes. Some C’s went past. Approaching the foot of the volcano a mixed C/D group caught up with me. I had monitored them for a while and they weren’t going much slower than the group I had dropped from, so I quickly decided to let them just pass. 

It actually seemed to take a while for the possibly legit D’s to reach me, so I then opted for a semi-slow pace of 2.0-2.2 to postpone getting caught as much as possible. I wanted to give them reason to work hard to wear them down. It turned out I timed it well. By the time the first seemingly legit D rider caught up with me, I was barely below the 185W mark. 

I stopped the Wahoo clock and restarted it after around 24 min and by then the average was still dropping. The first 20 min are always the most dangerous, the time frame where you are most likely to go over limits as a cruiser. Since I can’t actually measure a rolling 20 min window in the Wahoo app, and since the race would last around two flips of a 20 min hourglass, it made sense to restart the measuring just to be sure I wouldn’t go over limits on the second half. It seemed unlikely but you never know what comes from behind.

The D rider that eventually caught up with me, an Englishman, came together with a C rider that we both let go once the climb started. The Englishman, who later got the silver on ZP, seemed well aware of the situation in the race and quite possibly what I was up to as well. I let him do all the work of course. Not that we were going over limits, but I wanted generous wiggle room in my average for the final stretch and it seemed like I would be getting plenty of wiggle.

Towards the latter part of the climb a Norwegian D caught up with us. I had decided to let it happen. After some struggle around who would pull or set pace, the Norwegian decided to up the pace. The Englishman seemed to make a quick decision to let him go. To me, though, it was a tough choice. On the one hand the Norwegian was pushing 2.7 and it was still a little early to let my average rise again. (Also, he could have been going quite hard solo to get to us, meaning he might even already be DQ’d.) On the other hand, this was a climb, which could motivate a high tempo in the climb since the Norwegian would most likely slack off a bit on the descent. 

I reluctantly let the Norwegian go. Then I changed my mind and dropped the Englishman to bridge to my neighbor (I’m Swedish as you know) by the arch. As I suspected the pace dropped significantly on the descent and I think I hit a low of 169W on my second cumulative average. That meant I would be able to go flat out for quite some time toward the finish. The question was only when to drop the hammer.

During the final part of the climb I noticed that guys from the front D group were once again visible in the list on the right. They had slowed down considerably. In fact, they were only some 17 sec away. Potentially, they could pose a threat. And potentially, I could bridge to them if they kept the pace for yet some time. But they went so hard the first 20 min… I decided against trying to catch up with them, which would have meant an early hammer, a real effort in fact.

Remembering my last cruise on Champs Elysées, where I had brought out the hammer from the back pocket at the overpass, some 1.5 km away from the finish, I decided to go a little later this time. The Norwegian was most likely heavier than me and with more muscle mass (the average rider always is), so I couldn’t risk a sprint against him. I would have to drop him well outside sprint range. 

By the 900m mark I decided it was hammer time. The Norwegian didn’t fight back. With a safe time gap I coasted the last meters to the finish.

Obviously, I got a UPG on ZP for this but stayed safely within limits for an average of 2.4 W/kg. The Norwegian got a DQ too or wasn’t registered. And, as mentioned above, the Englishman took the silver 40 sec behind me. Come October and I will steal that silver.

So who won then? According to ZP a kid took it down. Exploiting the sub-200W limit of cat D at 38 kg and 149 cm, he pushed a legitimate average of 3.8 W/kg. And won of course. 

I wouldn’t want to the deny a young boy the pleasure of winning a race or two in Zwift. Really. Because it will do him far more good than for us grumpy old men. But still… what can you say? Well played, ZP! Hope you’re doing alright so far down in the bunker.

My effort for the race was as follows:

Ignore the green blob. That’s the spindown. Consider the rest. Don’t think for one second that this is what a reasonable race effort over 42 min may look like, that it is somehow within the acceptable range. The racing was a complete joke. That’s what cruising is about after all, a slice of roflcopter with some WTFPWN! sprinkled on top. I could have gone MUCH harder. I should be going MUCH harder. I would have gone MUCH harder in my true category. And any race participant in Zwift with their FTP set correctly would have to work MUCH harder doing e.g. the McCarthy WO or Zwift’s 30/15’s GWO.

Think about that the next time you see a HR graph like this from a race podium.

Tagged :

Der Untergang

Can you hear the rumbling, people? It may have seemed distant before, but it is creeping closer and closer for every day. Bad omens in the sky. Seals being broken one by one.

And on the crumbling tarmac on top of the ZwiftPower bunker an armada of belligerent racers roll in, complaining over massive cheating in Off the MAAP Tour and elsewhere, riders racing fair attacking cheaters in the race chat.

Let’s face it, the W/kg categories are falling apart. This is not the end of the beginning. This is the beginning of the end. The End of Days, the Untergang.

And from the ashes a Phoenix will rise.

Tagged :

While the Watopia PD Looked the Other Way

Summer vacation drew to a close and now I’m in the city again with days that grow shorter and over 30 min to get to roads even remotely worth riding. And so I’m back in Zwift on weekdays. Time for some cheating!

I noticed that my last little screwup in a race was 1 Jul. Well, it wasn’t even a race but an official group ride and ZP picked it up. I got caught on the Watopia PD speed camera doing 2.8 W/kg. But I have served my sentence on 1 Oct. Or should have. We’ll see, because my current 90 day top 3 average doesn’t actually correspond to activities I have participated in. But anyway, I thought I’d prepare for an autumn of intense, relentless cheating. And for that I need a “legitimate” downgrade. Thus I needed two more races staying within cat D limits.

Filler Race No 1

First was the Namibian Race League of 23 Aug. I had just recovered from an infection with fever the day before. Not covid this time but bad enough to call in sick. Safe to say, I was in pretty bad shape right after and so the race actually turned out to be fairly tough although I did hold back somewhat. 

Since I was set on monitoring my average Watts closely, not to let it slip beyond 187W at 75 kg, I decided to go “easy” at the start (meaning hard instead of the standard insanely hard). So I let a front group or two of sandbaggers go straight away. 

There is always the risk that there is a piggy-backing legit rider on the wheels of the front group frequent flyers. Or a cruiser. If it’s a cruiser, then that’s a risky strategy. He is then banking on the group slowing down considerably mid-race, or he runs the risk getting a DQ or even screwing up your categorization. It can pay off though, and if you let a guy like that go, you won’t catch him. But a legit rider you may have to let go. The hard part is telling which is which. In this case, though, there was no doubt about it. I would let them go.

After the start I decided to drop two more times ending up in chasing groups. You start with an initially very high average Watt. It will drop over the course of the race (I have no means to measure a semi-rolling best 20 min, so I have to play it by ear partly, relying entirely on the total average). It just has to drop enough over time and it is not always easy to predict whether the drop rate will be enough.

In the final climb I decided to start sliding down immediately, although I ended up also catching up with a few riders who went too hard at the foot of the hill. I was already at the target 2.5W/kg I had set in my mind and just tried to stay there.

This so-called effort, if I had been downgraded already, would have sufficed for a bronze. It would have been hard to improve on the result further, though, at least on a day like that. The winner was a heavy-weighter with a private Zwift profile (and thus no HR data) and a 45 sec lead on the runner-up. Go figure. 

Filler Race No 2

The second race, The KISS Underdog Series of 23 Aug, 3 laps around Champs Elysées, was a bit funny in that it turned out so soft and mellow. I had opted for a 2.3 W/kg to pull the 90 day top 3 down a bit to get some leeway in October, so I was nowhere near VO2Max at start. I went really slow and just stepped on it briefly from time to time to bridge early gaps forming. I also decided to race as a C for a change, so I was only really cheating in my imagination. 

Some D riders slipped away one by one during the first third of the race, looking fairly legit. But I had an average to protect, a low one at that, and had already accepted a placing way down the imaginary D field. 

Soon enough a nice group of maybe 8 or so riders formed, mostly D but with some other C rider too taking it slow. The group was going well below cat D limits and I was sure we were quite far down the field by then.

The group kept together and I kept my Watts low and even, staying in the draft. During the first part of the third lap the group became a little antsy. No surprise there. And also no surprise than things also calmed down a few km before the finish, as the group was preparing for the final dash. 

Since my average Watts had dropped so low there was a little room to play, so I had decided well before the overpass on the final lap that I’d hammer the climb and then just keep stepping on it until the finish. This was my first time riding the course but I had noted the distance to the finish from the overpass already on the second lap.

Appearing as a blue dot in a clutter of yellow dots on the minimap, there was no incentive to chase me down of course, but I’m pretty sure I would have pulled it off even as a yellow dot. I dropped the group hard and kept at it almost all the way to the finish. Doing so I passed several D stragglers ahead of the group I broke off from. On the final stretch I took a quick breather behind one of them before beating him in a sprint that he initiated. This was, surprisingly enough, the ZP legit silver guy. So come October and this could have been a “legit” silver. All while the Watopia PD looked the other way. But the Law of W/kg is just! Right?

The winner in cat D? Well… have a look at the HR graph yourself. I won’t comment on it. But I can’t refrain from commenting on the winner of cat C in the same race. Even though I have stated before that you will find that cruising is very common once you start looking, you will nevertheless have a hard time finding a more ridiculous display of cruising than that. Absolutely priceless! And the Watopia PD just looked the other way.

The Perceived Effort of Race 2 and a Comparison

So how was the perceived effort in this second race? Well, I’d call it light excercise by my Zwift frame of reference. Remember my How to Spot a Cruiser post? Remember the example HR distribution graph, taken from a notorious cruiser? This is what my own graph looked like. A Zone 3 effort. This level of effort is piss easy. It is racing most foul.

I refuse to be beaten by a “legit” Zone 3 guy while pushing a high Zone 4 in a race in the lower categories. And so should you.

Tagged :

Cruiser Sunday Studies – Part 3

We turn again to our investigations of ZwiftPower race data. In the second of the recent Cruiser Sunday posts I discussed briefly whether the spotted difference between cat A and cat C with regards to relative effort levels among top contenders was statistically significant. Now we will try to analyze race data properly, with a third approach.

An Explanatory Sidetrack

We will start with a little loop before we get back on track. Imagine you have kids and that you recently moved to a new area. There are two nearby schools to put your kids in and you have the choice between either and want to choose the one where the students have the highest grades. Is there a difference at all, and if there is, can we somehow determine whether that difference is not just random?

Or let’s make it really simple. You and a friend throw dice. You roll a die 100 times each. The objective is to score the highest total. If the dice are fair, then there should be no difference between your results, right? Or rather, there will be a difference but only a small one. Either of you had a streak of luck resulting in a slightly higher total. Do it all again and it might be reversed. 

But if it turns out your friend’s total is 516 and yours is only 321, is that just luck? Well, in theory it could be. It’s just not very likely that you will see such a large difference. He would have to have rolled a large number of 6’s to get to that total score. It could happen once in a blue moon, sure, but at the same time it wouldn’t be unreasonable to suspect a loaded die. Or?

A better approach here would be to not begin with trying to decide whether the difference is random or not, because right now we don’t know, but rather to start with determining how likely such an extreme random difference would be. Maybe the difference isn’t that big after all when it comes to probabilities?

Fortunately, there are ways to determine this likelihood for various scenarios. In the case of the schools or the dice you can use a fairly simple statistical test called the Mann-Whitney U-test. If the test score is high enough, it indicates that the probability that the differences in dice total is just random is very low. 

You typically set a limit beforehand as a decision rule. In smaller studies where the results aren’t life critical, a 5% limit, a so-called 5% confidence interval, is standard. So if we were to do the 100 dice rolls over and over and you would see differences of the magnitude of 516 vs 321 only in less than 5% of the trials, then we have decided that it is so unlikely that we are better off looking for other explanations than just chance. I.e. we would rather suspect that your friend is cheating.

We will use this same method when looking at the race results on ZwiftPower next.

Method

We will look at HR distributions graphs on Zwift.com among the top 3 in 100 consecutive races in the recent past, in both cat A and cat C.

If a rider spends the best part of his time in the race in a higher HR zone than the other two, visibly so, then that rider has worked harder. The HR graphs aren’t a perfect description of everyone’s fitness, especially when HR zones aren’t tuned to an individual, but on average they will be and we are looking at 300 riders in each category. It will likely average out.

If the winner of a race has worked harder than the rest of the podium, then we will score that race as 0, meaning nobody worked harder than him. If either of the other guys have worked harder than the winner, then we will score the race as 1, meaning one guy worked harder than the winner. If both of the other riders worked harder than the winner, then we will score the race as 2, meaning two others worked harder than the winner.

If there is no HR data available for someone on the podium, we will skip that rider and instead look at the next guy on the results list. It is not uncommon that HR data is missing and the typical reason is that the rider’s Zwift profile is set to private. So if the winner has no HR data, then we will compare the no 2 guy to the no 3 and no 4 guy instead. And if the no 3 guy has no HR data, we will compare the winner to the no 2 and no 4 guy instead. The reason we do this is that the display of all recent races on ZwiftPower is somewhat limited and we need to make sure we get a sample size big enough, 100 races. And it should really make no difference when it comes to our assumptions, or our hypothesis in this study. More about that below.

Once we have scored 100 races in cat A and cat C, we will then compare the results using the Mann-Whitney U-test. If there is a difference big enough to be statistically significant (remember the 5% rule here), then and only then will we draw uncomfortable conclusions.

Hypothesis

Assume we are with the ZP team and we LOVE the W/kg category system. We firmly believe it is fair and reasonable. Every sport should be categorized with W/kg, we think. There is no better option. We just need to get rid of those pesky sandbaggers first somehow…

Then what do we expect in a race with regards to relative effort levels among the top contenders? Perhaps there are two possibilities here. We could for example assume that the strength and prowess among the top contenders is roughly the same. So why does someone come out on top? Because he works harder than the others. All else equal, on average, someone working harder than the others will win. So we expect the winner to have worked the hardest (score 0).

Or we could assume that winning a race isn’t just about working hard, even if you are as fit as other top contenders. It is also about random events in the race, such as splits and breakaways and powerups and whatnot. Maybe those random events, a.k.a. luck, play such a large part in a race that we can’t separate the podium places with differences in effort levels. So instead we assume that the relative effort among the top 3 will be roughly the same. Obviously, the top 3 will be more fit and potentially also work harder than the ones coming in last in a big race, but among the top 3, we assume that the effort of each respective rider will be about the same, if not in every race then at least on average in 100 races. Thus what we will not see is a tendency for score 2 in a lot of races. Rather, races will converge around score 1. 

And what do we expect when comparing cat A with cat C? We expect to see no difference in relative efforts in the two categories. Cat A riders might be used to working harder but when comparing the top 3 in a cat A race, there should be no greater differences among them than among the top 3 in a cat C race. There may or may not be a difference in overall relative effort between cat A and cat C but there will not be a difference between riders in a category that is different from the other category.

Possibly, since we make no distinction between A and A+ riders, and since it is not uncommon that a cat A race is won by an A+, followed by two A riders, we might find a slight tendency for cat A winners to work a little less hard than the rest of the podium. We do not, however, expect to see this in cat C. Because cat C is fair and the W/kg system is appropriate in Zwift, or so we claim.

The “Oh Shit!” Scenario

Now, if we were to find that there is a tendency for cat C winners to work less hard than the rest of the podium, and that there is less of that tendency in cat A, then that would scare us. Because it is unintuitive. Why should races be won by people who work less hard than others, especially when there is an upper limit to performance (W/kg) in a category? We wouldn’t like that. It goes against the nature and ethics of the sport and would distance us from outdoor cycling too.

And it may also indicate that the phenomenon of cruising is a real issue in the lower categories, i.e. that some riders exploit the W/kg system on ZwiftPower by staying behind in a category they are too strong for, making sure they don’t go over W/kg limits, and thus get an unfair advantage in races over riders who couldn’t go over limits due to fitness and who would have to (and will) work extremely hard to finish anywhere near the top.

Results

100 races were sampled starting Fri 7 Aug 2020 and forward in cat A and cat C. According to the scoring method described above, cat A got a total score of 80 whereas cat C got a total score of 106. 

In 43 races in cat A, the top 1 guy worked harder than the following two. In 34 races in cat A, one following rider worked harder than the top 1 guy. In 23 races in cat A, both following riders worked harder than the top 1 guy.

In 29 races in cat C, the top 1 guy worked harder than the following two. In 36 races in cat C, one following rider worked harder than the top 1 guy. In 35 races in cat C, both following riders worked harder than the top 1 guy.

The Mann-Whitney U-test gives a test score of -2.15, which translates into a probability, a p-value, of 0.032 (3.2%) for a random occurence. This is lower than the 5% limit we set. There is indeed a difference between the categories and it goes in a direction we did not expect, that there would be no statistically significant difference between the two categories or that if there was, then it would lean in the other direction, towards a tendency for winners in cat A to work less hard compared to the other two on the podium than in cat C. Hence we have to draw the conclusion that we cannot refute the “Oh shit!” scenario.

Conclusions

The “Oh shit!” scenario is real. We do not live in the best of all Watopias. We live in a Watopia where it pays off to work hard in cat A but apparently not so much so in cat C. We live in a Watopia where the category system makes us behave weirdly in races in the lower categories B-D. We live in a Watopia where you can get away with cruising, even on ZwiftPower.

Now we have a choice. We can either accept that racing is inherently unfair in the lower categories and just live with it. Or we can, inspired by other working and efficient category systems in real-life sports, find a new category system that would prevent not only sandbagging but also weird discrepancies such as the one we just looked at, a system that would also unchain racers in all categories and prevent cruising.

Your choice. I have made up my mind already.

Tagged : /