W/kg Cats Fail 2: The Light Rider’s Curse

The Heavy Rider’s Disadvantage

It could be debated if you can call it unfair, but heavy riders do suffer a disadvantage in Zwift races. The lighter riders have an easier time uphill and it is so hard to match the Watts needed to keep level with the lighter rider’s W/kg there.

The above is a very common complaint on various Zwift forums. But is it true?

I like to question those self-evident truths we all take for granted. If they are indeed truths then there is no harm in validating them. Sometimes, however, they turn out not to be true after all, once you actually take a serious look at them. So what about rider weight and race results in Zwift? Let’s take one of those serious looks for a change instead of just passing on what some other guy said in a one-liner on the forum.

The Light Rider’s Advantage

Without any prior knowledge about the impact of weight in Zwift racing we could assume three things:

1. There could be advantages to being light

2. There could also be disadvantages to being light

3. If there are both advantages and disadvantages to being light, perhaps depending on scenarios, then you could compare those advantages and disadvantages, weigh them against each other, and come to some kind of conclusion regarding the net effect of being light – is it more good than bad to be light, or is it the other way around?

So let’s start by looking at possible advantages to being light, since people say there are such advantages. There are no obvious advantages on the flat, and everyone seems to agree (we will get into details on this further down). What heavier riders say instead is that they have a hard time against light riders in climbs. 

On the flat speed is mainly maintained by momentum, so pure Watts is king and heavier riders can usually (although not necessarily) push higher Watts than a lighter rider with a smaller frame and less muscle volume. But in a climb W/kg is king. Body weight comes into play, and maybe it is easier for a light rider to attain a better ratio between Watts and body weight than it is for a heavier rider, especially a heavier rider with a few surplus kilos.

The above is reasoning taken from riding outdoors and in a different setting than Zwift racing with its unique and uniquely stupid rules. But it is actually a completely flawed argument and you need to fully understand why. The explanation is two-pronged. We start off with some physics. 

Question: A rider at 90 kg is time trialing against a rider at 70 kg up the Alpe du Zwift climb. Both are keeping the exact same lines and both are able to keep dead steady, ERG-like Watts. Both are doing exactly 3.19 W/kg. Who will win?

Answer: The lighter rider will win. By a few seconds. But it has nothing to do with Watts or weight. The lighter rider will win because he has an ever so slight advantage in drag, having a smaller frontal area. 

It’s similar to choosing between bikes in your garage before a Zwift climb. One frame will be ever so slightly faster than the other. However, when was the last time you saw a race up AdZ and only that? There are no such races. The closest you can get to that scenario in a Zwift race is a race on Road to Sky, a route which has quite the approach to the mountain, and the approach is flattish. So if we staged an iTT on the Road to Sky course, then this advantage in drag for the light rider, a mere seconds, is more than offset by the heavier rider’s advantage on the flattish approach to the mountain. On Road to Sky, or even Ven-Top with its very short approach, the heavier rider will win!

Also, what you need to understand is that if it wasn’t for the small difference in drag between the two riders, if they both raced in vacuum, then if both started at the same time at the foot of the climb, both riders would arrive at the finish exactly simultaneously. Because if we ignore the drag issue, then 3.19 W/kg is 3.19 W/kg. It doesn’t matter what you weigh. You will travel up the mountain at exactly the same speed. That’s what the measure W/kg implies, it’s its purpose, to equalize riders to make a comparison possible.

A heavier rider could in theory have a hard time producing high enough Watts to be able to match the W/kg of a lighter rider in a climb. But in our example we assumed that both climbed at exactly 3.19 W/kg, so the heavier rider already compensated his higher weight with higher Watts. And thus they are both traveling at the exact same speed up the mountain.

Now here comes the second prong of the argument. Put the above in relation to the W/kg cat system, with the performance ceilings in cat B-D. To be competitive in any cat B-D, you typically need to be able to put out W/kg at or very close to the performance ceiling, be it 2.5 W/kg, 3.2 W/kg or 4.0 W/kg. So to win a race on any course in, say, cat C, you need to be able to hold 3.19 W/kg, or someone else could come and do the 3.19 W/kg and beat you (there’s plenty of such riders). Agreed?

So to win a race up AdZ you thus need to be able to hold 3.19 W/kg. Assume you are contender, someone who could actually win in cat C. Then you will be able to race Road to Sky at 3.19 W/kg. If you are indeed one of those riders who could, then as we just concluded your weight doesn’t matter at all. And we already know that there are heavy riders who can do 3.19 W/kg up AdZ, and there are light riders who can do the same. Both kinds race up the climb at almost the exact same speed, bar the minuscule difference in drag. In fact, given that you are a contender, you advantaged being heavy on Road to Sky since you will be naturally faster in the approach and might thus either get a head start or save some energy before the climb.

GET THIS:

There is no advantage to being light in Zwift racing!

And this is because of the W/kg cat system. Without it things would be different. With a results-based categorization, a race on Road to Sky would favor lighter riders, whereas the heavies would still reign on Tempus Fugit. You would have to specialize and play to your unique advantages, just like in real cycling.

So if there are no advantages to being light in Zwift, could there still be disadvantages?

The light rider’s disadvantages

This post is named the Light Rider’s Curse, which refers to a tendency in Zwift racing. Many light riders have first-hand experience of improving fitness to the point where they reach the top of their current race category. Or rather what should have been the top of the race category. Only it isn’t.

You would think that being able to average e.g. 2.5 W/kg in cat D would make you competitive there. But that is not necessarily the case. First, you have to beat the cruisers. But even if we take the cruisers out of the picture, it can still be surprisingly hard for a light rider to get anywhere near a podium in the average Zwift race.

So they do what anyone would do in that situation. They try to improve fitness further still. Shouldn’t that help getting to a podium then? No, that’s just that final push that tips them over to the bottom of cat C. They got upgraded before they even saw a podium. 

Why is this? Is this real or just some bad excuse from failed light racers? It all seems so counter-intuitive. As a light rider you should have an advantage against the heavies in the hills, said a guy in a one-liner on the forum. And being able to do 2.5 W/kg you should have no problem getting a decent shot at the podium, right? So why don’t you win?

It’s because of this:

Someone doing 300W on the flat is going faster than someone doing 275W.

Yeah, of course he is! So what?

Well, what if it’s a semi-flat cat C race and the guy doing 300W weighs 94 kg? That’s 3.19 W/kg, within ZP’s cat limits. And what if the guy doing 275W weighs 77 kg? That’s 3.57 W/kg, way over limit. See the problem?

The heavy guy wins the race and the light guy, being slower, isn’t anywhere near a podium but is still a disgusting sandbagger who deserves a DQ. But this never happens in real-world cycling, only in Zwift. And it’s because of the W/kg cat system that no other sport uses. 

Specifically, it’s because of the W/kg ceiling of the lower cats in combination with ZP disqualifying racers afterwards, racers that they themselves allowed into the race. But you can’t have a performance ceiling in sports. And you should never have to disqualify a contestant for being “too good” in sports.

Most races consist of mainly flattish stretches and then some shorter climbs. At the W/kg ceiling of a cat, i.e. in that front group with the riders that actually have a chance to win the race, a light rider can in theory never match the speed of a heavier rider without going over limits and getting a DQ or even an upgrade, not unless the heavier rider is a cruiser. It’s simple maths.

If it’s simple maths in theory, then it should show in data too. So does it? Let’s find out!

Weight Study 1 – A Mix of Races

I grabbed some fresh data from ZwiftPower, a sample of 50 consecutive cat C races of all sorts (distances, elevation, etc). I only skipped races where

i) weight data was missing

ii) there were fewer than 6 cat C finishers according to ZP

iii) the race type didn’t lend itself to this test (like e.g. Hare & Hounds, age category or TTT races).

Then I compared the average weight of the 3 riders on the podium to the average weight of the other riders in the race (hence why I wanted at least 6 finishers).

Results

The podiums in the races had an average weight of 81.3 kg.
The remaining riders in the races had an average weight of 77.5 kg.

This nearly 4 kg difference between the average podium winner and the average loser turns out to be highly statistically significant, even at the 1% level (p = 0.00118). For those who aren’t into statistics, this means that it is extremely unlikely that this difference wouldn’t appear again and again if we picked some other random set of 50 races from the ZP database. And thus we can’t refute that there is indeed a difference in average weight between winners and losers. Winners are somewhat heavier on average. It is not bad to be heavy in Zwift racers, quite the opposite. It is bad to be light in Zwift races. The results prove it.

The W/kg cat system screws light riders. I will give a more detailed example than the the simple theoretical one above. Let’s work through this.

Assume the following:

-You are racing in the front group in cat C (for some reason there are no sandbaggers this time…)
-The group keeps a steady pace and you are at least 20 min from finish
-You weigh 75 kg
-You are on the wheel of a bigger guy @ 85 kg
-You are both in draft
-The big guy is able to hold a 20 min average of 286W, i.e. 3.2 W/kg according to ZP (286 x 0.95 = 272. 272/85 = 3.2)

The only way you can stay on his wheel is by matching his 286W. This would put you at (286 x 0.95)/75 = 3.6 W/kg. Keep at it for 20 min (if you can) and ZP will give you a DQ. People might even call you a sandbagger! You simply can’t win this race as a light rider and get away with it on ZP. It’s not just hard. It’s impossible.

Guys weighing 75 kg with a 1 hr FTP of 272W according to ZP will already have been upgraded to cat B. They will have seen very few podiums back in cat C if they were up against heavier riders. Which they were. And data supports our simple maths theory and the existence of a Light Rider’s Curse.

The Objection

But wait a minute! “Assume you are both in draft…” Granted, draft in Zwift doesn’t give quite as much help as outdoors but it is certainly a factor. What if these heavier winners are just better at drafting? It seems unlikely. Why wouldn’t drafting skills be evenly spread out over riders of all weights and sizes? But it’s a good idea to eliminate draft when you are doing a study like this. So how could we eliminate it? By studying only individual time trials instead. On a TT bike you can’t draft.

Weight Study 2 – Only TT Races, No Draft

So instead I scraped 40 consecutive iTT races in cat C from ZP. What were the average weights for the podium vs the rest of the field? Was there a difference? And was it statistically significant (i.e. not random)?

Results for iTT’s in Cat C

Podium avg weight: 83.9 kg
Losers avg weight: 78.1 kg
Difference: 5.8 kg
Statistical significance: p=0.00004 (probability of a random sample/event resulting in such a difference)

Conclusion: The difference is not random. In fact, a pharma company doing a study on a new promising medication would do wheelies and open up the champagne if getting results of this magnitude. So heavier riders do have an advantage in cat C, even in iTT’s where there is no draft.

“Ok, but maybe this is exclusive to cat C. I don’t care about the fat noobs in cat C anyway. I race in B.”

So let’s look at cat B too.

Results for iTT’s in Cat B

Podium avg weight: 77.7 kg
Losers avg weight: 73.0 kg
Difference: 4.7 kg
Statistical significance: p=0.00007

Conclusion: The difference is not random. We can see that people weigh less in cat B, just as I predicted in and older blog post, but there is still a clear advantage for the relatively heavier rider, even without draft.

“Uh-oh… and you mean the reason for this is that both cat C and cat B have a performance ceiling (3.2 W/kg and 4.0 W/kg) that will weed out lighter riders trying to match the speed of heavier riders?”

Exactly!

“A-ha! Gotcha! But cat A doesn’t have a performance ceiling! So if their iTT winners are heavier than the losers too, then your argument implodes!”

Yes, that’s right. It would. We’d have to come up with some other explanation for the differences. Not that I can think of any. But let’s worry about that later. First let’s look at cat A the same way. If we see the same difference, then I’m in trouble. However, if we don’t see the same difference… then the W/kg cat system is in trouble. If I lose, I’ll go jump off a bridge. If the W/kg cat system loses then… it can go jump off a bridge.

Results for iTT’s in Cat A

Podium avg weight: 68.8 kg
Losers avg weight: 69.9 kg
Difference: -1.1 kg
Statistical significance: p=0.18

Conclusion: There is a small difference, but it is pointing in the other direction (better to be light) and it is quite possibly just random. We would get a difference like this almost every 1 in 5 samples from the ZP database. So we conclude that there is no difference in weights between podiums and losers in cat A iTT’s. There is no disadvantage to being light in cat A, where there is no W/kg ceiling stopping you from doing your best.

GET THIS:

There is no advantage to being light in regular Zwift racing, but there are clear disadvantages. Hence the net effect of being light is negative. Or to spell it out: It sucks to be light in Zwift. Heavies have the upper hand. Always, on any course.

The Light Rider’s Curse is a reality. Now where’s that bridge? I have a cat system to escort there.

And don’t you ever come to the forums and complain about being heavy again. You are wrong. Don’t spread misinformation. Either you are not able to perform at the performance ceiling in your category, and then it doesn’t matter if you are light or heavy. You will get dropped in a climb either way. Or you are a real contender and can touch the upper W/kg limit, and then you are not disadvantaged at all by being heavy. In fact, you have the upper hand.

The Takeaway

So what is your takeaway from this post? That you should go buy some cake, french fries and some jars of peanut butter and start gaining weight? No, it’s not weight per se that gives an advantage but the part of it that is muscle volume. And we are talking muscle volume in absolute terms, not relative terms where you factor in body fat.

If you are a little chubby with a nice dad bod but stay on top of your category in terms of W/kg, then you still more than likely have higher absolute muscle volume than a rider 30 kg lighter than you. This translates into higher Watts. And that still gives you an advantage, even if that lighter rider can produce the same W/kg as you.

Of course you only stand to gain from losing excess body fat. It will improve your W/kg if nothing else. Just watch it so you don’t get that dreaded upgrade. You can always do what I do. Cruise!

Tagged : /

W/kg Cats Fail 1: The Sprint Race Catapult

The Zwift W/kg category system needs to go. We have talked about it before. A few times. But it is important to understand that the reason why the W/kg cat system is so terribad is not just that it allows for and incites cheating in the forms of sandbagging and cruising. It also does a lot of other stupid things to Zwift racing.

In this and the next blog post we are going to discuss two of those things. I have dubbed them the Sprint Race Catapult and the Light Rider’s Curse. First up is the Sprint Race Catapult. No, it’s not an instruction on how to get yourself catapulted over the finish line in race sprints. It’s a complaint over how sprint races tend to catapult you into a category where you don’t really belong because of how the W/kg cat system works.

A few posts ago I discussed the power curve. Let’s go over it quickly again. A power curve looks something like this.

No two riders’ power curves are exactly the same but all power curves are more or less the same in that it is always roughly the same downward slope with roughly the same shape. 

Your power curve is a continuous mapping of what kind of Watts you can produce over different time frames. You can only keep really high Watts over a sprint for a few seconds, Watts that you couldn’t possibly keep up for 20 min. And your 20 min performance won’t last you a full hour but the difference is not that big. In fact, Zwift reckons you could do 95% of your 20 min power over a full hour, and that is how it arrives at your 1 hr FTP from a 20 min test.

If we wanted to, knowing your weight, we could also plot a corresponding curve for your W/kg over a time scale. Most riders in cat D can actually race with cat A and keep similar W/kg. For a minute or so… But the longer the effort, the more the average W/kg is going to drop. And here is yet another example of why the W/kg cat system fails.

Most races in Zwift are roughly 20-30 km. Some go above 40 km. Longer races than that are rare. Then there are also races shorter than 20 km. The crits in the lower categories tend to be in “the teens” length wise. There are also sprint races that go on for less than 10 km.

Racing a sprint race is very different from a standard 30’ish km race. In real life endurance sports, with cat systems that don’t suck, races over different distances are treated differently. In US road racing and MTB you get upgraded from your cat by racing actively and by collecting race points over a season. Winning a short race does not award you as many points as a longer race. It’s not that the shorter race is easier to win, it’s just a different beast, but the categories would get screwed up if the system didn’t take race distance into account somehow. 

As a different approach, in cross-country skiing your rank upgrades from a race is dependent on the time gap between you and the winner. At one point in the calculation your rank gets multiplied by that time gap seen as a percentage of the total race time. So if your finish time is 1 min slower than the winner in a 10 min sprint race, then your finish time is actually 110% of the winner’s and your rank gets multiplied by that number (you want a low rank score in skiing). But if it’s a 1 hr race and you are 1 min slower than the winner, then that extra minute is just a +1/60th of the winner’s finish time, so your rank is multiplied bya mere 1.017, which is worse for your rank than winning the race but still far better than losing by a minute in a shorter race. And in that sense race distance is taken into account also in skiing, by finish time differences as a proxy.

The Zwift W/kg cats don’t take race distance into account at all. A sprint race is valued and treated the same as a 30 km race. And the performance ceiling for each cat (2.5 W/kg, 3.2 W/kg and so on) is the same regardless of race length.

You might already have first-hand experience of this obvious flaw, but if not then let’s imagine you have been racing mainly 30 km races and not only the flat ones. Your typical finish times will depend on your fitness and what category you are in, but if you are racing in a low cat it might be something like 40-50 min.

Now remember your power curve. If you are on top of your category, i.e. your W/kg is close to the ceiling in your cat, then your average race Watts in a 30 km race will be fairly close to your 1 hr FTP, because that far to the right in the power curve diagram your power curve doesn’t drop that fast anymore.

So you’re fairly comfortable at the top of your cat for the time being. You’re not cruising (let’s assume you aren’t). And then one night you get the stupid impulse to join a sprint race. Now, since you aren’t cruising you are not guarding your Watts. No, you do your best instead trying to beat the three cruisers in the race once the five sandbaggers are just specks on the Watopia horizon.

But remember the power curve. You can do a much higher sprint race W/kg effort than you can hold over 30 km. Let’s say you race in cat C. And so you go over limits. Oops! You get a DQ. If your last 30 km race was a 3.2 W/kg, then all it might take is one more sprint race over limits and your 90 day average is above 3.2 + 0.1 W/kg and ZP boots you to the next category. 

Well, isn’t that fair? Isn’t that working as intended? With the logics of ZP it is. If you are a sprint racer. But note here that your power curve has not changed one bit. You have not become stronger. You have just moved between different parts of the power curve in choosing races of different lengths. You are still a 3.2 W/kg racer in a 30 km race. So in effect, if your preference is to race primarily 30 km races, then you get booted to cat B by ZP while still being below the W/kg span of cat B. You get branded a cat B while still being a cat C. And a poor rider in the bottom half of cat B might not be able to compete with top cat C riders in a longer race even!

Lesson learned: Beware of the Sprint Race Catapult! And be wary of how race distance might affect your categorization in general.

The W/kg cat system is just all too stupid. There would be nothing for you to miss if Zwift came to their senses and replaced it. Absolutely nothing. Except perhaps that cozy feeling of familiarity. 

But what do I know? Maybe you would be too insecure without that cozy feeling. So perhaps you shouldn’t buy a new bike either. A new bike might not be as familiar to you as your old one on the first few rides. It might feel… different somehow… like… better. Scary! Nah, stick to your old bike and keep hugging your old system. We all need our security blankets now that we are grownups and mommy is not around anymore, isn’t that so? We might wet our bibs otherwise.

Tagged :

Borg Charting a Cheater

In the wake of my previous studies, proving that winners in cat B-D make a lesser effort than the rest of the podium, as opposed to cat A where winners make a harder effort, a question kept resurfacing in the discussions on the Zwift forum: Is it really reasonable to assume that you can detect cheating (cruising) from just looking at a HR distribution chart?

Coming from the outside it may indeed seem like a fair question. I would, however, like to argue that it is not, that you are missing the point. The point is that cruising is the HR distribution graph. You can’t really detect it any other way, not even in theory. In fact, you can’t really define it any other way. I will try to explain. But first one of those mandatory detours that come with this blog.

I thought we would start off with discussing dead celebrities. Let’s leave the boring Club 27 out of the picture for a change. But do you know who Borg was?

No no, not that Borg. I am referring to Gunnar Borg, PhD MD and former Swedish professor in psychology. 

I saw him in person a few times while he was still active since he was working at the same campus I was studying at for some years. He and his colleagues used to hang by themselves in this creepy brick building that looked more like a crematory than an academic faculty. Psychophysics. Supposedly, the house made for a good lab environment, whether they actually incinerated failed students in there or not. We weren’t sure.

Anyway, Borg, who died early this year (from old age, I would presume, after a long and productive life) is a world celebrity in our game. No, he was not a cyclist, but he was and remains the go-to guy when you need to put a measure on your physical efforts but lack data on Watt, heart rate, max heart rate, lactate levels, etc. Or when you want to match physiological measures to a person’s perceptions of what is going on in his body, regardless of whether this person is an elite athlete or someone with a possible heart condition visiting a hospital lab. 

Borg is famous for the so-called Borg Chart, widely spread in both sports physiology and medicine. You have surely seen it before. If not in this exact form then at least its elements will be familiar to you.

Along with the Borg Chart there is the Borg Scale in which you estimate your physical exertion from 0 to 20, where 20 would be the point of failure e.g. at the end of a ramp test, one where you don’t hold back. The rest should be familiar too. If you look higher up in the chart above you can find the “can talk“, a familiar cue from your recovery or fat burning rides, and so on. Yes, there is a corresponding scale in Strava that you can use when you don’t have a power meter or a heart rate monitor. And it all started with Borg.

On the right you can see the rough percentage of your maximum heart rate that each level of exertion corresponds to. Even though how your working heart maps to your perceived effort can vary a little from individual to individual, there is still a pretty hard correlation between the two. For example, it is very hard to talk at VO2Max (above 90% max HR) for anyone, and it is not something you can get used to or learn. It is just the way our bodies work. Nor can you go beyond 20. There is no “you can always dig deeper, what doesn’t kill you…” when you are at a perceived 20. Max is max, and your legs just stop working.

Obviously, the Borg Chart is relevant when we once more turn to cruising.

I thought I would show you some examples of HR distribution graphs from Zwift again. The other day I posted a race report. The effort in this race can be summed up as follows:

The green part is the spindown and can be ignored. But look at the rest. Was I cruising this time or not? Couldn’t this be a fairly normal, legit race?

We need a point of reference, something to compare with. Here is another race from last year when I was more fit but also had a max HR that seemed to be a couple of beats lower than today. It’s a 3.2 W/kg effort that still left me well outside the podium in cat C on ZP:

Do you notice any difference between the two graphs? 

Returning to Borg, what was the perceived effort in those two races? Let’s start with the second graph. A large part of it was spent above 160 BPM, as you can see. In my case, with a max HR of 173 at the time, this meant 92% of max HR. If you refer to the Borg Chart above this should mean that I perceived a large part of the race as “Very Hard” or worse.

Did I? It checks out. I can attest to that. Or to put the perceived effort in my own words: It was something of a OH-GOD-PLEASE-MAKE-IT-STOP-I-CAN’T-TAKE-IT-ANYMORE-I-WILL-SELL-MY-BIKE-TOMORROW kind of effort (and the day after you are none the wiser).

So what about the first graph? First, I was actively cruising. I had signed up for a D race. I am not as fit today as in the other race, which should have pushed my bars in the graph to the right compared to if I had cruised this race a few days after the first one last year. And this push to the right would also translate into a somewhat higher perceived effort. Even so my perceived effort of the cruiser race was that it was quite easy.

Let’s repeat this AND look closely at the first graph again:

  1. I signed up to a lower category 
  2. I consciously cruised 
  3. It felt easy

Now let’s look at another rider in a race that I participated in a few days ago. The winner in cat C, according to ZP, looked like this:

It should be noted that this rider is very young, a teenager, so he should normally have a max HR in the 200’s. He has won about half his 30-some races on ZP [sic!]. In this particular race he was followed by a podium that looked like the second of my graphs, the “Very Hard” effort according to the Borg Chart.

You are the jury here. What is the verdict? Make ample use of the Borg Chart if in doubt. Did he cruise? Or does he just have a serious heart condition capping his HR, a condition that somehow still lets him win half his races? (I bet you can beat his win-% easily.) Or was there perhaps just a glitch? Maybe Martians sent some rays that affected the graph? Or maybe he has Martian DNA himself and that this is what a typical low cat winner’s HR graph looks like on Mars?

You are the jury here. What is the verdict? Is it at all possible to separate at least some cruisers from legit racers by merely looking at HR distribution graphs?

You are the jury here. What is the verdict? Refer to the Borg Chart again. Is it reasonable that someone can win half his races while talking to a friend without too much difficulty (70% HR), while other contenders can hardly breathe (90% HR) and all of them, winner included, are at or close to the performance ceiling in the category and would get a DQ if they went any harder? Are the W/kg categories appropriate for a sport?

Tagged :

Cruiser Sunday Studies – Part 3

We turn again to our investigations of ZwiftPower race data. In the second of the recent Cruiser Sunday posts I discussed briefly whether the spotted difference between cat A and cat C with regards to relative effort levels among top contenders was statistically significant. Now we will try to analyze race data properly, with a third approach.

An Explanatory Sidetrack

We will start with a little loop before we get back on track. Imagine you have kids and that you recently moved to a new area. There are two nearby schools to put your kids in and you have the choice between either and want to choose the one where the students have the highest grades. Is there a difference at all, and if there is, can we somehow determine whether that difference is not just random?

Or let’s make it really simple. You and a friend throw dice. You roll a die 100 times each. The objective is to score the highest total. If the dice are fair, then there should be no difference between your results, right? Or rather, there will be a difference but only a small one. Either of you had a streak of luck resulting in a slightly higher total. Do it all again and it might be reversed. 

But if it turns out your friend’s total is 516 and yours is only 321, is that just luck? Well, in theory it could be. It’s just not very likely that you will see such a large difference. He would have to have rolled a large number of 6’s to get to that total score. It could happen once in a blue moon, sure, but at the same time it wouldn’t be unreasonable to suspect a loaded die. Or?

A better approach here would be to not begin with trying to decide whether the difference is random or not, because right now we don’t know, but rather to start with determining how likely such an extreme random difference would be. Maybe the difference isn’t that big after all when it comes to probabilities?

Fortunately, there are ways to determine this likelihood for various scenarios. In the case of the schools or the dice you can use a fairly simple statistical test called the Mann-Whitney U-test. If the test score is high enough, it indicates that the probability that the differences in dice total is just random is very low. 

You typically set a limit beforehand as a decision rule. In smaller studies where the results aren’t life critical, a 5% limit, a so-called 5% confidence interval, is standard. So if we were to do the 100 dice rolls over and over and you would see differences of the magnitude of 516 vs 321 only in less than 5% of the trials, then we have decided that it is so unlikely that we are better off looking for other explanations than just chance. I.e. we would rather suspect that your friend is cheating.

We will use this same method when looking at the race results on ZwiftPower next.

Method

We will look at HR distributions graphs on Zwift.com among the top 3 in 100 consecutive races in the recent past, in both cat A and cat C.

If a rider spends the best part of his time in the race in a higher HR zone than the other two, visibly so, then that rider has worked harder. The HR graphs aren’t a perfect description of everyone’s fitness, especially when HR zones aren’t tuned to an individual, but on average they will be and we are looking at 300 riders in each category. It will likely average out.

If the winner of a race has worked harder than the rest of the podium, then we will score that race as 0, meaning nobody worked harder than him. If either of the other guys have worked harder than the winner, then we will score the race as 1, meaning one guy worked harder than the winner. If both of the other riders worked harder than the winner, then we will score the race as 2, meaning two others worked harder than the winner.

If there is no HR data available for someone on the podium, we will skip that rider and instead look at the next guy on the results list. It is not uncommon that HR data is missing and the typical reason is that the rider’s Zwift profile is set to private. So if the winner has no HR data, then we will compare the no 2 guy to the no 3 and no 4 guy instead. And if the no 3 guy has no HR data, we will compare the winner to the no 2 and no 4 guy instead. The reason we do this is that the display of all recent races on ZwiftPower is somewhat limited and we need to make sure we get a sample size big enough, 100 races. And it should really make no difference when it comes to our assumptions, or our hypothesis in this study. More about that below.

Once we have scored 100 races in cat A and cat C, we will then compare the results using the Mann-Whitney U-test. If there is a difference big enough to be statistically significant (remember the 5% rule here), then and only then will we draw uncomfortable conclusions.

Hypothesis

Assume we are with the ZP team and we LOVE the W/kg category system. We firmly believe it is fair and reasonable. Every sport should be categorized with W/kg, we think. There is no better option. We just need to get rid of those pesky sandbaggers first somehow…

Then what do we expect in a race with regards to relative effort levels among the top contenders? Perhaps there are two possibilities here. We could for example assume that the strength and prowess among the top contenders is roughly the same. So why does someone come out on top? Because he works harder than the others. All else equal, on average, someone working harder than the others will win. So we expect the winner to have worked the hardest (score 0).

Or we could assume that winning a race isn’t just about working hard, even if you are as fit as other top contenders. It is also about random events in the race, such as splits and breakaways and powerups and whatnot. Maybe those random events, a.k.a. luck, play such a large part in a race that we can’t separate the podium places with differences in effort levels. So instead we assume that the relative effort among the top 3 will be roughly the same. Obviously, the top 3 will be more fit and potentially also work harder than the ones coming in last in a big race, but among the top 3, we assume that the effort of each respective rider will be about the same, if not in every race then at least on average in 100 races. Thus what we will not see is a tendency for score 2 in a lot of races. Rather, races will converge around score 1. 

And what do we expect when comparing cat A with cat C? We expect to see no difference in relative efforts in the two categories. Cat A riders might be used to working harder but when comparing the top 3 in a cat A race, there should be no greater differences among them than among the top 3 in a cat C race. There may or may not be a difference in overall relative effort between cat A and cat C but there will not be a difference between riders in a category that is different from the other category.

Possibly, since we make no distinction between A and A+ riders, and since it is not uncommon that a cat A race is won by an A+, followed by two A riders, we might find a slight tendency for cat A winners to work a little less hard than the rest of the podium. We do not, however, expect to see this in cat C. Because cat C is fair and the W/kg system is appropriate in Zwift, or so we claim.

The “Oh Shit!” Scenario

Now, if we were to find that there is a tendency for cat C winners to work less hard than the rest of the podium, and that there is less of that tendency in cat A, then that would scare us. Because it is unintuitive. Why should races be won by people who work less hard than others, especially when there is an upper limit to performance (W/kg) in a category? We wouldn’t like that. It goes against the nature and ethics of the sport and would distance us from outdoor cycling too.

And it may also indicate that the phenomenon of cruising is a real issue in the lower categories, i.e. that some riders exploit the W/kg system on ZwiftPower by staying behind in a category they are too strong for, making sure they don’t go over W/kg limits, and thus get an unfair advantage in races over riders who couldn’t go over limits due to fitness and who would have to (and will) work extremely hard to finish anywhere near the top.

Results

100 races were sampled starting Fri 7 Aug 2020 and forward in cat A and cat C. According to the scoring method described above, cat A got a total score of 80 whereas cat C got a total score of 106. 

In 43 races in cat A, the top 1 guy worked harder than the following two. In 34 races in cat A, one following rider worked harder than the top 1 guy. In 23 races in cat A, both following riders worked harder than the top 1 guy.

In 29 races in cat C, the top 1 guy worked harder than the following two. In 36 races in cat C, one following rider worked harder than the top 1 guy. In 35 races in cat C, both following riders worked harder than the top 1 guy.

The Mann-Whitney U-test gives a test score of -2.15, which translates into a probability, a p-value, of 0.032 (3.2%) for a random occurence. This is lower than the 5% limit we set. There is indeed a difference between the categories and it goes in a direction we did not expect, that there would be no statistically significant difference between the two categories or that if there was, then it would lean in the other direction, towards a tendency for winners in cat A to work less hard compared to the other two on the podium than in cat C. Hence we have to draw the conclusion that we cannot refute the “Oh shit!” scenario.

Conclusions

The “Oh shit!” scenario is real. We do not live in the best of all Watopias. We live in a Watopia where it pays off to work hard in cat A but apparently not so much so in cat C. We live in a Watopia where the category system makes us behave weirdly in races in the lower categories B-D. We live in a Watopia where you can get away with cruising, even on ZwiftPower.

Now we have a choice. We can either accept that racing is inherently unfair in the lower categories and just live with it. Or we can, inspired by other working and efficient category systems in real-life sports, find a new category system that would prevent not only sandbagging but also weird discrepancies such as the one we just looked at, a system that would also unchain racers in all categories and prevent cruising.

Your choice. I have made up my mind already.

Tagged : /

Cruiser Sunday Studies – Part 2

In the last blog post I tried to show that the majority of races in Zwift and on ZwiftPower seem to be won by riders making a smaller effort than riders coming in behind. As you may have had objections to the methodology, I made new little study which I think you will find more methodologically sound.

Method

I went through all races in cat C starting from the strike of midnight between the 16th and the 17th of Aug 2020, working myself backwards until I had had a look at 100 eligible races. Again, a lot of races had to be discarded due to low attendance or due to a missing link on ZP to the Zwift rider profile page for the race in question.

This time I chose to look at the winner in comparison to the no 4 guy, the guy who didn’t quite make it to the podium. Did any of these riders, winners vs 1st losers, on average, seem to make less of an effort than the others? Effort here is defined as a higher workload in terms of HR distribution over the race. A rider who spends more time in higher HR zones than another rider is considered to have worked harder, made a higer effort.

What is to be expected here? Either we could argue that, all else equal, the winners would make more of an effort on average. If two physically equal riders compete (and they will be equal, on average, with large numbers), then the rider who makes the highest effort would win. 

Or we could argue that there should be no difference. Chance, tactics, random occurences, interference by other riders, and powerups may be what decides a race among equals. Everybody should be working roughly equally hard, at least at the top end of the race.

Either of the two scenarios above, or both, is to be considered the baseline, or the null hypothesis, as a scientist would say. If the actual results deviate from this, then it indicates that the null hypothesis isn’t true and that something strange is going on. 

What we don’t expect to see here is for the winner to make less effort than the no 4 guy, because that doesn’t make sense. Or, as I would like to argue, it indicates the presence of cruising, i.e. that some riders stay behind in a category, even though they would meet the requirements of a higher category, just to be able to keep winning. By staying within W/kg limits during races they have an advantage over riders who can only reach W/kg limits by giving it their all. The advantage lies in being able to drop people by having reserves and by not riding at VO2Max.

Anyway, I checked the HR distribution graphs of the winner and the no 4 rider in 100 races in cat C and made notes in a table. If the winner made less effort than the no 4, then the race got a ‘1’ in one column, the ‘Oh shit!’ column. If instead the no 4 rider made less effort than the winner OR if there was no clear difference between the respective HR chart, then the race got a ‘1’ in another column, the ‘As expected…’ column.

Results in Cat C

Out of 100 random, consecutive races in cat C, 61 ended up in the ‘Oh shit!’ column, i.e. the winner made less effort than the no 4 guy. Only 39 races showed a no 4 working harder than the winner or no difference between the two of them.

A Comparison

Before we come to any discussion of the results, a comparison with cat A was needed. If there is indeed cruising going on in cat C, then the same should not be true of cat A. Why? Because the hypothesis is that it is the upper performance limit of the categories B-D that creates the incentive to cruise, whereas in cat A there is no upper limit to performance. The harder you go, the better your chances of winning. There is no downside to going too hard as you don’t risk getting a DQ or an upgrade (unless you present superhuman Watts of course).

Scrounging up races in cat A proved to be significantly harder. Not only are there fewer cat A riders, although they are arguably more active on Zwift than the C guys. And in both the cat C study and the cat A study there had to be at least 4 valid participants (according to ZP) in order to do the comparison between the winner and the no 4 guy of course. So a lot of races had to be discarded for this very reason. 

Secondly, it is far more common among cat A riders to do a spindown or even to keep riding hard after a race as a prolongation of the race as a training session. And while finish times are not affected if you keep riding after the finish line, your HR distribution graph on Zwift.com is. This made comparisons difficult quite often and led to more discarding of races.

Results in Cat A

During the same time period of the 100 races in cat C, only 52 eligible cat A races were found. Of those only 25 races had a winner making less of an effort than the no 4 guy. 27 races showed no difference or a harder working no 4.

We should keep in mind here that there is actually some room for completely legit cruising in A. I have made no distinction between A and A+. Quite often a race is won by an A+ rider who doesn’t have to go flat out to win. Not only do you not go any harder than you can, you also go no harder than needed – if you are already in the lead, then there is no need to push. Still, over half of the races in cat A showed no such difference.

Conclusions

To me this is yet another piece of evidence showing the presence of cruising in Zwift – whether the cruisers are aware of it or not. And it does seem counter-intuitive that you should be at an advantage making less effort than other contenders. This happens because of the upper performance limit in cat B-D. 

You are not allowed to go too hard in cat B-D. It is not forbidden to be too strong though. So as long as you are too strong for your category but manage your performance as to stay within cat limits, then you are a favorite in the race. You don’t always win, but you will win more than your fair share, and you can keep winning indefinitely. ZwiftPower will not upgrade you.

This does not sit well with a sport in my opinion. We should move to a results-based category system, like in real-life sports. Be as strong as you can. Race as hard as you like. Win any race where you are the strongest. But if you keep getting great results in your category time and again over a season, then it’s time for you to get an upgrade. But not because you went too hard but because you did too well too often. 

Thus a sandbagger, going well over the current cat limits, will win legitimately but will get an upgrade soon enough into a category where he is no longer that superior and dominant, and you won’t have to face him anymore. And thus a cruiser can still cruise if he likes, i.e. he can still choose to not go too hard in a race, but he can no longer make less effort than you and still win over and over. If he does go for wins, then he will be upgraded, just like the sandbagger, and he will no longer suck your wheel in your races.

A Zwift with results-based categories is a healthier Zwift. And a more fun Zwift. Fun is Fast. And Fun is Fair!

Footnote

So there was a difference between cat C and cat A but was it just random or what is large enough to be statistically significant, i.e. so large that it is unlikely that it was caused by chance? 

We only had 52 races in cat A. Comparing the first 52 races in cat C with the entire sample of cat A with the Mann-Witney U-test, we get a p-value of 0.088. So it’s not statistically significant at a 5% confidence level (although at the 10% level). I will come back with a larger sample, e.g. 100 races in each category, as I am convinced that the difference will stand and will then be statistically significant.

Tagged :

Cruiser Sunday Studies – Part 1

As a follow-up to the last couple of posts about cruising and the weird effort limits the W/kg category system imposes on us, I decided to do a little pseudo-scientific study of the racing in Zwift.

Previously I have claimed that the W/kg system favors making less effort than competitors in a race, if the objective is to win. If you haven’t read the last couple of posts (you should read all of them), then you might ask yourself, how does that make sense? That couldn’t possibly be right, could it? All else equal, sports are won by people making more effort than their competitors, isn’t that so?

And the awful truth is that, yes, in all sports except Zwift this is indeed so. But Zwift is different, I have claimed, since it has a uniquely weird categorization that imposes an upper limit to your power output in categories B-D – regardless of your perceived effort, I might add. This would then mean, if I am right, that ideally, if you are set on winning races in Zwift, you would race in a category you are too strong for but still make sure to stay within the category’s upper W/kg limits. You would then not get disqualified by ZwiftPower but still be able to beat competitors in climbs, sprints, surges, what have you. In other words, you would likely win. (Unless you are up against several other guys like you, i.e. several other cruisers, whether they cruise the race intentionally or not.) And not only would you likely win, you could also repeat this indefinitely. You would keep winning over and over and still be allowed to stay in your category.

So let’s put these claims to the test. I took a random day (today, Sun Aug 17) and went through all the races from midnight to midnight to see if winners did indeed make less effort than the others.

The idea here is that the occurence of a winner making less of an effort compared to others becomes apparent by studying HR distribution graphs on the Zwift website. If e.g. someone wins a race spending most of it in HR Zone 3, yellow, and does so against a runner-up who spent most of the race in Zone 4, orange, and both are at or close to the W/kg limits of the category, then that indicates that the winner could go harder still, just like the runner-up. Only going harder might push the winner above limits and result in a DQ or even a category upgrade. So the winner wins by being the strongest and by making less effort.

Method

I went through over 80 races in those 24 hours and studied cat B. (We all feel we know cheating is abundant in cat D, right, so what about B?) Of those 80 races about half of them had to be discarded. I set a lower limit of at least 5 eligible participants in cat B (according to ZP) because I wanted to make sure there had been at least some kind of dynamics during the race. The far most common reason for a race to get discarded was indeed lack of participants. But there were also some age category races and some others special cases that did not lend themselves to a comparison between cat B riders.

For a race to qualify as having been won by someone making less effort than others I looked excludingly at the podium, even though I have claimed before that cruisers are over-represented not just among winners but the entire podium. This is because there are often more cruisers than one in a race, I have claimed. So we should actually look beyond the podium, but I had to simplify a bit.

So, anyway, the heuristics here was that if the winner made significantly less effort than either of the no 2 or no 3 guy, then the race would qualify as having been won through less effort

Doesn’t it distort things comparing one guy to two? Wouldn’t on average at least one other guy have made more effort than the winner just by random chance? Well, is that your experience from other sports in the categories below the top one? Also, you need to consider how the comparison was made. To qualify as less effort, it had to be significantly, visibly so. I looked at in what zone(s) most of the time in race was spent and if there was an obvious difference compared to the rest of the podium. 

It is, after all, rather conspicuous and intriguing if the winner sits mainly in Zone 3 if the no 3 guy races on the threshold, don’t you think? How would you explain that? All on the podium are at the top of the category but the winner is not the one having a near-death experience? (Keep in mind the upper W/kg limit here, which does not exist in other sports where categories are based on past results rather than, weirdly enough, past power outputs.) I would then say it clearly supports my claims.

It should be added that in cases where a rider did not use a HR monitor, I have counted that as less effort per default, regardless of whether it was the winner or any of the other two. So in one case there is a race where the winner, although he did seem to slouch around in Zone 3 mainly, was up against two others with no HR monitors and thus that particular race was not deemed as having been won by someone making less effort.

Results

Of 43 eligible races in cat B on this date, 30 were won by someone making less effort than either or both of the no 2 and no 3. In 13 races the difference in effort was not significant or the winner made more of an effort than the others.

Conclusions

If you want to win a Zwift race on ZwiftPower, the odds speak strongly in favor of making less effort than others in the front group. Don’t push yourself too hard! Relax. Cruise. And you will likely win.

Tagged :

A Clarification: Cruising Cat B

The other day I posted a reply to a thread on the Zwift forum. If you didn’t get the point of the previous blog post, the one about Ethics in Zwift, then maybe my reply can serve as a clarification. So I thought I would repost it here.

As usual, I don’t want to expose names. (Dig yourself if you must.) Names are not interesting, at all. Nothing of the problems with Zwift racing that I write about has anything to do with individual subscribers anyway. Rather, I cover a system that is falling apart because it was flawed to begin with. And it is this system that creates cheating by promoting it. And if this system creates a whole bunch of cheaters, it also creates ten times as many weird and unfair situations in races on a daily basis even though it can’t be classified as cheating.

Anyway, a guy in cat B despairs after a race and decides to seek advice from the forum. And I would guess this isn’t the first time he despairs. Nor the last. It was probably just the average race and thus a representative one. Our rider is on top of cat B in terms of his sustainable W/kg, but he is light and weighs in at only 67 kg. 

Now, the problem was that even though our rider put out an impressive average effort of 4.1 W/kg over the race, barely admissible by ZP standards, he still came in outside the top 10. On ZP. And so he asks the forum, what’s wrong with his race tactics? 

Many of us can sympathize with that because we have been in exactly the same situation. You fight your way up to what you think should be the top of your category only to find that you’re not there still. And after a while you begin to suspect that you never will be, and that you will sooner get moved to the next category than get a podium placing (the light rider’s curse). And you know what? You’re right. You’re are screwed. You were screwed even before you started racing in Zwift, and you only realized it now.

With that backdrop, here was my reply:

I’m not racing in B but then again B, C and D are all the same, while A is an entirely different beast. I think responder X leaves an important clue above here.

Like he says, most of those above you in the results list have at least a sliver of green in their NP bar, meaning they have higher variability in their power output than you. Like you say, meaning they can. Whereas you – I’m guessing here – are more in an all-out effort and not able to match them in e.g. small climbs since, like you say, you have little to spare. They on the other hand…

Plus you are already at a disadvantage being on the lighter side among the top X, since you need to push higher W/kg to stick with the heavies on flattish roads. And being light alone doesn’t benefit you as much as one might think even in a climb.

The next thing you should look at now is the other riders’ race profiles on the Zwift website, quickly accessed through the little green bar diagram symbols on the far left on the ZP race report. (ZP is down right now or I’d look myself.)

Take a look at their HR distribution diagrams. Are they different from yours? Do they have more time spent in Zone 3 or lower Zone 4 than you? If so, then there is your explanation to the disappointing result. If they are not working as hard as you, no wonder they have juice to spare at critical moments. And so you get dropped.

I point out two important things above:

1. At some point you as a lighter rider will have to go over W/kg cat limits to stay with a heavier guy who is onthe limit. (Sums up the entire race, doesn’t it?) So as a lighter rider you are basically already screwed. You can’t really both win and stay in cat (unless you race in A). The race favors a heavier rider, given equal W/kg capabilities. Any race does, except the rare race including a very long climb. Granted, at some point weight turns into overweight and body fat doesn’t help you race. But there is a sweetspot in cat B-D. And whatever it is (it’s dependent on other race participants’ weights), it’s higher than your 67 kg for sure since your weight is below the race average. Is this cycling physics? No, it’s Zwift race rules and just that. See below.

2. Given that you both respect W/kg cat limits but barely so, you will always be at a disadvantage against someone who is making less effort than you. Yes, that’s correct! Zwift actually favors cruising a race.

Sandbagging is not the most common form of cheating in Zwift. Cruising is. It’s just not as visible, unless you start digging in data on Zwift and ZP. How to win a race in cat B-D in Zwift and ZP is you get fitter so as to outgrow your cat but still stick around. You never pull, always draft. You always monitor your avg. W/kg as to not go over limits although you could. You leave a little room to spare in that average. And then in climbs or similar you bring down the hammer briefly. If you don’t do this, then someone else will. In basically any and every race. You need to get really lucky to sign up to a race with no cruisers in it.

Cruising as a form of cheating is real. Then on top of that there is a huge grey area where people aren’t exactly (or consciously) cheating but their levels of effort still differ significantly in a race. And who is to say how much you are supposed to suffer in a race? Shouldn’t you be allowed to race any way you like, it’s your body after all? And the answer is yes of course. But then also, should someone who doesn’t want to go too hard have the upper hand in a race? I don’t know. Occasionally maybe? But in every race?! Because that is what we have, a race system that will always favor riders who don’t want to go too hard. Zwift Velominati rule #1: STFU – Soften The F… Up, kinda.

It all boils down to the W/kg system in Zwift, as promoted by ZP, being utterly inappropriate as a race categorization. And there is nothing like it in any RL sport. It is unique and uniquely inappropriate. We can never get away from Watts and kg because both are needed for accurate and fair simulations on a smart trainer in cat A. But they won’t do as a way to split up riders into categories to make racing interesting for all.
What is needed is instead a race categorization based on past results, just like in US cycling, World Cup skiing etc etc, a proven concept. It works. And it would work for Zwift too and make racing more intuitive and interesting.

You enter a race and don’t feel like going too hard, you just wanted to participate for fun and fitness. Ok fine, but you don’t win. Agreed? Fair deal?

You enter another race and don’t feel like going too hard (except at crucial moments like small climbs) and it turns out you still outperform the other riders because they are weaker even though they go flat out. Ok fine, you win this time. Kudos to you for being so strong!

You enter yet another race and don’t feel like going too hard, and it turns out you still outperform the other riders. Not fine. Because now you have already been on the podium in many races in your category and it’s time you get moved up to the next category where you obviously and rightfully belong. And a results-based categorization does exactly that. It’s self-sanitizing.

You can’t put upper limits on people’s efforts in a single competition or race. (You have to do that between races.) “Go hard! But not too hard!” That goes against reason and the nature of sports. Time for a change.

At the time ZP was down. Since then I have had the opportunity to have a look at the HR distribution graphs of the riders in that race. Our rider looks like this:

Our rider’s graph looks a little odd in that so much time is spent in Zone 5. It turns out he has a max HR that goes at least 10 bpm higher than the average for his age. And so without adjusting HR zones, what looks like a Zone 5 effort is rather an upper Zone 4. But even so, as I suspected he is on the threshold most of the race. For a few seconds here and there he gets to coast a bit, but you know what it’s like doing 30/15’s. Those 15 seconds are not enough to bring your HR down. And thus we see less variability at the upper end. He can push it a little when he is forced to but not by much. There is little to spare.

Some of the guys higher up in the placings are working hard too but there are also some guys like this:

He finishes the race a few seconds ahead of our rider. With his higher variability of power output he probably sprinted the crap out of our rider and a few others. But how do you beat a guy like this one? You can’t! He will still have a whole pocket full of matches as you strike your last one.

These two riders are competing in the same category with the same artificial upper effort limit. They are not allowed to go any harder than 4.0 (+0.1) W/kg according to ZP. Who would you rather be hitting 4.0 if the task is to win the race? (That’s usually what races are about.)

This is not to say that a rider like ours above wouldn’t look the same if he was somewhere in the middle of a category in a proper and sound results-based race categorizaation. He would. And he still wouldn’t win. The difference, though, is that the fights for the podium in any such category would be hard, fair and equal for anyone except for those just passing through the category. Let them pass through. And let the real racing begin.

I say it again: 

You cannot put an upper limit to rider effort in a race.

It’s a nobrainer, it really is. Stop hugging an idiotic system just because it feels familiar to you. Once upon a time you were a neophyte zwifter and everything was new and unfamiliar. It’s time to go out in deep waters again. Knee-deep. I know, it’s scary. But you will be fine, I promise.

Tagged :

The Ethics of Zwift Racing

If you think about sports and games and their rule sets, you can always discern some sort of ethics and ideals in them. 

Take chess for example. There is nothing random about chess, except maybe picking sides. In theory, you could foresee just about everything in a chess game, as permutations or possibilities, and thus you have a fair chance to avoid disasters and to push your opponent into a disadvantageous position. The side who wins has done the calculating ahead better, and this is also the ethics of the game. The player who is the best at calculating and visualizing position deserves to win. If they didn’t deserve to win, then there would be some handicap rule that would kick in once a player becomes too dominant on the board. No, no holds barred. The smartest guy deserves to win, every time.

In football (either kind), the team that scores the most goals wins. They deserve to win because they scored the most goals. It makes them better, more ideal. Scoring more goals than the opponent is going to take both individual and team skill and efforts. The winning team was better at those skills and in those efforts, we say, and so they deserved to win. Sometimes, if it’s not our favorite team winning, we say that the opponents were just lucky or that they played unfairly or we blame it all on the referee. But all else equal, we’d say that the team that scored the most goals wins according to the rules and that they also deserved to win. The rules are aligned with our sense of ethics.

In cycling, real-life cycling, the rider who crosses the finish line first deserves to win, unless he doped or raced unfairly or dangerously. Why? Well… he was the fastest and umm… fast is good, isn’t it? Why was he the fastest? Because he was better? Better in what way? Well, he was either more talented or fitter, or he worked harder than the opponents. So talented and fit is good? Yes, that is good. The ideal rider is talented and fit. What about working hard then? Yes, that is also good. The ideal rider works hard in a race. If he is up against an equally talented and fit opponent, then he will win if he works harder and that is an ideal rider, one willing to dig deeper than the others regardless of the pain and misery it puts him in. And getting fit takes a lot of pain and misery to begin with, so someone did their homework. OK…

Talent is needed at the top level but talent alone won’t usually win a race, because a rider might be up against an equal talent, and then it all comes down to, all else equal, who dug the deepest. And we like that. We admire that. When people go through a lot of suffering without giving up to achieve a goal. A person like that deserves to win, we think.

Now, look at Zwift racing. What is the ethics there. We bring our tarmac values to the OLED screen, but if we consider the rule set of Zwift racing, what it actually implies, we arrive at different ideals, ones that conflict with out tarmac values.

There isn’t one single rule set in Zwift racing. There are at least two broad main versions and then some subsets that the organizer can choose between. We have the Zwift Proper rules and then the ZwiftPower rules, and they are not quite the same. 

The Zwift Proper rules say that the fastest rider deserves to win, period. They do not make any assumptions whatsoever about the rider and why he was the fastest, regardless of category. The Beta Crit City races are a little different and approach the ZwiftPower ethics, but they are experimental and a separate case.

The ZwiftPower rules for cat A is similar to Zwift Proper rules, but the rules set for cat B-D says that the fastest rider deserves to win too, but it does make some assumptions about the rider and stipulate some preconditions. More specifically, they say that the fastest rider up to a certain point deserves to win. You can’t be too fast. Sometimes you also can’t go too hard. Then you don’t deserve to win. You have to nail it just right, depending on your capabilities but in a way also regardless of your capabilities. 

The rider that pinpoints this ideal speed the best, which translates into pinpointing ideal estimates of physical dimensions (matching Watt, weight and height in some ideal combination, and add draft and powerups to that) is also the ideal rider, the one we should admire. 

The ZwiftPower rules do not, however, make any assumptions about subjective effort. If a rider barely wins the sprint in a race after digging horribly deep, but not so deep as to go over cat limits, whereas the runner-up cruised his way through the race, also respecting cat limits, then the winner deserves to win, but not because he worked harder than the other rider. He deserves to win because he optimized certain physical estimates slightly better, thus getting closer to the category ideal than the runner-up. Correspondingly, if the cruiser had won the sprint, then he would be the deserving winner.

So the winner was the smartest and therefore deserved to win, like the winner in chess who calculated the whole game better than the opponent? No, the rules don’t put a value on smarts. This optimizing of physics estimates could have happened for any reason – careful calculation or just dumb luck going flat out – it doesn’t matter. It’s not why you got it just right that makes you admirable and deserving, it’s simply that you did.

This is Zwift racing ethics, like it or not. It’s a quirky sport, isn’t it? 

As a side note, did you for one second think that someone sat at a drawing board and thought it all up just like this, the way it became? 

“Hey, I know, I’m going to design this cool new sport which is about making guesstimates about physical estimates like, you know, when you have to guess how many marbles are in a jar at a market, only you have to put the marbles in there yourself and they’re heavy-like and umm…”

If you, like me, don’t think that Zwift racing, as it turned out to be, was ever truly planned, then ask yourself this: Why exactly do you keep hugging the W/kg system? Why exactly would it be better than a results-based system that promotes doing your best rather than hitting a target W/kg? Is it, perhaps, just because you’re a hopeless conservative conformist in general, one who dislikes change just because change makes you feel uneasy, and who wouldn’t know ‘improvement’ even if it hit you in the face?

Tagged :

Why ZwiftPower Must Go

Think about W/kg categories for a second. It’s the bane of fair Zwift racing. We have talked at length about that already. But let’s think in broader terms. What could W/kg possibly be good for at all? How did Zwift come up with these categories to begin with? Let’s do some guess work.

Zwift gives us an understanding of our own personal physiology that surpasses even that of the most expensive sports watches. It’s all these numbers and zones and whatnot. Confusing at first but they tell us how we work on a bike and given some time we start to get it. What we can and cannot do, what we might be able to improve, how to approach certain types of efforts and challenges.

At the center of all this sports science is our functional threshold, arguably the most important of all the numbers. We arrive at our functional threshold power through an FTP test or ramp test in Zwift or sometimes by just going hard yet somewhat consistent in a race.

What do we need the FTP or the W/kg for? It will tell us our max sustainable effort for an hour of work on the bike, which incidentally happens to be a very common time span in the activities on offer in Zwift on a daily basis and for good reason. It will also help us pick a suitable group ride in Zwift. 

Often the organizers of group rides will be quite specific. Such and such a ride will aim for an average of, say, 1.8-2.0 W/kg, given the ride leaders weight of so and so many kg. And then, since you know your FTP and your W/kg, you can quite easily decide if the ride is for you or not. You will have at least a rough idea of how the ride will feel in your body, whether you can cope and whether the ride fits into your training regimen if you have one.

These things were at the core of Zwift early on. This was likely what Zwift had in mind when introducing the categories. Racing was underdeveloped but caught on more and more as time went by because… well, racing is fun! The community wanted it, more than Zwift could foresee. And Zwift provided the means to race but did not meddle too much with how races were organized. They left that to the community. ZwiftPower did the meddling instead.

Have you ever participated in a Zwift race with non-standard race categories? One common example would be masters/veteran races with age interval categories. The organizer uses the A-D categories for convenience but the meaning of those categories is not the standard W/kg one. And the Zwift race mechanics are crude and flexible enough to let you do that. It works just fine. You could organize a race where cat A was male riders on US virtual bikes and cat D female riders on European bikes. Or rider length-based categories. It’s up to your imagination, more or less. At least there are no technical limits to what you can write in your race presentation about what the categories are supposed to mean.

In other words, ZwiftPower could have worked for any type of race category system. They were never tied to W/kg. And, in fact, an embryo of something different can be found on the far right of the ZwiftPower race reports or rider profiles, a kind of rank score that could have been developed further into a results-based categorization.

Moving towards a results-based race category system would have required ZwiftPower to get the clubs and other race organizers onboard. Not necessarily an easy thing. But it would have been possible. And then we wouldn’t have ended up in the W/kg mess we are in right now. 

ZwiftPower is as much a culprit as Zwift, I dare say even more so, when it comes to unintentionally promoting the far most common forms of cheating in Zwift – sandbagging and cruising. Then they set out to chase down the cheaters they have created themselves by building racing on an unsound foundation, through ever more complicated means of catching sandbaggers. Even Zwift have started to help out with that lately. But the cruisers are untouched so far. You can cruise all you want. And I argue you should. 

This is all so backwards if you think about it. The road to hell is paved with good intentions, they say. And Zwift racing went straight to hell, I say, as an addendum to that. You cannot have an influential third party working against reason in a platform of yours. ZwiftPower must go. Well played, Zwift, and I mean it. This is potentially a new beginning. Not a day too soon.

This sounds very harsh, I know. But we just have to kill a few darlings now. The subscribers will benefit in the long run.

Tagged :

Are the World Tour Pros Cheating in Other Ways in the Virtual Tour de France?

I concluded in the previous post, on cruising in the Virtual Tour de France, that Zwift races are necessarily brutal in their very nature.

That said, the men’s stage 5 up the Ven-Top route, the virtual Mont Ventoux, was indeed brutal. Perhaps a little too brutal… 

I am not saying that there necessarily had to be something fishy about the front trio, but those W/kg numbers they produced were very high. 

The break-way effort that lasted all the way to the finish line for all three of them (althought they got separated amongst themselves towards the end) didn’t last a full hour. Let’s keep that in mind. 

Rather, the attack came 18 min into the race and they got to the finish line in about 45 min. That’s a 27 min breakaway. Roughly. At least it went on for more than 20 min. Let’s keep that in mind too. 

And the pace of the break-away trio never settled to match that of the chasing peloton since the three of them were duking it out amongst themselves all the way to the finish. Let’s also note here that a large part of the break-away was spent pushing pedals at 7+ W/kg. Impressive!

Now over to something completely different. I don’t know if you have thought about this, especially now with the Olympics being postponed, but aren’t the track and fields world records coming in more rarely these days? How come?

Sports science and medicine have an explanation. Top athletes in sports that have you push Watts, whether cycling or running or skiing, and whether sprinting or going long distances, are close or even at the physiological limits of the human body. That’s the official explanation. Hence we are not to expect the 100m dash world record to be beaten easily, and if it is, then it won’t be by much.

I’m sure you have seen a power curve. If you haven’t, then here is one I st0led from the interwebz:

It’s someone’s curve, I hope they don’t mind. They look like this. A little different from person to person, some individual weaknesses and strenghts, but you will always see this downward-sloping curve with a bit of a hockey stick tendency. 

On the Y-axis there is the Watt output of the rider. On the X-axis is a logarithmic time scale. The curve is like a scaling FTP report. Look at any time frame and you can find what is the maximum Watt that rider can produce over that time frame. The Watt for a 1h time frame is what we normally call a person’s FTP. 

In Zwift the 5 min power is also very important, and you can find that too. On the far left is the peak power when sprinting, and it drops off quite fast over time. And if you remember, the way to calculate your FTP is by doing a 20 min max power test in e.g. Zwift and then multiply that number by 0.95 to get the max sustainable 1 hr power. It doesn’t always hold true, though, but at any rate your 1 hr max power will be lower than your 20 min max power by some factor. 

Now, by looking at world records in sports such as track and fields or cycling (mainly track cycling), what sports science says is that you could infer an ideal power curve for mankind. The human body can only move so fast over a 100m dash. Likewise, the human body can only go so fast over the 1 hr track cycling world record attempts. I.e. unless we alter our genetics. Are top athletes at this limit already? Like stated above, sports science claims if they aren’t, then they are at least very close. Which could explain why e.g. Pantani’s record up Alpe d’Huez still stands since 1997 – a both fantastic and terrible year for cycling.

If you want to read more about what science says about cycling on the topic of max performance or perhaps about the physics of cycling in general, the physics Zwift’s computerized model most likely is based on, then I recommend this e-book. You can get it from Amazon or similar. It’s a really interesting read. A wee bit of maths in it – you can’t get away from that in physics – but explained in the simplest possible of terms.

Anyway, sports science claim that the upper human limit for the 1 hr FTP is 6.4 W/kg for men and 5.7 W/kg for women. It seems to check out. It does for track and fields. And no 1 hr world record attempt in cycling has ever crossed that line. Check for yourself! Anything above those numbers and you could fairly become suspicions. It would warrant a closer look. And an explanation of some kind.

All WT pros are genetic freaks. You cannot get to that level with determination alone, or you would run up against someone with the same determination as you but a better genetic disposition and you’d lose. WT pros have both the determination and the genetic underpinnings for performance at that level. But couldn’t it be possible that there are freak-of-freaks too? Guys that stand out even among the best due to some extremely unique genetics, one in a billion? It’s not impossible. It’s just much more likely there is another explanation to the results of such an individual, one such guy that really stands out. Or three.

Let’s return to the break-away in stage 5 in VTdF. I’m not sure what the ideal power curve would have to say about a 27 min effort, but it should at least not be higher than a 20 min effort. Turning the standard calculation to arrive at the 1 hr FTP from a 20 min test on its head, we could say that no rider should push higher Watts than 6.4 / 0.95 = 6.7 W/kg during a 20 min effort. I don’t have data on the average W/kg output for the trio during those 27 min. I sure would like to see it though. Maybe WADA and ZADA should take a look too. Just to be sure.

There are many ways to cheat in Zwift, like we have discussed in an earlier post. But there is also something to be said for Zwift racing. It brings visibility! You can easily hide EPO shots in coke cans in the fridge (in reference to a certain notorious rider in the past), but you cannot easily hide what EPO or blood bags or whatever bring you. Not in Zwift. 

Well, not unless you cruise WT Zwift races…

UPDATE: During the later broadcast of stage 6 it was mentioned that the winner’s W/kg average during the last 30 min of stage 5 was 6.592. 

Tagged :