Tour de France and multidimensional scaling

Thu, 2015-07-30 19:09

I've had the following idea for a long time and finally took the time to try it out.

In the Tour de France, every rider receives a rank (i.e., placing) on each stage. In the 2015 tour, there were 20 stages (exclusing the team time trial which works differently). Each rider who finishes the tour (i.e., finish all stages) has then a 20-dimensional vector of their placings. For example, Rohan Dennis had the vector


We can thus embed the set of finishing riders into a 20-dimensional space. What is this embedded set "like"?
Twenty dimensions is too many to hope to have an easy way to, say, visualize, the set.
Fortunately, there is an amusing technique called multidimensional scaling (mds). The idea of mds is to take a high dimensional data set and model it in a space of lower dimension. The key idea is to maintain, as much as possible, some measure of proximity: if two points are "close" in the original space, we want them to be "close" in the lower dimensional space. How we measure close is up to us, of course.

I applied this technique to the rider data and made a 2-d model for easy viewing. The goodness of fit (GOF) is not great, about 0.548, but the visualization is nice (click on it to go to Flickr for a somewhat larger version, or see this pdf):

2015 Tour de France MDS

Some things we can see in this plot:

  • The top GC riders and climbers all cluster in the upper right, with Froome (the winner) in the most extreme position.
  • In the bottom left we find all the sprinters (in particular the cluster of Greipel, Kristoff, Cavendish, Degenkolb, Coquard, and Boasson Hagen).
  • I don't know what to say about the cluster of nine riders above the main sprinter cluster.
  • Most noticeably, we see Peter Sagan on his own at the bottom, right of center. Sagan achieved high placings on many stages that pure sprinters could not, and fared very well on the sprint stages, too. On the other hand, he didn't get high places on some mountain stages.

I pulled the stage data off the web and processed the files into a csv file with names and placings with a bit of Perl. I then processed the csv file with R like this:

ranks <- read.csv(file.choose(new=FALSE),head=FALSE,sep=",")
ranksplusnames <- read.csv(file.choose(new=FALSE),head=TRUE,sep=",")
myfun <- function(x) {return(x^0.3)}
ll <- cmdscale(dist((apply(ranks,MARGIN=c(1,2),myfun)),method="minkowski",p=2),k=2,eig=FALSE)

I actually used two different csv files, one with nothing but ranks, and the other with names and headers. I'm sure you could get by with one.

I defined myfun to "adjust" the rankings. This is based on the idea that nobody really cares whether they finish 120 versus 140, but care a lot whether than finish 3rd versus 23rd. I thought applying a "concave down" function to the rankings should make comparisons more "realistic" (other functions would give similar results, including, say 1/x, as it groups larger rankings closer together). However, I was surprised to find that this did not seem to make a great difference. Hmmm.

The cmdscale command does the real work. The first thing is to calculate a matrix of distances between each rider (using the dist function). There are lots of options on calculating the distance; here, I used Minkowksi distance with p=2 (i.e., plain-old-euclidean distance). This distances array gets handed to cmdscale which finds a set of points in the specified dimensional space (k=2) that best captures this distance information. With eig=FALSE, you just get a set of points; setting this to true gives more information.

Then this set of points is given to textplot (which is part of the wordcloud package) which plots the points using the names of the riders. Textplot makes sure that the names do not overlap, which is useful here, since the upper left is so crowded. There is a bit of grep in there to extract just the last name of each rider, which also helps with the crowding.

riding, 2012

Mon, 2012-12-31 18:14

2012 was a good year for riding. The highlight has to be
the completion of my first 200 km brevet. I am a randonneur.

I rode a little less, in kilometers, than 2011, but my average ride was a little longer: 100 rides, 7208.86 km.

For 2013, I plan to do at least two 200 km brevets, and to do one before June. In fact, I plan to do
the on in March run by SIR. That would really kick my year off well. There are longer brevets (300,400,600, even 1200!) but I think I should do a bunch more 200s before thinking about trying a 300.

Here is an animated gif of histograms of my rides over the last few years. Geek out!

broken tooth

Mon, 2012-03-12 09:30

After a bit less than three years and around 22,000 km, I broke a tooth on my triplizer middle chainring. I've already replaced it with another triplizer.

triplizer middle chainring with broken tooth

updated cycling data

Mon, 2011-06-06 23:29

I have updated my cycling data page with 2011 Giro info and other things.

The 2011 Giro was advertised as the hardest one in recent memory. But, the data doesn't suggest that it was particularly hard, with one exception. The lantern rouge time is higher than any other Giro going back to at least 2001, but just barely. On the other hand, the percentage of riders finishing is actually on the high side, and it was the fastest Giro I have data for.

fenders back on!

Fri, 2010-09-17 21:51

Arghh, it's been a very rainy September here in Seattle, so I'm bowing to the weather and putting the fenders back on a few weeks earlier than I usually do.

The last Husky Cycling ride of the "summer" is tomorrow; next week starts the fall trying-to-attract-more-members rides.

In other news, did I tell you I got a cable release for my camera so I can do bulb shots?

fenders back on

tour de france stats

Tue, 2010-07-27 10:45

I've been keeping track of certain statistics in the grand tours for a while now.

I just updated the table with the 2010 results.

A few observations:

  • The correlation between prologue rank and final gc rank was lower than any year I have
    calculated it for, with the exception of 1997. This might be a result of the limited time trials and/or the rather intense mountain stages.
  • The percentage finishing, 87%, was the same as last year, and ties the record among the
    years I've calculated. I am a bit surprised at this, since it seemed like there was a particularly large number of crashes. Perhaps there was less illness?
  • The winning margin and the 10th place final gc gap were both rather small. Does this indicate an "easy" tour, or a hotly contested tour?
  • Petacchi won the green jersey, with the lowest GC ranking since at least 1994.
  • Charteau's KOM win makes three years in a row with the KOM winner not winning a stage.

3000 km

Sun, 2010-05-30 00:13

Having a great cycling year so far. Just past 3000 km ytd today (5/29/10). This is the earliest ever (other dates include 6/16/09, 8/7/08, 7/3/07). Plus, Thursday I did 147.5 km - my longest ride since the 1980s, so that's a good thing.

ride 542 Mt. Baker Hill Climb ride report

Wed, 2009-09-16 23:05

On Sunday (September 13), I rode the Mt. Baker Hill Climb (Ride 542) with my friend Les and about 800 other people. It was a good time.

On Saturday, Les, his wife, his daughter, and me and Jenni drove up to Glacier (Washington) in the afternoon. Les and his wife have a friend who owns a "cabin" (i.e., house) in Glacier, a short ride's distance from the start line. On the way to the cabin, we stopped for dinner at Il Caffe in Deming. I was shocked at the goodness of this little restaurant. Out in the middle of nowhere, on route 542, it had a nicely sophisticated menu and was a happy relief for me, as I was expecting the best we'd do for food was a Denny's type of place. I recommend Il Caffe to anyone heading to this area.

The cabin was great, with plenty of room for us. I didn't sleep very well, though: I was nervous about the climb. I probably got 5-7 hours of sleep; not bad, but I prefer 8 to 10.

In the morning, I got up at 6 and took a pill I have to take 30 minutes before breakfast, and then snoozed for half an hour, finally rising for good at 6:30. I breakfasted on my usual two packets of oatmeal, scoop of whey powder, walnuts and a banana. Then it was time to get dressed and go. Les and I were both wearing our Husky Cycling team kits (lots of purple). It was pretty chilly riding to the start, probably in the low 50s. I was wearing shorts and a jersey, to which I added arm warmers and a thin vest. I took the vest off just before the start, but kept the arm warmers on all the way to the top, though I pushed them down to my wrists when it got a bit warm a few kms from the top.

The ride to the start was pleasant. We left at 7:15. Les's start time was 8:00, but needed to be at the start area by 7:45. It took 15 or 20 minutes to get to the start area in what seemed to be "downtown" Glacier: a few buildings on either side of route 542.

There were many, many cyclists at the start when we arrived, and Les soon took his place for the pre-ride announcements (which primarily were about letting us know that the road was not really closed, due to the inability to control the many little forest roads). Les's group (the "recreational" group) then moved a quarter mile up the road (in the opposite direction of the ride) to the official start line, and then they were off.

Fifteen minutes later, my group ("recreational fast") repeated the drill: pre-ride announcement, move to start line, go. The ride started pretty fast, with a short downhill out of the start area, but soon there was an easy climb for a few km that slowed everyone down nicely. This was followed by a fairly screaming downhill, at which I must have hit my maximum speed for the day (58 km/h). I found that I was riding faster on the downhills than many, but it was hard to stay on anyone's wheel, so I was working a little harder than I probably should have been.

Soon enough, we got to what's known as Powerhouse Hill (or just "the Powerhouse"), an early bit of climbing which was surprisingly steep and long: the course profile says 3.2 km at 6.6%. This knocked me down a few pegs, and I lowered my effort a bit to make sure I didn't blow up and just peter out.

I frankly don't remember much from the top of the Powerhouse to the DOT station about 22km in, where the real climb starts. I guess I must have been moving pretty well if I can't remember it. I do know that I covered the first 20 km in 50 minutes, which I was quite happy with. The whole ride is 40 km, and I figured if I could do the first 20km in 1 hour, and then the second in 1.5 hours, I'd finish in 2.5 hours which was my rough target time. (The "fast recreational" category was for those who expected times between 1:45 and 2:45, so I really wanted to do under 2:45, and figured I'd shoot for 2:30 to be on the safe side of that.)

Round about the DOT station, I started getting into a rhythm; every five minutes I'd get out of the saddle and do a quick stretch and drink a swig of water. Every 20 minutes I took a shot from my homemade gel in my gel flask: the "gel" is actually about 45% brown rice syrup, 45% honey, 10% molasses, with some water and a tiny bit of vanilla flavoring. I've been using this on longer rides recently, and it seems to work well. This is all I ate on the ride and it did the job.

I mostly just plugged away from the DOT on. I was surprised that my speed was so low, though: I was right around 11.7 to 12 km/h for a long chunk of the climb. Since the grade is listed on the profile as 5.7%, I figured I'd be about to maintain more like 14 or 15 km/h. I'm not sure what the problem was: too hard at the base? fear of blowing up? altitude? In any case, I maintained that speed really constantly for a long way, until about 3 km from the finish, when I knew I was going to make it, and make it with a better time than I'd expected. The last 1.5 km or so was terrific: I felt really good, and did a lot out of the saddle, passing a bunch of people (though they were mostly rec or "summit" riders who looked like they were having a hard time of it, but still...). The views near the finish are amazing, and there were people cheering riders on for the last few hundred meters, which was great.

My official time was 2:19:18. All things considered, I think I could shoot for 2:10 next time. I think
I may have ridden too cautiously, considering how good I felt near the finish.

It was nice to not be passed by any of the "competitive" riders, who started finishing not long after I did (they started an hour after I did). If I had ridden in the competitive wave, and ridden the same time, I would have finished dead last, but only just.

Les finished a while after I did. His time was 3:20:40, but he had his saddle pain problem which required him to take a break on the climb. I talked to Kent (who also rides with the Husky team) at
the summit: he finished in an amazing 1:41:40. Ben, also from the team, road a 1:32:56 (his father road, too: 1:45:19!). So many fast people!

I'm a little surprised that I'm so slow. I worked really hard this year, not specifically for this climb, but in general riding, and I think I'm stronger than I have ever been, or at least stronger than I've been in at least ten years. I think I must be limited by my weird aortic valve, and, perhaps more significantly, by my cardiac medications, which keep my heart rate down. I think this must be limiting my aerobic capacity. But, I guess I just have to work with what I've got, and I certainly had a good time, and felt very happy with my result.

I road my Independent Fabrication Club Racer. I have a triple crank, with a very small 24 tooth
inner chainring, and I was on that exclusively from the DOT station on. I managed to spin the whole
way, cadence around 90 rpm, often higher, except for a few attempts at acceleration near the top. (Thinking ahead to future climbs, I could lighten my bike a bit: I could definitely get lighter wheels, and I carry a lot in my saddle bag, and I could work on losing five pounds or so of body weight: it might be interesting to see the effect of such improvements.)

Instead of riding back down, Jenni, Les' wife and daughter drove up and picked us up. Les had difficulty on long descents due to some issues with his wrists and braking. I'd like to descend some time: it surely looks like a fun one.

I'm hesitant to say I want to do it next year. Except for trying to beat my time, I'm not sure there is much attraction there for me. I mostly feel like it's a ride I've checked off, and although the view from the summit is fantastic, a lot of the climb is on a road with dense forest on both sides, so there is little view, and there aren't many switchbacks to make things interesting. I'll have to see how I feel next summer.