Blogs

Tour de France and multidimensional scaling

Thu, 2015-07-30 19:09

I've had the following idea for a long time and finally took the time to try it out.

In the Tour de France, every rider receives a rank (i.e., placing) on each stage. In the 2015 tour, there were 20 stages (exclusing the team time trial which works differently). Each rider who finishes the tour (i.e., finish all stages) has then a 20-dimensional vector of their placings. For example, Rohan Dennis had the vector

<1,88,122,88,107,156,183,142,139,101,77,106,101,108,106,98,85,65,80,151>.

We can thus embed the set of finishing riders into a 20-dimensional space. What is this embedded set "like"?
Twenty dimensions is too many to hope to have an easy way to, say, visualize, the set.
Fortunately, there is an amusing technique called multidimensional scaling (mds). The idea of mds is to take a high dimensional data set and model it in a space of lower dimension. The key idea is to maintain, as much as possible, some measure of proximity: if two points are "close" in the original space, we want them to be "close" in the lower dimensional space. How we measure close is up to us, of course.

I applied this technique to the rider data and made a 2-d model for easy viewing. The goodness of fit (GOF) is not great, about 0.548, but the visualization is nice (click on it to go to Flickr for a somewhat larger version, of see this pdf):

2015 Tour de France MDS

Some things we can see in this plot:

  • The top GC riders and climbers all cluster in the upper right, with Froome (the winner) in the most extreme position.
  • In the bottom left we find all the sprinters (in particular the cluster of Greipel, Kristoff, Cavendish, Degenkolb, Coquard, and Boasson Hagen).
  • I don't know what to say about the cluster of nine riders above the main sprinter cluster.
  • Most noticeably, we see Peter Sagan on his own at the bottom, right of center. Sagan achieved high placings on many stages that pure sprinters could not, and fared very well on the spring stages, too. On the other hand, he didn't get high places on some mountain stages.

I pulled the stage data off the web and processed the files into a csv file with names and placings with a bit of Perl. I then processed the csv file with R like this:

ranks <- read.csv(file.choose(new=FALSE),head=FALSE,sep=",")
ranksplusnames <- read.csv(file.choose(new=FALSE),head=TRUE,sep=",")
myfun <- function(x) {return(x^0.3)}
ll <- cmdscale(dist((apply(ranks,MARGIN=c(1,2),myfun)),method="minkowski",p=2),k=2,eig=FALSE)
textplot(ll[,1],ll[,2],gsub("(^[^\\s]+\\s{1})","",ranksplusnames$rider,perl=TRUE),xlim=c(-4,7),cex=0.6)

I actually used two different csv files, one with nothing but ranks, and the other with names and headers. I'm sure you could get by with one.

I defined myfun to "adjust" the rankings. This is based on the idea that nobody really cares whether they finish 120 versus 140, but care a lot whether than finish 3rd versus 23rd. I thought applying a "concave down" function to the rankings should make comparisons more "realistic" (other functions would give similar results, including, say 1/x, as it groups larger rankings closer together). However, I was suprised to find that this did not seem to make a great difference. Hmmm.

The cmdscale command does the real work. The first thing is to calculate a matrix of distances between each rider (using the dist function). There are lots of options on calculating the distance; here, I used Minkowksi distance with p=2 (i.e., plain-old-euclidean distance). This distances array gets handed to cmdscale which finds a set of points in the specified dimensional space (k=2) that best captures this distance information. With eig=FALSE, you just get a set of points; setting this to true gives more information.

Then this set of points is given to textplot (which is part of the wordcloud package) which plots the points using the names of the riders. Textplot makes sure that the names do not overlap, which is useful here, since the upper left is so crowded. There is a bit of grep in there to extract just the last name of each rider, which also helps with the crowding.

figure drawing in pencil

Sat, 2015-07-25 14:21

Working on my figure drawing precision. Pencil (2B) for a change from charcoal.

short pose figure drawing

Laurell anagram

Mon, 2015-06-15 16:09

distance function video 33

Fri, 2015-06-12 22:21

Another distance function video. Made with processing.org.

Click to embiggen and get more information on Vimeo.

distance function video 33 from Matthew Conroy on Vimeo.

gravity 34-1

Fri, 2015-06-05 22:23

Another gravity image. Click to embiggen.

gravity 34-1

gravity 32-1

Tue, 2015-05-05 18:08

Another gravity image. Click to embiggen.

gravity 32-1

moving circles animation

Thu, 2015-03-12 12:23

curvature animation

Sun, 2015-03-08 21:18

The "circles of curvature" of the curve with polar equation r = sin θ -0.5cos 3 θ.

RPM 2015

Tue, 2015-03-03 12:36

February is over, and I made six tracks, a whisker under 31 minutes, of sound for RPM 2015.

You can listen to them on Soundcloud.

You can also download the whole thing in one zip file, so you can add it to your music library and listen over and over again.

Here are descriptions of the six tracks.

1. Chimes

There is a series of 18 tones here, each a small-denominator rational multiple of the last, that plays in a cycle with each tone given differing lengths so as to cause a varieties of harmonies.

This was made with perl-script-generated Csound score. I used aleatoric vibrato, which I think works pretty well, and was certainly easy to implement.

2. Eleanor

A piece of Perl code modified a recording of one of Eleanor Roosevelt's speeches. A stretching kind of thing.

3. Simple

A bunch of oscillators interacting, plus a bunch of filters interacting with the output of the oscillators to make a nice swimming kind of feeling with occasional sharper concentrations of frequencies. Just Csound.

4. HDHDH

A large bank of rather simple oscillator start at arbitrary pitches, converge to some chord at about mid-track, then converge on a single tone at the end. Lots of beat frequencies. Csound.

5. Modular

Noise. I have a game I play where I try to write the simplest piece of code that generates (what I think is) interesting sound. This one was made with a few lines of perl code, all modular arithmetic and comparisons, no trig or random number generators. I like a lot of the timbres.

6. Choose

Choose between random nouns. Variable delays, ring modulation, plus filters. I think it amusing that the manipulation of the synthetic voice causes the synthetic voice ultimately to sound more natural.

more figure drawing

Tue, 2015-02-03 00:15

Back on the drawing horse.

Charcoal, short pose. Click to embiggen.

short pose figure drawing