An old-school image created with modern tools. Click to embiggen on Flickr.

# Matthew Conroy's blog

## curves from lines

Thu, 2015-10-15 10:31## book sale

Sun, 2015-09-13 16:55The Friends of the Seattle Public Library had another giant book sale. I got these books:

- Harmonograph: A Visual Guide to the Mathematics of Music, by Anthony Ashton
- Elements of Applied Stochastic Processes, by U. Narayan Bhat
- Mathematical Modeling Techniques, by Rutherford Aris
- No More Secondhand Art, by Peter London
- Existentialism is a Humanism, by Jean-Paul Sartre
- Futurism, by Caroline Tisdall and Angelo Bozzolla
- Artists on Art, by Robert Goldwater and Marco Treves
- Book of Hours, by George A. Walker
- The Fractal Geometry of Nature, by Benoit B. Mandelbrot
- Henry Moore: My Ideas, Inspiration and Life as an Artist, by Henry Moore and John Hedgecoe
- From El Greco to Pollock: Early and Late Works by European and American Artists, by Gertrude Rosenthal

Good stuff.

## finishing Dickens

Tue, 2015-09-08 20:42I just finished reading Nicholas Nickleby, and so now I've read all of the novels of Charles Dickens.

NN was pretty good. A reasonably fast-moving story, some decent characters. However, it lacked the character eccentricity of some other novels, and the overarching story is not really all that interesting. And, it does use the "magic-inheritance-solves-all-problems" device so often used by Dickens.

## long pose figure drawing

Sat, 2015-08-29 14:32## long pose figure drawing

Tue, 2015-08-18 12:52## Tour de France and multidimensional scaling

Thu, 2015-07-30 19:09I've had the following idea for a long time and finally took the time to try it out.

In the Tour de France, every rider receives a rank (i.e., placing) on each stage. In the 2015 tour, there were 20 stages (exclusing the team time trial which works differently). Each rider who finishes the tour (i.e., finish all stages) has then a 20-dimensional vector of their placings. For example, Rohan Dennis had the vector

<1,88,122,88,107,156,183,142,139,101,77,106,101,108,106,98,85,65,80,151>.

We can thus embed the set of finishing riders into a 20-dimensional space. What is this embedded set "like"?

Twenty dimensions is too many to hope to have an easy way to, say, visualize, the set.

Fortunately, there is an amusing technique called multidimensional scaling (mds). The idea of mds is to take a high dimensional data set and model it in a space of lower dimension. The key idea is to maintain, as much as possible, some measure of proximity: if two points are "close" in the original space, we want them to be "close" in the lower dimensional space. How we measure close is up to us, of course.

I applied this technique to the rider data and made a 2-d model for easy viewing. The goodness of fit (GOF) is not great, about 0.548, but the visualization is nice (click on it to go to Flickr for a somewhat larger version, or see this pdf):

Some things we can see in this plot:

- The top GC riders and climbers all cluster in the upper right, with Froome (the winner) in the most extreme position.
- In the bottom left we find all the sprinters (in particular the cluster of Greipel, Kristoff, Cavendish, Degenkolb, Coquard, and Boasson Hagen).
- I don't know what to say about the cluster of nine riders above the main sprinter cluster.
- Most noticeably, we see Peter Sagan on his own at the bottom, right of center. Sagan achieved high placings on many stages that pure sprinters could not, and fared very well on the sprint stages, too. On the other hand, he didn't get high places on some mountain stages.

I pulled the stage data off the web and processed the files into a csv file with names and placings with a bit of Perl. I then processed the csv file with R like this:

ranks <- read.csv(file.choose(new=FALSE),head=FALSE,sep=",")

ranksplusnames <- read.csv(file.choose(new=FALSE),head=TRUE,sep=",")

myfun <- function(x) {return(x^0.3)}

ll <- cmdscale(dist((apply(ranks,MARGIN=c(1,2),myfun)),method="minkowski",p=2),k=2,eig=FALSE)

textplot(ll[,1],ll[,2],gsub("(^[^\\s]+\\s{1})","",ranksplusnames$rider,perl=TRUE),xlim=c(-4,7),cex=0.6)

I actually used two different csv files, one with nothing but ranks, and the other with names and headers. I'm sure you could get by with one.

I defined myfun to "adjust" the rankings. This is based on the idea that nobody really cares whether they finish 120 versus 140, but care a lot whether than finish 3rd versus 23rd. I thought applying a "concave down" function to the rankings should make comparisons more "realistic" (other functions would give similar results, including, say 1/x, as it groups larger rankings closer together). However, I was surprised to find that this did not seem to make a great difference. Hmmm.

The cmdscale command does the real work. The first thing is to calculate a matrix of distances between each rider (using the dist function). There are lots of options on calculating the distance; here, I used Minkowksi distance with p=2 (i.e., plain-old-euclidean distance). This distances array gets handed to cmdscale which finds a set of points in the specified dimensional space (k=2) that best captures this distance information. With eig=FALSE, you just get a set of points; setting this to true gives more information.

Then this set of points is given to textplot (which is part of the wordcloud package) which plots the points using the names of the riders. Textplot makes sure that the names do not overlap, which is useful here, since the upper left is so crowded. There is a bit of grep in there to extract just the last name of each rider, which also helps with the crowding.