My favorite. This cop is in shock. Or is it awe?

I just discovered that the version of mpg123 on the latest Redhat, which is what I use to decode mpegs into .wav, is actually not mpg123 but a link to mpg321, which silently ignores the options i give it telling to downsample by 2 and convert to mono. So all the pfiles (and hence the .htk files) are twice as big as they should be.
The annoying thing is that I'm not quite sure whether the anchor nets were trained on these files, or on files that were correctly downsampled. It looks OK: the nets are dated July 31 2002, which means they were trained at NEC. I looked there, and those pfiles (dated from that July as well) are OK, ie. half as small as they ones I have now on blush.
I implemented a rank-agreement score to compare the ranking-by-audio-similarity to the ranking-by-AOTM-cooccurrence. Say we have N item, and two rankings of those items, where a ranking is just a permutation of the items. One of the rankings, say A, is the reference ranking, and the other, say B, is the ranking to be evaluated. For example, for each artist X, A is a list of the artists in decreasing order of conditional co-occurrence probability given artist X. B is the artists ordered by similarity to X under whatever metric we're testing. Note that B doesn't have to a complete ordering, we could take the top 10 hits or something.
The rank-agreement score R is:
The weights are calculated according to the exponential
w = exp(-(log(2)/halflife)*(0:(length(A)-1)));
I used halflife=20 for now.
The optimal value of this metric is when A and B agree completely, i.e. are equal, so then r=1:N. For 414 artists and halflife=20, then optimal_R=29.4. So for the evaluation, I find the rank-agreement score R conditioned on each of the 414 artists, and take opt_R/R for each, and average. So perfection is 1.
To do a statistical significance test (I hope I passed my midterm), we need to know what this score will do under a null hypothesis that the ranking to be evaluated (B) is a random permutation. So I created 10,000 random permutations r of 1:414 and computed w'*r for each. The histogram of that, with the mean in red, is shown here. It looks roughly normal (but a little right-tailed), so I use a basic one-sided significance test with the Normal distribution (probly should use t-test since both mean and variance are unknown, but close enough for now.) The 95% significance point for this test is .1590, but the mean score under the ALA-based SIM metric that I'm using now was .143. Way below 95% significance. In fact, the p-value is about .54, which is completely insignificant.
I also tried normalizing the conditional probability by the prior, to adjust for popularity effects. The test also failed, with p-value .52, even worse than the non-normalized case.
Next:
On this day of protest in New York City, I'd like to point out an alternative for those who didn't march, either out of support for the aims of the war, or ambivalence, or you just couldn't get here. The problems with our government can be traced back, in a large part, to the influence of corporate money in politics. We won't have a truly representative democracy until candidates without multimillion dollar corporate-backed campaigns are put on even footing with those who do have them. The message below describes the problem, and proposed legislation that addresses it.
The PIRG (public interest research group) has a website that makes it easy to email your senators
supporting the bill. Go for it.
http://pirg.org/alerts/route.asp?id=22&id4=ES
Also the website for the "Free Air Time" campaign:
http://freeairtime.org/
Dear U.S. PIRG supporter,
In 2002, candidates, parties and issue groups spent approximately $1 billion on TV and radio ads. Broadcasters make billions of dollars in profits every year from publicly owned airwaves. In exchange for this free use of a public good, broadcasters are required to act as trustees of our airwaves and operate them in the public interest. Unfortunately, the amount of substantive campaign coverage has been dwindling each decade and is now pitifully low.
Senators McCain, Feingold and Durbin plan to introduce a bill that will require broadcasters to provide substantive coverage of candidates and campaigns during election season and will provide qualified candidates with earned airtime vouchers that can be used to "purchase" TV or radio time.
Please take a moment to urge your senators to cosponsor the McCain-Feingold-Durbin free airtime legislation. Follow the link below to go to a web page where you can e-mail your senators.
http://pirg.org/alerts/route.asp?id=22&id4=ES
BACKGROUND
More than 90% of the candidates who raised and spent the most money won their 2002 Congressional elections. Candidates are raising the vast majority of this money from a small pool of wealthy donors. Specifically, 83% of all itemized individual contributions to candidates, parties and PACs came from 1/9 of 1% of the population who contributed at least $1,000 in aggregate. This means that without personal wealth or access to a network of wealthy donors, grassroots candidates are locked out of contention for federal office.
Research shows that candidates are spending a large percentage of the money they raise on TV and radio ads. In 2002, candidates, parties and issue groups spent approximately $1 billion on ads. Broadcasters make billions of dollars in profits every year off of our publicly owned airwaves. In exchange for this free use of a public good, broadcasters are required to act as trustees of our airwaves and operate them in the public interest. Unfortunately, the amount of substantive campaign coverage has been dwindling each decade and is now pitifully low.
Senators McCain, Feingold and Durbin plan to introduce a bill that can help grassroots candidates be more competitive and require broadcasters to fulfill their obligations to the public. The legislation will require broadcasters to provide substantive coverage of candidates and campaigns during election season as a condition of their licenses. It will also provide qualified candidates with earned airtime vouchers that can be used to "purchase" TV or radio time. And it will close loopholes in a provision in existing law intended to prevent broadcasters from charging candidates exorbitant advertising rates.
Please take a moment to urge your senators to cosponsor the McCain-Feingold-Durbin free airtime legislation. Follow the link below to go to a web page where you can e-mail your senators.
http://pirg.org/alerts/route.asp?id=22&id4=ES
Sincerely,
Gene Karpinski
U.S. PIRG Executive Director
GeneK@uspirg.org
http://www.USPIRG.org
Yay. I implemented Vasconcelos' Asympotic Likelihood Approximation to the KL-divergence between GMMs. Then I created a distance matrix for the 414 "tuna.artists" (really the artists in the playola DB after re-ripping), using the ALA on GMMs fit to the anchorspace points. One caveat is that I only trained the models with 20 EM iterations because I was impatient, so I'll have to do it again with longer training later.
Then I compared the distance metric to the AOTM data by plotting the conditional co-occurrence densities (conditioned on each artist), but sorted according to the audio-based ALA distance.
Some example results:
http://blush.ee.columbia.edu/~madadam/tmp/cd-sorted-ala-aguilera.jpg
http://blush.ee.columbia.edu/~madadam/tmp/cd-sorted-ala-coldplay.jpg
http://blush.ee.columbia.edu/~madadam/tmp/cd-sorted-ala-abdul.jpg
In the plots, the list of artists in the upper right corner is the top 5 artists sorted by ALA-distance. The top 5 sorted by co-occurrence probability are also labeled.
I think I need to normalize these to get rid of popularity effect. If we really want to compare the distance-based ranking with the co-occurrence-based ranking, which is what this plot essentially does, then i should normalize the probabilites by popularity. otherwise, e.g. radiohead often has a high probability, which doesn't necessarily mean that radiohead is similar to the conditioned artist.
next:
- musicseer eval on ALA
- try to quantify this AOTM eval. not sure about the fitting-the-exponential idea I had before, there's really not much reason to believe it should behave exponentially. what I really want to do, I guess is ranking comparison between the distance-based ranking and the conditional density ranking, but normalized as I mentioned.
- retrain the GMMs with longer EM iterations
I matched the regularized art of the mix lists to the 414-artist playola DB
to see what kind of overlap we have. the results:
16% of the songs are by playola artists
7% of the songs are in our DB
35% of the lists have two or more songs in our DB.
346/417 "new playola" artists are represented.
this is good news! i think with these numbers, we have enough data to
explore the relationship between the audio-based sim metric and the AOTM
lists. Here's what I'm planning on doing:
Let's assume that songs that co-occur in a playlist are similar, i.e., the
probability of co-occurrence is some function of similarity. So I'd like to
see a plot of simlarity vs. (empirical) conditional probability. I''m
hoping it looks like an exponential density - probability of seeing
something very similar is high, and it quickly falls off as dissimilarity
(distance) increases. The question is, how to use this as a quantitative
measure of how good the similarity metric is? perhaps we fit an exponential
to the plot, and look at the rate of decay - a faster rate means that
cooccurrence probability falls off faster with similarity, so the similarity
metric is better.
anyway, it's something to try.
I was just going through Steve's code that does the musicseer evaluation and I discovered something bad: the results I had that showed that randomly-trained anchor models do as badly as a random sim metric is wrong. It turns out that steve code looks for the word "rand" in the name of a SIM file, and creates its own random SIM metric if it finds it. So my file SIM_ankrand12 was hitting this case and the actual contents of the file were being overridden with random numbers. So obviously this metric did as bad as random, it was random.
when i fix it, the real results aren't so good. The random anchors do almost as well as the "true" genre-based anchors, in some cases:
Mode & ank14v1Centroid & ankrand12Centroid & erdos & rand \\ \hline
Survey, all (6102 resp, 8.97 av.choices) & 3.9736 & 4.3296 & 3.8270 & 5.4193 \\ \hline
Survey, known (4739 resp, 3.59 av.choices) & 4.4577 & 4.7374 & 4.0704 & 5.4425 \\ \hline
Game, all (7124 resp, 11.10 av.choices) & 4.4532 & 4.6012 & 4.4940 & 5.4964 \\ \hline
Game, known (6244 resp, 4.72 av.choices) & 4.8654 & 4.9094 & 4.8661 & 5.4522 \\
What does this mean? Where does the improvement over randomness come from, when you train anchor models on random training labels? some thoughts:
- Perhaps the neural nets are learning something useful, even though they were given random training labels. I trained the random anchors by selecting several random artists and giving each net several songs by that artist. perhaps there is a bias in this process, and some anchors actually learn characteristics of a "dominant" artist in their training set.
- It may be an effect of the centroid. These results use the highly sophisticated "centroid" method of comparing distributions in anchor space, if you recall. Even on mfcc features that would probably do better than random. That should really be the experiment - modeling the distribution in mfcc space the same way as I model the anchor space distribution and comparing those. which, actually, is basically what we decided to do anyway, i.e. comparing to Beth's method.
Back when this all started (for me), that is of course 9/11, I carried a sign into Union Square that asked "What is the root cause of terrorism?". I thought it was important to ask a question rather than shout a polemic, because the attack had shown me how we were, as a nation, both blind to world politics yet inextricably immersed in it. It seemed clear that if twenty men were eager to give their lives to make a statement about our nation's wrongdoing, then no amount of military power would protect us. The only road to safety was understanding.
Apparently this did not seem clear to our leaders. They chose to shout instead of listen, and then to fight instead of heal. The story is a familiar one, and by now the chorus of dissent is so loud that I don't need to retell it. Instead, I want to ask that first question again: What is the root cause of terrorism? I believe that suffering is the root cause of terrorism, and only that which reduces suffering can make us safer in the long run. If that is so, can military power alone make us safer? In a passive and ambiguous sort of way, I live on the front lines of the war on terror. The rifles in the subway remind me. I checked the wind speed and direction this morning, to know which way I would need to run to avoid a cloud of radiological dust.
Tonight I am thinking about power and its forms. There is military power: essentially the strength to execute one's will despite the contradictary will of your enemy. There is also the power of ideas, which is the power to shape the will of your enemy rather than force it aside. The pen is mightier than the sword, some say; others say actions speak louder than words. I believe that we have come to the point in history where military power alone, while perhaps irrefutably effective in the short term, cannot secure the peace in the long term.
The adminstration has chosen to use military power to secure the peace. Essentially this is a choice to bowl over the will of the enemy rather than change their minds. This antique approach won't work today, and it will become even less effective tomorrow. As technology marches on, weapons get smaller, more destructive, easier to obtain, and harder to control. Fears of a nuclear, biological, chemical, and someday a nanotechnological attack will not be assuaged by destroying nation states or forcing "regime change". Instead, we've got to go deeper behind enemy lines: deep into the hearts and minds of the populations from which our enemies are springing. We've got to change their minds about us. Not by empty propaganda and promises, either; this can be done only by allowing them to participate in world politics on even footing.
Some object that by heeding complaints and making concessions, we are giving in to the terrorists. Yes, the terrorists will have won, in some sense, if we change our policies in response to their violence. But some terrorists must be viewed as extreme expressions of common sentiments, not as evil aberrations that spring from nowhere and will be blasted back there by a Tomahawk or a MOAB. The truth is, in a world where technology and economics increasingly binds together the fates of all peoples, everyone holds everyone hostage. But as the cold war has shown, it is possible to find calm in a balanced arrangement of mutual threat. Terrorism has sprung to life because the balance of power has been tipped; terrorists hold us at gunpoint because they have no other power to hold us with.
But the right way for us to hold each other hostage is not at gunpoint but with consensual government. A global democracy, as utopian as it sounds, is necessary for future peace. But as the whole world knows now, this administration is not interested in levelling the playing field, but believes that our own mightiness exempts us from having to care about what people think.