I implemented a rank-agreement score to compare the ranking-by-audio-similarity to the ranking-by-AOTM-cooccurrence. Say we have N item, and two rankings of those items, where a ranking is just a permutation of the items. One of the rankings, say A, is the reference ranking, and the other, say B, is the ranking to be evaluated. For example, for each artist X, A is a list of the artists in decreasing order of conditional co-occurrence probability given artist X. B is the artists ordered by similarity to X under whatever metric we're testing. Note that B doesn't have to a complete ordering, we could take the top 10 hits or something.
The rank-agreement score R is:
The weights are calculated according to the exponential
w = exp(-(log(2)/halflife)*(0:(length(A)-1)));
I used halflife=20 for now.
The optimal value of this metric is when A and B agree completely, i.e. are equal, so then r=1:N. For 414 artists and halflife=20, then optimal_R=29.4. So for the evaluation, I find the rank-agreement score R conditioned on each of the 414 artists, and take opt_R/R for each, and average. So perfection is 1.
To do a statistical significance test (I hope I passed my midterm), we need to know what this score will do under a null hypothesis that the ranking to be evaluated (B) is a random permutation. So I created 10,000 random permutations r of 1:414 and computed w'*r for each. The histogram of that, with the mean in red, is shown here. It looks roughly normal (but a little right-tailed), so I use a basic one-sided significance test with the Normal distribution (probly should use t-test since both mean and variance are unknown, but close enough for now.) The 95% significance point for this test is .1590, but the mean score under the ALA-based SIM metric that I'm using now was .143. Way below 95% significance. In fact, the p-value is about .54, which is completely insignificant.
I also tried normalizing the conditional probability by the prior, to adjust for popularity effects. The test also failed, with p-value .52, even worse than the non-normalized case.
Next: