March 20, 2003

AOTM & audio

I matched the regularized art of the mix lists to the 414-artist playola DB
to see what kind of overlap we have. the results:

16% of the songs are by playola artists
7% of the songs are in our DB
35% of the lists have two or more songs in our DB.
346/417 "new playola" artists are represented.

this is good news! i think with these numbers, we have enough data to
explore the relationship between the audio-based sim metric and the AOTM
lists. Here's what I'm planning on doing:

Let's assume that songs that co-occur in a playlist are similar, i.e., the
probability of co-occurrence is some function of similarity. So I'd like to
see a plot of simlarity vs. (empirical) conditional probability. I''m
hoping it looks like an exponential density - probability of seeing
something very similar is high, and it quickly falls off as dissimilarity
(distance) increases. The question is, how to use this as a quantitative
measure of how good the similarity metric is? perhaps we fit an exponential
to the plot, and look at the rate of decay - a faster rate means that
cooccurrence probability falls off faster with similarity, so the similarity
metric is better.

anyway, it's something to try.

Posted by madadam at March 20, 2003 06:47 PM
Comments
Post a comment