July 18, 2006

small variances

These plots were constructed to further examine what's happening with the smallest-variance mixture components. The first figure shows scatterplots of several statistics of the 3 smallest-variance (smallest determinant) components across all songs in uspop1000, as computed by the EMD-KL kernel on unnormed data. The determinant is plotted against hubness, prior, distance from the global mixture mean, and distance from the origin. It appears to be multimodal, which is especially visible in the following 3-d plot, which contains hubness, determinant, and distance from the origin.

I was also wondering whether simply enforcing a minimum variance floor would help things. I computed a PPK kernel over uspop37, and there doesn't appear to be any difference. R-precision is still .394, the max hubness is still 128.

Returning to the three questions about small-variance components, it doesn't seem that there is much significant difference between the statistics for minimum-variance components and randomly chosen components. The following table shows the means of various statistics computed over the 3 minimum variance components from each song (top row), and 3 randomly-chosen components from each (bottom row). The priors are close, and min-variance components seem to be closer to the mixture mean and further from the origin.

So the remaining question to be tackled is the most interesting: what kind of mfcc frames do these small-variance components model? Posted by madadam at July 18, 2006 12:21 PM