July 17, 2006

GMM visualization, part 2

Here's a similar set of figures, but using the kernel and models from normalized data (the MFCCs were globally normalized to zero mean and unit variance in all dimensions before modelling).

Looking at the size of the determinants in this and the last post, and at the plots of minimum determinant vs hubness, it seems that one of the few obvious differences between the models for strong hubs and normal songs is that normal songs have one or two mixture components with a much smaller covariance matrix than the rest, while hubs have a larger minumum determinant. Interestingly, the spread of the determinants of the remaining components in the normal songs is about the same as the spread of the hubs' components. There are two ways to think about what this means: (I) the "normal" songs have a few tight components which do strange things when compared to other models. The songs without this strange property are actually behaving as they ought to, but since they're the only ones doing so, they appear to be hubs. (II) Very tight components are very specific, while broader components are more forgiving. Therefore, models which don't have any tight components are a more likely match to any arbitrary model than a model with a very tight covariance. In other words, the unlikelihood of a component with a tight covariance matrix dominates, and two models with tight components in general seem more dissimilar than if one of the models doesn't have any tight components.

Questions to be answered:

  • Where are these components in respect to the rest? on the fringe? in the center?
  • Are these components important to the mixture - ie do they have high priors?
  • What type of MFCC frames are being modelled by these components? Is it silence? Something else that should be filtered out?

And assuming that these tight covariance components are indeed the culprit, what is the appropriate response? If they're modeling silence, that's easy enough. More generally, what happens if we just discard them? And how does this relate to Aucouturier's observations about "homogenizing" the models. Posted by madadam at July 17, 2006 05:20 PM