June 23, 2006

global normalization

ok, i have most of the machinery in place to do the experiment about covariance size and hubness: look at the sizes of the covariance matrices, and plot them against the song "hubness". The idea was that under one hypothesis (hubs are caused by universal background radiation), hubs will have very broad covariances because they're similar to the background, but under another (hubs are caused by very localized tight gaussians fit to common repetitive frames) the covariances will be small.

But in order to be able to compare the norms of covariance matrices across models, the data has to be globally normalized to unit variance in all dimensions. otherwise gaussians which stretch out along dimensions that are just more highly scaled out naturally will seem bigger than gaussians which stretch out along tighter dimensions.

since there are lot of songs, it would be unwieldy to create one giant (15G) pfile and run qnnorm on it. instead, I took a sampling approach, by computing qnnorm over each file, then averaging them into a global norm file. It was a bit rough & ready; ideally I would compensate for different frame lengths of different songs, but it probably won't make a big difference.

The command was:

~/work/globalnorm.pl -nsamples 500 -output uspop-global.norm <
libraries/uspop_trans.list

Posted by madadam at June 23, 2006 01:25 PM