July 20, 2006

Hubness and r-precision

What is the relationship between hubness and R-precision? Intuitively, it seems that extreme hubs should hurt R-precision because they are frequent false positives. Here's an interesting visualization of the relationship. The black-and-white matrix represents near neighbors (the top 20 neighbors are white in each row), and plotted above that are both hubness@20 (which equals the sum of each column) along with the local R-precision value (fraction of that item's cluster that actually appear as neighbors). [This is a fragment, zoomed in so you can see the detail. So the rows may not all add up to the 20, which they would on the entire matrix. ] Some kind of correlation between hubness and local R-precision is visible; the correlation coefficient turns out to be .25 for this kernel (negligible p-value). So superficially it seems the opposite from what we'd expect: hubs help precision. But I think this is a misleading correlation, because hubs may have high precision for themselves, but at the expense of precision for many other items. Posted by madadam at July 20, 2006 12:03 AM