I implemented it using in C++ using Torch, which had most of the machinery I needed, and it takes 150 ms to compute one pair - two orders of magnitude faster. A 350x350 kernel will take about 2 hours. And that's without doing any of the obvious optimizations I can think of like not resampling from each model across each kernel row.
Results coming soon...
Posted by madadam at July 21, 2006 10:55 AM