Good news everybody!

With a slight change of data, and a change to the bond order settings, now 2,737 out of 5,739 match exactly. 1,114 of the rest get the right one in the top 10. That still leaves almost 2,000 - but some of these are clearly poor experimental data.

Comments

What similarity measure are you using to compare the spectra?
gilleain said…
I'm using a stripped down version of Stefan's code from the NMRshiftDB, so the score is based on the number of shift peak matches.
I have rather good experience with the weighted cross-correlation (WCC) for peak-like spectra where small shifts are not significant. It does not use binning, and you won't see effects due to that approach. There is code to calculate the WCC measure in the seneca bioclipse1 plugin.