A comment on my previous post reminded me of the "EquivalentClassPartitioner" already in the CDK, written back in 2003 by Junfeng Hao and based on this article by Chang-Yu Hu and Lu Xu. After some testing, it seems they give the same results on various molecules - although, if you only want the equivalence classes the Hu/Xu method is much, much faster. Like 10-100 times faster.
The test molecules I used for comparing speeds are a library of fullerenes that range in size from 20 carbons up to 720. Naturally, I started with the smaller ones, but even there the difference was clear. For instance, here is a table of numbers from the C40 run:
The left-hand column is just the name of the cc1 file, the next two columns are the times for the AtomPartitionRefiner and EquivalentClassPartitioner, and the last two are the order of the automorphism group and the number of equivalence classes. Times are in milliseconds, so clearly the HuXu method is far faster at only one or two ms rather than 30-70 ms. There is presumably some VM speedup going on that accounts for the apparent increase in speed for the first few examples.
It's a similar picture for larger fullerenes : for a typical C92, the PartitionRefiner takes 700 ms and HuXu only 7 ms. There is a speedup for more symmetric molecules - the larger the value of |Aut| (the group order), the faster the PartitionRefiner is. This is to be expected, as it is using the automorphisms to prune the search.
The tests, and some output are in this github repo (fullerene library not included). Finally, here is another image of a fullerene (one of the test cases from the EquivalentClassPartitioner):
because why not. Colourful, is it not?
The test molecules I used for comparing speeds are a library of fullerenes that range in size from 20 carbons up to 720. Naturally, I started with the smaller ones, but even there the difference was clear. For instance, here is a table of numbers from the C40 run:
The left-hand column is just the name of the cc1 file, the next two columns are the times for the AtomPartitionRefiner and EquivalentClassPartitioner, and the last two are the order of the automorphism group and the number of equivalence classes. Times are in milliseconds, so clearly the HuXu method is far faster at only one or two ms rather than 30-70 ms. There is presumably some VM speedup going on that accounts for the apparent increase in speed for the first few examples.
It's a similar picture for larger fullerenes : for a typical C92, the PartitionRefiner takes 700 ms and HuXu only 7 ms. There is a speedup for more symmetric molecules - the larger the value of |Aut| (the group order), the faster the PartitionRefiner is. This is to be expected, as it is using the automorphisms to prune the search.
The tests, and some output are in this github repo (fullerene library not included). Finally, here is another image of a fullerene (one of the test cases from the EquivalentClassPartitioner):
because why not. Colourful, is it not?
Comments
I really should try getting the Schlegel layout stuff working on fullerenes. I got close over the summer, but the optimisation (annealing) part was broken.