Skip to main content

Posts

Showing posts from November, 2012

Comparing the EquivalentClassPartitioner and PartitionRefiner : Fullerenes

A  comment  on my previous post reminded me of the "EquivalentClassPartitioner" already in the CDK, written back in 2003 by Junfeng Hao and based on this article  by Chang-Yu Hu and Lu Xu. After some testing, it seems they give the same results on various molecules - although, if you only want the equivalence classes  the Hu/Xu method is much, much faster. Like 10-100 times faster. The test molecules I used for comparing speeds are a library of fullerenes  that range in size from 20 carbons up to 720. Naturally, I started with the smaller ones, but even there the difference was clear. For instance, here is a table of numbers from the C40 run: The left-hand column is just the name of the cc1 file, the next two columns are the times for the AtomPartitionRefiner and EquivalentClassPartitioner, and the last two are the order of the automorphism group and the number of equivalence classes. Times are in milliseconds, so clearly the HuXu method is far fast...

Using the CDK's group module

There is a new module and package for the CDK that is currently under review. This is a short guide to how to use it - to help both reviewers and users. The basic idea is that molecules can be considered as a kind of graph , and that one useful thing to calculate about such graphs is the  automorphism group  that preserves element labels and/or bond labels. To put it another way, calculating the symmetries  of the molecule - although I should point out that it's not quite the same as the crystallographic symmetry groups. As a simple example, consider these two molecules (1,4-cylohexadiene and 4h-pyran) : They are numbered from 0-5 for programming convenience; on the right each molecule has a table of automorphisms written as permutations in cycle notation. It should be fairly obvious that - for example - the H-Flip sends atom 0 to atom 4, 1 to 3, and fixes 2 and 5. Only the H-Flip is an automorphism for 4h-pyran, due to the oxygen atom. The code to do this i...

Graphs of Trees of Graphs (ok, just timings again)

Line -graphs of search -trees of molecular -graphs, that is. I should also point out that I am not using the cutting-edge development version of OMG, but rather the commit 30b08250efa4.... - sadly, I don't have  java7 on this machine, so I can't run the latest versions. Anyway, it doesn't make much difference for the alkanes and alkenes. Here are the times in miliseconds, and log(t in ms) for CnH2n and CnH2n+2. Click for bigger, as always: Clearly AMG (in blue) is significantly slower than OMG (in red), roughly by 10 times. On the other hand, the picture is surprisingly different if we add in a few oxygens: Weird, but I suspect that this kind of problem has been fixed in more recent versions of OMG. User "mmajid" seems to have been doing some interesting experiments with using bliss instead of nauty,  multithreading, and semicanonical checking.

Overcoming the Problem of Crown Ethers in Generating

Due to an overly-optimistic algorithm, AMG was not generating crown ethers properly. In fact it was missing out on a whole bunch of structures, but these particular ones are a good example: The familiar structure is on the left, but the one on the right is the key to the problem that AMG had. Carbons are connected only to oxygens, and oxygens to carbons - in other words the molecule graph is elementally bipartite . Well, ok that's not really a phrase anyone uses - but it should be clear from the image. The flaw in AMG's algorithm was (again) in failing to try disconnected structures. It is not obviously not possible to reach a (CO)n structure from a C-C bond, only from C.C - that is, from two disconnected carbons. Surprisingly, after a fruitless attempt to use partition refinement on the connected parts and then 'add back' the disconnected atoms it turned out that refinement works just great on disconnected structures. So (for now) results for C3H6O3 and simila...