Skip to main content


Showing posts from June, 2010

Formula Debugger

(Please click for bigger - there is quite a bit of detail :)

The left panel shows the search tree of structures; each circle is a structure, the red ones at the tips of some branches are fully connected and saturated. The blue branch has been selected, and is shown in the middle, with the structure at the tip at the top.
This solution is also highlighted (in red - colors are not very consistent) and shown in detail on the right. The right/middle panel is a traditional molecule layout, but with numbers in place of atom symbols. The lower right panel is the spanning tree of the atom highlighted in red in the upper right panel.
Clearly, there are still a lot of unnecessary structures generated since most branches are dead ends. However, this version does at least avoid duplicates - sadly it also misses a few structures :( Clearly refining partitions based on element symbols doesn't totally work...

CDK Signature implementation now in review

What is it?
I have written about them quite a bit, but here is a quick summary : signatures are a little like SMILES, but also somewhat like HOSE codes. They are a description of the connectivity of a molecule, or an atom in the molecule. A more detailed description can be found in these papers by Faulon et al: [1], [3] or in this blog post by me.
The java implementation of this algorithm is a collaboration between Lars Carlsson (who wrote a C++ version) and me (who ported this version to java). However, I was also influenced by my previous attempt at a port from the c implementation by Faulon's group. There is an online service for using their program called "sscan" here. It also deals with stereochemistry.
What is it used for?
So, what can be done with all this new code? Here are some possibilities: Smiles-like canonical strings that represent molecules. Note that signatures are considerably longer than smiles, but are guaranteed to work for cuneane, and indeed a broad rang…