Exploring the wild beasts of the layout jungle

There was a bug submitted to the CDK sourceforge tracker (bug number 2783741) with a list of molecules that are laid out badly. I had a look at some of them with the help of bioclipse. For example, this calix[4]arene:

or this:

which is a clearer case of something going wrong. More difficult is structures that are fully 3D, like:


Can you guess what it is? :) Try the 3D version (also made with bioclipse, using the CDK 3D layout):

It's a paracyclophane! The phenyl rings are lost in the 2D layout because there is a bigger 'ring'. Perhaps a chemist would look at the 3D structure and think that those chains are linkers, not parts of a ring, but the algorithm doesn't know this.

I think that it is difficult to have general rules for this. Of course, any fully 3D structure will be difficult to lay out in 2D (if it is not embeddable in the plane then it is impossible) so things like this:


are truly awful.

Comments

These larger ring systems are a problem indeed. I recently added some templates to cover some larger ring systems of some size, giving them the more common 120 degree angles, but more are needed...

Or, an algorithm to layout ring systems of 10 bonds and larger into a 120 angle layout... sort of like a multi-phenyl system, but then without the inner atoms, and just the edge...

This is when I found that the template matcher took into account bond orders, basically matching these templates only match the cylco-foo-ane, and not the variants with -[di|tri|etc]-enes, -ynes, etc...
The current standard in SDG is set by the guys from CCG (http://dx.doi.org/10.1021/ci050550m). Those large rings, btw, can be easily laid-out with a honey-comb-embedding algorithm and need no templates.
Yes, indeed, no honey-cumb templates needed once we have an implementation of that algorithm.