Symmetric Generations

I've been trying to work on an implementation of a structure generator algorithm due to Faulon (its a paper in this list somewhere). One problem I have difficulty with is reduction of the number of isomers generated. For example:

It may be hard to read, but the idea is that the full tree of possible ways to attach {Br, Cl, F, I} to c1ccc1 is 4 * 3 * 2 * 1 (you can attach Bromine to all four of the carbons, leaving 3 places to attach chlorine, and so on).

Of course c1(Br)c(Cl)cc1 is the same as c1(Cl)c(Br)cc1 [never mind that C1(Br)=C(Cl)C=C1 is not the same as C1=C(Br)C(Cl)=C1]. Or, in other words, there is a high degree of symmetry in the tree.

There is a way to solve this, perhaps even described in the paper - if I could just understand them...


Symmetry is indeed the main blocker in structure generators. The only way to get performance is to detect symmetry as soon as possible... the deterministic generator we had in CDK 1.0 was doing this, as there is a trick to detect this in the connectivity matrix, but it needs to be normalized. I never had/took the time to fully grasp the math behind it to isolate the bugs in the implementation. But please do look up the literature; it's a must read for your work.
gilleain said…
Finding the eigenvalues of the matrix, perhaps?

Reading the literature is one thing, understanding it is another. I was reading "Isomorphism, Automorphism Partitioning, and Canonical
Labeling Can Be Solved in Polynomial-Time for Molecular Graphs" (Faulon, 1998) and it's heavy going :(