A couple of people have asked how the structure generation stuff is going, and the short answer is that I am stuck. This post will give a short summary of the problem, and the next will give a much more detailed description.
So the problem is this : given an elemental formula (like C6H12) or a list of fragments plus a formula (like {2 * CH, 2 * CH2, 2 * CH3 : C6H12}) return all possible connected structures.
There are simple ways to do this, such as connecting every atom to every other atom, and removing duplicates. The downside is that this takes forever, because this procedure will make many, many isomorphic copies of each solution. At the final filtering step, an all-v-all comparison would have to be done on these many copies.
A better solution is to check each structure each time a bond is made, to see if it is canonical. Although I know how to do this in theory, it turns out to be more difficult in practice. For simple graphs, I have a solution that seems to work. Chemicals are not simple graphs, however, as they have elements and bond orders.
This is not code optimisation - without checking for canonical graphs, the running times for even quite simple problems are far too long. For more realistic, reasonable problems the code would be too slow to be useful.
Comments