Skip to main content

Posts

Showing posts from August, 2010

Combinations and Filters

So there is now the beginning of a possible re-write of the DBST that uses basically the same approach, but is a bit more flexible. The code is here , but it's still a bit rough. The original idea seems to have been to encode arrangements of double bonds for different ring sizes as a kind of 'library'. For each ring, a particular arrangement is picked until all possible combinations are generated. As a concrete example, see this example for a napthalene skeleton: Here, the arrangements (1, 2) are applied to each ring (A, B) and then these are combined. Of the four combinations (A1B1, A2B1, A1B2, A2B2) only three are valid. The A1B2 combination has two atoms highlighted in red that have two double bonds and one single bond. So one way to filter the combinations is to try and type the atoms, and reject any structure that has untypeable atoms. Another possible filter rejects structures that don't have atoms that are SP2 hybridized. Both of these are from the original cod

1,4-Benzoquinone and the DeduceBondSystemsTool

Once upon a time, there was a DeduceBondSystemsTool, and... Er, anyway. Further to a patch made on the tool (patch ID : 3040138), there is a failing test for 1,4-benzoquinone: The tool generates A, and the test wants B. Now, the problem is not that the tool is not trying B as a possibility, but that it generates A first and the final step doesn't remove it or rank it as better than A. Understanding this requires an understanding of the algorithm. This is (roughly): For each ring, generate a list of possible positions for all numbers of double bonds. Generate a set of molecules by combining these positions together. Remove 'bad' solutions and pick a solution with the least number of 'bad' N/S atoms. where the definition of 'bad' is based on chemical rules like atom types. Now, neither A nor B are bad solutions, and they don't contain N or S atoms, so they both have a rank of zero, and the first one generated will be returned. So, there is really no part

Line Graphs and Double Bonding Systems

After looking at a CDK tool for fixing bond orders for aromatic systems (DeduceBondSystemTool in the smiles package) I wondered if there was a more general approach. That is, the problem is to take a molecular graph with no double bonds and generate all possible double bonded systems. One possibility might be to first convert the graph (G) into a form known as a line graph (lg(G)) where every vertex in lg(G) is an edge in G. If these vertices are labelled to represent the bond order, then an aromatic system has a particular line graph. For example, here is benzene: The dashed lines show the construction of the line graph, and the labels '-' and '=' mean single and double. Now obviously, the two resulting graphs are essentially the same, so it would be nice to remove this redundancy. An example of two different bonding systems comes from phenanthrene: Which is great, but how to generate all non-redundant colorings of the line graphs? Since a line graph is just a graph,