## Posts

Showing posts from February, 2016

### Equitable Partition Refinement with List Invariants

So the bug with C19H14 and C10H16 formulae seems to have been due to the partition refinement not correctly labelling structures with particular arrangements of multiple bonds. The underlying problem is in the equitable partition refinement process. This is a short note about the problem.

Equitable refinement of a partition for a graph is the formation of a vertex partition where each element of each block of the partition has an equal number of neighbours in the other blocks. This is a little difficult to imagine, but it is - roughly - a generalisation of the Morgan number algorithm which attempts to find labels for sets of vertices which are stable with respect to splitting them by the labels of the neighbours.

For example:

This image shows a cub-2-ene like molecule (or a cube graph with two of the edges colored). Clearly the orange and green vertices are in 'different' sets in some sense. Precisely, they are in different blocks of the equitable partition ([0,2,5,7|1,3,4,6])…

### From Seed To Leaf

So the previous post pointed out the problem with a simple extension from a seed : you miss some. In detail, two of the problems are:
Firstly, A shows the - slightly obvious - idea that for some (seed, leaf) pairs you can only get from one to the other by adding edges and not vertices. This problem is easy enough.

As for B, I show here a detailed (if made up) example of the main problem : augmentation of a seed is not necessarily canonical. Or, to put it another way, the canonical deletion can lead to a sibling of the seed, rather than the seed itself.
I think the way round this might be to restrict the candidate atoms (or bonds, even) for canonical deletion to those outside the original seed. In other words, canonically label the augmentation to give the ordering of atoms/bonds then choose the largest labelled one that is not in the seed.

### Seeds and Weeds : Good/Bad Lists in Structure Generation

With the recent revival of the moleculegen (AMG) project, I've started to properly think beyond just simple generation of spaces from a formula. For example, there are 4 trillion or so C30H62 structures - which might take ... a while.

In any case, one useful feature would be to have good/bad lists of substructures (or 'nice' vs 'naughty' as I've been thinking of them. One simple approach I thought might work is to just start with the largest good substructure and generate from there. This can work :

C9H16 from 6-cycle
(By-the-way : all images done with John May's new Depict utility! It's wonderful to use :)
Which is great! Except that it doesn't always work. The problem is that leaves on the tree with a particular substructure might not have that substructure as a common parent in the tree. Consider these C6H8 structures generated from a 4-cycle:
These are not all of them. Now we filter out using subgraph isomorphism (Asad's SMSD code) from the …

### Pictures of Very Wide Trees

Visualising canonical augmentation trees of molecules is not quite as good as I thought:

Or perhaps I should use a different tree layout. Probably a circular one.

Edit. Like this: