Skip to main content

Posts

Showing posts from 2011

McKay's canonical augmentation method explained for simple graphs

The previous post talked about generating one type of combinatorial object (chessboards) using a method similar to that outlined by Brendan McKay in a paper called " Isomorph-free exhaustive generation " (J Algorithms, 26 (1998) 306-324.). This one will focus instead on simple graphs, which requires both parts of the method. The canonical construction (or canonical augmentation ) method has two components. Firstly, only one 'expansion' of a graph is tried at each step from the set of equivalent expansions. Secondly, the expansions are checked to see if they are the inverse of a 'canonical deletion' for that graph. For an example of the first rule, consider this set of expansions of a 4-vertex graph on the left: Each of the 5-vertex graphs on the right are shown with the newly added vertex and edges in red; the arrows are labelled by the added edge set - so {1:4, 3:4} means edges added from 1 to 4 and 3 to 4. The sets of vertices to add to - {{0}, {1}, {1,3}}

Generating Chessboards With K-Independant Vertex Sets

After looking at Julio Peironcely's poster on generating chemical structures, where he describes using Canonical Path Augmentation (I think due to Brendan McKay ) I went looking for more about it. One thing I found was this talk/slideshow by Mathieu Dutour Sikirić - incidentally coauthor of a nice book on Chemical Graphs . Anyway; that's the context. Now : about chessboards? Well one of the examples given for augmentation (or 'orderly') schemes of independent vertex sets . I'm not sure what made me think of chessboards for this, but I think it's a fairly standard simple toy example. Let me show an example for 3x3 boards: So, these are all the three by three boards where no two black squares share an edge. In other words, if the board is considered as a grid-shaped graph, then these are the k-independent vertex sets. So the question is : how to generate these? The simple way, of course, is just to fill in every square and eliminate those boards that have pairs

Ring Plate Visualisation

One diagram I've often wanted to make was filled-in rings in molecules : Mainly for the purposes of highlighting rings without highlighting the atoms involved. This image was made using the CDK renderbasic module, and a small toy AWTRenderingVisitor that fills in paths. I'm not sure if the current one does this... The gist for the main drawing method is here but is really just stuff seen before. The custom generator for rings is probably more interesting - if very simple - and is here . Note that it doesn't look too nice with inner-ring double bonds: and would look nicer if the double bonds were symmetric. EDIT : Coloring by ring equivalence classes didn't do what I expected... Shouldn't all the outer rings be in the same class? Steran is how I expect, though:

External Symmetry Numbers and Graph Automorphism Groups

So, there was a question on BioStar about calculating the 'external symmetry number' of a molecule - something I hadn't heard of, but turns out to be something like the subgroup of rotations and reflections of the automorphism group of a graph. Since I have some code to calculate the automorphism group, I naïvely thought it would be simple... The questioner - Nick Vandewiele - kindly provided some test cases, which ended up as this code . Although many of these tests now pass, they only do so because I commented out the hydrogen adding! :) On the one hand, there are some recent improvements that try to handle vertex and edge 'colors' - in other words, element symbols and bond orders. For example, consider the improbable molecule C1OCO1 : These are the three permutations that leave the carbons and the oxygens in the same positions; when you include the identity, that makes 4. Cyclobutane (without hydrogens!) has a symmetry group of order 8. Similarly, cyclobutadien

Colored Tree Paths to Represent Bond Order Assignements

A couple of different sources pointed me to a paper by Dehof et al (Bioinformatics, 2011; doi: 10.1093/bioinformatics/btq718) about bond order assignment by various methods. One of them is the A*-algorithm , which finds paths. Not in the molecule graph, but in a kind of tree of possible bond combinations. Precisely, the tree has a layer for each decision, and a leaf for each combination of decisions. For example, here is the usual example of a square four-cycle (G): with the tree shown below. If we pick a particular path through the tree - say to leaf 5 - we get the sequence of colors in the box in the upper center of the image. This corresponds to the assignment on the upper right. Clearly, it has too many double bonds to be carbon skeleton, but still. So, it's a nice, simple way to represent the set of solutions. Well, it could be even simpler but representing it as a tree lets the paper authors assign weights to the edges based on the chemical rules outlined in work they refere

Final post on Combinatorial Maps of Cuneane

EDIT : Updated with new image. Well, unless I can make a more horrifying diagram than this: ..but I think that's unlikely. An explanation - if such a thing is possible - is that the red/purple arrows correspond to a kind of inside-out operation. So the triangular map (M3) at the bottom can be converted to the square (M4) by choosing the purple darts (2, 14, 16, 5) as the boundary, and putting everything else on the inside. The cycle (1, 4, 9, 20, 22) is red because it is the boundary of the pentagonal map (M5). So, it seems like ϕ(M4) is α(ϕ(M5)) - so the cycle (0,8,23,21,5) in ϕ(M4) is (1,9,22,20,4) in ϕ(M5). That is, because (α[0] = 1, α[8] = 9, α[23] = 22 ...). However ϕ(M3) = ϕ(M5), which is confusing. Oh well So I have updated the image so that all three 'maps' have the same cycles. Which means they have the same ϕ and therefore the same σ. I guess this means I misunderstood what a CM actually is : you can get 'different' embeddings with the same σ. Here is a s

Cuneane Maps

So, to continue about combinatorial maps, here are some more intricate diagrams. Firstly, an embedding of cuneane with a 4-cycle as the outer face: The permutations below are just for reference. Anyway, with this embedding, the cycles of Ï• are (0,8,23,21,5)(1,2,11,7)(3,4,17,15)(6,12,9)(10,14,18,22,13)(16,20,19) which does indeed have one for each face, including the boundary. If you use a different embedding, naturally you get a different map: Which is quite different, and has Ï• of (0,6,10,3)(1,4,20,22,9)(2,14,16,5)(7,8,13)(11,12,23,19,15)(17,18,21) - again, 6 cycles for the 6 rings. And one to rule them all, and in the darkness bind them, of course. The cycles of darts are shown in this composite image: Some of the triangular faces are missing the dart labels, as they were getting too crowded. Well, the whole thing is too crowded, but still. Highlighted in red in each are the darts corresponding to the outer face of the other embedding. Not sure what it means, though. Finally, some

Squashing Molecules Flat and Combinatorial Maps

So the traditional way to say it is probably 'planar embedding', but I've gone for the alternate terminology of 'squashing flat'. For many molecules, this is a fairly easy process : some decisions have to be made about rotatable bonds, but quite a few drugs are just things sticking off a benzene ring or two. For fully 3-dimensional molecular graphs ( for example ) it is trickier. I don't yet know a simple algorithm for choosing different possible embeddings ... er ... squashings. Perhaps they are all complicated; they seem to involve tree data structures with names like "SPQR Tree" (see "Optimizing over all combinatorial embeddings of a planar graph"). Far easier, though, is the intermediate data structure between a graph and some 2D coordinates - known as a combinatorial map . These mathematical objects store the detail of the embedding by recording the order of 'half bonds' called flags (or darts?) around a vertex. Maybe a flag is t

Further work on PDB hetdict

Previously... After using setFormalCharge instead of setCharge, some of the null atom types disappeared. Specifically, the quaternary (SP3?) nitrogens, that have to have a +1 formal charge. What is left? Mostly FeS clusters, and other metals . Turns out that the height-1 signature is clearer for 'clustering' the atoms into types. This is because a signature of this height records only the immediate neighbours of the atom. One fairly frequent example is "[Co]([N][N][N][N])", but fortunately there is a patch for this based on a bug report. The atom is found in cobalamine and other cobalt-haemes. Still, there are quite a few 'odd' ligands - either due to unusual elements, like Iridium ; or unusual coordinations, like this iron with 6 oxygen neighbours . Hmmm, PDBeChem's image (at the bottom of the page) doesn't show all six bonds. It's clearer in Jmol from PDBSum , where the C3 symmetry of the iron ligands makes it seem (to me) that it is 6-coordi

Null Atomtypes in PDB Het Dictionary now in some sort of RDF thing

This blog post won "Best Title" in the award for "Random Combinations of Words" category. Anyway, the atomtypes that were not found by the CDKAtomMatcher are now down to just 1,400. There were various errors on my part (such as using element symbols like "CL" rather than "Cl') and CIF files are trickier to parse than I thought (an atom name can be "HOH'" or 'P"' - look closely!). Just to make life even harder, and because Egon suggested it, I tried to get the output in some kind of RDF using Jena . In practice, this seems to mean a sort of vague tree of nodes, some of which are data and some are types. Well, its in the same repository as before, with source code. Although looking at the N3 file it still looks like it would be impossible to work out which atom belongs to which hetgroup...

Atom-typing the Hetgroup Dictionary

So I posted this to CDK-devel, but probably this is the better place... I've been trying to make a map between the atom IDs used in the HET dictionary (which is in CIF format) and atom types of some sort. To see what this looks like, here is a tail of the file: ZZZ.O6A:O.sp2 ZZZ.H7C1:H ZZZ.H7C2:H ZZZ.H8:H ZZZ.H2N1:H ZZZ.H2N2:H ZZZ.H3:H ZZZ.H5:H ZZZ.H6:H ZZZ.H6A:H 'Zzzz', you may be thinking, but although many atom ids are quite obvious (like H8 is a hydrogen), some are probably not. One annoying aspect of this process was that the CIF file format is not especially friendly, and particularly, the file has 'loop_'s that don't terminate in octothorpes ('#'), as I thought they would. Probably the parser (an IteratingCIFReader) could be much better written - in fact, it will probably only parse this one CIF! So my initial estimates of 13,000 typing failures is now down to only 3,508. What are the atoms that fail? There are some that are bound to like TBR , w