Some Stuff

Link experiment - never mind me

2023-04-16T22:32:00.006+01:00

Just a trial to see if linking from here to my personal site here : https://gilleain-torrance.net/ affects how google search console indexes it.

Nothing else to see here :)

Tailor : Descriptions

2016-12-28T12:14:00.000+00:00

Tailor (https://github.com/gilleain/tailor) is a project that grew out of my attempts to search for ~~catmats~~ niches with Prof. Milner-White. The goal of the project was to allow users to define protein structural patterns (called 'descriptions') along with a set of associated measures. More on measures later, but first what is a description?

Here is a very simple example:

The lines don't have arrowheads, but this is implicitly a tree/DAG rather than a graph. There is a root ProteinDescription and the leaves are AtomDescriptions - the DistanceCondition is referencing the two atoms. Basically, this just defines a pattern of two amino acids (GLY, ALA) with a distance of less than 3Å between the N and O atoms.

There are still a lot of details to be worked out here. Can the groups be separated along the chain? If they can, should that require the description to be explicit as to the relationship between sibling nodes? How do we define any number of matches (as with a helix, or strand)?

After some messing about with some ideas from regular expressions, I've abandoned for now the idea of having metacharacters like ".*" or ".+" as it would be hellishly difficult to match. Consider this description:

If you wanted to have the middle two groups repeated any number of times - the equivalent of "A.*G" as a sequence regular expression - how would that work? You would have to test the torsion condition for any number of repeated groups, but that requires the 'capping' residues to be present.

In any case, the code is slowly being revived - with better tests! - and hopefully should be more usable in the New Year...

WTF is a Number Bond?

2016-06-15T21:21:00.002+01:00

Not chemistry, as it happens. I was searching for similar images to one of my line drawings (always fun) and came across these 'number bond things' :

The one on the left - hilariously - is just "1 + 1 = 2". Ok, so that's a deliberately jokey example; real ones have larger numbers and one of the three numbers is for the student to fill in. On the right is a more complex example, drawn as a DAG (directed acyclic graph) although at least one of the example I saw had a node at the bottom with three parents!

In any case, what these things really are representing is partitions of numbers - which are usually drawn as Ferrer's diagrams (or Young tableaux) which I'll refer to as "Ferrer's-Young diagrams". These have a superior feature as shown here:

So one FY-diagram can represent two different number bonds. Note that I've made the crazy leap of making number bonds with more than two parts (or 'addends'). Clearly 1+1+3+4 = 9 = 1+2+2+4 as these partitions are a transposed pair.

One more thing occurs to me - what happens if you do this on a graph, not just a tree? You can label the leaves with - say - an increasing sequence of numbers, and then move to the parents, summing as you go. To work, this algorithm has to do something like add in the largest number from the previous round to get unique numbers:

Here we are labelling the leaves with the numbers {1, 2, 3} and then their parents with {1+3, 2+3+3} and then the final one with {5+8+8}. Of course this labelling is not canonical - you would have to try all permutations of leaf labels. However it's quite nice to see a connection between something as simple as number bonds and something more complex like vertex labelling!

Submultisets and Graphs

2016-05-24T22:37:00.001+01:00

The previous post mentioned the restricted weak composition (RWC), but didn't expand on it at all! Basically, I found this excellent paper : "Generalized Algorithm for

Restricted Weak Composition Generation" by Daniel R. Page. It even gave some java code in an appendix - good stuff :)

Anyway, a RWC is a composition (which is a partition where order matters) that is weak - has zeros in it - and the parts are restricted. So [1, 0, 1, 1] is a weak composition of 3 into 4 parts and lets say we have restricted the parts to {0, 1, 2}. Here is an overview of the scheme:

where we take a degree sequence, convert to a multiset and use a RWC to get a particular sub-multiset. This allows us to take some count of some subset of elements from the multiset.

Doing this for all sub-multisets at each round should then allow us to list graphs - although not without redundant examples:

This shows all starting points for 3 -> [3, 2, 2, 1, 1] and the eventual graphs that are formed. Note that a) and e) are isomorphic. Still, this would surely remove a lot of redundant solutions.

Restricted Weak Compositions, Labelled Partitions, and Trees

2016-05-18T22:56:00.000+01:00

So in the last post about listing trees I outlined a slightly cumbersome method to list trees from degree sequences. Thinking about it a bit more, it would probably be far easier to just list all trees on some number of vertices and filter out by degree sequence. I talked a little about the WROM algorithm in this old post which is a constant time generator of 'free' (unlabelled) trees.

Anyway, that's boring so I was trying out the more complicated approach. It looks like generating a single tree from a degree sequence is as simple as the Havel-Hakimi method. Connect the largest degree (dn) to the dn next largest degrees. Also maintain a list of vertices that have already been connected to, and then at the next step connect only to those not already connected to. So, for [3, 3, 2, 1, 1, 1, 1] we get:

You might notice that trees a) and c) are isomorphic. Below the trees labelled by degree are the same trees labelled by DFS discovery order, and below that the 'layout' of the tree as described in the WROM paper.

As it happens, I tried to use restricted weak compositions and what I call 'labelled partitions' to do this efficiently but it doesn't work so well yet. It seems like this could all be done far easier using just successor functions...

Listing Degree Restricted Trees

2016-05-10T13:12:00.001+01:00

Although stack overflow is generally just an endless source of questions on the lines of "HALP plz give CODES!? ... NOT homeWORK!! - don't close :(" occasionally you get more interesting ones. For example this one that asks about degree-restricted trees. Also there's some stuff about vertex labelling, but I think I've slightly missed something there.

In any case, lets look at the simpler problem : listing non-isomorphic trees with max degree 3. It's a nice small example of a general approach that I've been thinking about. The idea is to:

Given N vertices, partition 2(N - 1) into N parts of at most 3 -> D = {d0, d1, ... }
For each d_i in D, connect the degrees in all possible ways that make trees.
Filter out duplicates within each set generated by some d_i.

Hmm. Sure would be nice to have maths formatting on blogger....

Anyway, look at this example for partitioning 12 into 7 parts:

At the top are the partitions, in the middle the trees (colored by degree) and at the bottom the desired output of "lattice-trees" (a kind of polyomino, apparently). I should really have a consistent degree color scheme...

Anyway, it's probably not the neatest approach for this particular problem, but I think it would work. Since the number of trees generated from each degree sequence is only a fraction of the space, it seems reasonable to do all-v-all checking for isomorphism in this case.

Equitable Partition Refinement with List Invariants

2016-02-29T13:17:00.001+00:00

So the bug with C19H14 and C10H16 formulae seems to have been due to the partition refinement not correctly labelling structures with particular arrangements of multiple bonds. The underlying problem is in the equitable partition refinement process. This is a short note about the problem.

Equitable refinement of a partition for a graph is the formation of a vertex partition where each element of each block of the partition has an equal number of neighbours in the other blocks. This is a little difficult to imagine, but it is - roughly - a generalisation of the Morgan number algorithm which attempts to find labels for sets of vertices which are stable with respect to splitting them by the labels of the neighbours.

For example:

This image shows a cub-2-ene like molecule (or a cube graph with two of the edges colored). Clearly the orange and green vertices are in 'different' sets in some sense. Precisely, they are in different blocks of the equitable partition ([0,2,5,7|1,3,4,6]) as they have different sets of bonds to neighbours.

In fact, all I have changed is to turn the invariants for the refinement procedure from numbers (the neighbour count) to lists of numbers : the ordered list of counts of neighbours connected by an edge of a particular color. This seems to work for the purpose I need it (structure generation) but it may be that it doesn't work for some more subtle use case.

From Seed To Leaf

2016-02-19T22:28:00.001+00:00

So the previous post pointed out the problem with a simple extension from a seed : you miss some. In detail, two of the problems are:

Difficulties with growth from a seed

Firstly, A shows the - slightly obvious - idea that for some (seed, leaf) pairs you can only get from one to the other by adding edges and not vertices. This problem is easy enough.

As for B, I show here a detailed (if made up) example of the main problem : augmentation of a seed is not necessarily canonical. Or, to put it another way, the canonical deletion can lead to a sibling of the seed, rather than the seed itself.

I think the way round this might be to restrict the candidate atoms (or bonds, even) for canonical deletion to those outside the original seed. In other words, canonically label the augmentation to give the ordering of atoms/bonds then choose the largest labelled one that is not in the seed.

Seeds and Weeds : Good/Bad Lists in Structure Generation

2016-02-18T20:20:00.002+00:00

With the recent revival of the moleculegen (AMG) project, I've started to properly think beyond just simple generation of spaces from a formula. For example, there are 4 trillion or so C30H62 structures - which might take ... a while.

In any case, one useful feature would be to have good/bad lists of substructures (or 'nice' vs 'naughty' as I've been thinking of them. One simple approach I thought might work is to just start with the largest good substructure and generate from there. This can work :

C9H16 from 6-cycle

(By-the-way : all images done with John May's new Depict utility! It's wonderful to use :)

Which is great! Except that it doesn't always work. The problem is that leaves on the tree with a particular substructure might not have that substructure as a common parent in the tree. Consider these C6H8 structures generated from a 4-cycle:

C6H10 from 4-cycle

These are not all of them. Now we filter out using subgraph isomorphism (Asad's SMSD code) from the whole space:

Whole C6H10 space filtered by 4-cycle

So we get half of them by growing from a 'seed'. Now to think about filtering out the weeds...

Pictures of Very Wide Trees

2016-02-09T00:30:00.001+00:00

Visualising canonical augmentation trees of molecules is not quite as good as I thought:

Or perhaps I should use a different tree layout. Probably a circular one.

Edit. Like this:

Biconnectivity and Degree Sequence

2015-06-27T17:49:00.001+01:00

Very occasionally there is a question on math stack exchange that I can actually give some kind of useful answer to. This question caught my eye; the questioner asks whether graphs with the same degree sequence have the same biconnectivity property. Or to put it another way, are there any pairs of graphs with the same degree sequence but one is biconnected and the other is not?

I thought of an example of such a pair, and another user supplied a much simpler one, but here are both pairs:

The vertices are coloured by degree (4 = orange, 3 = blue, 2 = green); the red arrows show a transformation of the top graph into its non-biconnected partner. Obviously there may be more than one, I've no idea how many members of a set of isodegree graph might be expected to be biconnected.

For the record, here is the transformation of the bottom pair:

The red lines bisecting edges indicate a kind of 'bond breaking' although it's not meant to represent an actual chemical reaction!

Multigraph Misery

2015-06-24T12:52:00.001+01:00

So another lesson reluctantly learned...

It seems the naïve approach to augmenting molecules by sets of bonds (colored edges, strictly rather than multiple edges) did not work. For example this pair of C9H16 graphs:

Thicker lines represent double bonds, and the thinner are singles. The parents of these are non-isomorphic - which is easy to see if you just remove the 6:8 edge from both. However, they cannot be distinguished as augmentations by my current method as the canonical labelling each one gives a graph where the last bond added is the 'natural' canonical choice.

The alternative might be to use the canonical labelling method for multigraphs suggested by the nasty manual which involved transforming the multiedge graph into a layered simple graph. This is illustrated in this example:

The transformation converts single edges into an edge in the first layer, and multiple edges into an edge in the second layer (and so on). Highlighted in purple is an example augmentation of one single edge and one multiple edge. This augmentation is transformed in the process of making the simple graph, then ordered. Finally, the augmentation in the canonically labelled, layer transformed graph has to be checked to see if it is the chosen augmentation.

Phew! All that remains is implementing it properly...

Millions of Graphs : Slow Yet Correct Generation

2015-05-13T22:35:00.000+01:00

My newest version of canonical path augmentation code for generating graphs has reached a new high point - generating 11,716,571 graphs on ten vertices. Of course, it also gets the number of nines (261,080) and the number of eights (11,117) correct as well ... which is great, but I'm cautious about declaring it 'correct'. Especially given the last version did not get the sevens and eights right. See, for example these past failures:

So how does it get the right answer? Well, it now properly uses the method mentioned in this post to only pick canonical deletions that are not cut-vertices. That turns out only to be necessary for graphs on 8 vertices, but you still have to check this for all augmentations, which seems expensive. However, there was a more fundamental problem; consider the example below (basically nicked from Derick Stolee's blog post):

Obviously A and B are isomorphic, yet how do we properly distinguish them? Well, the key is the set of vertices added to - on the image, these are the labels on the edges between graphs : {0}, {1, 3}, etc. When a new graph is created, a vertex is chosen - using canonical labelling, in my case - and the vertices attached to it must be the ones we used to make that augmented graph. I was checking the set of augmented vertices in the automorphism group of the parent, not the child.

So, the canonical checking is now better. I seem to have written a thousand of these methods, but this one (I think!) finally does it right. What I was getting wrong was checking the orbit of the canonical deletion vertex, and not the orbit of the set of vertices it was being connected to. Great! Now what? How long does it take? See this, where the purple line is the new code, and the others are older attempts:

Clearly the problem now is that of verifying the results - it's quite slow to generate these large datasets, and storing them (uncompressed) takes a lot of space. The nines took minutes and megabytes of space, while the tens took hours and over a gigabyte. At this rate, the 11s would take days and 10s of gigabytes. In any case - where do you stop?

Generating Dungeons With BSP Trees or Sliceable Rectangles

2014-09-29T20:32:00.001+01:00

So, I admit that the original reason for looking at sliceable rectangles was because of this gaming stackoverflow question about generating dungeon maps. The approach described there uses something called a binary split partition tree (BSP Tree) that's usually used in the context of 3D - notably in the rendering engine of the game Doom. Here is a BSP tree, as an example:

In the image, we have a sliced rectangle on the left, with the final rectangles labelled with letters (A-E) and the slices with numbers (1-4). The corresponding tree is on the right, with the slices as internal nodes labelled with 'h' for horizontal and 'v' for vertical. Naturally, only the leaves correspond to rectangles, and each internal node has two children - it's a binary tree.

So what is the connection between such trees and the sliceable dual graphs? Well, the rectangles are related in exactly the expected way:

Here, the same BSP tree is on the left (without some labels), and the sliceable dual is on the right (without corner assignments). The red and blue edges are just the horizontal and vertical associations between rectangles.

Colorful Expanding Triangulations and Sliceable Rectangular Graphs

2014-09-10T21:14:00.001+01:00

There is a whole area of study on visualisations called cartograms - most appealing are the ones that make countries look like inflated or deflated balloons. The rectangular versions of these are less pretty, but more interesting to me from a graph theory perspective.

I came across this subject via an impressive masters thesis by Vincent Kusters : '
Characterizing Graphs with a Sliceable Rectangular Dual' … which is a title that will take some explaining. Firstly, what is a 'rectangular dual' when it's at home? Well check this out:

Clearly the thing on the left is a graph, and on the right is its rectangular dual - in fact, this is the smallest 'sliceable' dual. By sliceable, I mean that the white rectangles can be made by recursively slicing up a rectangle. For example, if a slice is like [{0, 3, 4, 5, 6}, {1, 2}] for making the first split into the areas of 1 and 2 on the right, and all the rest on the left. The next could be [{0}, {3, 4, 5, 6}] and so on.

The colors of the graph indicate a top/bottom cut in red, and a left/right cut in blue. So rectangles 0 and 1 share a left-right (blue) boundary, while 0 and 4 share a top-bottom (red) boundary. The square nodes [T, R, B, L] are the 'corners', and serve to anchor the dual. There's a lot of detail that I'm skipping here, but this is the broad picture.

Interestingly - for me - Kuster's work makes use of a program called Plantri made by none other than Gunnar Brinkmann and Brendan McKay. It generates planar triangulations - which rectangular duals are examples of - and then colors them to make proper duals. The way Plantri works is fairly familiar; canonical path augmentation but with a restricted set of operations to add vertices and edges:

Starting from K4 - the complete graph on 4 vertices - these 'expansions' are applied to graphs while rejecting duplicates using CPA. Now the thing that occurs to me is the possibility of expanding while maintaining the colorings of a rectangular dual. For example:

These are just examples of E5 from the picture before, but starting from particular colorings, and expanding only to particular colorings. As can be seen from the rectangular slices to the side of each graph, these expansions are 'compatible' in some sense with changes in the dual. Whether this is a meaningful operation or not, I'm not sure. There are a number of possible such expansions, but not a huge number. Here are a couple more:

Note that B and C are the same, but expand to different possible colorings. Also that the outer cycle colors are preserved, along with the some of the internal edges. That is no particular coincidence, since they were chosen specifically to preserve as many of the edges colors as possible.

Interesting, but not yet conclusive in any way.

Misunderstanding Embeddings, and Whitney Flips

2014-09-04T21:17:00.000+01:00

So I should correct something that I posted a while ago about embedding 3-connected graphs. The most obvious examples of this class of graph are polyhedra - tetrahedra, cubes, etc - and maybe it's obvious that there is only one way to embed these in the plane. So in that sense, 'enumerating' the embeddings for these graphs is quite easy ... there's just one to count.

Of course, this embedding can be drawn with any of its cycles as an outer face; this is what gave rise to the different looking drawings. I guess the way to think about it is that the embedding is on the surface of a sphere - where there is no 'privileged' face to call the outer face - and that the drawing on the plane just picks one of the faces to 'squash flat' as I put it back in (wow) 2011.

Anyway - on to Whitney flips! There are a class of graphs that can be embedded in the plane in different ways (that is, the combinatorial map is different for the same outer face). These are subject to a Whitney flip, named after the mathematician who laid the foundation for matroids among other things. As an example see the image above.

The only graphs that have this flip are ones with a 2-vertex cutset - the top and bottom vertices in the image. Of course, finding these is a whole other problem, which I may or may not get around to describing...

InChI and InChIKey Metadata in Cambridge DSpace Repository (WWMM)

2014-01-13T13:50:00.002+00:00

At the end of last year, I updated the metadata on some 175,000 or so items in the Cambridge DSpace repository. These were molecules that made up a copy of the 'WWMM' (the World Wide Molecular Matrix) and they had old 'IChI' identifiers rather than the newer InChI and InChIKey identifiers.

So now - after this update - you can use a search engine to ~~google~~ … er, search for compounds by their InChIKey. For example:

YMSFBKYTOUKHOI-UHFFFAOYSA-N

gives just two results, one of which is from PubChem, and the other is Cambridge Repository. Hilariously, the image from PubChem is this:

when the formula is C32H60N4O4. I assume that the connectors in this cycle are just alkane chains, but the layout fails for this kind of 'cyclic lipid peptide' (or whatever it is called!).

Festive Chemical Structure Generation : Necklaces and Trees!

2013-12-19T15:53:00.003+00:00

So, a student asked me about a homework question that is a sub-problem of the structure generation problem. Basically, it was to count the number of chemical structures with exactly one cycle given the elemental formula. Of course, the best solution here is probably to use the Polyá Enumeration Theorem since all that was asked for was a count (enumeration) of the structures.

Naturally, I have a different way to do this - especially since I don't really understand the mathematics of PET enough to implement it. So:

The image shows a rough overview of how I might list all of the structures with a single cycle. It takes a number of necklaces (one shown), and a number of trees, and glues the one to the other in all possible ways. The word 'necklace' here is specifically the combinatorial object; so the cyclic sequence CNCNO is the same as CNOCN since you can rotate one to get the other.

One tricky decision here is whether to add multiple bonds to the necklace before or after adding the trees. It seems like this would make a difference to how fast the algorithm was - if you add the bonds afterwards, you might reject many of the possible attachments. Hard to say.

The other aspect to consider is the connection of the parts. If we consider necklaces (or 'cycles') and trees as types of block, then the problem is connecting together blocks into a tree. This is essentially the reverse of the approach detailed in this post - using a block decomposition tree to guide the assembly of the blocks:

Although, now I come to look at it, it seems like the attachment points on the necklace would drive the underlying block-tree. So perhaps this is only relevant for graphs that contain multiple cycles - which starts to become a much more difficult problem!

Happy Holidays, anyway...

Comparing Kiraly (Exhaustive) Graph Generation with nAUTy Output

2013-11-29T17:48:00.001+00:00

So recently I was asked about Király's method for generating all graphs from a degree sequence. While refactoring some of the code that I wrote to do this, I also made some tests. Specifically, coverage tests to check that the generation was actually exhaustive. I know it's redundant, but I have good tools to remove duplicate graphs - or I thought that I did…

Here a rough flowchart of the procedure here, starting with a number ('n') that is passed to Dreadnaut (the interface to nAUTy) to generate graphs:

These graphs are grouped by degree sequence, and these degree sequences are fed into the KirályHHGenerator to reconstruct the set of graphs. I think that compare arrow is wrong, but never mind. The point is that the sets should be the same size.

They are for n=5,6,7 but not for 8. Oddly enough, however, there are more in the Király set than in the nAUTy set. The obvious conclusion would be that my duplicate detection is failing - in other words, I am failing to spot an isomorphism between two graphs. For example this pair:

However, two of my methods give different answers for this pair. The signatures method says they are different, while the partition refinement method says they are the same. Odd - and more investigation is needed before I am certain that geng has missed some graphs here...

Centrality as a Vertex Invariant (or 'Atom Descriptor')

2013-10-11T08:51:00.001+01:00

EDIT: After some more tests, I now realise that this is not really as great a vertex label/descriptor as I thought it was. For example, see these four graphs on 7 vertices that fail to distinguish vertices properly:

The first one should have a central vertex in a different class than the other blue vertices. The green class in the second graph should be split, and same for the third graph. And so on.

So, in the last post I talked about the ideas of Randić et al for calculating the 'centrality' of vertices in a graph. Interestingly, the numbers calculated for each vertex act as a kind of equivalence class label or vertex invariant. This is similar in many ways to Morgan numbers (sorry, Egon's post doesn't actually explain them, but they are the sum of degrees across extended neighbourhoods).

For example, here is one of the examples from the previous post:

With the centrality matrix in the middle, and the 'label' made by sorting the row elements in descending order to the right of that. Finally, these labels are converted to more easily read alphabetic ones - classes in some sense ('a' = {0, 5} and 'b' = {1, 2, 3, 4}). These classes make sense, given that the middle vertices are in the same class, with the rest in another class.

Compare this to the graph with the same ORS of [6, 6, 5, 5, 5, 5]:

The graph here is nearly the same - with only the edge 1:3 missing - yet the labels don't distinguish between vertices {1, 3} and vertices {2, 4}. One possibility mentioned in the paper is to use combinations of descriptors (fairly common practice in cheminformatics, I suspect). The simplest one that occurs to me is just to add the degree of a vertex to the start of the label. That makes the label for {1, 3} into "1320000", distinguishing them from the one for {2, 4} which is "2320000".

Anyway, here is a picture of a number of pairs of graphs with the same ORS, colored by the label just described:

Note how some (but not all!) of these have the 'same' equivalence classes in different arrangements. Be aware that the colors may not be totally meaningful when compared between graphs. Code for this is here.

Common Vertex Matrices of Graphs

2013-10-05T11:59:00.001+01:00

There is an interesting set of papers out this year by Milan Randic et al (sorry about the accents - blogger seems to have a problem with accented 'c'...). I've looked at his work before here.

[1] Common vertex matrix: A novel characterization of molecular graphs by counting
[2] On the centrality of vertices of molecular graphs

and one still in publication to do with fullerenes. The central idea here (ho ho) is a graph descriptor a bit like path lengths called 'centrality'. Briefly, it is the count of neighbourhood intersections between pairs of vertices. Roughly this is illustrated here:

For the selected pair of vertices, the common vertices are those at the same distance from each - one at a distance of two and one at a distance of three. The matrix element for this pair will be the sum - 2 - and this is repeated for all pairs in the graph. Naturally, this is symmetric:

At the right of the matrix is the row sum (∑) which can be ordered to provide a graph invariant. In the case of 2-methylpentane it is [6, 5, 5, 3, 3, 2] - this is referred to as the ORS (ordered row sum) in the papers. Naturally, such a simple property to calculate is not a complete graph invariant, although it does have quite a high discriminatory power for acyclic graphs.

Some examples of pairs of graphs with identical ORS are these two pairs:

Where it is clear that they have quite similar structures - which is also true for ORS close together in lex order. There is some code to test all this here. However the implementation I've made is quite inefficient, I suspect. My algorithm was:

1) Calculate a distance matrix (with Floyd-Warshall).
2) Use this matrix to get lists of vertices arranged by neighbourhood.
3) For each pair of vertices, sum the sizes of the intersections of the neighbourhoods.

It really seems like this could be done in less steps, perhaps in some way similar to the first step?

Tutte's Twist Operation on Cubic Graphs

2013-06-09T17:19:00.001+01:00

There is an interesting book by W. T. Tutte called 'Graph Theory as I have known it' which is a cross between a normal mathematical text and a biography. So it's a description of the areas he was interested in, and his theorems. One thing that interested me was the use of a 'twist' operation on cubic graphs like so:

Where for the edge between vertices x and y labelled 'A' we reconnect the surrounding edges to form the arrangement on the right hand side. So detach edge D from y and connect it to x, and vice versa with edge C. The lower part of the picture shows what happens for a loop-edge - it transforms to a multi-edge.

This operation is used on a family of 'base' graphs looking like this:

with the first in the list is a vertexless loop graph - that is, it has no vertices and a single edge. From these base graphs, the twist operation can form any cubic graph. Note that all of U_n are cubic with 2n vertices.

For example, from U₃ we can get to both of the (simple) graphs with 6 vertices by the following sequence:

In this diagram, the twist is being applied to the red edge, then the blue, then the green, etc. The final step converts the prism (G₆) to K_3,3 (G₇) while the other steps involve non-simple graphs with loops and multiple edges.

One of Tutte's uses for these transformations was to show that the number of 1-factors (perfect matching) J of a graph can be calculated by J(G) + J(G_A) = J(H) + J(H_A) where G_A is a graph with the edge A deleted. So, starting with a base graph U - which has J = 1 except for U0 where J = 2. Then use that value to determine the number of 1-factors in the next graph in the sequence, and so on.

It does make me wonder if there is a way to generate cubic graphs from these base examples, by these twists. From a few simple examples it is clear that there would be a lot of redundancy at the leaves of the generated tree, but possibly that could be handled with canonical path augmentation in some way.

Signatures with user-defined edge colors

2013-04-16T18:16:00.000+01:00

A bug in the CDK implementation of my signature library turned out to be due to the fact that the bond colors were hard coded to just recognise the labels {"-", "=", "#" }. The relevant code section even had an XXX above it!

Poor show, but it's finally fixed now. So that means I can handle user-defined edge colors/labels - consider the complete graph (K5) below:

So the red/blue colors here are simply those of a chessboard imposed on top of the adjacency matrix - shown here on the right. You might expect there to be at least two vertex signature classes here : {0, 2, 4} and {1, 3} where the first class has vertices with two blue and two red edges, and the second has three blue and two red.

Indeed, here's what happens for K4 to K7:

Clearly even-numbered complete graphs have just one vertex class, while odd-numbered ones have two (at least?). There is a similar situation for complete bipartite graphs:

Although I haven't explored any more of these. Next is to put this fixed version into the CDK...

Visualising Ring Equivalence Classes in Jmol

2013-01-13T21:24:00.001+00:00

As promised (in the previous post) I've now made Jmol scripts to show the atom/ring equivalence classes. I still think that the ring ones are more clear, but I suppose it depends on what aspect of the symmetry of the structure is needed. As an example:

Shown here is a C70 structure, with coloured circular plates at the centre of each face. It should be clear that there is an axis of symmetry running through the middle, from one blue plate to the other. Around the blue is a ring of green, and 5 rings in between.

The slight difficulty in all this was working out the ring equivalence classes. There is an existing CDK method to do this - in the SSSR ring finder - but it seems to give too many classes. The way I did it was to first find atom equivalence classes (or 'orbits') using signatures. Then each ring is a circular list of the orbit indices : which I'm going to call a 'ring code'. See this image for illustration:

These two rings (A and B) have the same ring code, written as the smallest concatenated string formed from their orbit indices. In other words, the signatures of each atom in the ring is converted to a number based on that signatures index in a list of all the signatures for all the atoms. Obviously other atom-equivalence class methods could be used to find the initial orbits; the rest of the procedure would be the same.

I do wonder if it would be quicker to just find the orbits of the dual of the embedding. However, that involves making that embedding first, so probably not...

Blowing Carbon Bubbles : Expanding 2D Fullerene Layouts to 3D

2013-01-11T23:27:00.001+00:00

The concentric face layout code is working well enough now to handle the larger fullerenes - such as that old favourite, C60. Since coloring the vertices by equivalence class is not always terribly informative, here is a view of the ring equivalence classes :

Where C60 is on the left, and a more colourful C70-D5h is on the right. One difficulty, however, is to understand the symmetries of these structures when they are distorted like this. The further away from the center of the layout, the more stretched the rings become.

So, an obvious next step was to 'blow up' these 2D layouts into 3D. It turns out that is possible, with a combination of inverse stereographic projection and Jmol's minimize command. The first step is necessary since minimizing the 2D coordinates (with a z-coord of zero) just shrinks the diagram down in the plane. Here are before and after shots of these steps:

Clearly the inverse-projection does not give very good 3D positions for the atoms, but they are on the surface of a sphere, and the bonds don't cross. The minimization gives a much better looking version, although there are still dents and a lot of asymmetry.

Next step is to write out the equivalence class colours (vertex/face?) into a Jmol script, so that they can be visualised in 3D. Oh, and one last thing - it was necessary to scale the flat diagram down so that the radius was closer to a unit circle. If it was contained inside the circle, then the expansion was only a hemisphere. Also, the points were expanded from the centre outwards by a small factor, to improve the bond distances.