tag:blogger.com,1999:blog-1233136933883846632018-06-19T13:57:36.851+01:00Some StuffAn Online Research Notebookgilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.comBlogger181125tag:blogger.com,1999:blog-123313693388384663.post-84610874053971246162016-12-28T12:14:00.000+00:002016-12-28T12:14:03.231+00:00Tailor : DescriptionsTailor (https://github.com/gilleain/tailor) is a project that grew out of my attempts to search for <strike>catmats</strike> <a href="https://en.wikipedia.org/wiki/Niche_(protein_structural_motif)">niches</a> with Prof. Milner-White. The goal of the project was to allow users to define protein structural patterns (called 'descriptions') along with a set of associated <i>measures</i>. More on measures later, but first what is a description?<br /><br />Here is a very simple example:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-bHHEnaJYtvM/WGOmUuOvOKI/AAAAAAAABGg/mJsljdUv_nkxkRcHptabiZik1FxTVIDdgCLcB/s1600/distance_condition.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="272" src="https://2.bp.blogspot.com/-bHHEnaJYtvM/WGOmUuOvOKI/AAAAAAAABGg/mJsljdUv_nkxkRcHptabiZik1FxTVIDdgCLcB/s400/distance_condition.png" width="400" /></a></div><br />The lines don't have arrowheads, but this is implicitly a tree/DAG rather than a graph. There is a root <i>ProteinDescription</i> and the leaves are <i>AtomDescriptions</i> - the <i>DistanceCondition</i> is referencing the two atoms. Basically, this just defines a pattern of two amino acids (GLY, ALA) with a distance of less than 3Å between the N and O atoms.<br /><br />There are still a lot of details to be worked out here. Can the groups be separated along the chain? If they can, should that require the description to be explicit as to the relationship between sibling nodes? How do we define any number of matches (as with a helix, or strand)?<br /><br />After some messing about with some ideas from regular expressions, I've abandoned for now the idea of having metacharacters like ".*" or ".+" as it would be hellishly difficult to match. Consider this description:<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-jaz8DgHDKAI/WGOrKqeNkoI/AAAAAAAABG8/ZTVi1pnPl_Q_qhUUAfPE1mMNoEmUi_4pwCLcB/s1600/torsion_condition.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="278" src="https://4.bp.blogspot.com/-jaz8DgHDKAI/WGOrKqeNkoI/AAAAAAAABG8/ZTVi1pnPl_Q_qhUUAfPE1mMNoEmUi_4pwCLcB/s400/torsion_condition.png" width="400" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"></div><br />If you wanted to have the middle two groups repeated any number of times - the equivalent of "A.*G" as a sequence regular expression - how would that work? You would have to test the torsion condition for any number of repeated groups, but that requires the 'capping' residues to be present.<br /><br />In any case, the code is slowly being revived - with better tests! - and hopefully should be more usable in the New Year...gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-62095172133555411792016-06-15T21:21:00.002+01:002016-06-15T21:21:49.260+01:00WTF is a Number Bond?Not chemistry, as it happens. I was searching for similar images to one of my line drawings (always fun) and came across these 'number bond things' :<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-kSsZ8tJl6Wk/V2GpdZZXnoI/AAAAAAAABEo/I9nm4MYEsIIXSJ5frxmxnF-ujATVYa9hgCLcB/s1600/number_bonds.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="223" src="https://4.bp.blogspot.com/-kSsZ8tJl6Wk/V2GpdZZXnoI/AAAAAAAABEo/I9nm4MYEsIIXSJ5frxmxnF-ujATVYa9hgCLcB/s400/number_bonds.png" width="400" /></a></div>The one on the left - hilariously - is just "1 + 1 = 2". Ok, so that's a deliberately jokey example; real ones have larger numbers and one of the three numbers is for the student to fill in. On the right is a more complex example, drawn as <a href="http://mathworld.wolfram.com/AcyclicDigraph.html">a DAG</a> (directed acyclic graph) although at least one of the example I saw had a node at the bottom with three parents!<br /><br />In any case, what these things really are representing is partitions of numbers - which are usually drawn as Ferrer's diagrams (or Young tableaux) which I'll refer to as "Ferrer's-Young diagrams". These have a superior feature as shown here:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-aPkiTi6Yny0/V2Gz1dNj6ZI/AAAAAAAABE4/8UpfH5TcKp8klIVSHVumdEML6FWiH8kGQCLcB/s1600/ferrers_young_vs_number_bonds.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="266" src="https://2.bp.blogspot.com/-aPkiTi6Yny0/V2Gz1dNj6ZI/AAAAAAAABE4/8UpfH5TcKp8klIVSHVumdEML6FWiH8kGQCLcB/s320/ferrers_young_vs_number_bonds.png" width="320" /></a></div>So one FY-diagram can represent <i>two</i> different number bonds. Note that I've made the crazy leap of making number bonds with more than two parts (or 'addends'). Clearly 1+1+3+4 = 9 = 1+2+2+4 as these partitions are a transposed pair.<br /><br />One more thing occurs to me - what happens if you do this on a graph, not just a tree? You can label the leaves with - say - an increasing sequence of numbers, and then move to the parents, summing as you go. To work, this algorithm has to do something like add in the largest number from the previous round to get unique numbers:<br /><br /><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-bvwzMGQ2IM0/V2G38CPRZhI/AAAAAAAABFM/xDSXvn8RXvAZiznGxx7cgostmMKX_uMBQCLcB/s1600/number_bond_labelling.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="363" src="https://2.bp.blogspot.com/-bvwzMGQ2IM0/V2G38CPRZhI/AAAAAAAABFM/xDSXvn8RXvAZiznGxx7cgostmMKX_uMBQCLcB/s400/number_bond_labelling.png" width="400" /></a></div>Here we are labelling the leaves with the numbers {1, 2, 3} and then their parents with {1+3, 2+3+3} and then the final one with {5+8+8}. Of course this labelling is not canonical - you would have to try all permutations of leaf labels. However it's quite nice to see a connection between something as simple as number bonds and something more complex like vertex labelling!gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-67342828449023712942016-05-24T22:37:00.001+01:002016-05-24T22:37:39.826+01:00Submultisets and GraphsThe previous post mentioned the restricted weak composition (RWC), but didn't expand on it at all! Basically, I found this excellent paper : "<span style="font-family: NimbusRomNo9L;"><a href="http://link.springer.com/article/10.1007%2Fs10852-012-9194-4">Generalized Algorithm for</a></span><br /> <div class="page" title="Page 1"> <div class="layoutArea"> <div class="column"> <span style="font-family: NimbusRomNo9L;"><a href="http://link.springer.com/article/10.1007%2Fs10852-012-9194-4">Restricted Weak Composition Generation</a>" by Daniel R. Page. It even gave some java code in an appendix - good stuff :)</span><br /><span style="font-family: NimbusRomNo9L;"><br /></span><span style="font-family: NimbusRomNo9L;">Anyway, a RWC is a <i><a href="https://en.wikipedia.org/wiki/Composition_(combinatorics)">composition</a></i> (which is a partition where order matters) that is <i>weak</i> - has zeros in it - and the parts are <i>restricted</i>. So [1, 0, 1, 1] is a weak composition of 3 into 4 parts and lets say we have restricted the parts to {0, 1, 2}. Here is an overview of the scheme:</span><br /><span style="font-family: NimbusRomNo9L;"><br /></span><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-OJXlipYzVQk/V0TIcdwh1YI/AAAAAAAABEM/DdWxQ-_Va0A2LgpYLacV63XxQ4fgM3OsQCLcB/s1600/making_degree_multisets.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="201" src="https://1.bp.blogspot.com/-OJXlipYzVQk/V0TIcdwh1YI/AAAAAAAABEM/DdWxQ-_Va0A2LgpYLacV63XxQ4fgM3OsQCLcB/s400/making_degree_multisets.png" width="400" /></a></div><span style="font-family: NimbusRomNo9L;">where we take a degree sequence, convert to a multiset and use a RWC to get a particular sub-multiset. This allows us to take some count of some subset of elements from the multiset.</span><br /><span style="font-family: NimbusRomNo9L;"><br /></span><span style="font-family: NimbusRomNo9L;">Doing this for all sub-multisets at each round should then allow us to list graphs - although not without redundant examples:</span><br /><span style="font-family: NimbusRomNo9L;"><br /></span><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-6cbLkHTEjZ8/V0TJNEtTA-I/AAAAAAAABEQ/4P-_hXkcXOQOEiMks2OBCRFFe1rjdM0MQCLcB/s1600/tree_saturations.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="335" src="https://3.bp.blogspot.com/-6cbLkHTEjZ8/V0TJNEtTA-I/AAAAAAAABEQ/4P-_hXkcXOQOEiMks2OBCRFFe1rjdM0MQCLcB/s400/tree_saturations.png" width="400" /></a></div><span style="font-family: NimbusRomNo9L;"> This shows all starting points for 3 -> [3, 2, 2, 1, 1] and the eventual graphs that are formed. Note that a) and e) are isomorphic. Still, this would surely remove a lot of redundant solutions.</span><br /> </div></div></div>gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-46834074099473657882016-05-18T22:56:00.000+01:002016-05-18T22:56:04.010+01:00Restricted Weak Compositions, Labelled Partitions, and TreesSo in the last post about <a href="http://gilleain.blogspot.co.uk/2016/05/listing-degree-restricted-trees.html">listing trees</a> I outlined a slightly cumbersome method to list trees from degree sequences. Thinking about it a bit more, it would probably be far easier to just list <b>all</b> trees on some number of vertices and filter out by degree sequence. I talked a little about the WROM algorithm <a href="http://gilleain.blogspot.co.uk/2012/04/generating-trees.html">in this old post</a> which is a constant time generator of 'free' (unlabelled) trees.<br /><br />Anyway, that's boring so I was trying out the more complicated approach. It looks like generating a <i>single</i> tree from a degree sequence is as simple as the Havel-Hakimi method. Connect the largest degree (dn) to the dn next largest degrees. Also maintain a list of vertices that have already been connected to, and then at the next step connect only to those not already connected to. So, for [3, 3, 2, 1, 1, 1, 1] we get:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-ejgaCkQj0Ks/VzzjzM6_tiI/AAAAAAAABD8/FZ3CHPnFdegB9dIhY9p7XDkz0iepa20kACLcB/s1600/all_trees_on_3321111.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://2.bp.blogspot.com/-ejgaCkQj0Ks/VzzjzM6_tiI/AAAAAAAABD8/FZ3CHPnFdegB9dIhY9p7XDkz0iepa20kACLcB/s400/all_trees_on_3321111.png" width="400" /></a></div><br />You might notice that trees a) and c) are isomorphic. Below the trees labelled by degree are the same trees labelled by DFS discovery order, and below that the 'layout' of the tree as described in the WROM paper.<br /><br />As it happens, I tried to use restricted weak compositions and what I call 'labelled partitions' to do this efficiently but it doesn't work so well yet. It seems like this could all be done far easier using just successor functions...gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-30216196013563518792016-05-10T13:12:00.001+01:002016-05-10T13:13:24.171+01:00Listing Degree Restricted TreesAlthough stack overflow is generally just an endless source of questions on the lines of "HALP plz give CODES!? ... NOT homeWORK!! - don't close :(" occasionally you get more interesting ones. For example <a href="http://math.stackexchange.com/questions/1774091/how-to-generate-recursively-all-non-isomorphic-trees-with-2-types-of-vertex-l">this one</a> that asks about degree-restricted trees. Also there's some stuff about vertex labelling, but I think I've slightly missed something there.<br /><br />In any case, lets look at the simpler problem : listing non-isomorphic trees with max degree 3. It's a nice small example of a general approach that I've been thinking about. The idea is to:<br /><ol><li>Given <i>N</i> vertices, partition <i>2(N - 1)</i> into <i>N</i> parts of at most 3 -> D = {d0, d1, ... }</li><li>For each <i>d_i</i> in D, connect the degrees in all possible ways that make trees.</li><li>Filter out duplicates within each set generated by some d_i.</li></ol>Hmm. Sure would be nice to have maths formatting on blogger....<br /><br />Anyway, look at this example for partitioning 12 into 7 parts:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-22RmBx7vay0/VzHOBhCK6EI/AAAAAAAABDo/RRSRBDBpcOULTuomIY_mTR7NFUYgO4FMACLcB/s1600/animal_trees.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="523" src="https://3.bp.blogspot.com/-22RmBx7vay0/VzHOBhCK6EI/AAAAAAAABDo/RRSRBDBpcOULTuomIY_mTR7NFUYgO4FMACLcB/s640/animal_trees.png" width="640" /></a></div>At the top are the partitions, in the middle the trees (colored by degree) and at the bottom the desired output of "lattice-trees" (a kind of polyomino, apparently). I should really have a consistent degree color scheme...<br /><br />Anyway, it's probably not the neatest approach for this particular problem, but I think it would work. Since the number of trees generated from each degree sequence is only a fraction of the space, it seems reasonable to do all-v-all checking for isomorphism in this case.gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-86464473529848464882016-02-29T13:17:00.001+00:002016-02-29T13:17:40.095+00:00Equitable Partition Refinement with List InvariantsSo the bug with C19H14 and C10H16 formulae seems to have been due to the partition refinement not correctly labelling structures with particular arrangements of multiple bonds. The underlying problem is in the <b>equitable</b> partition refinement process. This is a short note about the problem.<br /><br />Equitable refinement of a partition for a graph is the formation of a vertex partition where each element of each block of the partition has an equal number of neighbours in the other blocks. This is a little difficult to imagine, but it is - roughly - a generalisation of the Morgan number algorithm which attempts to find labels for sets of vertices which are stable with respect to splitting them by the labels of the neighbours.<br /><br />For example:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-x4a8VrITzO8/VtRDwSjtl5I/AAAAAAAABDU/mFK45azrFII/s1600/cubene_partition.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="151" src="https://1.bp.blogspot.com/-x4a8VrITzO8/VtRDwSjtl5I/AAAAAAAABDU/mFK45azrFII/s400/cubene_partition.png" width="400" /></a></div><br />This image shows a cub-2-ene like molecule (or a cube graph with two of the edges colored). Clearly the orange and green vertices are in 'different' sets in some sense. Precisely, they are in different blocks of the equitable partition ([0,2,5,7|1,3,4,6]) as they have different sets of bonds to neighbours.<br /><br />In fact, all I have changed is to turn the invariants for the refinement procedure from numbers (the neighbour count) to lists of numbers : the ordered list of counts of neighbours connected by an edge of a particular color. This seems to work for the purpose I need it (structure generation) but it may be that it doesn't work for some more subtle use case.<br /> gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-52945757362695070102016-02-19T22:28:00.001+00:002016-02-19T22:28:41.142+00:00From Seed To LeafSo the <a href="http://gilleain.blogspot.com/2016/02/seeds-and-weeds-goodbad-lists-in.html">previous post</a> pointed out the problem with a simple extension from a seed : you miss some. In detail, two of the problems are:<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="http://1.bp.blogspot.com/-n5kkdF0rFM8/VseN5nm1iqI/AAAAAAAABCw/n3ss4PFzYDY/s1600/seed_growth_problems.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="296" src="https://1.bp.blogspot.com/-n5kkdF0rFM8/VseN5nm1iqI/AAAAAAAABCw/n3ss4PFzYDY/s320/seed_growth_problems.png" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Difficulties with growth from a seed</td></tr></tbody></table>Firstly, <b>A</b> shows the - slightly obvious - idea that for some (seed, leaf) pairs you can only get from one to the other by adding edges and not vertices. This problem is easy enough.<br /><br />As for <b>B</b>, I show here a detailed (if made up) example of the main problem : augmentation of a seed is not necessarily canonical. Or, to put it another way, the canonical deletion can lead to a sibling of the seed, rather than the seed itself.<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-lLLh5Viy9fU/VseVQ3o-f-I/AAAAAAAABDA/YB9WOy3POWk/s1600/subtrees_sibling_seeds.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="205" src="https://3.bp.blogspot.com/-lLLh5Viy9fU/VseVQ3o-f-I/AAAAAAAABDA/YB9WOy3POWk/s400/subtrees_sibling_seeds.png" width="400" /></a></div>I think the way round this might be to restrict the candidate atoms (or bonds, even) for canonical deletion to those outside the original seed. In other words, canonically label the augmentation to give the ordering of atoms/bonds then choose the largest labelled one that is not in the seed.gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-4798048091292277372016-02-18T20:20:00.002+00:002016-02-18T20:21:49.534+00:00Seeds and Weeds : Good/Bad Lists in Structure GenerationWith the recent revival of the moleculegen (AMG) project, I've started to properly think beyond just simple generation of spaces from a formula. For example, there are 4 trillion or so C30H62 structures - which might take ... a while.<br /><br />In any case, one useful feature would be to have good/bad lists of substructures (or 'nice' vs 'naughty' as I've been thinking of them. One simple approach I thought might work is to just start with the largest good substructure and generate from there. This <i>can</i> work :<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-Jv1x7_3xsJ0/VsYlQAIeSwI/AAAAAAAABCI/RJsKO66fGaM/s1600/six_seed.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="200" src="https://3.bp.blogspot.com/-Jv1x7_3xsJ0/VsYlQAIeSwI/AAAAAAAABCI/RJsKO66fGaM/s320/six_seed.png" width="320" /></a></div><div class="separator" style="clear: both; text-align: center;">C9H16 from 6-cycle</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">(By-the-way : all images done with John May's new Depict utility! It's wonderful to use :)</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">Which is great! Except that it doesn't <i>always</i> work. The problem is that leaves on the tree with a particular substructure might not have that substructure as a common parent in the tree. Consider these C6H8 structures generated from a 4-cycle:</div><div class="separator" style="clear: both; text-align: left;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="http://1.bp.blogspot.com/-YJl6_Sq0uJs/VsYmPFJYyQI/AAAAAAAABCU/DK3MfUeDH8g/s1600/fourring_to_c6h10.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="144" src="https://1.bp.blogspot.com/-YJl6_Sq0uJs/VsYmPFJYyQI/AAAAAAAABCU/DK3MfUeDH8g/s320/fourring_to_c6h10.png" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">C6H10 from 4-cycle</td></tr></tbody></table><div class="separator" style="clear: both; text-align: left;">These are not all of them. Now we filter out using subgraph isomorphism (Asad's SMSD code) from the whole space:</div><div class="separator" style="clear: both; text-align: left;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="http://3.bp.blogspot.com/-QydsmlTJxms/VsYnGHYSQ6I/AAAAAAAABCg/R8UVQIDTuJE/s1600/filtered_c6h10.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="201" src="https://3.bp.blogspot.com/-QydsmlTJxms/VsYnGHYSQ6I/AAAAAAAABCg/R8UVQIDTuJE/s320/filtered_c6h10.png" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Whole C6H10 space filtered by 4-cycle </td></tr></tbody></table><div class="separator" style="clear: both; text-align: left;">So we get half of them by growing from a 'seed'. Now to think about filtering <b>out</b> the weeds...</div>gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-59911844001990183972016-02-09T00:30:00.001+00:002016-02-09T23:44:06.245+00:00Pictures of Very Wide TreesVisualising canonical augmentation trees of molecules is not quite as good as I thought:<br /><br /><a href="http://3.bp.blogspot.com/-rUMycP832sM/Vrky73ZcJLI/AAAAAAAABBk/noJuUyvwf4Q/s1600/test.png" imageanchor="1"><img border="0" height="32" src="https://3.bp.blogspot.com/-rUMycP832sM/Vrky73ZcJLI/AAAAAAAABBk/noJuUyvwf4Q/s640/test.png" width="640" /></a><br /><br />Or perhaps I should use a different tree layout. Probably a circular one.<br /><br />Edit. Like this:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-Qfz4kOQF-rE/Vrp5nEtX4CI/AAAAAAAABB0/Zb_Dmqdv9-o/s1600/test.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://4.bp.blogspot.com/-Qfz4kOQF-rE/Vrp5nEtX4CI/AAAAAAAABB0/Zb_Dmqdv9-o/s320/test.png" width="320" /></a></div>gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-23802392402066956962015-06-27T17:49:00.001+01:002015-06-27T17:49:24.053+01:00Biconnectivity and Degree SequenceVery occasionally there is a question on math stack exchange that I can actually give some kind of useful answer to. <a href="http://math.stackexchange.com/questions/1328539/determining-graph-biconnection-from-degree-sequence?answertab=votes#tab-top">This question</a> caught my eye; the questioner asks whether graphs with the same degree sequence have the same <a href="https://en.wikipedia.org/wiki/Biconnected_graph">biconnectivity</a> property. Or to put it another way, are there any pairs of graphs with the same degree sequence but one is biconnected and the other is not?<br /><br />I thought of an example of such a pair, and another user supplied a much simpler one, but here are both pairs:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-kiNjwITGJoU/VY7S50kDzzI/AAAAAAAABAY/2C_lbWxRWPM/s1600/biconnected_isodegree.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="291" src="http://1.bp.blogspot.com/-kiNjwITGJoU/VY7S50kDzzI/AAAAAAAABAY/2C_lbWxRWPM/s400/biconnected_isodegree.png" width="400" /></a></div>The vertices are coloured by degree (4 = orange, 3 = blue, 2 = green); the red arrows show a transformation of the top graph into its non-biconnected partner. Obviously there may be more than one, I've no idea how many members of a set of isodegree graph might be expected to be biconnected.<br /><br />For the record, here is the transformation of the bottom pair:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-WlTthRw07Wo/VY7T1e2g7tI/AAAAAAAABAg/CADDVlqOyLI/s1600/wheel_transform.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="251" src="http://3.bp.blogspot.com/-WlTthRw07Wo/VY7T1e2g7tI/AAAAAAAABAg/CADDVlqOyLI/s320/wheel_transform.png" width="320" /></a></div><br />The red lines bisecting edges indicate a kind of 'bond breaking' although it's not meant to represent an actual chemical reaction!gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com1tag:blogger.com,1999:blog-123313693388384663.post-39155712570364676462015-06-24T12:52:00.001+01:002015-06-24T12:52:15.218+01:00Multigraph MiserySo another lesson reluctantly learned...<br /><br />It seems the naïve approach to augmenting molecules by sets of bonds (colored edges, strictly rather than multiple edges) did not work. For example this pair of C9H16 graphs:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-_ZWoBC0Hvuo/VWzDLNwPPLI/AAAAAAAAA_k/lt44IG1UuSI/s1600/failing_nines.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="243" src="http://4.bp.blogspot.com/-_ZWoBC0Hvuo/VWzDLNwPPLI/AAAAAAAAA_k/lt44IG1UuSI/s320/failing_nines.png" width="320" /></a></div><div><br /></div><div>Thicker lines represent double bonds, and the thinner are singles. The parents of these are non-isomorphic - which is easy to see if you just remove the 6:8 edge from both. However, they cannot be distinguished as augmentations by my current method as the canonical labelling each one gives a graph where the last bond added is the 'natural' canonical choice.<br /><br />The alternative might be to use the canonical labelling method for multigraphs suggested by the <i>nasty</i> manual which involved transforming the multiedge graph into a layered simple graph. This is illustrated in this example:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-anyGiFA9d7g/VYqYakglbSI/AAAAAAAABAE/vjCAzgJsvh0/s1600/can_lab_lay_tran.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="153" src="http://4.bp.blogspot.com/-anyGiFA9d7g/VYqYakglbSI/AAAAAAAABAE/vjCAzgJsvh0/s400/can_lab_lay_tran.png" width="400" /></a></div>The transformation converts single edges into an edge in the first layer, and multiple edges into an edge in the second layer (and so on). Highlighted in purple is an example augmentation of one single edge and one multiple edge. This augmentation is transformed in the process of making the simple graph, then ordered. Finally, the augmentation in the canonically labelled, layer transformed graph has to be checked to see if it is the chosen augmentation.<br /><br />Phew! All that remains is implementing it properly...</div>gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-58480172370039749712015-05-13T22:35:00.000+01:002015-05-13T22:35:01.873+01:00Millions of Graphs : Slow Yet Correct GenerationMy newest version of canonical path augmentation code for generating graphs has reached a new high point - generating 11,716,571 graphs on ten vertices. Of course, it also gets the number of nines (261,080) and the number of eights (11,117) correct as well ... which is great, but I'm cautious about declaring it 'correct'. Especially given the <a href="http://gilleain.blogspot.co.uk/2012/10/timing-four-augmentation-algorithms.html">last version</a> did not get the sevens and eights right. See, for example these past failures:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-WEsToX6zfr0/VTuHoKn563I/AAAAAAAAA-s/wg5WUMuIyO4/s1600/failed_nines.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="237" src="http://3.bp.blogspot.com/-WEsToX6zfr0/VTuHoKn563I/AAAAAAAAA-s/wg5WUMuIyO4/s1600/failed_nines.png" width="320" /></a></div><br /><br />So how does it get the right answer? Well, it now properly uses the method mentioned in <a href="http://gilleain.blogspot.co.uk/2012/09/two-different-ways-to-handle.html">this post</a> to only pick canonical deletions that are not <a href="https://proofwiki.org/wiki/Definition:Cut-Vertex">cut-vertices</a>. That turns out only to be necessary for graphs on 8 vertices, but you still have to check this for all augmentations, which seems expensive. However, there was a more fundamental problem; consider the example below (basically nicked from <a href="https://computationalcombinatorics.wordpress.com/2012/08/13/canonical-deletion/">Derick Stolee</a>'s blog post):<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-YFybp8D-_Bs/VVO_4ar0fgI/AAAAAAAAA_E/bW_8GnvvDg0/s1600/dual_path.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="230" src="http://2.bp.blogspot.com/-YFybp8D-_Bs/VVO_4ar0fgI/AAAAAAAAA_E/bW_8GnvvDg0/s400/dual_path.png" width="400" /></a></div><br /><br />Obviously A and B are isomorphic, yet how do we properly distinguish them? Well, the key is the set of vertices added to - on the image, these are the labels on the edges between graphs : {0}, {1, 3}, etc. When a new graph is created, a vertex is chosen - using canonical labelling, in my case - and the vertices attached to it must be the ones we used to make that augmented graph. I was checking the set of augmented vertices in the automorphism group of the parent, not the child.<br /><br />So, the canonical checking is now better. I seem to have written a thousand of these methods, but this one (I think!) finally does it right. What I was getting wrong was checking the orbit of the canonical deletion vertex, and not the orbit of the set of vertices it was being connected to. Great! Now what? How long does it take? See this, where the purple line is the new code, and the others are older attempts:<br /><br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-h0uwk3D2v_E/VVPC8N1J8LI/AAAAAAAAA_Q/0B6-MWh0xXU/s1600/timings.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="344" src="http://2.bp.blogspot.com/-h0uwk3D2v_E/VVPC8N1J8LI/AAAAAAAAA_Q/0B6-MWh0xXU/s640/timings.png" width="640" /></a></div><br /><br />Clearly the problem now is that of verifying the results - it's quite slow to generate these large datasets, and storing them (uncompressed) takes a lot of space. The nines took minutes and megabytes of space, while the tens took hours and over a gigabyte. At this rate, the 11s would take days and 10s of gigabytes. In any case - where do you stop?gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com1tag:blogger.com,1999:blog-123313693388384663.post-16465971389180499302014-09-29T20:32:00.001+01:002014-09-29T20:32:42.890+01:00Generating Dungeons With BSP Trees or Sliceable RectanglesSo, I admit that the original reason for looking at <a href="http://gilleain.blogspot.com/2014/09/colorful-expanding-triangulations-and.html">sliceable rectangles</a> was because of <a href="http://gamedev.stackexchange.com/questions/82059/algorithm-for-procedureral-2d-map-with-connected-paths">this gaming stackoverflow</a> question about generating dungeon maps. The approach described there uses something called a binary split partition tree (<a href="http://en.wikipedia.org/wiki/Binary_space_partitioning">BSP Tree</a>) that's usually used in the context of 3D - notably in the rendering engine of <a href="http://doom.wikia.com/wiki/Doom_rendering_engine">the game Doom</a>. Here is a BSP tree, as an example:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-rGsEy_zRrvU/VCmtzbxkNaI/AAAAAAAAA9Y/ZlAVcDPySmk/s1600/bsp_tree_vs_slices.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-rGsEy_zRrvU/VCmtzbxkNaI/AAAAAAAAA9Y/ZlAVcDPySmk/s1600/bsp_tree_vs_slices.png" height="253" width="400" /></a></div><br /><br />In the image, we have a sliced rectangle on the left, with the final rectangles labelled with letters (A-E) and the slices with numbers (1-4). The corresponding tree is on the right, with the slices as internal nodes labelled with 'h' for horizontal and 'v' for vertical. Naturally, only the leaves correspond to rectangles, and each internal node has two children - it's a binary tree.<br /><br />So what is the connection between such trees and the sliceable dual graphs? Well, the rectangles are related in exactly the expected way:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-3paEXoSEbQE/VCmzK_S24AI/AAAAAAAAA9o/oePG8lLVJUg/s1600/bsp_tree_vs_graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-3paEXoSEbQE/VCmzK_S24AI/AAAAAAAAA9o/oePG8lLVJUg/s1600/bsp_tree_vs_graph.png" height="257" width="400" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div>Here, the same BSP tree is on the left (without some labels), and the sliceable dual is on the right (without corner assignments). The red and blue edges are just the horizontal and vertical associations between rectangles.<br /><br /><br />gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-86637200390553670272014-09-10T21:14:00.001+01:002014-09-10T21:46:05.971+01:00Colorful Expanding Triangulations and Sliceable Rectangular GraphsThere is a whole area of study on visualisations called cartograms - most appealing are the ones that make countries look <a href="http://www.worldmapper.org/">like inflated or deflated balloons</a>. The rectangular versions of these are less pretty, but more interesting to me from a graph theory perspective.<br /><br />I came across this subject via an impressive <a href="http://vincentkusters.org/pdf/Talk_MastersThesis45.pdf">masters thesis</a> by Vincent Kusters : '<br />Characterizing Graphs with a Sliceable Rectangular Dual' … which is a title that will take some explaining. Firstly, what is a 'rectangular dual' when it's at home? Well check this out:<br /><div><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-B_6qKOEOYbo/VBC4eW8QBUI/AAAAAAAAA9I/K9-zA9LCGLg/s1600/rectangular_slice_example.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-B_6qKOEOYbo/VBC4eW8QBUI/AAAAAAAAA9I/K9-zA9LCGLg/s1600/rectangular_slice_example.png" height="288" width="400" /></a></div><br /></div><div class="separator" style="clear: both; text-align: center;"></div><div>Clearly the thing on the left is a graph, and on the right is its rectangular dual - in fact, this is the smallest 'sliceable' dual. By sliceable, I mean that the white rectangles can be made by recursively slicing up a rectangle. For example, if a slice is like [{0, 3, 4, 5, 6}, {1, 2}] for making the first split into the areas of 1 and 2 on the right, and all the rest on the left. The next could be [{0}, {3, 4, 5, 6}] and so on.</div><div><br /></div><div>The colors of the graph indicate a top/bottom cut in red, and a left/right cut in blue. So rectangles 0 and 1 share a left-right (blue) boundary, while 0 and 4 share a top-bottom (red) boundary. The square nodes [T, R, B, L] are the 'corners', and serve to anchor the dual. There's a lot of detail that I'm skipping here, but this is the broad picture.</div><div><br /></div><div>Interestingly - for me - Kuster's work makes use of a program called <a href="http://cs.anu.edu.au/~bdm/plantri/">Plantri</a> made by none other than Gunnar Brinkmann and Brendan McKay. It generates planar triangulations - which rectangular duals are examples of - and then colors them to make proper duals. The way Plantri works is fairly familiar; <a href="http://gilleain.blogspot.com/2011/08/mckays-canonical-augmentation-method.html">canonical path augmentation</a> but with a restricted set of operations to add vertices and edges:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-VRiwVyD9ldU/VBCvIT-eIcI/AAAAAAAAA8o/c1z_a-m0dK4/s1600/triangulation_expansions.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-VRiwVyD9ldU/VBCvIT-eIcI/AAAAAAAAA8o/c1z_a-m0dK4/s1600/triangulation_expansions.png" height="202" width="320" /></a></div><br />Starting from K4 - the complete graph on 4 vertices - these 'expansions' are applied to graphs while rejecting duplicates using CPA. Now the thing that occurs to me is the possibility of expanding while maintaining the colorings of a rectangular dual. For example:<br /><br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-rIrMQafu7pk/VBCv6jjEYJI/AAAAAAAAA84/BVw0VStModk/s1600/slicable_expansions_detail.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-rIrMQafu7pk/VBCv6jjEYJI/AAAAAAAAA84/BVw0VStModk/s1600/slicable_expansions_detail.png" height="268" width="400" /></a></div><br />These are just examples of E5 from the picture before, but starting from particular colorings, and expanding only to particular colorings. As can be seen from the rectangular slices to the side of each graph, these expansions are 'compatible' in some sense with changes in the dual. Whether this is a meaningful operation or not, I'm not sure. There are a number of possible such expansions, but not a huge number. Here are a couple more:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-BSW2HKxAMkI/VBCv4oYQg5I/AAAAAAAAA80/PB7AKGWm6nQ/s1600/sliceable_expansions_colors.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-BSW2HKxAMkI/VBCv4oYQg5I/AAAAAAAAA80/PB7AKGWm6nQ/s1600/sliceable_expansions_colors.png" height="231" width="400" /></a></div><br />Note that B and C are the same, but expand to different possible colorings. Also that the outer cycle colors are preserved, along with the some of the internal edges. That is no particular coincidence, since they were chosen specifically to preserve as many of the edges colors as possible.<br /><br />Interesting, but not yet conclusive in any way.</div>gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-84520495689252056652014-09-04T21:17:00.000+01:002014-09-04T21:17:28.549+01:00Misunderstanding Embeddings, and Whitney FlipsSo I should correct something that I posted <a href="http://gilleain.blogspot.co.uk/2011/03/squashing-molecules-flat-and.html">a while ago</a> about embedding 3-connected graphs. The most obvious examples of this class of graph are <a href="http://en.wikipedia.org/wiki/Polyhedron">polyhedra</a> - tetrahedra, cubes, etc - and maybe it's obvious that there is only <b>one</b> way to embed these in the plane. So in that sense, 'enumerating' the embeddings for these graphs is quite easy ... there's just one to count.<br /><br />Of course, this embedding can be drawn with any of its cycles as an outer face; this is what gave rise to the different looking drawings. I guess the way to think about it is that the <i>embedding</i> is on the surface of a sphere - where there is no 'privileged' face to call the <i>outer</i> face - and that the drawing on the plane just picks one of the faces to 'squash flat' as I put it back in (wow) 2011.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-0255KFim1HY/VAjIW66nHhI/AAAAAAAAA8M/CDlvnboZk-4/s1600/whitney_flip.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-0255KFim1HY/VAjIW66nHhI/AAAAAAAAA8M/CDlvnboZk-4/s1600/whitney_flip.png" height="181" width="400" /></a></div><br /><br />Anyway - on to Whitney flips! There are a class of graphs that can be embedded in the plane in different ways (that is, the combinatorial map is different for the same outer face). These are subject to a Whitney flip, named after the <a href="http://en.wikipedia.org/wiki/Hassler_Whitney">mathematician</a> who laid the foundation for matroids among other things. As an example see the image above.<br /><br />The only graphs that have this flip are ones with a 2-vertex cutset - the top and bottom vertices in the image. Of course, finding these is a whole other problem, which I may or may not get around to describing...<br /><br /><br />gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-69362586227821481352014-01-13T13:50:00.002+00:002014-01-13T13:50:47.274+00:00InChI and InChIKey Metadata in Cambridge DSpace Repository (WWMM)At the end of last year, I updated the metadata on some 175,000 or so items in the <a href="https://www.repository.cam.ac.uk/">Cambridge DSpace repository</a>. These were molecules that made up a copy of the '<a href="https://www.repository.cam.ac.uk/handle/1810/724">WWMM</a>' (the <u>W</u>orld <u>W</u>ide <u>M</u>olecular <u>M</u>atrix) and they had old 'IChI' identifiers rather than the newer <a href="http://www.iupac.org/home/publications/e-resources/inchi.html">InChI</a> and InChIKey identifiers.<div><br /></div><div>So now - after this update - you can use a search engine to <strike>google</strike> … er, <i>search for</i> compounds by their InChIKey. For example:</div><div><br /></div><blockquote class="tr_bq">YMSFBKYTOUKHOI-UHFFFAOYSA-N</blockquote>gives just two results, one of which is from <a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=343606">PubChem</a>, and the other is Cambridge Repository. Hilariously, the image from PubChem is this:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-JukwFjUsOug/UtPupQZnFaI/AAAAAAAAA6M/YKCzfuKZ7Gk/s1600/imagefly.cgi.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="http://2.bp.blogspot.com/-JukwFjUsOug/UtPupQZnFaI/AAAAAAAAA6M/YKCzfuKZ7Gk/s320/imagefly.cgi.png" width="320" /></a></div><br />when the formula is C32H60N4O4. I assume that the connectors in this cycle are just alkane chains, but the layout fails for this kind of 'cyclic lipid peptide' (or whatever it is called!).gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-70670798572140908742013-12-19T15:53:00.003+00:002013-12-19T15:54:57.577+00:00Festive Chemical Structure Generation : Necklaces and Trees!So, a student asked me about a homework question that is a sub-problem of the <a href="http://gilleain.blogspot.co.uk/2012/10/alternative-molecule-generation.html">structure generation problem</a>. Basically, it was to <i>count</i> the number of chemical structures with <i>exactly one</i> cycle given the elemental formula. Of course, the best solution here is probably to use the <a href="http://en.wikipedia.org/wiki/P%C3%B3lya_enumeration_theorem">Polyá Enumeration Theorem</a> since all that was asked for was a count (enumeration) of the structures.<br /><br />Naturally, I have a different way to do this - especially since I don't really understand the mathematics of PET enough to implement it. So:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-WMgZ_RSyIYs/UrMOvgEr6sI/AAAAAAAAA5o/f9ez6K0bSNA/s1600/festive_method_overview.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="250" src="http://1.bp.blogspot.com/-WMgZ_RSyIYs/UrMOvgEr6sI/AAAAAAAAA5o/f9ez6K0bSNA/s400/festive_method_overview.png" width="400" /></a></div><br />The image shows a rough overview of how I might <i>list</i> all of the structures with a single cycle. It takes a number of necklaces (one shown), and a number of trees, and glues the one to the other in all possible ways. The word '<a href="http://en.wikipedia.org/wiki/Necklace_(combinatorics)">necklace</a>' here is specifically the combinatorial object; so the cyclic sequence CNCNO is the same as CNOCN since you can rotate one to get the other.<br /><br />One tricky decision here is whether to add multiple bonds to the necklace before or after adding the trees. It seems like this would make a difference to how fast the algorithm was - if you add the bonds afterwards, you might reject many of the possible attachments. Hard to say.<br /><br />The other aspect to consider is the connection of the parts. If we consider necklaces (or 'cycles') and trees as types of <i>block</i>, then the problem is connecting together blocks into a tree. This is essentially the reverse of the approach detailed in <a href="http://gilleain.blogspot.co.uk/2012/05/general-graph-layout-putting-parts.html">this post</a> - using a block decomposition tree to guide the assembly of the blocks:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-M3w2EVXg9Bs/UrMWD7tIXII/AAAAAAAAA54/w0-861O3BAA/s1600/tree_block_graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="216" src="http://1.bp.blogspot.com/-M3w2EVXg9Bs/UrMWD7tIXII/AAAAAAAAA54/w0-861O3BAA/s640/tree_block_graph.png" width="640" /></a></div><br />Although, now I come to look at it, it seems like the attachment points on the necklace would drive the underlying block-tree. So perhaps this is only relevant for graphs that contain <b>multiple</b> cycles - which starts to become a much more difficult problem!<br /><br />Happy Holidays, anyway...gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com1tag:blogger.com,1999:blog-123313693388384663.post-54608004872902295782013-11-29T17:48:00.001+00:002013-11-29T17:48:08.567+00:00Comparing Kiraly (Exhaustive) Graph Generation with nAUTy OutputSo recently I was asked about Király's method for generating <a href="http://gilleain.blogspot.co.uk/2012/07/kiralys-method-for-generating-all.html">all graphs from a degree sequence</a>. While refactoring some of the code that I wrote to do this, I also made some tests. Specifically, coverage tests to check that the generation was actually exhaustive. I know it's redundant, but I have good tools to remove duplicate graphs - or I thought that I did…<br /><br />Here a rough flowchart of the procedure here, starting with a number ('n') that is passed to Dreadnaut (the interface to <a href="http://cs.anu.edu.au/~bdm/nauty/">nAUTy</a>) to generate graphs:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-QRF5q1h53Cc/UpjReJXzaKI/AAAAAAAAA5M/UCGXy4GXRKQ/s1600/nauty_kiraly_compare.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="172" src="http://3.bp.blogspot.com/-QRF5q1h53Cc/UpjReJXzaKI/AAAAAAAAA5M/UCGXy4GXRKQ/s400/nauty_kiraly_compare.png" width="400" /></a></div><br />These graphs are grouped by degree sequence, and these degree sequences are fed into the KirályHHGenerator to reconstruct the set of graphs. I think that compare arrow is wrong, but never mind. The point is that the sets should be the same size.<br /><br />They are for n=5,6,7 but not for 8. Oddly enough, however, there are <i>more</i> in the Király set than in the nAUTy set. The obvious conclusion would be that my duplicate detection is failing - in other words, I am failing to spot an isomorphism between two graphs. For example this pair:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-Nj47KgxQe-M/UpjSaur4enI/AAAAAAAAA5U/KUyrXWYcSRY/s1600/nauty_possible_misses.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="180" src="http://3.bp.blogspot.com/-Nj47KgxQe-M/UpjSaur4enI/AAAAAAAAA5U/KUyrXWYcSRY/s400/nauty_possible_misses.png" width="400" /></a></div><br />However, two of my methods give different answers for this pair. The s<a href="http://gilleain.blogspot.co.uk/2010/06/cdk-signature-implementation-now-in.html">ignatures</a> method says they are different, while the <a href="http://gilleain.blogspot.co.uk/2010/05/stuck-detailed-description.html">partition refinement</a> method says they are the same. Odd - and more investigation is needed before I am certain that <i>geng</i> has missed some graphs here...gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-6295446916625704552013-10-11T08:51:00.001+01:002013-10-11T18:00:55.418+01:00Centrality as a Vertex Invariant (or 'Atom Descriptor')<b>EDIT</b>: After some more tests, I now realise that this is not really as great a vertex label/descriptor as I thought it was. For example, see these four graphs on 7 vertices that fail to distinguish vertices properly:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-28wKeYjIHdw/UlguNXcN-aI/AAAAAAAAA3w/41wn7PaAUQg/s1600/seven_fails.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="164" src="http://4.bp.blogspot.com/-28wKeYjIHdw/UlguNXcN-aI/AAAAAAAAA3w/41wn7PaAUQg/s640/seven_fails.png" width="640" /></a></div><br /><br />The first one should have a central vertex in a different class than the other blue vertices. The green class in the second graph should be split, and same for the third graph. And so on.<br /><br /><hr /><br />So, in the <a href="http://gilleain.blogspot.co.uk/2013/10/common-vertex-matrices-of-graphs.html">last post</a> I talked about the ideas of Randić et al for calculating the 'centrality' of vertices in a graph. Interestingly, the numbers calculated for each vertex act as a kind of equivalence class label or vertex invariant. This is similar in many ways to <a href="http://chem-bla-ics.blogspot.co.uk/2011/08/speeding-up-cdk-morgan-numbers.html">Morgan numbers</a> (sorry, Egon's post doesn't actually explain them, but they are the sum of degrees across extended neighbourhoods).<br /><br />For example, here is one of the examples from the previous post:<br /><br /><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-_zeV4GeyaRU/UlerLq8agkI/AAAAAAAAA3U/HZKYOMjOnjI/s1600/bicycle_centrality.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="256" src="http://4.bp.blogspot.com/-_zeV4GeyaRU/UlerLq8agkI/AAAAAAAAA3U/HZKYOMjOnjI/s640/bicycle_centrality.png" width="640" /></a></div><br />With the centrality matrix in the middle, and the 'label' made by sorting the row elements in descending order to the right of that. Finally, these labels are converted to more easily read alphabetic ones - classes in some sense ('a' = {0, 5} and 'b' = {1, 2, 3, 4}). These classes make sense, given that the middle vertices are in the same class, with the rest in another class.<br /><br />Compare this to the graph with the same ORS of [6, 6, 5, 5, 5, 5]:<br /><br /><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-Lvs30oNZ7Ng/Uleqmf530jI/AAAAAAAAA3M/ffqQOk3bEAw/s1600/rabbitsquare_centrality.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="244" src="http://4.bp.blogspot.com/-Lvs30oNZ7Ng/Uleqmf530jI/AAAAAAAAA3M/ffqQOk3bEAw/s640/rabbitsquare_centrality.png" width="640" /></a></div><br />The graph here is nearly the same - with only the edge 1:3 missing - yet the labels don't distinguish between vertices {1, 3} and vertices {2, 4}. One possibility mentioned in the paper is to use combinations of descriptors (fairly common practice in cheminformatics, I suspect). The simplest one that occurs to me is just to add the degree of a vertex to the start of the label. That makes the label for {1, 3} into "1320000", distinguishing them from the one for {2, 4} which is "2320000".<br /><br />Anyway, here is a picture of a number of pairs of graphs with the same ORS, colored by the label just described:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-kIO0MWfH0iA/UletXKEvnYI/AAAAAAAAA3g/KQPn8DyjvCU/s1600/ors_pairs.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="496" src="http://4.bp.blogspot.com/-kIO0MWfH0iA/UletXKEvnYI/AAAAAAAAA3g/KQPn8DyjvCU/s640/ors_pairs.png" width="640" /></a></div><br />Note how some (but not all!) of these have the 'same' equivalence classes in different arrangements. Be aware that the colors may not be totally meaningful when compared between graphs. <a href="https://github.com/gilleain/gendraw/blob/master/src/gendraw/ORSEquivalenceClassDraw.java">Code for this is here</a>.gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-18390382348512711382013-10-05T11:59:00.001+01:002013-10-05T11:59:55.563+01:00Common Vertex Matrices of GraphsThere is an interesting set of papers out this year by <a href="http://en.wikipedia.org/wiki/Milan_Randi%C4%87">Milan Randic</a> et al (sorry about the accents - blogger seems to have a problem with accented 'c'...). I've looked at his work before <a href="http://gilleain.blogspot.co.uk/2008/10/on-canonical-numberings.html">here</a>.<br /><br />[1] <a href="http://onlinelibrary.wiley.com/doi/10.1002/jcc.23300/full">Common vertex matrix: A novel characterization of molecular graphs by counting</a><br />[2] <a href="http://onlinelibrary.wiley.com/doi/10.1002/jcc.23413/full">On the centrality of vertices of molecular graphs</a><br /><br />and one still in publication to do with fullerenes. The central idea here (ho ho) is a graph descriptor a bit like path lengths called 'centrality'. Briefly, it is the count of neighbourhood intersections between pairs of vertices. Roughly this is illustrated here:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-TTskm_8EEBc/Uk_qn22ZlOI/AAAAAAAAA2A/MIRiAZcEDvk/s1600/overlapping_neighbourhoods.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="187" src="http://4.bp.blogspot.com/-TTskm_8EEBc/Uk_qn22ZlOI/AAAAAAAAA2A/MIRiAZcEDvk/s400/overlapping_neighbourhoods.png" width="400" /></a></div><br />For the selected pair of vertices, the common vertices are those at the same distance from each - one at a distance of two and one at a distance of three. The matrix element for this pair will be the sum - 2 - and this is repeated for all pairs in the graph. Naturally, this is symmetric:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-f5bjSPo_LcY/Uk_tI3R8kpI/AAAAAAAAA2Y/sziudVC3WgM/s1600/cvp_2_me_pent.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="177" src="http://3.bp.blogspot.com/-f5bjSPo_LcY/Uk_tI3R8kpI/AAAAAAAAA2Y/sziudVC3WgM/s400/cvp_2_me_pent.png" width="400" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div>At the right of the matrix is the row sum (∑) which can be ordered to provide a graph invariant. In the case of 2-methylpentane it is [6, 5, 5, 3, 3, 2] - this is referred to as the ORS (ordered row sum) in the papers. Naturally, such a simple property to calculate is not a complete graph invariant, although it does have quite a high discriminatory power for acyclic graphs.<br /><br />Some examples of pairs of graphs with identical ORS are these two pairs:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-3h38W1PYNaw/Uk_vCxZH86I/AAAAAAAAA2k/zX8O4XMr5jc/s1600/cvp_isomers.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="177" src="http://2.bp.blogspot.com/-3h38W1PYNaw/Uk_vCxZH86I/AAAAAAAAA2k/zX8O4XMr5jc/s400/cvp_isomers.png" width="400" /></a></div><br />Where it is clear that they have quite similar structures - which is also true for ORS close together in lex order. There is some code to test all this <a href="https://github.com/gilleain/generate/blob/HEAD/src/test/distance/CentralityCalculatorTest.java">here</a>. However the implementation I've made is quite inefficient, I suspect. My algorithm was:<br /><br />1) Calculate a distance matrix (with <a href="http://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm">Floyd-Warshall</a>).<br />2) Use this matrix to get lists of vertices arranged by neighbourhood.<br />3) For each pair of vertices, sum the sizes of the intersections of the neighbourhoods.<br /><br />It really seems like this could be done in less steps, perhaps in some way similar to the first step?gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-79884622778664877012013-06-09T17:19:00.001+01:002013-06-09T17:19:37.304+01:00Tutte's Twist Operation on Cubic GraphsThere is an interesting book by <a href="https://en.wikipedia.org/wiki/Tutte">W. T. Tutte</a> called '<a href="http://ukcatalogue.oup.com/product/9780198502517.do#.UbSjxODTTVI">Graph Theory as I have known it</a>' which is a cross between a normal mathematical text and a biography. So it's a description of the areas he was interested in, and his theorems. One thing that interested me was the use of a 'twist' operation on cubic graphs like so:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-eftv3crXA24/UbSjTDUNCnI/AAAAAAAAA0s/a_SnrrsQg9w/s1600/link_twistings_cubic.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="250" src="http://3.bp.blogspot.com/-eftv3crXA24/UbSjTDUNCnI/AAAAAAAAA0s/a_SnrrsQg9w/s320/link_twistings_cubic.png" width="320" /></a></div><br />Where for the edge between vertices x and y labelled 'A' we reconnect the surrounding edges to form the arrangement on the right hand side. So detach edge D from y and connect it to x, and vice versa with edge C. The lower part of the picture shows what happens for a loop-edge - it transforms to a multi-edge.<br /><br />This operation is used on a family of 'base' graphs looking like this:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-OpoX-Zkb-v8/UbSmEvqhYDI/AAAAAAAAA08/PGuG1ElWxnw/s1600/base_graph_family.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="http://1.bp.blogspot.com/-OpoX-Zkb-v8/UbSmEvqhYDI/AAAAAAAAA08/PGuG1ElWxnw/s320/base_graph_family.png" width="310" /></a></div>with the first in the list is a vertexless loop graph - that is, it has no vertices and a single edge. From these base graphs, the twist operation can form any cubic graph. Note that all of U<sub>n</sub> are cubic with 2n vertices.<br /><br />For example, from U<sub>3</sub> we can get to both of the (simple) graphs with 6 vertices by the following sequence:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-KnkhmqTMZxI/UbSnY56MNZI/AAAAAAAAA1M/QjdHGv-8cj8/s1600/tutte_twists_prism.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="245" src="http://2.bp.blogspot.com/-KnkhmqTMZxI/UbSnY56MNZI/AAAAAAAAA1M/QjdHGv-8cj8/s320/tutte_twists_prism.png" width="320" /></a></div><br />In this diagram, the twist is being applied to the red edge, then the blue, then the green, etc. The final step converts the prism (G<sub>6</sub>) to K<sub>3,3</sub> (G<sub>7</sub>) while the other steps involve non-simple graphs with loops and multiple edges.<br /><br />One of Tutte's uses for these transformations was to show that the number of <a href="https://en.wikipedia.org/wiki/1-factor">1-factors</a> (perfect matching) J of a graph can be calculated by J(G) + J(G<sub>A</sub>) = J(H) + J(H<sub>A</sub>) where G<sub>A</sub> is a graph with the edge A deleted. So, starting with a base graph U - which has J = 1 except for U<span class="Apple-style-span" style="font-size: x-small;">0</span> where J = 2. Then use that value to determine the number of 1-factors in the next graph in the sequence, and so on.<br /><br />It does make me wonder if there is a way to generate cubic graphs from these base examples, by these twists. From a few simple examples it is clear that there would be a lot of redundancy at the leaves of the generated tree, but possibly that could be handled with canonical path augmentation in some way.gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com1tag:blogger.com,1999:blog-123313693388384663.post-28291176577540466682013-04-16T18:16:00.000+01:002013-04-16T18:16:19.185+01:00Signatures with user-defined edge colorsA bug in the <a href="http://gilleain.blogspot.com/2010/06/cdk-signature-implementation-now-in.html">CDK implementation</a> of my <a href="https://github.com/gilleain/signatures">signature library</a> turned out to be due to the fact that the bond colors were hard coded to just recognise the labels {"-", "=", "#" }. The relevant code section even had an <a href="https://github.com/gilleain/signatures/commit/8485990be9434d78a65b6329bd332c6287c90fb3">XXX above it</a>!<br /><br />Poor show, but it's finally fixed now. So that means I can handle user-defined edge colors/labels - consider the complete graph (K5) below:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-FMUgDdVLDOY/UW2FnN9_v6I/AAAAAAAAAz8/TcxzQGmcN7M/s1600/k5_chess.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="206" src="http://2.bp.blogspot.com/-FMUgDdVLDOY/UW2FnN9_v6I/AAAAAAAAAz8/TcxzQGmcN7M/s400/k5_chess.png" width="400" /></a></div>So the red/blue colors here are simply those of a chessboard imposed on top of the adjacency matrix - shown here on the right. You might expect there to be at least two vertex signature classes here : {0, 2, 4} and {1, 3} where the first class has vertices with two blue and two red edges, and the second has three blue and two red.<br /><br />Indeed, here's what happens for K4 to K7:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-MxxOgkiNe70/UW2FkoMaAoI/AAAAAAAAAz0/_oyjh-vE5HE/s1600/complete_bicolored.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="397" src="http://3.bp.blogspot.com/-MxxOgkiNe70/UW2FkoMaAoI/AAAAAAAAAz0/_oyjh-vE5HE/s400/complete_bicolored.png" width="400" /></a></div><br />Clearly even-numbered complete graphs have just one vertex class, while odd-numbered ones have two (at least?). There is a similar situation for complete bipartite graphs:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-xiigfS6_1gc/UW2HC3IpJOI/AAAAAAAAA0E/_T5ZNFd5N94/s1600/complete_bipartite_colored.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="138" src="http://2.bp.blogspot.com/-xiigfS6_1gc/UW2HC3IpJOI/AAAAAAAAA0E/_T5ZNFd5N94/s400/complete_bipartite_colored.png" width="400" /></a></div>Although I haven't explored any more of these. Next is to put this fixed version into the CDK...gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-9194677661648945992013-01-13T21:24:00.001+00:002013-01-13T21:24:20.957+00:00Visualising Ring Equivalence Classes in JmolAs promised (in the <a href="http://gilleain.blogspot.com/2013/01/blowing-carbon-bubbles-expanding-2d.html">previous post</a>) I've now made Jmol scripts to show the atom/ring equivalence classes. I still think that the ring ones are more clear, but I suppose it depends on what aspect of the symmetry of the structure is needed. As an example:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-KbaTmDC_nfU/UPMkAozO1VI/AAAAAAAAAzU/6esZCIuE21g/s1600/c70d5h_ringEqCl.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="392" src="http://1.bp.blogspot.com/-KbaTmDC_nfU/UPMkAozO1VI/AAAAAAAAAzU/6esZCIuE21g/s640/c70d5h_ringEqCl.png" width="640" /></a></div> Shown here is a C70 structure, with coloured circular plates at the centre of each face. It should be clear that there is an axis of symmetry running through the middle, from one blue plate to the other. Around the blue is a ring of green, and 5 rings in between.<br /><br />The slight difficulty in all this was working out the ring equivalence classes. There is an existing CDK method to do this - in the SSSR ring finder - but it seems to give too many classes. The way I did it was to first find atom equivalence classes (or 'orbits') using signatures. Then each ring is a circular list of the orbit indices : which I'm going to call a 'ring code'. See this image for illustration:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-Oh7Az0ePt8A/UPMkCsumjlI/AAAAAAAAAzc/3CrFF_HkcfI/s1600/ringCodes.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="179" src="http://1.bp.blogspot.com/-Oh7Az0ePt8A/UPMkCsumjlI/AAAAAAAAAzc/3CrFF_HkcfI/s320/ringCodes.png" width="320" /></a></div>These two rings (A and B) have the same ring code, written as the smallest concatenated string formed from their orbit indices. In other words, the signatures of each atom in the ring is converted to a number based on that signatures index in a list of all the signatures for all the atoms. Obviously other atom-equivalence class methods could be used to find the initial orbits; the rest of the procedure would be the same.<br /><br />I do wonder if it would be quicker to just find the orbits of the dual of the embedding. However, that involves making that embedding first, so probably not...gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com1tag:blogger.com,1999:blog-123313693388384663.post-35479890799454779402013-01-11T23:27:00.001+00:002013-01-11T23:27:12.054+00:00Blowing Carbon Bubbles : Expanding 2D Fullerene Layouts to 3DThe concentric face layout code is working well enough now to handle the larger fullerenes - such as that <a href="http://gilleain.blogspot.co.uk/2010/05/c60-double-bonding-networks.html">old favourite</a>, C60. Since <a href="http://gilleain.blogspot.co.uk/2012/11/comparing-equivalentclasspartitioner.html">coloring the vertices</a> by equivalence class is not always terribly informative, here is a view of the <i>ring</i> equivalence classes :<br /><br /><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-B3Nbiey9Zxs/UPCcotOe42I/AAAAAAAAAyI/3cHWQPrC9vE/s1600/C60_C72_schlegel.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="336" src="http://2.bp.blogspot.com/-B3Nbiey9Zxs/UPCcotOe42I/AAAAAAAAAyI/3cHWQPrC9vE/s640/C60_C72_schlegel.png" width="640" /></a></div>Where C60 is on the left, and a more colourful <a href="http://en.wikipedia.org/wiki/C70_fullerene">C70-D5h</a> is on the right. One difficulty, however, is to understand the symmetries of these structures when they are distorted like this. The further away from the center of the layout, the more stretched the rings become.<br /><br />So, an obvious next step was to 'blow up' these 2D layouts into 3D. It turns out that is possible, with a combination of <a href="http://en.wikipedia.org/wiki/Stereographic_projection">inverse stereographic projection</a> and <a href="http://jmol.sourceforge.net/">Jmol</a>'s minimize command. The first step is necessary since minimizing the 2D coordinates (with a z-coord of zero) just shrinks the diagram down in the plane. Here are <i>before</i> and <i>after</i> shots of these steps:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-tH1VR8aST3c/UPCesbild0I/AAAAAAAAAyg/bmAMHu-ZIPk/s1600/c60_bubbled.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="323" src="http://2.bp.blogspot.com/-tH1VR8aST3c/UPCesbild0I/AAAAAAAAAyg/bmAMHu-ZIPk/s640/c60_bubbled.png" width="640" /></a></div>Clearly the inverse-projection does not give very good 3D positions for the atoms, but they are on the surface of a sphere, and the bonds don't cross. The minimization gives a much better looking version, although there are still dents and a lot of asymmetry.<br /><br />Next step is to write out the equivalence class colours (vertex/face?) into a Jmol script, so that they can be visualised in 3D. Oh, and one last thing - it was necessary to scale the flat diagram down so that the radius was closer to a unit circle. If it was contained inside the circle, then the expansion was only a hemisphere. Also, the points were expanded from the centre outwards by a small factor, to improve the bond distances.gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0tag:blogger.com,1999:blog-123313693388384663.post-54915071208437011142013-01-03T19:31:00.000+00:002013-01-03T19:31:00.248+00:00Fullerene Layout with Spokes and ArchesHaving tried (and failed) to layout fullerene structures using various optimisation methods, I thought I would try direct positioning of the atoms. In other words, 'logical' placement rather than 'physics' based layout. For example:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-vseWTNLLVKI/UOXVkmkV2EI/AAAAAAAAAwM/6YpKF3tyDhA/s1600/icosahedrane_c30_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="233" src="http://2.bp.blogspot.com/-vseWTNLLVKI/UOXVkmkV2EI/AAAAAAAAAwM/6YpKF3tyDhA/s400/icosahedrane_c30_1.png" width="400" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"></div><br />These are two regular fullerenes that work very well. The algorithm is simple in principle:<br /><br />1) Given a planar embedding <b>G</b>, calculate the <a href="http://gilleain.blogspot.com/2012/04/duals-and-inner-duals-of-planar-graphs.html">inner dual</a> <b>id(G)</b> and the 'face layers'.<br />2) The innermost layer is the 'core' which is one of: a single vertex, a connected pair, or a cycle.<br />3) Layout the core, and then each layer outwards, by spoke and arch.<br /><br />So, to explain some of this; a 'face layer' is a set of faces all at the same distance from the outer cycle, measured by graph distance on <b>id(G)</b>. So the faces adjacent to the outer cycle are the first layer, and the second layer is adjacent to that, and so on. This is roughly illustrated here:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-CEiVBR_xlWE/UOXZZGwkFSI/AAAAAAAAAwk/PvV5-mVkguY/s1600/face_layers_spokes_arches.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="166" src="http://2.bp.blogspot.com/-CEiVBR_xlWE/UOXZZGwkFSI/AAAAAAAAAwk/PvV5-mVkguY/s320/face_layers_spokes_arches.png" width="320" /></a></div><br />The concentric circles represent the layers of faces, with the innermost being the core. On the right is a cartoon of two spokes on an outer path, connected by a dashed line to show the arch. The outer path starts off as the edges around the core, and is replaced by the arches of the next level.<br /><br />This method does work - it creates diagrams that are not too horrible - but does not give as good results for the less-symmetric examples. Something like Bojan Mohar's <a href="http://www.sciencedirect.com/science/article/pii/0012365X9390340Y">circle-packing method </a>would probably be better, or even a properly implemented spring layout...<br /><br />Here are some final examples, all with paired-cores:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-tCWc2ebQcDk/UOXcCN3l1QI/AAAAAAAAAw8/7gTVdxIoae4/s1600/less_regular_fullerenes.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="212" src="http://4.bp.blogspot.com/-tCWc2ebQcDk/UOXcCN3l1QI/AAAAAAAAAw8/7gTVdxIoae4/s640/less_regular_fullerenes.png" width="640" /></a></div>(The colors are signature classes, by the way. All images made using CDK's new classes. <a href="https://github.com/gilleain/test_group">Code here</a>).gilleainhttp://www.blogger.com/profile/14491887080861584059noreply@blogger.com0