Algorithms through the lens of symbolic pattern matching

pattern matching
graph theory
Author

Ben Ruijl

Published

August 14, 2024

Support Symbolica

This blog post uses the computational framework Symbolica. Symbolica is free for hobbyists and a single-core instance is free for non-commercial use. Would you like to have new features added? Your organization can purchase a license and support the Symbolica project!

Introduction

Pattern matching is all around us. It is what you do when you solve puzzles, and it lies at the core of some programming languages. The evolution of a cell in Conway’s game of life is completely determined by the pattern of the 8 surrounding cells:

Game of life

Another common use of pattern matching is regular expressions (regex). This is a language that describes patterns in text. Below you see a regex to parse an e-mail address:

^[a-zA-Z0-9]+(?:\.[a-zA-Z0-9]+)*@[a-zA-Z0-9]+(?:\.[a-zA-Z0-9]+)*$

The text needs to start (^) with an alpha-numerical character ([a-zA-Z0-9]) and must contain @ followed by at least one alpha-numerical character. This pattern will actually not match all valid e-mail addresses! For that you need this horrendous regex. Yikes.

Pattern matching is also ubiquitous in mathematics. For example, in the recursive definition of the factorial function: \[ \begin{align} f(0) &= 1\\ f(n) &= n f(n - 1);\quad n > 0 \end{align} \]

Let’s see how we can code the above relation using the Symbolica library in Python. We need to define a function f and we need a wildcard that can match any argument of the function f. Wildcards are variables that end with an underscore (_). We will use x_.

from symbolica import *
f, x_ = Expression.symbols('f', 'x_')
e = f(10)
e = e.replace_all(f(x_), x_*f(x_-1), x_.req_gt(0), repeat=True)
e = e.replace_all(f(0), 1)
print(e)
3628800

That indeed equals 10 factorial. Note that we use the restriction x_.req_gt(0), which means that the pattern will only match if x_ is greater than 0.

Great, now let’s try to design some algorithms using only pattern matching and replacements on mathematical expressions. These algorithms will not be the most efficient, but they could be elegant and could unlock a different way of thinking.

The first challenge is to represent a problem as a mathematical expression. This will feel a bit alien at first, but remember that LLMs receive an array of integers as input and they function just fine ;)

Graphs

Let’s write a graph as a mathematical expression. Graphs have edges and nodes/vertices. One possible way of writing a graph is by defining a function v that represents a node/vertex. The first argument is the label of the vertex and the remainder are labels of the edges that connect to that vertex. We can then multiply these vertices together. For example:

v(A,1,2,3)*v(B,2,3,4)*v(C,4,1)

represents the following graph:

flowchart LR
  A[A] -- 2 --- B(B)
  A[A] -- 3 --- B(B)
  B -- 4 --- C
  C(C) -- 1 --- A

In Symbolica, it looks like:

from symbolica import *
v, A, B, C = Expression.symbols('v', 'A', 'B', 'C')
g = v(A,1,2,3)*v(B,2,3,4)*v(C,4,1)

To represent graphs with directional edges, one could introduce an edge function that specifies the direction. For example, e(2,B,A) indicates that the edge 2 flows from vertex B to A. Again, this e(...) can be multiplied into the term.

Now that we have a basic mapping of a graph as a term (an expression with multiplications and no additions at the top level) we can manipulate it with pattern matching. For example, we can add two graphs:

from symbolica import *
v, A, B, C = Expression.symbols('v', 'A', 'B', 'C')
g1 = v(A,1,2,3)*v(B,2,3,4)*v(C,4,1)
g2 = v(A,1,2)*v(B,1,2)
g = 3*g1 + 2*g2
print(g)
2*v(A,1,2)*v(B,1,2)+3*v(C,4,1)*v(A,1,2,3)*v(B,2,3,4)

What sense does this make? It depends on the context. In particle physics one can express probabilities of collision events in terms of a weighted sum of Feynman graphs, so this g could be something that can be measured in a detector!

Testing connectivity

Now let us try to extract some basic properties of our graph using pattern matching. For example, we can answer the question whether a graph is connected or if the input consists of two or more connected components (disjoint graphs).

How to go about this? Well, we realize that the number of connected components does not change if we shrink an edge; any edge connects two nodes of the graph, which we can fuse into one. For example, in the example graph above, we can shrink the edge 4, which fuses B and C. We get:

flowchart LR
  A -- 2 --- BC
  A -- 3 --- BC
  BC -- 4 --- BC
  BC -- 1 --- A

Now we can pick any other edge and shrink that too. For example, edge 2:

flowchart LR
  ABC -- 1 --- ABC
  ABC -- 2 --- ABC
  ABC -- 3 --- ABC
  ABC -- 4 --- ABC

We are now left with a single vertex that contains all our edges. We shrank the entire graph to a point, and therefore it is one component! To conclude, if we are left with \(n\) vertices after shrinking every edge, we have \(n\) connected components.

Let’s write the algorithm in Symbolica. We need to be able to match any edge in a vertex. To achieve that, we define ranged wildcards l1___, r1___,l2___ and r2___ that can match any number of arguments (see here for more information).

from symbolica import *
v, A, B, C, x_, l1___, l2___, r1___, r2___ = Expression.symbols('v', 'A', 'B', 'C', 'x_',
                                                'l1___', 'l2___', 'r1___', 'r2___')
g = v(A,1,2,3)*v(B,2,3,4)*v(C,4,1)

g = g.replace_all(v(l1___,x_,r1___)*v(l2___,x_,r2___),
        v(l1___,x_,x_,r1___,l2___,r2___), repeat=True)
print(g)
v(A,B,C,1,1,2,2,3,3,4,4)

We are left with one vertex and therefore the graph is connected. Thus, in just a single line we have written an algorithm that tests if a graph is connected. Not bad!

Counting cycles

One may wonder how many cycles/loops a graph has. Our example has two loops, as can be clearly seen by eye. But how many loops does the following graph have?

flowchart LR
  A -- 1 --- B
  B -- 2 --- C
  B -- 8 --- E
  C -- 3 --- D
  C -- 9 --- F
  D -- 10 --- E
  D -- 4 --- G
  G -- 5 --- H
  G -- 6 --- F
  E -- 9 --- F

The non-planarity (crossings of edges) makes it hard to count! The representation in v-notation is sadly not more illuminating:

v(A,1)*v(B,1,2,8)*v(C,2,3,9)*v(D,3,4,10)*v(G,4,5,6)*v(F,6,7,9)*v(E,8,9,10)*v(H,5)

Fortunately, Euler derived a formula to determine the number of cycles/loops:

\[ L = E - V + C \] where \(L\) is the number of loops, \(E\) is the number of edges, \(V\) the number of vertices and \(C\) the number of connected components. For now, we will always work with one connected component, so \(C=1\). What remains is to count the number of vertices and the number of edges. Since each edge appears twice in our representation — once in the vertex of its left end and once in the vertex of its right end — we can count the total number of arguments of all the vertex functions and divide by two!

We only want to use pattern matching and replacements, so we will find a way to represent a counter for the number of vertices in the mathematical expression. We define the variable nv whose power will determine the number of vertices and ne whose power will determine the total number of edges.

Then if we replace all v(...) by v(...)*nv, the power of nv will represent the number of vertices:

from symbolica import *
v, A, B, C, D, E, F, G, H, nv, x___ = Expression.symbols('v', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'nv', 'x___')
g = v(A,1)*v(B,1,2,8)*v(C,2,3,9)*v(D,3,4,10)*v(G,4,5,6)*v(F,6,7,9)*v(E,8,9,10)*v(H,5)
g = g.replace_all(v(x___), v(x___)*nv)
print(g)
nv^8*v(A,1)*v(H,5)*v(B,1,2,8)*v(C,2,3,9)*v(D,3,4,10)*v(E,8,9,10)*v(F,6,7,9)*v(G,4,5,6)

So we have 8 vertices. Note that this match-and-replace works just as well if we have a sum of graphs: each graph will be multiplied by the correct number of nvs.

For the counting of edges, we use the nargs function that counts the number of arguments:

g = g.replace_all(v(x___), v(x___)*nv*ne**((x___.transform().nargs() - 1) / 2))
print(g)
nv^8*ne^10*v(A,1)*v(H,5)*v(B,1,2,8)*v(C,2,3,9)*v(D,3,4,10)*v(E,8,9,10)*v(F,6,7,9)*v(G,4,5,6)

By using transform(), the function nargs() is applied to the expression x__ after it has been replaced by its matched content. We subtract 1 because the first argument of the function is the vertex id and not an edge.

With this one-liner we get the number of vertices (8) and the number of edges (10).

Now we do the final transformation to obtain the number of loops:

g = g.replace_all(nv, 1/ne) * ne

which will subtract the power of nv from the power of ne (computing \(E-V\)). Finally we add 1 (the number of connected components) to it by multiplying ne.

Putting it all together:

from symbolica import *
v, A, B, C, D, E, F, G, H, nv, ne, x___ = Expression.symbols('v', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'nv', 'ne', 'x___')
g = v(A,1)*v(B,1,2,8)*v(C,2,3,9)*v(D,3,4,10)*v(G,4,5,6)*v(F,6,7,9)*v(E,8,9,10)*v(H,5)
g = g.replace_all(v(x___), v(x___)*nv*ne**((x___.transform().nargs() - 1) / 2))
g = g.replace_all(nv, ne**-1) * ne
print(g)
ne^3*v(A,1)*v(H,5)*v(B,1,2,8)*v(C,2,3,9)*v(D,3,4,10)*v(E,8,9,10)*v(F,6,7,9)*v(G,4,5,6)

We find that graph has three loops!

Counting cycles II

A different way to obtain the number of cycles is realizing that the shrinking algorithm we saw before is constructing a spanning tree: it is visiting every vertex exactly once.

If we leave out the traversed edge x_ on the right-hand side of the shrinking algorithm:

g = g.replace_all(v(l1___,x_,r1___)*v(l2___,x_,r2___),
        v(l1___,r1___,l2___,r2___), repeat=True)

the remaining edges are the edges that do not belong to the spanning tree and form a fundamental cycle basis (also called a loop momentum basis in particle physics). Since the edges will appear twice, we just have to divide the argument length by 2 to get the number of cycles!

Graph isomorphisms

One may wonder whether two graphs that look different are actually the same, once one relabels the vertices and edges. We can achieve this using pattern matching! But how do we construct a pattern that has the same connectivity as the graph at hand?

We first need to make a modification to our graph definition. For simplicity, we drop the vertex labels. We will also make each vertex symmetric in its arguments, since there is no preferred edge ordering.

This is our setup and test graph:

v = Expression.symbol('vs', is_symmetric=True)
e_ = Expression.symbols(*['e{}_'.format(i) for i in range(7)])
graph = v(0, 1, 4)*v(1, 2, 5, 6)*v(0, 2, 3)*v(3, 4, 5)

Now comes the crux: we replace each edge identifier by its associated wildcard by indexing into the e_ array.

graph_pat = graph
for edge, edge_pat in enumerate(e_):
    graph_pat = graph_pat.replace_all(
        edge, edge_pat, allow_new_wildcards_on_rhs=True)

Normally Symbolica will not allow new wildcards on the right-hand side of a replacement, but this can be explicitly enabled using allow_new_wildcards_on_rhs=True.

We get the graph pattern:

v(e0_, e1_, e4_)*v(e1_, e2_, e5_, e6_)*v(e0_, e2_, e3_)*v(e3_, e4_, e5_)

Let us try to test if the graph in the input is the same as the following graph:

k = Expression.symbols(*['k{}'.format(i) for i in range(7)])
graph2 = v(k[0], k[2], k[3])*v(k[3], k[4], k[5], k[6]) * \
    v(k[0], k[1], k[4])*v(k[1], k[2], k[6])
for x in graph2.match(graph_pat):
    for ee in e_:
        print('{}={}'.format(ee, x[ee]), end=' ')
    print()

We get the following possible mappings:

e0_=k0 e1_=k4 e2_=k3 e3_=k2 e4_=k1 e5_=k6 e6_=k5
e0_=k1 e1_=k4 e2_=k6 e3_=k2 e4_=k0 e5_=k3 e6_=k5
e0_=k0 e1_=k3 e2_=k4 e3_=k1 e4_=k2 e5_=k6 e6_=k5
e0_=k2 e1_=k3 e2_=k6 e3_=k1 e4_=k0 e5_=k4 e6_=k5
e0_=k1 e1_=k6 e2_=k4 e3_=k0 e4_=k2 e5_=k3 e6_=k5
e0_=k2 e1_=k6 e2_=k3 e3_=k0 e4_=k1 e5_=k4 e6_=k5

Thus the graphs are isomorphic!

Conclusion

Sometimes recasting your problem in a different language can lead to surprisingly elegant algorithms (though not necessarily fast ones). In this blog post we saw how reinterpreting a graph as a mathematical expression allowed for creative algorithms by making use of the pattern matching capabilities of Symbolica.

Want to learn more? Join the discussion on the Symbolica Discord or Zulip!

Support Symbolica

Symbolica is free for hobbyists and a single-core instance is free for non-commercial use. Would you like to use additional features or have unrestricted access? Your organization can purchase a license and support the Symbolica project!