```
from symbolica import *
= Expression.symbols('f', 'x_')
f, x_ = f(10)
e = e.replace_all(f(x_), x_*f(x_-1), x_.req_gt(0), repeat=True)
e = e.replace_all(f(0), 1)
e print(e)
```

`3628800`

pattern matching

graph theory

Author

Ben Ruijl

Published

August 14, 2024

Support Symbolica

This blog post uses the computational framework Symbolica. Symbolica is **free** for hobbyists and a single-core instance is **free** for non-commercial use. Would you like to have new features added? Your organization can purchase a license and support the Symbolica project!

Pattern matching is all around us. It is what you do when you solve puzzles, and it lies at the core of some programming languages. The evolution of a cell in Conway’s game of life is completely determined by the pattern of the 8 surrounding cells:

Another common use of pattern matching is *regular expressions* (regex). This is a language that describes patterns in text. Below you see a regex to parse an e-mail address:

`^[a-zA-Z0-9]+(?:\.[a-zA-Z0-9]+)*@[a-zA-Z0-9]+(?:\.[a-zA-Z0-9]+)*$`

The text needs to start (`^`

) with an alpha-numerical character (`[a-zA-Z0-9]`

) and must contain `@`

followed by at least one alpha-numerical character. This pattern will actually not match all valid e-mail addresses! For that you need this horrendous regex. Yikes.

Pattern matching is also ubiquitous in mathematics. For example, in the recursive definition of the factorial function: \[ \begin{align} f(0) &= 1\\ f(n) &= n f(n - 1);\quad n > 0 \end{align} \]

Let’s see how we can code the above relation using the Symbolica library in Python. We need to define a function `f`

and we need a *wildcard* that can match any argument of the function `f`

. Wildcards are variables that end with an underscore (`_`

). We will use `x_`

.

```
from symbolica import *
f, x_ = Expression.symbols('f', 'x_')
e = f(10)
e = e.replace_all(f(x_), x_*f(x_-1), x_.req_gt(0), repeat=True)
e = e.replace_all(f(0), 1)
print(e)
```

`3628800`

That indeed equals 10 factorial. Note that we use the restriction `x_.req_gt(0)`

, which means that the pattern will only match if `x_`

is greater than 0.

Great, now let’s try to design some algorithms using only pattern matching and replacements on mathematical expressions. These algorithms will not be the most efficient, but they could be elegant and could unlock a different way of thinking.

The first challenge is to represent a problem as a mathematical expression. This will feel a bit alien at first, but remember that LLMs receive an array of integers as input and they function just fine ;)

Let’s write a graph as a mathematical expression. Graphs have edges and nodes/vertices. One possible way of writing a graph is by defining a function `v`

that represents a node/vertex. The first argument is the label of the vertex and the remainder are labels of the edges that connect to that vertex. We can then *multiply* these vertices together. For example:

`v(A,1,2,3)*v(B,2,3,4)*v(C,4,1)`

represents the following graph:

In Symbolica, it looks like:

```
from symbolica import *
v, A, B, C = Expression.symbols('v', 'A', 'B', 'C')
g = v(A,1,2,3)*v(B,2,3,4)*v(C,4,1)
```

To represent graphs with directional edges, one could introduce an edge function that specifies the direction. For example, `e(2,B,A)`

indicates that the edge `2`

flows from vertex `B`

to `A`

. Again, this `e(...)`

can be *multiplied* into the term.

Now that we have a basic mapping of a graph as a term (an expression with multiplications and no additions at the top level) we can manipulate it with pattern matching. For example, we can add two graphs:

```
from symbolica import *
v, A, B, C = Expression.symbols('v', 'A', 'B', 'C')
g1 = v(A,1,2,3)*v(B,2,3,4)*v(C,4,1)
g2 = v(A,1,2)*v(B,1,2)
g = 3*g1 + 2*g2
print(g)
```

`2*v(A,1,2)*v(B,1,2)+3*v(C,4,1)*v(A,1,2,3)*v(B,2,3,4)`

What sense does this make? It depends on the context. In particle physics one can express probabilities of collision events in terms of a weighted sum of Feynman graphs, so this `g`

could be something that can be measured in a detector!

Now let us try to extract some basic properties of our graph using pattern matching. For example, we can answer the question whether a graph is connected or if the input consists of two or more connected components (disjoint graphs).

How to go about this? Well, we realize that the number of connected components does not change if we *shrink* an edge; any edge connects two nodes of the graph, which we can fuse into one. For example, in the example graph above, we can shrink the edge `4`

, which fuses `B`

and `C`

. We get:

Now we can pick any other edge and shrink that too. For example, edge `2`

:

We are now left with a single vertex that contains all our edges. We shrank the entire graph to a point, and therefore it is one component! To conclude, if we are left with \(n\) vertices after shrinking every edge, we have \(n\) connected components.

Let’s write the algorithm in Symbolica. We need to be able to match any edge in a vertex. To achieve that, we define ranged wildcards `l1___`

, `r1___`

,`l2___`

and `r2___`

that can match *any* number of arguments (see here for more information).

```
from symbolica import *
v, A, B, C, x_, l1___, l2___, r1___, r2___ = Expression.symbols('v', 'A', 'B', 'C', 'x_',
'l1___', 'l2___', 'r1___', 'r2___')
g = v(A,1,2,3)*v(B,2,3,4)*v(C,4,1)
g = g.replace_all(v(l1___,x_,r1___)*v(l2___,x_,r2___),
v(l1___,x_,x_,r1___,l2___,r2___), repeat=True)
print(g)
```

`v(A,B,C,1,1,2,2,3,3,4,4)`

We are left with one vertex and therefore the graph is connected. Thus, in just a single line we have written an algorithm that tests if a graph is connected. Not bad!

One may wonder how many cycles/loops a graph has. Our example has two loops, as can be clearly seen by eye. But how many loops does the following graph have?

The non-planarity (crossings of edges) makes it hard to count! The representation in `v`

-notation is sadly not more illuminating:

`v(A,1)*v(B,1,2,8)*v(C,2,3,9)*v(D,3,4,10)*v(G,4,5,6)*v(F,6,7,9)*v(E,8,9,10)*v(H,5)`

Fortunately, Euler derived a formula to determine the number of cycles/loops:

\[ L = E - V + C \] where \(L\) is the number of loops, \(E\) is the number of edges, \(V\) the number of vertices and \(C\) the number of connected components. For now, we will always work with one connected component, so \(C=1\). What remains is to count the number of vertices and the number of edges. Since each edge appears twice in our representation — once in the vertex of its left end and once in the vertex of its right end — we can count the total number of arguments of all the vertex functions and divide by two!

We only want to use pattern matching and replacements, so we will find a way to represent a counter for the number of vertices in the mathematical expression. We define the variable `nv`

whose *power* will determine the number of vertices and `ne`

whose power will determine the total number of edges.

Then if we replace all `v(...)`

by `v(...)*nv`

, the power of `nv`

will represent the number of vertices:

```
from symbolica import *
v, A, B, C, D, E, F, G, H, nv, x___ = Expression.symbols('v', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'nv', 'x___')
g = v(A,1)*v(B,1,2,8)*v(C,2,3,9)*v(D,3,4,10)*v(G,4,5,6)*v(F,6,7,9)*v(E,8,9,10)*v(H,5)
g = g.replace_all(v(x___), v(x___)*nv)
print(g)
```

`nv^8*v(A,1)*v(H,5)*v(B,1,2,8)*v(C,2,3,9)*v(D,3,4,10)*v(E,8,9,10)*v(F,6,7,9)*v(G,4,5,6)`

So we have 8 vertices. Note that this match-and-replace works just as well if we have a sum of graphs: each graph will be multiplied by the correct number of `nv`

s.

For the counting of edges, we use the `nargs`

function that counts the number of arguments:

`nv^8*ne^10*v(A,1)*v(H,5)*v(B,1,2,8)*v(C,2,3,9)*v(D,3,4,10)*v(E,8,9,10)*v(F,6,7,9)*v(G,4,5,6)`

By using `transform()`

, the function `nargs()`

is applied to the expression `x__`

*after* it has been replaced by its matched content. We subtract `1`

because the first argument of the function is the vertex id and not an edge.

With this one-liner we get the number of vertices (`8`

) and the number of edges (`10`

).

Now we do the final transformation to obtain the number of loops:

which will subtract the power of `nv`

from the power of `ne`

(computing \(E-V\)). Finally we add 1 (the number of connected components) to it by multiplying `ne`

.

Putting it all together:

```
from symbolica import *
v, A, B, C, D, E, F, G, H, nv, ne, x___ = Expression.symbols('v', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'nv', 'ne', 'x___')
g = v(A,1)*v(B,1,2,8)*v(C,2,3,9)*v(D,3,4,10)*v(G,4,5,6)*v(F,6,7,9)*v(E,8,9,10)*v(H,5)
g = g.replace_all(v(x___), v(x___)*nv*ne**((x___.transform().nargs() - 1) / 2))
g = g.replace_all(nv, ne**-1) * ne
print(g)
```

`ne^3*v(A,1)*v(H,5)*v(B,1,2,8)*v(C,2,3,9)*v(D,3,4,10)*v(E,8,9,10)*v(F,6,7,9)*v(G,4,5,6)`

We find that graph has three loops!

A different way to obtain the number of cycles is realizing that the shrinking algorithm we saw before is constructing a *spanning tree*: it is visiting every vertex exactly once.

If we leave out the traversed edge `x_`

on the right-hand side of the shrinking algorithm:

the remaining edges are the edges that do not belong to the spanning tree and form a fundamental cycle basis (also called a loop momentum basis in particle physics). Since the edges will appear twice, we just have to divide the argument length by 2 to get the number of cycles!

One may wonder whether two graphs that look different are actually the same, once one relabels the vertices and edges. We can achieve this using pattern matching! But how do we construct a pattern that has the same connectivity as the graph at hand?

We first need to make a modification to our graph definition. For simplicity, we drop the vertex labels. We will also make each vertex symmetric in its arguments, since there is no preferred edge ordering.

This is our setup and test graph:

```
v = Expression.symbol('vs', is_symmetric=True)
e_ = Expression.symbols(*['e{}_'.format(i) for i in range(7)])
graph = v(0, 1, 4)*v(1, 2, 5, 6)*v(0, 2, 3)*v(3, 4, 5)
```

Now comes the crux: we replace each edge identifier by its associated wildcard by indexing into the `e_`

array.

```
graph_pat = graph
for edge, edge_pat in enumerate(e_):
graph_pat = graph_pat.replace_all(
edge, edge_pat, allow_new_wildcards_on_rhs=True)
```

Normally Symbolica will not allow new wildcards on the right-hand side of a replacement, but this can be explicitly enabled using `allow_new_wildcards_on_rhs=True`

.

We get the graph pattern:

`v(e0_, e1_, e4_)*v(e1_, e2_, e5_, e6_)*v(e0_, e2_, e3_)*v(e3_, e4_, e5_)`

Let us try to test if the graph in the input is the same as the following graph:

```
k = Expression.symbols(*['k{}'.format(i) for i in range(7)])
graph2 = v(k[0], k[2], k[3])*v(k[3], k[4], k[5], k[6]) * \
v(k[0], k[1], k[4])*v(k[1], k[2], k[6])
```

We get the following possible mappings:

```
e0_=k0 e1_=k4 e2_=k3 e3_=k2 e4_=k1 e5_=k6 e6_=k5
e0_=k1 e1_=k4 e2_=k6 e3_=k2 e4_=k0 e5_=k3 e6_=k5
e0_=k0 e1_=k3 e2_=k4 e3_=k1 e4_=k2 e5_=k6 e6_=k5
e0_=k2 e1_=k3 e2_=k6 e3_=k1 e4_=k0 e5_=k4 e6_=k5
e0_=k1 e1_=k6 e2_=k4 e3_=k0 e4_=k2 e5_=k3 e6_=k5
e0_=k2 e1_=k6 e2_=k3 e3_=k0 e4_=k1 e5_=k4 e6_=k5
```

Thus the graphs are isomorphic!

Sometimes recasting your problem in a different language can lead to surprisingly elegant algorithms (though not necessarily fast ones). In this blog post we saw how reinterpreting a graph as a mathematical expression allowed for creative algorithms by making use of the pattern matching capabilities of Symbolica.

Want to learn more? Join the discussion on the Symbolica Discord or Zulip!

Symbolica is **free** for hobbyists and a single-core instance is **free** for non-commercial use. Would you like to use additional features or have unrestricted access? Your organization can purchase a license and support the Symbolica project!