from symbolica import *
= Expression.symbols('f', 'x_')
f, x_ = f(10)
e = e.replace_all(f(x_), x_*f(x_-1), x_.req_gt(0), repeat=True)
e = e.replace_all(f(0), 1)
e print(e)
3628800
Ben Ruijl
August 14, 2024
This blog post uses the computational framework Symbolica. Symbolica is free for hobbyists and a single-core instance is free for non-commercial use. Would you like to have new features added? Your organization can purchase a license and support the Symbolica project!
Pattern matching is all around us. It is what you do when you solve puzzles, and it lies at the core of some programming languages. The evolution of a cell in Conway’s game of life is completely determined by the pattern of the 8 surrounding cells:
Another common use of pattern matching is regular expressions (regex). This is a language that describes patterns in text. Below you see a regex to parse an e-mail address:
^[a-zA-Z0-9]+(?:\.[a-zA-Z0-9]+)*@[a-zA-Z0-9]+(?:\.[a-zA-Z0-9]+)*$
The text needs to start (^
) with an alpha-numerical character ([a-zA-Z0-9]
) and must contain @
followed by at least one alpha-numerical character. This pattern will actually not match all valid e-mail addresses! For that you need this horrendous regex. Yikes.
Pattern matching is also ubiquitous in mathematics. For example, in the recursive definition of the factorial function:
Let’s see how we can code the above relation using the Symbolica library in Python. We need to define a function f
and we need a wildcard that can match any argument of the function f
. Wildcards are variables that end with an underscore (_
). We will use x_
.
from symbolica import *
f, x_ = Expression.symbols('f', 'x_')
e = f(10)
e = e.replace_all(f(x_), x_*f(x_-1), x_.req_gt(0), repeat=True)
e = e.replace_all(f(0), 1)
print(e)
3628800
That indeed equals 10 factorial. Note that we use the restriction x_.req_gt(0)
, which means that the pattern will only match if x_
is greater than 0.
Great, now let’s try to design some algorithms using only pattern matching and replacements on mathematical expressions. These algorithms will not be the most efficient, but they could be elegant and could unlock a different way of thinking.
The first challenge is to represent a problem as a mathematical expression. This will feel a bit alien at first, but remember that LLMs receive an array of integers as input and they function just fine ;)
Let’s write a graph as a mathematical expression. Graphs have edges and nodes/vertices. One possible way of writing a graph is by defining a function v
that represents a node/vertex. The first argument is the label of the vertex and the remainder are labels of the edges that connect to that vertex. We can then multiply these vertices together. For example:
v(A,1,2,3)*v(B,2,3,4)*v(C,4,1)
represents the following graph:
In Symbolica, it looks like:
from symbolica import *
v, A, B, C = Expression.symbols('v', 'A', 'B', 'C')
g = v(A,1,2,3)*v(B,2,3,4)*v(C,4,1)
To represent graphs with directional edges, one could introduce an edge function that specifies the direction. For example, e(2,B,A)
indicates that the edge 2
flows from vertex B
to A
. Again, this e(...)
can be multiplied into the term.
Now that we have a basic mapping of a graph as a term (an expression with multiplications and no additions at the top level) we can manipulate it with pattern matching. For example, we can add two graphs:
from symbolica import *
v, A, B, C = Expression.symbols('v', 'A', 'B', 'C')
g1 = v(A,1,2,3)*v(B,2,3,4)*v(C,4,1)
g2 = v(A,1,2)*v(B,1,2)
g = 3*g1 + 2*g2
print(g)
2*v(A,1,2)*v(B,1,2)+3*v(C,4,1)*v(A,1,2,3)*v(B,2,3,4)
What sense does this make? It depends on the context. In particle physics one can express probabilities of collision events in terms of a weighted sum of Feynman graphs, so this g
could be something that can be measured in a detector!
Now let us try to extract some basic properties of our graph using pattern matching. For example, we can answer the question whether a graph is connected or if the input consists of two or more connected components (disjoint graphs).
How to go about this? Well, we realize that the number of connected components does not change if we shrink an edge; any edge connects two nodes of the graph, which we can fuse into one. For example, in the example graph above, we can shrink the edge 4
, which fuses B
and C
. We get:
Now we can pick any other edge and shrink that too. For example, edge 2
:
We are now left with a single vertex that contains all our edges. We shrank the entire graph to a point, and therefore it is one component! To conclude, if we are left with
Let’s write the algorithm in Symbolica. We need to be able to match any edge in a vertex. To achieve that, we define ranged wildcards l1___
, r1___
,l2___
and r2___
that can match any number of arguments (see here for more information).
from symbolica import *
v, A, B, C, x_, l1___, l2___, r1___, r2___ = Expression.symbols('v', 'A', 'B', 'C', 'x_',
'l1___', 'l2___', 'r1___', 'r2___')
g = v(A,1,2,3)*v(B,2,3,4)*v(C,4,1)
g = g.replace_all(v(l1___,x_,r1___)*v(l2___,x_,r2___),
v(l1___,x_,x_,r1___,l2___,r2___), repeat=True)
print(g)
v(A,B,C,1,1,2,2,3,3,4,4)
We are left with one vertex and therefore the graph is connected. Thus, in just a single line we have written an algorithm that tests if a graph is connected. Not bad!
One may wonder how many cycles/loops a graph has. Our example has two loops, as can be clearly seen by eye. But how many loops does the following graph have?
The non-planarity (crossings of edges) makes it hard to count! The representation in v
-notation is sadly not more illuminating:
v(A,1)*v(B,1,2,8)*v(C,2,3,9)*v(D,3,4,10)*v(G,4,5,6)*v(F,6,7,9)*v(E,8,9,10)*v(H,5)
Fortunately, Euler derived a formula to determine the number of cycles/loops:
We only want to use pattern matching and replacements, so we will find a way to represent a counter for the number of vertices in the mathematical expression. We define the variable nv
whose power will determine the number of vertices and ne
whose power will determine the total number of edges.
Then if we replace all v(...)
by v(...)*nv
, the power of nv
will represent the number of vertices:
from symbolica import *
v, A, B, C, D, E, F, G, H, nv, x___ = Expression.symbols('v', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'nv', 'x___')
g = v(A,1)*v(B,1,2,8)*v(C,2,3,9)*v(D,3,4,10)*v(G,4,5,6)*v(F,6,7,9)*v(E,8,9,10)*v(H,5)
g = g.replace_all(v(x___), v(x___)*nv)
print(g)
nv^8*v(A,1)*v(H,5)*v(B,1,2,8)*v(C,2,3,9)*v(D,3,4,10)*v(E,8,9,10)*v(F,6,7,9)*v(G,4,5,6)
So we have 8 vertices. Note that this match-and-replace works just as well if we have a sum of graphs: each graph will be multiplied by the correct number of nv
s.
For the counting of edges, we use the nargs
function that counts the number of arguments:
nv^8*ne^10*v(A,1)*v(H,5)*v(B,1,2,8)*v(C,2,3,9)*v(D,3,4,10)*v(E,8,9,10)*v(F,6,7,9)*v(G,4,5,6)
By using transform()
, the function nargs()
is applied to the expression x__
after it has been replaced by its matched content. We subtract 1
because the first argument of the function is the vertex id and not an edge.
With this one-liner we get the number of vertices (8
) and the number of edges (10
).
Now we do the final transformation to obtain the number of loops:
which will subtract the power of nv
from the power of ne
(computing ne
.
Putting it all together:
from symbolica import *
v, A, B, C, D, E, F, G, H, nv, ne, x___ = Expression.symbols('v', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'nv', 'ne', 'x___')
g = v(A,1)*v(B,1,2,8)*v(C,2,3,9)*v(D,3,4,10)*v(G,4,5,6)*v(F,6,7,9)*v(E,8,9,10)*v(H,5)
g = g.replace_all(v(x___), v(x___)*nv*ne**((x___.transform().nargs() - 1) / 2))
g = g.replace_all(nv, ne**-1) * ne
print(g)
ne^3*v(A,1)*v(H,5)*v(B,1,2,8)*v(C,2,3,9)*v(D,3,4,10)*v(E,8,9,10)*v(F,6,7,9)*v(G,4,5,6)
We find that graph has three loops!
A different way to obtain the number of cycles is realizing that the shrinking algorithm we saw before is constructing a spanning tree: it is visiting every vertex exactly once.
If we leave out the traversed edge x_
on the right-hand side of the shrinking algorithm:
the remaining edges are the edges that do not belong to the spanning tree and form a fundamental cycle basis (also called a loop momentum basis in particle physics). Since the edges will appear twice, we just have to divide the argument length by 2 to get the number of cycles!
One may wonder whether two graphs that look different are actually the same, once one relabels the vertices and edges. We can achieve this using pattern matching! But how do we construct a pattern that has the same connectivity as the graph at hand?
We first need to make a modification to our graph definition. For simplicity, we drop the vertex labels. We will also make each vertex symmetric in its arguments, since there is no preferred edge ordering.
This is our setup and test graph:
v = Expression.symbol('vs', is_symmetric=True)
e_ = Expression.symbols(*['e{}_'.format(i) for i in range(7)])
graph = v(0, 1, 4)*v(1, 2, 5, 6)*v(0, 2, 3)*v(3, 4, 5)
Now comes the crux: we replace each edge identifier by its associated wildcard by indexing into the e_
array.
graph_pat = graph
for edge, edge_pat in enumerate(e_):
graph_pat = graph_pat.replace_all(
edge, edge_pat, allow_new_wildcards_on_rhs=True)
Normally Symbolica will not allow new wildcards on the right-hand side of a replacement, but this can be explicitly enabled using allow_new_wildcards_on_rhs=True
.
We get the graph pattern:
v(e0_, e1_, e4_)*v(e1_, e2_, e5_, e6_)*v(e0_, e2_, e3_)*v(e3_, e4_, e5_)
Let us try to test if the graph in the input is the same as the following graph:
k = Expression.symbols(*['k{}'.format(i) for i in range(7)])
graph2 = v(k[0], k[2], k[3])*v(k[3], k[4], k[5], k[6]) * \
v(k[0], k[1], k[4])*v(k[1], k[2], k[6])
We get the following possible mappings:
e0_=k0 e1_=k4 e2_=k3 e3_=k2 e4_=k1 e5_=k6 e6_=k5
e0_=k1 e1_=k4 e2_=k6 e3_=k2 e4_=k0 e5_=k3 e6_=k5
e0_=k0 e1_=k3 e2_=k4 e3_=k1 e4_=k2 e5_=k6 e6_=k5
e0_=k2 e1_=k3 e2_=k6 e3_=k1 e4_=k0 e5_=k4 e6_=k5
e0_=k1 e1_=k6 e2_=k4 e3_=k0 e4_=k2 e5_=k3 e6_=k5
e0_=k2 e1_=k6 e2_=k3 e3_=k0 e4_=k1 e5_=k4 e6_=k5
Thus the graphs are isomorphic!
Sometimes recasting your problem in a different language can lead to surprisingly elegant algorithms (though not necessarily fast ones). In this blog post we saw how reinterpreting a graph as a mathematical expression allowed for creative algorithms by making use of the pattern matching capabilities of Symbolica.
Want to learn more? Join the discussion on the Symbolica Discord or Zulip!
Symbolica is free for hobbyists and a single-core instance is free for non-commercial use. Would you like to use additional features or have unrestricted access? Your organization can purchase a license and support the Symbolica project!