Term streaming

In many scenarios an expression contains many terms, where each term is relatively small. Also often, symbolic operations can be performed term by term. The list of common operations that cannot be done term per term is short:

Factorization
GCD computation
Horner scheme construction
Pattern match \(a+b\)
…

The following operations can be done term per term:

Most pattern matches, such as \(y f(x)\)
Expansion
Differentiation
Integration
…

When using some global memory, operations such as finding the maximum term, or counting the number of terms that satisfy a certain condition can also be done term-by-term.

With this in mind, we realize that for many operations the expression does not need to be in memory as a whole at the same time: only one term needs to be in memory. Once it is processed, it can be written to disk. We call this term streaming and it allows for huge expressions to be processed.

Symbolica can convert an expression into a stream and it provides an easy way to perform operations on terms in parallel by using a map. The input of the map is one term and the output should be an expression. In the example below a derivative is applied to each term, in parallel.

Python
Rust

from symbolica import *

x, x_, f = Expression.symbol('x', 'x_', 'f')
e = Expression.parse('x+ f(x) + 2*f(y) + 7*f(z)')

s = TermStreamer(n_cores=2)
s.push(e)
e = s.map(Transformer().replace_all(f(x_), f(x) + x))
print(s.to_expression())

More about Transformers can be read here.

    let input = parse!("x+ f(x) + 2*f(y) + 7*f(z)").unwrap();
    let pattern = Pattern::parse("f(x_)").unwrap();
    let rhs = Pattern::parse("f(x) + x").unwrap();

    let mut stream = TermStreamer::<CompressorWriter<_>>::new(TermStreamerConfig {
        n_cores: 4,
        path: ".".to_owned(),
        max_mem_bytes: 40,
    });
    stream.push(input);

    // map every term in the expression
    stream = stream.map(|x| pattern.replace_all(x.as_view(), &rhs, None, None).expand());

    let res = stream.to_expression();
    println!("\t+ {}", res);

Normalization

Since every term is processed independently, there is no merging of terms (for example x + 2x = 3x). To force the normalization of the expression, the user can call normalize on the term stream. This will sort the entire expression using a merge sort (potentially disk-based).

Global information

Information that involves comparison between terms can be collected using appropriate synchronized structures, like atomics or mutexes. Currently, this is only possible in Rust. Note that due to Rust’s borrow checker, there can be no memory problems such as race conditions.

In the example below, the maximum numerical argument of the function f over all terms is determined (in parallel).

Python
Rust

Not available yet.

Obtain the maximal function argument using an atomic integer.

use std::sync::atomic::{AtomicI64, Ordering};

use symbolica::{
    coefficient::CoefficientView,
    id::{Condition, Match, MatchSettings, Pattern},
    atom::{Atom, AtomCore, AtomView},
    streaming::TermStreamer,
};

fn main() {
    let x_ = symbol!("x_");
    let input = parse!("f(1)+f(2)+f(3)+f(4)").unwrap();
    let pattern = Pattern::parse("f(x_)").unwrap();

    let mut stream = TermStreamer::default();
    stream.push(input);

    let max_argument = AtomicI64::new(0);

    // map every term in the expression
    stream.map(|_, term| {
        if let Some((_, _, _, m)) = pattern
            .pattern_match(
                term.as_view(),
                &Condition::default(),
                &MatchSettings::default(),
            )
            .next()
        {
            if let Some(Match::Single(n)) = m.get(x_) {
                if let AtomView::Num(nn) = n {
                    if let CoefficientView::Natural(a, 1) = nn.get_coeff_view() {
                        max_argument.fetch_max(a, Ordering::SeqCst);
                    }
                }
            }
        }

        term
    });

    println!("Max argument: {}", max_argument.load(Ordering::SeqCst));
}