Code generation

Alkahest can compile symbolic expressions to fast native or GPU code. Compiled code bypasses Python entirely during evaluation.

The compilation pipeline

Expressions lower through multiple IR levels:

ExprPool (hash-consed DAG)
    ↓  e-graph extraction + canonicalization
Canonical expression form
    ↓  alkahest MLIR dialect
High-level MLIR (math-aware ops: horner, poly_eval, interval_eval)
    ↓  lowering passes
Standard MLIR (arith, math, linalg, gpu)
    ↓
LLVM IR / PTX / StableHLO (depending on target)
    ↓
Native machine code / GPU kernel / XLA

The custom alkahest MLIR dialect is where math-aware optimizations happen: Horner’s method for polynomials, fused multiply-add emission, numerically stable rearrangements via StabilityCost.

compile_expr

compile_expr produces a callable from a symbolic expression and a list of input variables:

from alkahest import ExprPool, compile_expr, sin, cos

pool = ExprPool()
x = pool.symbol("x")
y = pool.symbol("y")

f = compile_expr(x**2 + sin(y), [x, y])
print(f([3.0, 0.0]))   # 9.0

The callable takes a list of floats (one per variable) and returns a float. For batch evaluation see numpy_eval below.

Without --features jit, a fast Rust tree-walking interpreter is used instead of LLVM. The API is identical.

eval_expr

For one-off evaluation without compiling:

from alkahest import eval_expr

result = eval_expr(x**2 + sin(y), {x: 3.0, y: 0.0})
print(result)   # 9.0

eval_expr is slower than a compiled function for repeated evaluation but has no compilation overhead.

numpy_eval

numpy_eval vectorises a compiled function over NumPy arrays via the batch path:

import numpy as np
from alkahest import numpy_eval

f = compile_expr(sin(x) * cos(x), [x])
xs = np.linspace(0, 2 * 3.14159, 1_000_000)
ys = numpy_eval(f, xs)   # vectorised, zero-copy

Also accepts PyTorch CPU tensors and JAX arrays via DLPack.

Horner-form emission

horner rewrites a polynomial expression into Horner’s form, which is numerically better conditioned and faster to evaluate:

from alkahest import horner

# x^3 + 2x^2 + 3x + 4 → x*(x*(x + 2) + 3) + 4
h = horner(x**3 + pool.integer(2)*x**2 + pool.integer(3)*x + pool.integer(4), x)

emit_c emits a C function string for embedding in other projects:

from alkahest import emit_c

c_code = emit_c(expr, [x, y], fn_name="f")
# → "double f(double x, double y) { return ...; }"

MLIR dialect

The alkahest-mlir crate exposes the custom MLIR dialect. The dialect ops are:

Op	Description
`alkahest.sym`	Symbolic variable reference
`alkahest.const`	Constant value
`alkahest.add`, `alkahest.mul`	Arithmetic
`alkahest.pow`	Exponentiation
`alkahest.horner`	Horner polynomial evaluation
`alkahest.poly_eval`	Generic polynomial evaluation
`alkahest.series_taylor`	Taylor series evaluation
`alkahest.interval_eval`	Ball arithmetic evaluation
`alkahest.rational_fn`	Rational function evaluation

Three lowering targets are available:

ArithMath — lowers to arith + math MLIR dialects; uses math.fma for Horner chains
StableHlo — lowers to StableHLO ops for XLA/JAX integration
Llvm — lowers to llvm dialect for LLVM IR / PTX emission

from alkahest import to_stablehlo

# Emit textual MLIR in the StableHLO dialect
mlir_text = to_stablehlo(expr, [x, y], fn_name="my_fn")
print(mlir_text)  # valid input to mlir-opt / XLA

GPU codegen (NVPTX)

With --features cuda and an LLVM installation with NVPTX support:

from alkahest import compile_cuda

f_gpu = compile_cuda(expr, [x, y])
result = f_gpu.call_batch(inputs)   # runs on the first CUDA device

The GPU compiler:

Lowers the expression through inkwell to NVPTX LLVM IR for sm_86 (Ampere)
Links libdevice.10.bc for transcendental functions (__nv_sin, etc.)
Emits PTX via LLVM’s target machine
Loads the PTX via the CUDA driver (cudarc)

The benchmark nvptx/nvptx_polynomial_1M shows 16.2× speedup over the CPU JIT on a 1M-point polynomial evaluation on an RTX 3090.

Upcoming (v1.1): AMD ROCm / amdgcn target (hardware-blocked pending RDNA3 availability).

Compilation results are cached keyed by the canonical hash of the expression DAG. Compiling the same expression twice returns the cached result. The persistent ExprPool (V1-14) extends this cache across sessions.

Small expressions below a complexity threshold skip LLVM entirely and run through the Rust interpreter, which has lower overhead for trivial expressions.

Alkahest — User Guide