Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Code generation

Alkahest can compile symbolic expressions to fast native or GPU code. Compiled code bypasses Python entirely during evaluation.

The compilation pipeline

Expressions lower through multiple IR levels:

ExprPool (hash-consed DAG)
    ↓  e-graph extraction + canonicalization
Canonical expression form
    ↓  alkahest MLIR dialect
High-level MLIR (math-aware ops: horner, poly_eval, interval_eval)
    ↓  lowering passes
Standard MLIR (arith, math, linalg, gpu)
    ↓
LLVM IR / PTX / StableHLO (depending on target)
    ↓
Native machine code / GPU kernel / XLA

The custom alkahest MLIR dialect is where math-aware optimizations happen: Horner’s method for polynomials, fused multiply-add emission, numerically stable rearrangements via StabilityCost.

compile_expr

compile_expr produces a callable from a symbolic expression and a list of input variables:

from alkahest import ExprPool, compile_expr, sin, cos

pool = ExprPool()
x = pool.symbol("x")
y = pool.symbol("y")

f = compile_expr(x**2 + sin(y), [x, y])
print(f([3.0, 0.0]))   # 9.0

The callable takes a list of floats (one per variable) and returns a float. For batch evaluation see numpy_eval below.

Without --features jit, a fast Rust tree-walking interpreter is used instead of LLVM. The API is identical.

eval_expr

For one-off evaluation without compiling:

from alkahest import eval_expr

result = eval_expr(x**2 + sin(y), {x: 3.0, y: 0.0})
print(result)   # 9.0

eval_expr is slower than a compiled function for repeated evaluation but has no compilation overhead.

numpy_eval

numpy_eval vectorises a compiled function over NumPy arrays via the batch path:

import numpy as np
from alkahest import numpy_eval

f = compile_expr(sin(x) * cos(x), [x])
xs = np.linspace(0, 2 * 3.14159, 1_000_000)
ys = numpy_eval(f, xs)   # vectorised, zero-copy

Also accepts PyTorch CPU tensors and JAX arrays via DLPack.

Horner-form emission

horner rewrites a polynomial expression into Horner’s form, which is numerically better conditioned and faster to evaluate:

from alkahest import horner

# x^3 + 2x^2 + 3x + 4 → x*(x*(x + 2) + 3) + 4
h = horner(x**3 + pool.integer(2)*x**2 + pool.integer(3)*x + pool.integer(4), x)

emit_c emits a C function string for embedding in other projects:

from alkahest import emit_c

c_code = emit_c(expr, [x, y], fn_name="f")
# → "double f(double x, double y) { return ...; }"

MLIR dialect

The alkahest-mlir crate exposes the custom MLIR dialect. The dialect ops are:

OpDescription
alkahest.symSymbolic variable reference
alkahest.constConstant value
alkahest.add, alkahest.mulArithmetic
alkahest.powExponentiation
alkahest.hornerHorner polynomial evaluation
alkahest.poly_evalGeneric polynomial evaluation
alkahest.series_taylorTaylor series evaluation
alkahest.interval_evalBall arithmetic evaluation
alkahest.rational_fnRational function evaluation

Three lowering targets are available:

  • ArithMath — lowers to arith + math MLIR dialects; uses math.fma for Horner chains
  • StableHlo — lowers to StableHLO ops for XLA/JAX integration
  • Llvm — lowers to llvm dialect for LLVM IR / PTX emission
from alkahest import to_stablehlo

# Emit textual MLIR in the StableHLO dialect
mlir_text = to_stablehlo(expr, [x, y], fn_name="my_fn")
print(mlir_text)  # valid input to mlir-opt / XLA

GPU codegen (NVPTX)

With --features cuda and an LLVM installation with NVPTX support:

from alkahest import compile_cuda

f_gpu = compile_cuda(expr, [x, y])
result = f_gpu.call_batch(inputs)   # runs on the first CUDA device

The GPU compiler:

  1. Lowers the expression through inkwell to NVPTX LLVM IR for sm_86 (Ampere)
  2. Links libdevice.10.bc for transcendental functions (__nv_sin, etc.)
  3. Emits PTX via LLVM’s target machine
  4. Loads the PTX via the CUDA driver (cudarc)

The benchmark nvptx/nvptx_polynomial_1M shows 16.2× speedup over the CPU JIT on a 1M-point polynomial evaluation on an RTX 3090.

Upcoming (v1.1): AMD ROCm / amdgcn target (hardware-blocked pending RDNA3 availability).

Caching

Compilation results are cached keyed by the canonical hash of the expression DAG. Compiling the same expression twice returns the cached result. The persistent ExprPool (V1-14) extends this cache across sessions.

Small expressions below a complexity threshold skip LLVM entirely and run through the Rust interpreter, which has lower overhead for trivial expressions.