A vector-first language for CPU SIMD
Rake is named for its execution model: data rakes through tine patterns.
Each tine is a horizontal barrier—the teeth of a rake—that filters lanes. Data flows downward through tine declarations. Results are swept up at the end.
Auto-vectorization fails on divergent code. Traditional if/else branches
cannot map efficiently to SIMD because different lanes need different code paths.
Rake inverts the model:
This is SIMD semantics made explicit in the language.
<name> broadcasts, preventing accidental confusionstack (SoA) vs single (scalars) is visible| Term | Meaning |
|---|---|
| rack | Vector value (one per SIMD lane) |
| tine | Named mask declaration |
| through | Masked computation region |
| sweep | Collect results from tines |
| crunch | Pure function (all lanes same logic) |
| rake | Divergent function (lanes may differ) |
| stack | Structure-of-Arrays type |
| single | All-scalar configuration struct |
| pack | Collection of stack chunks |
| over | Iterate over pack in SIMD-width chunks |
A rack is a vector of values—one per SIMD lane. Arithmetic is lane-wise. Scalars use angle brackets to broadcast.
let positions : float rack = load_positions() let new_pos = positions + velocities * <dt>
A stack is a struct in Structure-of-Arrays format. Each field is a rack.
stack Ray { ox: float rack, oy: float rack, oz: float rack, dx: float rack, dy: float rack, dz: float rack }
A crunch is a pure function where all lanes execute identical logic. Always inlined.
crunch dot (ax, ay, az, bx, by, bz) -> d: let d = ax*bx + ay*by + az*bz d
Tines are named boolean masks that partition lanes. The # prefix evokes grid lines.
| #miss := (disc < <0.0>) | #maybe := (!#miss)
A rake function handles divergent control flow with tines, through blocks, and sweep.
rake intersect (ray_ox, ray_oy, ray_oz, ray_dx, ray_dy, ray_dz, <sphere_cx>, <sphere_cy>, <sphere_cz>, <sphere_r>) -> t_result: let disc = b * b - <4.0> * a * c | #miss := (disc < <0.0>) | #maybe := (!#miss) through #maybe: let sqrt_disc = sqrt(disc) let t = (- b - sqrt_disc) / (<2.0> * a) t -> t_value through #miss: < -1.0> -> miss_result sweep: | #miss -> miss_result | #maybe -> t_value -> t_result
over construct is still in flux.
A pack is a collection of stack data. over iterates in SIMD-width chunks.
run render_all (rays : Ray pack, <count> : int64) -> result: over rays, <count> |> ray: let t = intersect(ray.ox, ray.oy, ray.oz, ray.dx, ray.dy, ray.dz, ...) t
Prerequisites: OCaml 5.0+ with dune, LLVM/MLIR 17+