Platform: Apple M1 MacBook Pro — 8GB RAM
Engine: Sail 0.6.0 (Rust-native, Apache Arrow + DataFusion)
Runtime: Docker container, --memory=6g
Date: 15 April 2026
Overview
Sail is a Rust-native, JVM-free compute engine compatible with the Apache Spark Connect protocol. It implements the Spark SQL and DataFrame API with no code rewrites required — existing PySpark code connects to Sail over sc://localhost:{port} unchanged.
This report documents a series of progressively complex workload tests run entirely inside a Docker container on a constrained 8GB M1 machine, using pysail and pyspark-client as the only dependencies.
Environment
| Item | Detail |
|---|---|
| Machine | Apple MacBook Pro M1 |
| RAM | 8GB unified memory |
| Docker memory limit | 6GB |
| Python | 3.14 |
| PySail version | 0.6.0 |
| PySpark client | pyspark-client (Spark 4.x) |
| Storage | Local /tmp inside container, mounted volume for Parquet output |
Test Results
Test 1 — Simple Aggregation (5M rows)
Workload: Generate 5M rows, join to 2 dimension tables, run 6 aggregations.
| Metric | Value |
|---|---|
| Rows | 5,000,000 |
| Joins | 2 (star schema) |
| Aggregations | 6 (count, sum, avg, max, min, stddev) |
| Total time | 0.59s |
Test 2 — Heavy Aggregation with Derived Metrics (10M rows)
Workload: 10M rows, 2 joins, 7 derived computed columns (sqrt, log, pow, abs), 13 aggregations, 2 post-aggregation ratio columns.
| Metric | Value |
|---|---|
| Rows | 10,000,000 |
| Joins | 2 |
| Derived columns | 7 (sqrt, log, pow, abs, discount, tax, gross profit) |
| Aggregations | 13 |
| Post-agg columns | 2 (profit margin %, tax ratio) |
| Total time | 1.39s |
Test 3 — Window Functions (8M rows)
Workload: 8M rows, 2 joins, 3 partition-only window functions (avg over tier, max over category, avg over region), 9 final aggregations.
| Metric | Value |
|---|---|
| Rows | 8,000,000 |
| Joins | 2 |
| Window functions | 3 (partition-only, no unbounded sort) |
| Aggregations | 9 |
| Total time | 4.27s |
Window functions require a full partition pass over data and are inherently more memory-intensive than plain aggregations. The ~10x cost per row vs plain agg is expected behaviour — not a Sail limitation.
Test 4 — Parquet Read/Write + Query (10M rows)
Workload: Generate 10M rows, write to partitioned Parquet (by tier + category), read back, apply filter + 6 aggregations.
| Step | Time |
|---|---|
| Write (partitioned Parquet) | 1.61s |
| Read + row count | 0.01s |
| Filter + 6 aggregations on Parquet | 0.11s |
| Total wall time | 1.74s |
0.11s for a filtered aggregation query on 10M rows read from disk. Partition pruning on tier + category reduces the effective scan significantly.
Test 5 — Parallel SCD Type 2 (5 dimensions, 4M rows total)
Workload: 5 dimension tables processed concurrently via ThreadPoolExecutor(max_workers=5). Each dimension undergoes a full SCD Type 2 merge — row hash comparison, expiry of changed records, insertion of new versions.
| Dimension | Existing | Updates | Final | Expired | Time |
|---|---|---|---|---|---|
| dim_agent | 750,000 | 250,000 | 1,000,000 | 250,000 | 2.46s |
| dim_branch | 250,000 | 100,000 | 350,000 | 100,000 | 1.38s |
| dim_customer | 1,000,000 | 300,000 | 1,300,000 | 300,000 | 2.73s |
| dim_policy | 1,500,000 | 500,000 | 2,000,000 | 500,000 | 2.92s |
| dim_product | 500,000 | 200,000 | 700,000 | 200,000 | 2.05s |
| TOTAL | 4,000,000 | 1,350,000 | 5,350,000 | 1,350,000 | 2.92s wall |
All 5 SCD2 jobs completed in 2.92s wall time — bounded by the largest dimension (dim_policy). Demonstrates Sail's ability to handle concurrent Spark Connect sessions from a single server process.
Test 6 — End-to-End Daily SCD2 Pipeline with Parquet I/O (5M rows)
Workload: Full simulated daily pipeline — write T-1 snapshot to Parquet, read back, apply 30% change batch, run SCD2 merge, write T snapshot to Parquet, validate.
| Step | Detail | Time |
|---|---|---|
| Write T-1 snapshot | 5M rows → Parquet | 0.64s |
| Read T-1 from Parquet | 5M rows | 0.01s |
| Generate change batch | 1.5M records (30% churn) | — |
| SCD2 merge | Join + expire + new versions | 0.00s |
| Write T snapshot | 6.5M rows → Parquet | 5.04s |
| Validate output | count + filter checks | 0.18s |
| Total wall time | 5.91s |
Output validation:
| Metric | Value |
|---|---|
| T-1 rows | 5,000,000 |
| T total rows | 6,500,000 |
| Current records | 5,000,000 |
| Expired records | 1,500,000 |
| Net new versions | 1,500,000 |
Parquet files written to local volume and verified readable post-container exit.
Summary
| Test | Rows | Key Operations | Time |
|---|---|---|---|
| Simple aggregation | 5M | 2 joins, 6 aggs | 0.59s |
| Heavy aggregation | 10M | 2 joins, 13 aggs, 7 derived cols | 1.39s |
| Window functions | 8M | 2 joins, 3 windows, 9 aggs | 4.27s |
| Parquet query | 10M | filter + 6 aggs off disk | 0.11s |
| Parallel SCD2 (5 dims) | 4M total | 5 concurrent merges | 2.92s wall |
| Daily SCD2 + Parquet I/O | 5M + 1.5M delta | full pipeline | 5.91s |
Key Observations
No JVM overhead. Sail starts in under 2 seconds and consumes a small memory footprint at idle — no warmup, no GC pauses, no heap tuning required.
Parquet performance is exceptional. Read latency on a 5M row file is effectively zero (0.01s). Query performance on 10M rows off partitioned Parquet is 0.11s — faster than most in-memory engines on equivalent hardware.
Concurrent session support works. Five independent SCD2 pipelines ran in parallel against a single Sail server process with no contention issues. Wall time was bounded by the largest job, not the sum of all jobs.
Memory ceiling on constrained hardware. The practical limit on a 6GB Docker allocation is approximately 5–8M rows depending on operation complexity. Window functions with unbounded sorts and large Parquet writes are the most memory-intensive operations. On a 32GB+ cloud VM, the 10M+ workloads would complete comfortably.
SCD Type 2 logic is fast. The merge computation itself (join, filter, union) executes in sub-second time at 5M rows. The bottleneck in production pipelines is Parquet I/O, not the transformation logic.
Architecture Notes
The SCD2 implementation used in these tests follows a hash-based change detection pattern:
- Compute
MD5(concat_ws("|", tracked_cols))asrow_hashon both existing and incoming records - Left join existing dim to incoming updates on natural key
- Split into three sets: unchanged (no match or same hash), expired (hash changed → set
is_current=False, closeexpiry_date), new versions (hash changed → insert with neweffective_date,expiry_date=9999-12-31,is_current=True) - Union all three sets as final output
All columns in concat_ws must be cast to string before hashing — Sail enforces strict type checking on concat_ws and does not auto-coerce integer types.
Conclusion
Sail delivers credible Spark-compatible performance on severely constrained hardware. On production-grade infrastructure (cloud VMs, Kubernetes), the throughput numbers suggest it is a viable replacement for Spark in batch ETL, dimension processing, and analytical query workloads — with significantly lower operational overhead and no JVM dependency.