Benchmarks
Stormify-native (Kotlin/Native bound to libpq, libmariadb, ODPI-C, FreeTDS,
libsqlite3) measured against plain Hibernate 6 backed by pgjdbc / mysql-
connector-j / ojdbc11 / mssql-jdbc / org.xerial sqlite-jdbc, all running
against the same Docker-hosted database instances. The bench is co-located
with the source: benchmarks/runner/run.sh orchestrates DDL prepare →
inserts → reads across the matrix of databases × implementations and writes
benchmarks/results/results.csv.
The data tables on this page are populated from that CSV by
benchmarks/report/generate-benchmarks-md.py. After a fresh bench run we
look at the regenerated tables, decide whether the prose still matches
reality, and edit it directly.
Methodology
- Schema: two tables.
bench_parent is small (default 1000 rows) and
acts as the lookup target for the JOIN and N+1 scenarios. bench_child
is the workhorse (default 10 000 rows) where every other scenario runs.
- Iterations: 500 per read scenario, lower for heavier write paths
(50 for
update_bulk_* and tx_rollback, 1000 separate transactions for
single_insert_1000).
- Statistic: per-iteration
p50 (median). 5 warm-up iterations per
scenario run before measurement begins (override with BENCH_WARM):
same SQL, different parameter window per iter. The point is to put
the database in the steady state a real production workload would
already be in — buffer pool primed, query plan compiled, server-side
prepared statement ready, driver JIT/AOT settled. No result-set data
is cached on the ORM side because every iteration's params are new.
A separate JIT pass against the dedicated bench_warmup table runs
even earlier, so the measurement tables stay untouched until the
first measured iter of the first scenario.
- Setup: every iteration runs in its own short transaction only when
the equivalent JPA path also opens one (writes and
tx_rollback).
Read-only scenarios are not wrapped in a transaction on either side.
- Connection: a single persistent JDBC / KDBC connection per process;
no pool churn included in the per-iter timing.
- Data sizes: defaults above; override via
BENCH_PARENTS and
BENCH_CHILDREN. The two insert-throughput phases (insert_1000,
insert_10000) clear bench_child between runs and refill at the
named row count; the final phase leaves BENCH_CHILDREN rows in
place for the read scenarios.
A standalone HTML report with the same data plus chart-style visualisation
sits at benchmarks/results/report.html after each bench run.
The bench process logs both its time-to-first-query and resident-set-size
on every run. These don't measure database performance — they measure the
cost of spinning up the runtime that drives it.
Cold start — process spawn to first query ready
|
SQLite |
MySQL |
PostgreSQL |
Oracle |
SQL Server |
| Stormify-native |
440μs |
3.1ms |
25.6ms |
25.0ms |
8.9ms |
| Hibernate (JPA) |
2531ms |
2446ms |
2103ms |
2357ms |
2194ms |
| Native is |
5753× faster |
794× faster |
82× faster |
94× faster |
246× faster |
Resident set size at startup
|
SQLite |
MySQL |
PostgreSQL |
Oracle |
SQL Server |
| Stormify-native |
7.0MB |
11.0MB |
13.7MB |
44.6MB |
15.0MB |
| Hibernate (JPA) |
314MB |
336MB |
327MB |
356MB |
356MB |
| Native is |
45× lighter |
31× lighter |
24× lighter |
8× lighter |
24× lighter |
JPA startup is dominated by JVM bootstrap + Hibernate metadata loading:
classpath scanning, entity-manager-factory construction, dialect detection,
all before the first query. Stormify-native ships as a static binary with
the entity descriptors already baked in by the annproc-generated
GeneratedEntities, so the same path collapses to "open the libpq /
libmariadb / ODPI-C / FreeTDS / sqlite3 connection, fire the query."
The RSS comparison reflects the same thing: a JVM heap settles at a few
hundred megabytes regardless of workload size; the native binary holds
just the C-library state plus live result-set buffers.
Read scenarios
Per-iteration durations in microseconds. Lower is better. The "speedup"
row is jpa / native — values above 1.0 mean Stormify-native is
faster (a 5.0× speedup means native finished in one fifth of the JPA
time). Each scenario exposes p50 / p95 / p99 as separate tabs.
findById
PK lookup, one-row entity hydration.
SELECT * FROM bench_child WHERE id = ?
list_window_1000
1000-row windowed SELECT into a typed list.
SELECT * FROM bench_child WHERE id BETWEEN ? AND ?
join_child_parent
500-row JOIN projected to a typed DTO.
SELECT c.id AS cid, c.value, p.name FROM bench_child c
JOIN bench_parent p ON c.parent_id = p.id
WHERE c.id BETWEEN ? AND ?
n_plus_1_50
50-row child page + bulk-fetch parents (the JPA-friendly N+1 size).
-- Stormify (2 round-trips):
SELECT * FROM bench_child WHERE id BETWEEN ? AND ? -- 50 rows
SELECT * FROM bench_parent WHERE id IN (?, ?, …)
-- JPA via lazy @ManyToOne (51 round-trips):
SELECT * FROM bench_child WHERE id BETWEEN ? AND ? -- 50 rows
SELECT * FROM bench_parent WHERE id = ? -- ×50
n_plus_1_1000
1000-row child N+1 — extreme case; JPA fires 1001 round-trips.
-- Stormify (2 round-trips):
SELECT * FROM bench_child WHERE id BETWEEN ? AND ? -- 1000 rows
SELECT * FROM bench_parent WHERE id IN (?, ?, …)
-- JPA via lazy @ManyToOne (1001 round-trips):
SELECT * FROM bench_child WHERE id BETWEEN ? AND ? -- 1000 rows
SELECT * FROM bench_parent WHERE id = ? -- ×1000
paged_scan
5000-row server-side cursor stream (fetchSize = 50).
SELECT * FROM bench_child WHERE id BETWEEN ? AND ?
complex_filter
Three-predicate filter on bench_child.
SELECT * FROM bench_child
WHERE status = ? AND value BETWEEN ? AND ?
AND id BETWEEN ? AND ?
update_bulk_sql
Single UPDATE touching ~100 rows, raw SQL.
BEGIN;
UPDATE bench_child SET status = ? WHERE id BETWEEN ? AND ?;
COMMIT;
update_bulk_orm
ORM batch update — load 100 children, mutate, flush.
BEGIN;
SELECT * FROM bench_child WHERE id BETWEEN ? AND ?;
-- driver-batched per-row UPDATE (mutate, flush)
UPDATE bench_child SET status = ? WHERE id = ?; -- ×100
COMMIT;
tx_rollback
BEGIN + UPDATE + ROLLBACK round-trip mechanics.
BEGIN;
UPDATE bench_child SET status = ? WHERE id = ?;
ROLLBACK;
Insert scenarios
Two flavours: bulk inserts measure a single transaction with many rows
(total wall-clock is reported as one total row in the CSV), single
inserts measure 1000 separate one-row transactions and report the per-iter
p50.
insert_1000
1000 children inserted in one transaction.
BEGIN;
INSERT INTO bench_child (id, parent_id, status, value, payload, created_at)
VALUES (?, ?, ?, ?, ?, ?); -- ×1000, driver-batched
COMMIT;
|
SQLite |
MySQL |
PostgreSQL |
Oracle |
SQL Server |
| Stormify-native |
3489μs |
42.4ms |
15.8ms |
33.9ms |
60.9ms |
| Hibernate (JPA) |
110.2ms |
438.5ms |
208.1ms |
281.7ms |
300.1ms |
| Speedup (higher=better) |
31.58x ✓ |
10.33x ✓ |
13.19x ✓ |
8.32x ✓ |
4.93x ✓ |
insert_10000
10 000 children inserted in one transaction, chunked at 5000.
BEGIN;
INSERT INTO bench_child (id, parent_id, status, value, payload, created_at)
VALUES (?, ?, ?, ?, ?, ?); -- ×10 000, batched
COMMIT;
|
SQLite |
MySQL |
PostgreSQL |
Oracle |
SQL Server |
| Stormify-native |
36.2ms |
677.5ms |
122.0ms |
61.5ms |
593.1ms |
| Hibernate (JPA) |
233.6ms |
1758.9ms |
897.1ms |
3105.5ms |
1345.7ms |
| Speedup (higher=better) |
6.44x ✓ |
2.60x ✓ |
7.35x ✓ |
50.49x ✓ |
2.27x ✓ |
single_insert_1000
1000 separate 1-row transactions (per-iter p50).
BEGIN;
INSERT INTO bench_child (id, parent_id, status, value, payload, created_at)
VALUES (?, ?, ?, ?, ?, ?);
COMMIT;
Caveats and why native can lose to JDBC
Stormify-native ships ahead on most scenarios, but not all. The points
below cover both the measurement caveats and the structural reasons the
native side can fall behind a JIT-warmed JDBC driver. Specific scenarios
that visibly hit one or more of these in the latest run are flagged with
✗ in the tables above.
- Single-run variance. The bench runs on a normal desktop, not
isolated hardware. Run-to-run variance is typically ~10% on tight
scenarios and can spike to 50%+ on heavier write paths under disk
contention. On scenarios that finish in tens of microseconds, a
single OS jitter spike is enough to flip the speedup, so single-run
reports below 1.0× in those rows usually recover above 1.0× under
multi-run aggregation — see the
p95 tab and the HTML report.
- JPA tuning held neutral. Hibernate 6 runs with no second-level
cache, no query cache, and
useLocalSessionState /
useLocalTransactionState left at mysql-connector-j defaults. Tuning
either side further would skew the comparison.
- Per-cell hydration cost. Every column read crosses the
Kotlin/Native ↔ C boundary (≈ 50 ns) and
toKString() copies bytes
for every text column; JDBC drivers stay inside the JVM and reuse
the receive buffer between rows. Both costs scale linearly with row
count and column count, so the heaviest reads are where the gap is
most visible.
- AOT vs JIT compilation. Kotlin/Native is LLVM-AOT, no profile-
guided inlining. JDBC drivers benefit from runtime specialisation
once the JVM has warmed up; the AOT output for
when blocks and
setter dispatch can be slower than the equivalent JIT'd code.
- No protocol-level pipelining in some C client libraries. Several
request/response C client libraries don't expose a way to pack
multiple commands per TCP write. The matching JDBC drivers speak the
wire protocol directly and can — a structural advantage we can't
match without bypassing the C library.