A Cross-Language Comparison
/usr/bin/time -vAll benchmarks are reproducible. The scripts live in /bench at the project root, with one file per language so the comparisons are apples-to-apples.
Pure CPU work, no I/O, no concurrency.
| benchmark | Duso | Node | Python | Ruby |
|---|---|---|---|---|
| fib(30) x 10,000 (iterative) | 280.7 ms | 6 ms | 14.3 ms | 51 ms |
| nested loop 1000x1000 sum | 447.4 ms | 10 ms | 213.9 ms | 92 ms |
| sort 10,000 random floats | 3.1 ms | 10 ms | 1.6 ms | 4 ms |
Duso is the slowest on hand-written numeric loops, sometimes by an order of magnitude. The single exception is sort, where the work happens inside a native Go function rather than in script. There Duso is competitive with Python and ahead of Node and Ruby.
This pattern – slow when running script-level loops, native-speed when calling Go builtins – is the lead indicator for everything that follows.
500 workers, each making 5 HTTPS requests to httpbin.org/delay/1. The endpoint sleeps about one second server-side; most wall-clock time is real network latency. The test measures how well each runtime manages concurrent I/O on a small box.
| runtime | wall time | outcome |
|---|---|---|
| Node | 23.6 s | clean completion via async event loop |
| Duso | 28.3 s | clean completion via 500 goroutines |
| Python | 201.4 s | thrashed – ThreadPoolExecutor became the bottleneck |
| Ruby | OOM-killed | exceeded 1 GB RAM via per-thread stack allocation |
At 100 workers (the level at which the struggling runtimes can complete):
| runtime | wall time | peak RSS | per-worker |
|---|---|---|---|
| Ruby | 13.9 s | 163 MB | 135 ms |
| Node | 21.0 s | 66 MB | 201 ms |
| Duso | 25.5 s | 24 MB | 254 ms |
| Python | 42.7 s | 214 MB | 301 ms |
At small worker counts, Ruby is the fastest on raw I/O speed. As soon as the worker count grows past what fits in the memory budget, Ruby falls off a cliff and Python becomes effectively unusable. Duso and Node both scale to 500-way concurrency on the 1 GB box without sweating.
Peak RSS at 100 concurrent fetch workers:
| runtime | peak RSS | vs Duso |
|---|---|---|
| Duso | 24 MB | 1.0x |
| Node | 66 MB | 2.8x |
| Ruby | 163 MB | 6.8x |
| Python | 214 MB | 8.9x |
Translated into practical headroom on a 1 GB VM:
| runtime | simultaneous processes (approx) |
|---|---|
| Duso | 40 |
| Node | 14 |
| Ruby | 6 |
| Python | 4 |
For deployments where the hardware is small and fixed (the entire indie / SaaS / internal-tool tier), per-process memory is the practical scaling limit – not raw throughput.
The above results are on a 1 vCPU box. On a multi-core VM the gap widens in Duso’s favor, because Duso uses every available core without configuration while the alternatives don’t.
Project measurements at 1000 concurrent workers on a multi-core box show Duso completing the same fetch benchmark approximately 3.7x faster than Node. The reasons are structural rather than incidental:
cluster mode, which forks separate Node processes that don’t share state. Each fork carries its own V8 instance (50-100 MB) and requires a process supervisor, plus often an external store (Redis or similar) for shared application state.gunicorn -w N or multiprocessing, both of which fork separate Python interpreters (about 200 MB each).Even on an “I/O-bound” benchmark, multi-core scaling matters disproportionately for Duso because what looks like I/O has substantial hidden CPU work: TLS handshakes, HTTP header and body parsing, JSON construction, response handling. That CPU work distributes across cores naturally in Duso and concentrates on one core in the alternatives.
The same mechanics apply to inbound web and API workloads. Per request, a server does:
That CPU work parallelizes across cores in Duso. On a 4 vCPU box, a single Duso process handles concurrent requests across all 4 cores. Node, Python, and Ruby need cluster / worker / fork configurations to do the same – each layering operational complexity (process supervisors, port-sharing, shared-state stores) on top of the application.
For typical web / API workloads where script-level logic is small and most cost lies in network, database, or datastore primitives, Duso has the per-VM capacity advantage on the same hardware – plus a substantially simpler deployment.
Duso is architecturally a tree-walking AST interpreter – the textbook-slowest class of interpreter design. The conventional performance hierarchy:
| class | typical relative speed |
|---|---|
| Optimizing JIT | ~native |
| Bytecode VM | 10x-100x slower than JIT |
| Tree-walking AST | 10x-100x slower than bytecode VM |
By that math, a tree-walking interpreter should run roughly 1000x slower than V8’s TurboFan. Duso runs about 28x slower on the worst-case microbenchmark (fib recursion). That is a remarkable amount of distance closed.
For comparison, here is what the other runtimes ship under the hood:
| runtime | execution model |
|---|---|
| V8 (Node) | four-tier optimizing JIT (Ignition -> Sparkplug -> Maglev -> TurboFan), inline caches, hidden classes, speculative type optimization |
| CPython | bytecode VM with peephole optimization; experimental JIT in 3.13+ |
| MRI Ruby | YARV bytecode VM plus YJIT (LLVM-based copy-and-patch JIT) |
| Lua (reference) | minimal but heavily-optimized bytecode VM |
| LuaJIT | trace JIT; sometimes outperforms hand-written C |
| Duso | tree-walking AST interpreter, no bytecode, no JIT |
Duso skips all of the heavy machinery above. The reasons it remains competitive anyway:
Sort runs in Go’s sort.Slice. JSON parses through encoding/json. Regex uses Go’s RE2. The HTTP server is net/http. The datastore is hand-written Go with proper mutex discipline. Templates render through Go-native string operations.
Every meaningful primitive – the ones a real application spends time in – executes at native Go speed. The AST interpreter is only running between primitives, orchestrating which one to call next. That orchestration code is a small fraction of total runtime in any realistic application.
The 28x gap on the worst microbenchmark only manifests if a developer writes a tight numeric loop in script. Real applications almost never do that – they call a Go-native builtin to do the work.
Most non-mainstream language projects build their own scheduler, often poorly. Duso uses Go’s goroutine scheduler – the one Google paid hundreds of engineers to refine over a decade. Duso’s concurrency primitives (spawn, parallel, datastore wait/cond) sit on top of go func() and sync.Cond, getting M:N scheduling, work-stealing, and multi-core distribution as inherited infrastructure.
This is why Duso scales cleanly past 500 concurrent workers on a 1 GB box while Ruby OOM-kills and Python thrashes – the scheduler underneath is already production-grade.
The Duso codebase shows the optimizations that matter for an AST walker: compound-assignment fast paths, builtin-lookup short-circuits, environment caching, lock-free read paths for hot builtins. But no JIT, no bytecode compiler. The team did the small optimizations that close the worst gaps and stopped – accepting the residual perf cost in exchange for an interpreter that is small, simple, and maintainable.
The point that conventional language-design discourse often misses: most application code is glue between expensive primitives. If your primitives are fast – and Duso’s are, because they are Go – the speed of the glue rarely matters.
Pushing every meaningful primitive into a Go builtin is not a workaround for interpreter slowness. It is the design. The AST interpreter is fine for glue, and that is exactly what the interpreter is asked to do. The result is a runtime that competes on the workloads users actually run, while remaining small and maintainable enough that a focused team can keep the whole thing in their head.
| dimension | result |
|---|---|
| Tight numeric loops in script | Duso loses, sometimes 10x-50x – rarely matters in practice |
| Native-primitive work (sort, regex, JSON, HTTP, KV) | Duso operates at Go speed |
| I/O concurrency on a small VM | Duso and Node scale cleanly to 500; Ruby and Python do not |
| Per-process memory footprint | Duso uses 2.8x-8.9x less than alternatives at the same workload |
| Multi-core scaling | Duso uses every core natively; alternatives need cluster/fork config |
The runtime does not need to win every benchmark. It needs to be competitive on the workloads users run and dominant on the package: single binary, multi-core for free, low memory, zero operational overhead, batteries included. The data supports that exactly.
For the “single binary running a real web/API server on a small cloud VM” target – which is the design center – Duso is the lean, simple choice that holds up under measurement.