Performance Benchmarking of Node.js, Rust, Go, and Python
in Web Applications
by Igor Khokhriakov aka Ingvord
Disclaimer
In this talk I will scratch the surface of the software system design i.e. only the part that is connected with performance benchmarking. The whole thing is huge and won’t probably fit even into 45 min talk.
So if you will be interested to continue discussion after this session please feel free to connect on LinkedIn: https://www.linkedin.com/in/ikhokhryakov/
Preamble
Share some of my knowledge which I think might be useful in the context of SciCat@DESY project.
We want to make our project be:
Want to show you some tools/tricks that may help to achieve our goals.
Preamble
Our goal: Make SciCat useful @DESY
This meeting’s goal:
Share some of my knowledge which I think might be useful in the context of
OUR GOAL.
Usefulness has many faces: functional and non-functional
Theory and definitions
Node – physical OR logical unit of a cluster
Instance – a running process of a software component
RPS = Requests per Second.
RPS is one of the key components/metrics when doing “System design”.
Capturing non-functional requirements
Is required for …
Performance benchmarking is required for all above prior deployment and during runtime (observability)
Reasoning about performance requires understanding of expected desired RPS
Software System Design
System design is analogous to planning a scientific experiment.
It involves analyzing (and DOCUMENTING!!!)* requirements, outlining the architecture, and detailing component interactions—similar to defining experiment parameters, setting up equipment, and ensuring methods interact effectively.
Key elements include:
Here is where RPS comes into play
Why Requests Per Second (RPS)?
Researching RPS of your system under different conditions is required for:
Stage 1: Capacity planning - how many software components/hardware is required
Stage 2: Liveness metric - if deviates from designed it indicates issues (requires observability tools*)
Example: System Design of a Social Network
Simplified System Design example
Demonstrated RPS
How to map business metrics to RPS
Estimate load = estimate cost
Model = Session (how many Requests expected)
A < B!!!
The goal of System Design:
A < B!!!
SciCat example: 1 user action = many RPS
Performance Benchmarking
How to measure your Node’s max RPS aka B?!
Two most common scenarios for capturing RPS:
IO load and CPU load
IO (Input/Output e.g. Networking, Writing to disk etc) i.e ingesting data into MongoDB
CPU - intensive computations i.e. calculating hashes, string operations e.g. routing
Let’s explore NodeJs+Express performance
SciCat = NestJs = NodeJs + Express
https://github.com/Ingvord/shiny-guide
Prerequisites
All tests were performed on a typical single-instance virtual machine, armed with 8(4*) CPU cores and 12 GB RAM
Wrk2 was used to simulate requests*
wrk -R{1000..10000} -t10 -c1000 -d30
-R – rate
-t – number of threads
-c – connections
-d – duration
Above simulates how 10_000 clients requesting during 30s with various rate
https://github.com/giltene/wrk2
No Magic involved
NodeJs+Express app:
Entry points:
(cpuTime, sleepTime)
Baseline: 5_000 RPS
NodeJs IO load (single instance*, 8 instances + nginx)
Single instance
8 instances + nginx
* instance here and below means a process running on the VM
Higher is better
NodeJs CPU load (single instance, 8 instances + nginx)
Single instance
8 instances + nginx
https://github.com/Ingvord/shiny-guide
Higher is better
Bonus baseline (No Express, just plain http): 20_000K RPS
Bonus: Avoiding generic solutions boosts performance
Also makes code smaller, cleaner and much more maintainable :)
Rust CPU load (single instance, 4 instances + nginx)
https://github.com/Ingvord/redesigned-fishstick
Higher is better
Rust IO load (single instance, 4 instances + nginx)
https://github.com/Ingvord/redesigned-fishstick
Higher is better
DevHands.io
Kirill
Filimonov
Artem
Karpov
Python (FastAPI) – single instance
https://github.com/kirillF/devhands-bootcamp
https://humble-cart-ee9.notion.site/Module-2-Benchmarks-dd5062d1b15149b49b7c553fce92265a?pvs=4
IO - RPS vs Rate
CPU - Latency vs Rate
Higher is better
Lower is better
Go (echo)
https://gitlab.com/devhands/devhands (In Russian)
25ms CPU + 5ms IO
1ms CPU + 19 ms IO
Lower is better
BONUS:
check out this great resource ->
Conclusions
Documenting, documenting and documenting!!!
Single instance won’t stand a chance against even a moderate load
Multiple instances require orchestration (e.g. 2 instances + nginx per beamline)
Make your software Rusty
Thanks!
Questions?