The goal is to squeeze the maximum possible performance out of a single physical CPU core. Every stall, every OS call, every cache miss, and every thread context switch adds latency. This project builds a system that eliminates all of those at the source.
Nischal Khanal
Systems & Performance Engineer
Software Engineer exploring systems, market infrastructure, and performance engineering
Featured Projects
What I'm up to now?
View what I'm focused on right now →Recent Writing
Decoupled Vector-Map Data Layout for Allocation-Free Limit Order Book
An architectural guide to a 3-layer C++ order book layout using a flat vector memory pool and shallow map to achieve O(1) FIFO queue operations.
Python GIL Trap in Low-Latency Async Pipelines
We stopped market-volatility event loop freezes by micro-batching Pydantic payloads into a single GIL-efficient thread handoff for flatline reliability
Stabilizing a High-Frequency Trading Gateway: How We Reclaimed Our Event Loop Under Extreme Market Volatility
Fixed trading pipeline message drops by replacing blocking writes and GIL-heavy validation with an async micro-batching architecture