Tiny-vLLM: LLM Inference in C++ and CUDA
Tiny-vLLM rebuilds vLLM's core inference algorithms in pure C++ and CUDA — no Python required. Here is what self-hosters and inference engineers can learn from reading 3,000 lines of clean, annotated code.
4 articles tagged #self-hosted
Tiny-vLLM rebuilds vLLM's core inference algorithms in pure C++ and CUDA — no Python required. Here is what self-hosters and inference engineers can learn from reading 3,000 lines of clean, annotated code.
SQLite is enough for durable workflows when you run a single node and stay under ~5,000 state transitions per second. This deep dive compares SQLite, Postgres-backed DBOS, and Temporal so you can pick the right tool for your self-hosted setup.
13 million NXDomains in a year. How to run Technitium DNS in a homelab for ad blocking, split DNS, and LDAP service discovery — with real numbers.
Grafana 13 for self-hosted: Git Sync is GA, React 19 breaks community plugins, Unified Storage auto-migrates. Here is the list before you change the image tag.