Llm articles

DEEP DIVE

Tiny-vLLM: LLM Inference in C++ and CUDA

May 30, 20265 min readAIOps & Observability

Tiny-vLLM rebuilds vLLM's core inference algorithms in pure C++ and CUDA — no Python required. Here is what self-hosters and inference engineers can learn from reading 3,000 lines of clean, annotated code.

DEEP DIVE

OTel for AI agents: same tool, new conventions

May 29, 20265 min readAIOps & Observability

OpenTelemetry graduated at CNCF on 21 May 2026. It now has developing semantic conventions for LLM spans and agent spans. If you already instrument microservices with OTel, here is what changes when you add AI to the stack.

HOT TAKE

Forge: guardrails that make small models reliable

May 23, 2026AIOps & Observability

Forge adds guardrails to local LLM tool-calling in Python. Lifts an 8B model from ~32% to 84% on its eval suite — no model swap, just a reliability wrapper.

HOT TAKE

arXiv Bans Authors for AI-Hallucinated Citations

May 15, 2026DevOps Culture

arXiv now bans authors for one year if their paper contains AI-hallucinated citations. After the ban, every submission requires prior peer review. The model is not responsible. You are.

Llm

Articles tagged Llm

Tiny-vLLM: LLM Inference in C++ and CUDA

OTel for AI agents: same tool, new conventions

Forge: guardrails that make small models reliable

arXiv Bans Authors for AI-Hallucinated Citations