LLM

llama.cpp Quickstart with CLI and Server

llama.cpp Quickstart with CLI and Server

How to Install, Configure, and Use the OpenCode

I keep coming back to llama.cpp for local inference—it gives you control that Ollama and others abstract away, and it just works. Easy to run GGUF models interactively with llama-cli or expose an OpenAI-compatible HTTP API with llama-server.

Observability for LLM Systems: Metrics, Traces, Logs, and Testing in Production

Observability for LLM Systems: Metrics, Traces, Logs, and Testing in Production

End-to-end observability strategy for LLM inference and LLM applications

LLM systems fail in ways that traditional API monitoring cannot surface — queues fill silently, GPU memory saturates long before CPU looks busy, and latency blows up at the batching layer rather than the application layer. This guide covers an end-to-end observability strategy for LLM inference and LLM applications: what to measure, how to instrument it with Prometheus, OpenTelemetry, and Grafana, and how to deploy the telemetry pipeline at scale.