Llama-Server Router Mode - Dynamic Model Switching Without Restarts
Serve and swap LLMs without restarts.
For a long time, llama.cpp had a glaring limitation:
you could only serve one model per process, and switching meant a restart.
Serve and swap LLMs without restarts.
For a long time, llama.cpp had a glaring limitation:
you could only serve one model per process, and switching meant a restart.
Plugins first. Skills naming in brief.
This article is about OpenClaw plugins — native gateway packages that add channels, model providers, tools, speech, memory, media, web search, and other runtime surfaces.
Hermes Agent install and quickstart for devs
Hermes Agent is a self-hosted, model-agnostic AI assistant that runs on a local machine or low-cost VPS, works through terminal and messaging interfaces, and improves over time by turning repeated tasks into reusable skills.
Remote Ollama access without public ports
Ollama is at its happiest when it is treated like a local daemon: the CLI and your apps talk to a loopback HTTP API, and the rest of the network never finds out it exists.
Compose-first Ollama server with GPU and persistence.
Ollama works great on bare metal. It gets even more interesting when you treat it like a service: a stable endpoint, pinned versions, persistent storage, and a GPU that is either available or it is not.
HTTPS Ollama without breaking streaming responses.
Running Ollama behind a reverse proxy is the simplest way to get HTTPS, optional access control, and predictable streaming behaviour.
Stateful streaming, checkpoints, K8s, PyFlink, Go.
Apache Flink is a framework for stateful computations over unbounded and bounded data streams.
Graphs, Cypher, vectors, and ops hardening.
Neo4j is what you reach for when the relationships are the data. If your domain looks like a whiteboard of circles and arrows, forcing it into tables is painful.
Push URL updates to search engines after deploy.
Static sites and blogs change whenever you deploy. Search engines that support IndexNow can learn about those changes without waiting for the next blind crawl.
Serve open models fast with SGLang.
SGLang is a high-performance serving framework for large language models and multimodal models, built to deliver low-latency and high-throughput inference across everything from a single GPU to distributed clusters.
Install Kafka 4.2 and stream events in minutes.
Apache Kafka 4.2.0 is the current supported release line, and it’s the best baseline for a modern Quickstart because Kafka 4.x is fully ZooKeeper-free and built around KRaft by default.
Hot-swap local LLMs without changing clients.
Soon you are juggling vLLM, llama.cpp, and more—each stack on its own port. Everything downstream still wants one /v1 base URL; otherwise you keep shuffling ports, profiles, and one-off scripts. llama-swap is the /v1 proxy before those stacks.
Developing software involves Git for version control, Docker for containerization, bash for automation, PostgreSQL for databases, and VS Code for editing — along with countless other tools that make or break your productivity. This page collects the essential cheatsheets, workflows, and comparisons you need to work efficiently across the full development stack.
Self-host OpenAI-compatible APIs with LocalAI in minutes.
LocalAI is a self-hosted, local-first inference server designed to behave like a drop-in OpenAI API for running AI workloads on your own hardware (laptop, workstation, or on-prem server).
How to Install, Configure, and Use the OpenCode
I keep coming back to llama.cpp for local inference—it gives you control that Ollama and others abstract away, and it just works. Easy to run GGUF models interactively with llama-cli or expose an OpenAI-compatible HTTP API with llama-server.
Artificial Intelligence is reshaping how software is written, reviewed, deployed, and maintained. From AI coding assistants to GitOps automation and DevOps workflows, developers now rely on AI-powered tools across the entire software lifecycle.