OpenClaw Rise and Fall — Timeline and Real Reasons Behind the Collapse
OpenClaw rose fast. Then vanished faster.
OpenClaw did not fail as a product. It lost its fuel.
OpenClaw rose fast. Then vanished faster.
OpenClaw did not fail as a product. It lost its fuel.
Serve and swap LLMs without restarts.
For a long time, llama.cpp had a glaring limitation:
you could only serve one model per process, and switching meant a restart.
Build Claude Skills that survive real work
Most teams misuse Claude Skills in one of two ways. They either turn SKILL.md into a dumping ground, or they never graduate from giant copy-pasted prompts.
Profile-first Hermes setups for serious workloads
Hermes AI assistant, officially documented as Hermes Agent, is not positioned as a simple chat wrapper.
The skills worth keeping, and the ones to skip
OpenClaw has two extension stories, and they are easy to mix up.
Plugins extend the runtime. Skills extend the agent’s behavior.
Plugins first. Skills naming in brief.
This article is about OpenClaw plugins — native gateway packages that add channels, model providers, tools, speech, memory, media, web search, and other runtime surfaces.
How real OpenClaw systems are actually structured
OpenClaw looks simple in demos. In production, it becomes a system.
Claude subscriptions no longer power agents
The quiet loophole that powered a wave of agent experimentation is now closed.
Self-hosted AI search with local LLMs
Vane is one of the more pragmatic entries in the “AI search with citations” space: a self-hosted answering engine that mixes live web retrieval with local or cloud LLMs, while keeping the whole stack under your control.
Agentic coding, now with local model backends.
Claude Code is not autocomplete with better marketing. It is an agentic coding tool: it reads your codebase, edits files, runs commands, and integrates with your development tools.
Hermes Agent install and quickstart for devs
Hermes Agent is a self-hosted, model-agnostic AI assistant that runs on a local machine or low-cost VPS, works through terminal and messaging interfaces, and improves over time by turning repeated tasks into reusable skills.
Install TGI, ship fast, debug faster
Text Generation Inference (TGI) has a very specific energy. It is not the newest kid in the inference street, but it is the one that already learned how production breaks -
llama.cpp token speed on 16 GB VRAM (tables).
Here I am comparing speed of several LLMs running on GPU with 16GB of VRAM, and choosing the best one for self-hosting.
Compose-first Ollama server with GPU and persistence.
Ollama works great on bare metal. It gets even more interesting when you treat it like a service: a stable endpoint, pinned versions, persistent storage, and a GPU that is either available or it is not.
HTTPS Ollama without breaking streaming responses.
Running Ollama behind a reverse proxy is the simplest way to get HTTPS, optional access control, and predictable streaming behaviour.
Serve open models fast with SGLang.
SGLang is a high-performance serving framework for large language models and multimodal models, built to deliver low-latency and high-throughput inference across everything from a single GPU to distributed clusters.