AI - Rost Glukhov | Personal site and technical blog

OpenClaw Rise and Fall — Timeline and Real Reasons Behind the Collapse

OpenClaw did not fail as a product. It lost its fuel.

Llama-Server Router Mode - Dynamic Model Switching Without Restarts

For a long time, llama.cpp had a glaring limitation:
you could only serve one model per process, and switching meant a restart.

Claude Skills and SKILL.md for Developers: VS Code, JetBrains, Cursor

Most teams misuse Claude Skills in one of two ways. They either turn SKILL.md into a dumping ground, or they never graduate from giant copy-pasted prompts.

Hermes AI Assistant Skills for Real Production Setups

Hermes AI assistant, officially documented as Hermes Agent, is not positioned as a simple chat wrapper.

OpenClaw Skills Ecosystem and Practical Production Picks

OpenClaw has two extension stories, and they are easy to mix up.

Plugins extend the runtime. Skills extend the agent’s behavior.

OpenClaw Plugins — Ecosystem Guide and Practical Picks

This article is about OpenClaw plugins — native gateway packages that add channels, model providers, tools, speech, memory, media, web search, and other runtime surfaces.

OpenClaw Production Setup Patterns with Plugins and Skills

OpenClaw looks simple in demos. In production, it becomes a system.

Claude, OpenClaw, and the End of Flat Pricing for Agents

The quiet loophole that powered a wave of agent experimentation is now closed.

Vane (Perplexica 2.0) Quickstart With Ollama and llama.cpp

Vane is one of the more pragmatic entries in the “AI search with citations” space: a self-hosted answering engine that mixes live web retrieval with local or cloud LLMs, while keeping the whole stack under your control.

Claude Code install and config for Ollama, llama.cpp, pricing

Claude Code is not autocomplete with better marketing. It is an agentic coding tool: it reads your codebase, edits files, runs commands, and integrates with your development tools.

Hermes AI Assistant - Install, Setup, Workflow, and Troubleshooting

Hermes Agent is a self-hosted, model-agnostic AI assistant that runs on a local machine or low-cost VPS, works through terminal and messaging interfaces, and improves over time by turning repeated tasks into reusable skills.

TGI - Text Generation Inference - Install, Config, Troubleshoot

Text Generation Inference (TGI) has a very specific energy. It is not the newest kid in the inference street, but it is the one that already learned how production breaks -

16 GB VRAM LLM benchmarks with llama.cpp (speed and context)

Here I am comparing speed of several LLMs running on GPU with 16GB of VRAM, and choosing the best one for self-hosting.

Ollama in Docker Compose with GPU and Persistent Model Storage

Ollama works great on bare metal. It gets even more interesting when you treat it like a service: a stable endpoint, pinned versions, persistent storage, and a GPU that is either available or it is not.

Ollama behind a reverse proxy with Caddy or Nginx for HTTPS streaming

Running Ollama behind a reverse proxy is the simplest way to get HTTPS, optional access control, and predictable streaming behaviour.

SGLang QuickStart: Install, Configure, and Serve LLMs via OpenAI API

SGLang is a high-performance serving framework for large language models and multimodal models, built to deliver low-latency and high-throughput inference across everything from a single GPU to distributed clusters.