Implementing CQRS in Go: A Practical Guide to Scalable Architecture
Build CQRS in Go without needless ceremony
CQRS is one of those patterns that gets oversold, overcomplicated, and occasionally misdiagnosed as a cure for plain old CRUD boredom.
Build CQRS in Go without needless ceremony
CQRS is one of those patterns that gets oversold, overcomplicated, and occasionally misdiagnosed as a cure for plain old CRUD boredom.
Diagrams as code, without the drama.
Mermaid is a text-based diagramming tool for people who would rather write diagrams than drag boxes around a canvas.
Organize notes by action, not topic.
Organizing notes by topic sounds logical until you have notes on PostgreSQL in five different folders and cannot find the one that matters for today’s problem.
Notes that improve instead of decaying.
Most engineering notes are written once and forgotten. You capture something in a debugging session, paste it somewhere, and find it two years later with no context for why it mattered.
Publish knowledge that grows, not just posts.
The dominant model for publishing knowledge online has not changed much since the early 2000s: write something, polish it, publish it, move on.
Pick the simplest pattern that works.
Single-model systems are simple. Multi-model systems are powerful. The challenge isn’t choosing models — it’s designing the architecture that orchestrates them.
The right model for the right task.
Running a 70B parameter model to summarize a 200-word email is wasteful. Running a 3B model to review production code is reckless. Most systems live somewhere in between — and that’s where model routing comes in.
Control the risk, not just the model.
LLMs are unpredictable. They hallucinate, leak data, generate harmful content, or refuse legitimate requests. Guardrails constrain model behavior without sacrificing capability.
Spend tokens where they actually matter.
LLM costs scale linearly with usage. A system processing 10,000 requests a day at $0.01 per request costs $100 daily — $365 a year. At enterprise scale, that’s over $10,000.
Working, structured, and retrieval memory for assistants.
Memory turns assistants from reactive to persistent, but it is also where many systems quietly rot. Surveys argue the short-term versus long-term split is no longer enough for modern agent memory; OpenAI and LangGraph SDKs point to a simpler stack — working memory, durable state, and retrieval.
How serious assistants are actually built.
A production AI assistant is not “an LLM with a prompt”. It is a system that accepts intent, keeps state, decides when to retrieve or act, and exposes enough runtime detail to debug failures.
AI changes knowledge management, not its purpose.
AI is not replacing knowledge management; it is changing the shape of it for both individuals and teams.
Build a developer knowledge graph.
Developers do not usually suffer from a lack of information. We suffer from too much of it.
Stars, tokens, downloads — who actually wins?
Open-source AI agent frameworks are exploding in popularity on GitHub. Two projects at the core of the self-hosted AI systems ecosystem — OpenClaw and Hermes Agent — have pulled so far ahead that the rest of the field is fighting for a distant third place.
MTP vs standard decoding on RTX 4080 — real benchmarks
I tested Speculative decoding (Multi-Token Prediction, MTP) performance in Qwen 3.6 27B and 35B on an RTX 4080 with 16 GB VRAM.
Free VRAM without killing llama-server.
llama.cpp router mode is one of the most useful changes to llama-server in years. It finally gives local LLM operators something close to the model management experience people expect from Ollama, while keeping the raw performance and low-level control that make llama.cpp worth using in the first place.
Get new posts on AI systems, Infrastructure, and AI engineering.