AI Systems Memory — Persistent Knowledge and Agent Memory

Persistent knowledge beyond a single chat thread.

Page content

This section collects guides on persistent knowledge and memory for AI systems — how assistants keep facts, preferences, and distilled context across sessions without stuffing every token into one prompt. Here, memory means intentional retention (user facts, summaries, plugin-backed stores), not GPU RAM or model weights.

It complements the broader AI Systems cluster — OpenClaw, Hermes, orchestration — and sits beside RAG for retrieval mechanics and LLM Hosting for running models.

Memory sits inside the broader assistant stack described in AI Assistant Architecture alongside routing, tooling, and observability.

Memory design for assistants

Cross-framework guide to short-term, structured, and retrieval memory — consolidation policy, vector tradeoffs, and patterns from OpenAI, LangGraph, Hermes, and OpenClaw.

Memory Systems in AI Assistants That Actually Help — working memory, structured state, retrieval layers, and when memory helps versus hurts

Agent memory providers

Drop-in backends exposed by frameworks such as Hermes Agent and OpenClaw — Honcho, OpenViking, Mem0, Hindsight, and others — with different LLM, embedding, and database trade-offs.

Agent memory providers compared — full table, dependency notes, and Hermes memory setup flows

For Hermes-only bounded core memory (MEMORY.md / USER.md), see Hermes Agent Memory System.

Knowledge graphs and Cognee

Institutional and project knowledge extracted into graphs for retrieval-aware assistants.

Self-Hosting Cognee — Choosing LLM on Ollama — hands-on Cognee quickstart with local models
Choosing the Right LLM for Cognee — Local Ollama Setup — model comparison for graph quality vs hardware

Graph builders such as Cognee typically ingest Markdown vaults, wikis, or exports that people have already edited—salience, naming, and “why this mattered” are largely settled before chunks hit embeddings. A sloppy upstream corpus trains ambiguity back into the assistant; disciplined capture-through-expression workflows limit that damage. For that human-centered framing—including how it differs from retrieval-first RAG—see Second brain explained for engineers.

Memory design for assistants

Agent memory providers

Knowledge graphs and Cognee

Stack context

Subscribe