Open WebUI: Self-Hosted LLM Interface
Self-hosted ChatGPT alternative for local LLMs
Open WebUI is a powerful, extensible, and feature-rich self-hosted web interface for interacting with large language models.
Self-hosted ChatGPT alternative for local LLMs
Open WebUI is a powerful, extensible, and feature-rich self-hosted web interface for interacting with large language models.
Real AUD pricing from Aussie retailers now
The NVIDIA DGX Spark (GB10 Grace Blackwell) is now available in Australia at major PC retailers with local stock. If you’ve been following the global DGX Spark pricing and availability, you’ll be interested to know that Australian pricing ranges from $6,249 to $7,999 AUD depending on storage configuration and retailer.
Testing Cognee with local LLMs - real results
Cognee is a Python framework for building knowledge graphs from documents using LLMs. But does it work with self-hosted models?
Type-safe LLM outputs with BAML and Instructor
When working with Large Language Models in production, getting structured, type-safe outputs is critical. Two popular frameworks - BAML and Instructor - take different approaches to solving this problem.
Thoughts on LLMs for self-hosted Cognee
Choosing the Best LLM for Cognee demands balancing graph-building quality, hallucination rates, and hardware constraints. Cognee excels with larger, low-hallucination models (32B+) via Ollama but mid-size options work for lighter setups.
Build AI search agents with Python and Ollama
Ollama’s Python library now includes native OLlama web search capabilities. With just a few lines of code, you can augment your local LLMs with real-time information from the web, reducing hallucinations and improving accuracy.
Build AI search agents with Go and Ollama
Ollama’s Web Search API lets you augment local LLMs with real-time web information. This guide shows you how to implement web search capabilities in Go, from simple API calls to full-featured search agents.
Master local LLM deployment with 12+ tools compared
Local deployment of LLMs has become increasingly popular as developers and organizations seek enhanced privacy, reduced latency, and greater control over their AI infrastructure.
Deploy enterprise AI on budget hardware with open models
The democratization of AI is here. With open-source LLMs like Llama 3, Mixtral, and Qwen now rivaling proprietary models, teams can build powerful AI infrastructure using consumer hardware - slashing costs while maintaining complete control over data privacy and deployment.
GPT-OSS 120b benchmarks on three AI platforms
I dug up some interesting performance tests of GPT-OSS 120b running on Ollama across three different platforms: NVIDIA DGX Spark, Mac Studio, and RTX 4080. The GPT-OSS 120b model from the Ollama library weighs in at 65GB, which means it doesn’t fit into the 16GB VRAM of an RTX 4080 (or the newer RTX 5080).
Compare Docker Model Runner and Ollama for local LLM
Running large language models (LLMs) locally has become increasingly popular for privacy, cost control, and offline capabilities. The landscape shifted significantly in April 2025 when Docker introduced Docker Model Runner (DMR), its official solution for AI model deployment.
Integrate Ollama with Go: SDK guide, examples, and production best practices.
This guide provides a comprehensive overview of available Go SDKs for Ollama and compares their feature sets.
Comparing Speed, parameters and performance of these two models
Here is a comparison between Qwen3:30b and GPT-OSS:20b focusing on instruction following and performance parameters, specs and speed:
+ Specific Examples Using Thinking LLMs
In this post, we’ll explore two ways to connect your Python application to Ollama: 1. Via HTTP REST API; 2. Via the official Ollama Python library.
Not very nice.
Ollama’s GPT-OSS models have recurring issues handling structured output, especially when used with frameworks like LangChain, OpenAI SDK, vllm, and others.
A couple of ways to get structured output from Ollama
Large Language Models (LLMs) are powerful, but in production we rarely want free-form paragraphs. Instead, we want predictable data: attributes, facts, or structured objects you can feed into an app. That’s LLM Structured Output.