Reduce LLM Costs: Token Optimization Strategies
Cut LLM costs by 80% with smart token optimization
Token optimization is the critical skill separating cost-effective LLM applications from budget-draining experiments.
Cut LLM costs by 80% with smart token optimization
Token optimization is the critical skill separating cost-effective LLM applications from budget-draining experiments.
Build MCP servers for AI assistants with Python examples
The Model Context Protocol (MCP) is revolutionizing how AI assistants interact with external data sources and tools. In this guide, we’ll explore how to build MCP servers in Python, with examples focused on web search and scraping capabilities.
Availability, real-world retail pricing across six countries, and comparison against Mac Studio.
NVIDIA DGX Spark is real, on sale Oct 15, 2025, and targeted at CUDA developers needing local LLM work with an integrated NVIDIA AI stack. US MSRP $3,999; UK/DE/JP retail is higher due to VAT and channel. AUD/KRW public sticker prices are not yet widely posted.
Integrate Ollama with Go: SDK guide, examples, and production best practices.
This guide provides a comprehensive overview of available Go SDKs for Ollama and compares their feature sets.
Comparing Speed, parameters and performance of these two models
Here is a comparison between Qwen3:30b and GPT-OSS:20b focusing on instruction following and performance parameters, specs and speed:
+ Specific Examples Using Thinking LLMs
In this post, we’ll explore two ways to connect your Python application to Ollama: 1. Via HTTP REST API; 2. Via the official Ollama Python library.
Not very nice.
Ollama’s GPT-OSS models have recurring issues handling structured output, especially when used with frameworks like LangChain, OpenAI SDK, vllm, and others.
Slightly different APIs require special approach.
Here’s a side-by-side support comparison of structured output (getting reliable JSON back) across popular LLM providers, plus minimal Python examples
A couple of ways to get structured output from Ollama
Large Language Models (LLMs) are powerful, but in production we rarely want free-form paragraphs. Instead, we want predictable data: attributes, facts, or structured objects you can feed into an app. That’s LLM Structured Output.
Description, plans commands list and keyboard shortcuts
Here is an up-to-date GitHub Copilot cheat sheet, covering essential shortcuts, commands, usage tips, and context features for Visual Studio Code and Copilot Chat
Longread about MCP scpecs and implementation in GO
Here we have a description of The Model Context Protocol (MCP), short notes on how to implement an MCP server in Go, including message structure, protocol specifications.
Implementing RAG? Here are some Go code bits - 2...
Since standard Ollama doesn’t have a direct rerank API, you’ll need to implement reranking using Qwen3 Reranker in GO by generating embeddings for query-document pairs and scoring them.
Implementing RAG? Here are some codesnippets in Golang..
This little Reranking Go code example is calling Ollama to generate embeddings for the query and for eache candidate document, then sorting descending by cosine similarity.
LLM to extract text from HTML...
In the Ollama models library there are models that able convert HTML content to Markdown, which is useful for content conversion tasks.
What is this trendy AI-assisted coding?
Vibe coding is an AI-driven programming approach where developers describe desired functionality in natural language, allowing AI tools to generate code automatically.