How Ollama Handles Parallel Requests
Configuring ollama for parallel requests executions.
When the Ollama server receives two requests at the same time, its behavior depends on its configuration and available system resources.
Configuring ollama for parallel requests executions.
When the Ollama server receives two requests at the same time, its behavior depends on its configuration and available system resources.
Comparing two deepseek-r1 models to two base ones
DeepSeek’s first-generation of reasoning models with comparable performance to OpenAI-o1, including six dense models distilled from DeepSeek-R1 based on Llama and Qwen.
A python code of RAG's reranking
Requires some experimenting but
Still there are some common approaches how to write good prompts so LLM would not get confused trying to understand what you wand from it.
8 llama3 (Meta+) and 5 phi3 (Microsoft) LLM versions
Testing how models with different number of parameters and quantization are behaving.