Running Ollama on Kubernetes: A Complete Guide to Local LLM Deployment

Collabnix

Collabnix Team

24 juin 2025 à 18:25

Learn how to deploy and scale Ollama LLM models on Kubernetes clusters for production-ready AI applications

Building RAG Applications with Ollama and Python: Complete 2025 Tutorial

Collabnix

Collabnix Team

24 juin 2025 à 16:30

Retrieval-Augmented Generation (RAG) has revolutionized how we build intelligent applications that can access and reason over external knowledge bases. In this comprehensive tutorial, we’ll explore how to build production-ready RAG applications using Ollama and Python, leveraging the latest techniques and best practices for 2025. What is RAG and Why Use Ollama? Retrieval-Augmented Generation combines the […]

AI in Real-World Applications: Beyond Code Generation

Collabnix

Collabnix Team

24 juin 2025 à 13:20

A technical exploration of autonomous AI systems that move beyond content generation to real-world execution

Agentic AI in Customer Service: The Complete Technical Implementation Guide for 2025

Collabnix

Collabnix Team

23 juin 2025 à 06:37

Let’s get one thing straight—if you’re still deploying rule-based chatbots in 2025, you’re essentially bringing a flip phone to a smartphone convention. I’ve been in the trenches with AI implementations for years, and I can tell you that the shift from reactive customer service bots to autonomous agentic AI isn’t just evolutionary—it’s revolutionary. And frankly, […]

10 Agentic AI Tools That Will Replace ChatGPT in 2025

Collabnix

Collabnix Team

23 juin 2025 à 04:34

Stop settling for AI that just answers questions. The future belongs to AI that actually does the work. If you’re still using ChatGPT like it’s 2023, you’re about to be left behind. While you’ve been asking ChatGPT to write emails, a revolutionary shift is happening in the AI world—and it’s called Agentic AI. Here’s the […]

Ollama vs ChatGPT 2025: Complete Technical Comparison Guide

Collabnix

Collabnix Team

19 juin 2025 à 03:28

Ollama vs ChatGPT 2025: A Comprehensive Comparison A comprehensive technical analysis comparing local LLM deployment via Ollama against cloud-based ChatGPT APIs, including performance benchmarks, cost analysis, and implementation strategies The artificial intelligence landscape has reached a critical inflection point in 2025. Organizations worldwide face a fundamental strategic decision that will define their AI capabilities for […]

Best Ollama Models 2025: Performance Comparison Guide

Collabnix

Collabnix Team

19 juin 2025 à 03:09

Top Picks for Best Ollama Models 2025 A comprehensive technical analysis of the most powerful local language models available through Ollama, including benchmarks, implementation guides, and optimization strategies Introduction to Ollama’s 2025 Ecosystem The landscape of local language model deployment has dramatically evolved in 2025, with Ollama establishing itself as the de facto standard for […]

Understanding the n8 app and Its Solutions

Collabnix

Tanvir Kour

18 juin 2025 à 06:44

In today’s digital world, we use dozens of different apps and services every day. Email, Slack, Google Sheets, databases, social media, CRM systems – the list goes on. While each tool serves its purpose, getting them to work together smoothly can be a nightmare. Enter n8n (pronounced “n-eight-n”), a powerful workflow automation platform that connects […]

LM Studio vs Ollama: Picking the Right Tool for Local LLM Use

Collabnix

Tanvir Kour

16 juin 2025 à 11:31

LM Studio prioritizes ease of use with a polished GUI ideal for beginners, while Ollama offers greater flexibility and control through its developer-friendly command-line interface and REST API. Choose LM Studio if you want a plug-and-play experience with visual controls, or Ollama if you prefer command-line power and deeper customization options. The landscape of local […]

How to Build, Run, and Package AI Models Locally with Docker Model Runner

Docker

Vladimir Mikhalev

12 juin 2025 à 16:00

Introduction

As a Senior DevOps Engineer and Docker Captain, I’ve helped build AI systems for everything from retail personalization to medical imaging. One truth stands out: AI capabilities are core to modern infrastructure.

This guide will show you how to run and package local AI models with Docker Model Runner — a lightweight, developer-friendly tool for working with AI models pulled from Docker Hub or Hugging Face. You’ll learn how to run models in the CLI or via API, publish your own model artifacts, and do it all without setting up Python environments or web servers.

What is AI in Development?

Artificial Intelligence (AI) refers to systems that mimic human intelligence, including:

Making decisions via machine learning
Understanding language through NLP
Recognizing images with computer vision
Learning from new data automatically

Common Types of AI in Development:

Machine Learning (ML): Learns from structured and unstructured data
Deep Learning: Neural networks for pattern recognition
Natural Language Processing (NLP): Understands/generates human language
Computer Vision: Recognizes and interprets images

Why Package and Run Your Own AI Model?

Local model packaging and execution offer full control over your AI workflows. Instead of relying on external APIs, you can run models directly on your machine — unlocking:

Faster inference with local compute (no latency from API calls)
Greater privacy by keeping data and prompts on your own hardware
Customization through packaging and versioning your own models
Seamless CI/CD integration with tools like Docker and GitHub Actions
Offline capabilities for edge use cases or constrained environments

Platforms like Docker and Hugging Face make cutting-edge AI models instantly accessible without building from scratch. Running them locally means lower latency, better privacy, and faster iteration.

Real-World Use Cases for AI

Chatbots & Virtual Assistants: Automate support (e.g., ChatGPT, Alexa)
Generative AI: Create text, art, music (e.g., Midjourney, Lensa)
Dev Tools: Autocomplete and debug code (e.g., GitHub Copilot)
Retail Intelligence: Recommend products based on behavior
Medical Imaging: Analyze scans for faster diagnosis

How to Package and Run AI Models Locally with Docker Model Runner

Prerequisites:

Docker Desktop 4.40+ installed
Experimental features and Model Runner enabled in Docker Desktop settings
(Recommended) Windows 11 with NVIDIA GPU or Mac with Apple Silicon
Internet access for downloading models from Docker Hub or Hugging Face

Step 0 — Enable Docker Model Runner

Open Docker Desktop

Go to Settings → Features in development

Under the Experimental features tab, enable Access experimental features

Click Apply and restart

Quit and reopen Docker Desktop to ensure changes take effect

Reopen Settings → Features in development

Switch to the Beta tab and check Enable Docker Model Runner

(Optional) Enable host-side TCP support to access the API from localhost

Once enabled, you can use the docker model CLI and manage models in the Models tab.

Screenshot of Docker Desktop’s Features in development tab with Docker Model Runner and Dev Environments enabled.

Step 1: Pull a Model

From Docker Hub:

docker model pull ai/smollm2

Or from Hugging Face (GGUF format):

docker model pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF

Note: Only GGUF models are supported. GGUF (GPT-style General Use Format) is a lightweight binary file format designed for efficient local inference, especially with CPU-optimized runtimes like llama.cpp. It includes the model weights, tokenizer, and metadata all in one place, making it ideal for packaging and distributing LLMs in containerized environments.

Step 2: Tag and Push to Local Registry (Optional)

If you want to push models to a private or local registry:

Tag model with your registry’s address:

docker model tag hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF localhost:5000/foobar

Run a local Docker registry:

docker run -d -p 6000:5000 --name registry registry:2

Push the model to the local registry:

docker model push localhost:6000/foobar

Check your local models with:

docker model list

Step 3: Run the Model

Run a prompt (one-shot)

docker model run ai/smollm2 "What is Docker?"

Interactive chat mode

docker model run ai/smollm2

Note: Models are loaded into memory on demand and unloaded after 5 minutes of inactivity.

Step 4: Test via OpenAI-Compatible API

To call the model from the host:

Enable TCP host access for Model Runner (via Docker Desktop GUI or CLI)

Screenshot of Docker Desktop’s Features in development tab showing host-side TCP support enabled for Docker Model Runner.

docker desktop enable model-runner --tcp 12434

Send a prompt using the OpenAI-compatible chat endpoint:

curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Tell me about the fall of Rome."}
    ]
  }'

Note: No API key required — this runs locally and securely on your machine.

Step 5: Package Your Own Model

You can package your own pre-trained GGUF model as a Docker-compatible artifact if you already have a .gguf file — such as one downloaded from Hugging Face or converted using tools like llama.cpp.

Note: This guide assumes you already have a .gguf model file. It does not cover how to train or convert models to GGUF.

docker model package \
  --gguf "$(pwd)/model.gguf" \
  --license "$(pwd)/LICENSE.txt" \
  --push registry.example.com/ai/custom-llm:v1

This is ideal for custom-trained or private models. You can now pull it like any other model:

docker model pull registry.example.com/ai/custom-llm:v1

Step 6: Optimize & Iterate

Use docker model logs to monitor model usage and debug issues
Set up CI/CD to automate pulls, scans, and packaging
Track model lineage and training versions to ensure consistency
Use semantic versioning (:v1, :2025-05, etc.) instead of latest when packaging custom models
Only one model can be loaded at a time; requesting a new model will unload the previous one.

Compose Integration (Optional)

Docker Compose v2.35+ (included in Docker Desktop 4.41+) introduces support for AI model services using a new provider.type: model. You can define models directly in your compose.yml and reference them in app services using depends_on.

During docker compose up, Docker Model Runner automatically pulls the model and starts it on the host system, then injects connection details into dependent services using environment variables such as MY_MODEL_URL and MY_MODEL_MODEL, where MY_MODEL matches the name of the model service.

This enables seamless multi-container AI applications — with zero extra glue code. Learn more.

Navigating AI Development Challenges

Latency: Use quantized GGUF models
Security: Never run unknown models; validate sources and attach licenses
Compliance: Mask PII, respect data consent
Costs: Run locally to avoid cloud compute bills

Best Practices

Prefer GGUF models for optimal CPU inference
Use the --license flag when packaging custom models to ensure compliance
Use versioned tags (e.g., :v1, :2025-05) instead of latest
Monitor model logs using docker model logs
Validate model sources before pulling or packaging
Only pull models from trusted sources (e.g., Docker Hub’s ai/ namespace or verified Hugging Face repos).
Review the license and usage terms for each model before packaging or deploying.

The Road Ahead

Support for Retrieval-Augmented Generation (RAG)
Expanded multimodal support (text + images, video, audio)
LLMs as services in Docker Compose (Requires Docker Compose v2.35+)
More granular Model Dashboard features in Docker Desktop
Secure packaging and deployment pipelines for private AI models

Docker Model Runner lets DevOps teams treat models like any other artifact — pulled, tagged, versioned, tested, and deployed.

Final Thoughts

You don’t need a GPU cluster or external API to build AI apps. Learn more and explore everything you can do with Docker Model Runner:

Pull prebuilt models from Docker Hub or Hugging Face
Run them locally using the CLI, API, or Docker Desktop’s Model tab
Package and push your own models as OCI artifacts
Integrate with your CI/CD pipelines securely

You can also find other helpful information to get started at:

You’re not just deploying containers — you’re delivering intelligence.

Learn more

Read our quickstart guide to Docker Model Runner.
Find documentation for Model Runner.
Subscribe to the Docker Navigator Newsletter.
New to Docker? Create an account.
Have questions? The Docker community is here to help.

What is Agentic AI?

Collabnix

Collabnix Team

11 juin 2025 à 09:37

So you’ve probably heard the buzz about “Agentic AI” floating around tech circles lately, right? Maybe you’re wondering if it’s just another fancy buzzword or if there’s actually something revolutionary happening here. Well, let me tell you – this isn’t just hype. We’re looking at what might be the biggest shift in how AI works […]

Agentic AI Trends 2025: The Complete Guide to Autonomous Intelligence Revolution

Collabnix

Collabnix Team

8 juin 2025 à 18:33

Discover the top agentic AI trends 2025 that will transform business operations. From multi-agent systems to enterprise deployment strategies - get expert insights now.

Testcontainers Tutorial: Docker Model Runner Guide

Collabnix

Collabnix Team

6 juin 2025 à 06:17

Testcontainers Tutorial: Docker Model Runner Guide

What is the Difference Between Generative AI and Agentic AI? A Complete Guide

Collabnix

Collabnix Team

4 juin 2025 à 18:00

As artificial intelligence continues to transform industries and reshape how we work, two key terms have emerged that often confuse both technical professionals and business leaders: generative AI and agentic AI. While these technologies may seem similar on the surface, they serve fundamentally different purposes and operate in distinct ways. Understanding the difference between generative […]

What is Agentic AI? A Deep Dive into MCP and the Modern Agent Ecosystem

Collabnix

Collabnix Team

4 juin 2025 à 03:39

The artificial intelligence landscape is undergoing a fundamental transformation. While traditional AI systems excel at responding to prompts and generating content, a new paradigm is emerging: Agentic AI. These systems don’t just respond—they reason, plan, and act autonomously to achieve complex objectives. At the heart of this revolution lies groundbreaking infrastructure like the Model Context […]

Ollama vs Docker Model Runner: 5 Key Reasons to Switch

Collabnix

Collabnix Team

5 mai 2025 à 06:48

Ollama vs Docker Model Runner: Key Differences Explained In recent months, the LLM deployment landscape has been evolving rapidly, with users experiencing frustration with some existing solutions. A Reddit thread titled “How to move on from Ollama?” highlights growing discontent with Ollama’s performance and reliability issues. As Docker enters this space with Model Runner, it’s […]

Securing the Model Context Protocol: A Comprehensive Guide

Collabnix

Collabnix Team

1 mai 2025 à 14:29

The Model Context Protocol (MCP) represents a significant advancement in AI capabilities, offering a universal interface that connects AI models directly to various data sources and tools. Launched by Anthropic in November 2024, MCP standardizes how applications provide context to LLMs, functioning as a “USB-C port for AI applications.” While MCP offers tremendous potential for […]

Top 10 Interesting MCP Servers You Should Know About in 2025

Collabnix

Collabnix Team

1 mai 2025 à 12:00

Model Control Protocol (MCP) servers represent a significant advancement in the world of AI and Large Language Models (LLMs). These specialized interfaces enable LLMs like Claude, ChatGPT, and others to interact with external tools, APIs, and services, dramatically extending their capabilities beyond simple text generation. Think of MCP servers as bridges that connect the reasoning […]

Running AI Agents Locally with Ollama and AutoGen

Collabnix

Adesoji Alu

17 avril 2025 à 21:19

Have you ever wished you could build smart AI agents without shipping your data to third-party servers? What if I told you you can run powerful language models like Llama3 directly on your machine while building sophisticated AI agent systems? Let’s roll up our sleeves and create a self-contained AI development environment using Ollama and […]

Vue lecture

Introduction

What is AI in Development?

Common Types of AI in Development:

Why Package and Run Your Own AI Model?

Real-World Use Cases for AI

How to Package and Run AI Models Locally with Docker Model Runner

Prerequisites:

Step 0 — Enable Docker Model Runner

Step 1: Pull a Model

Step 2: Tag and Push to Local Registry (Optional)

Step 3: Run the Model

Step 4: Test via OpenAI-Compatible API

Step 5: Package Your Own Model

Step 6: Optimize & Iterate

Compose Integration (Optional)

Navigating AI Development Challenges

Best Practices

The Road Ahead

Final Thoughts

Learn more