Vue lecture

Il y a de nouveaux articles disponibles, cliquez pour rafraîchir la page.

How to Build, Run, and Package AI Models Locally with Docker Model Runner

Introduction

As a Senior DevOps Engineer and Docker Captain, I’ve helped build AI systems for everything from retail personalization to medical imaging. One truth stands out: AI capabilities are core to modern infrastructure.

This guide will show you how to run and package local AI models with Docker Model Runner — a lightweight, developer-friendly tool for working with AI models pulled from Docker Hub or Hugging Face. You’ll learn how to run models in the CLI or via API, publish your own model artifacts, and do it all without setting up Python environments or web servers.

What is AI in Development?

Artificial Intelligence (AI) refers to systems that mimic human intelligence, including:

  • Making decisions via machine learning
  • Understanding language through NLP
  • Recognizing images with computer vision
  • Learning from new data automatically

Common Types of AI in Development:

  • Machine Learning (ML): Learns from structured and unstructured data
  • Deep Learning: Neural networks for pattern recognition
  • Natural Language Processing (NLP): Understands/generates human language
  • Computer Vision: Recognizes and interprets images

Why Package and Run Your Own AI Model?

Local model packaging and execution offer full control over your AI workflows. Instead of relying on external APIs, you can run models directly on your machine — unlocking:

  • Faster inference with local compute (no latency from API calls)
  • Greater privacy by keeping data and prompts on your own hardware
  • Customization through packaging and versioning your own models
  • Seamless CI/CD integration with tools like Docker and GitHub Actions
  • Offline capabilities for edge use cases or constrained environments

Platforms like Docker and Hugging Face make cutting-edge AI models instantly accessible without building from scratch. Running them locally means lower latency, better privacy, and faster iteration.

Real-World Use Cases for AI

  • Chatbots & Virtual Assistants: Automate support (e.g., ChatGPT, Alexa)
  • Generative AI: Create text, art, music (e.g., Midjourney, Lensa)
  • Dev Tools: Autocomplete and debug code (e.g., GitHub Copilot)
  • Retail Intelligence: Recommend products based on behavior
  • Medical Imaging: Analyze scans for faster diagnosis

How to Package and Run AI Models Locally with Docker Model Runner

Prerequisites:

Step 0 — Enable Docker Model Runner

Open Docker Desktop

Go to Settings → Features in development

Under the Experimental features tab, enable Access experimental features

Click Apply and restart

Quit and reopen Docker Desktop to ensure changes take effect

Reopen Settings → Features in development

Switch to the Beta tab and check Enable Docker Model Runner

(Optional) Enable host-side TCP support to access the API from localhost

Once enabled, you can use the docker model CLI and manage models in the Models tab.

Screenshot of Docker Desktop’s Features in development tab with Docker Model Runner and Dev Environments enabled.

Screenshot of Docker Desktop’s Features in development tab with Docker Model Runner and Dev Environments enabled.

Step 1: Pull a Model

From Docker Hub:

docker model pull ai/smollm2

Or from Hugging Face (GGUF format):

docker model pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF

Note: Only GGUF models are supported. GGUF (GPT-style General Use Format) is a lightweight binary file format designed for efficient local inference, especially with CPU-optimized runtimes like llama.cpp. It includes the model weights, tokenizer, and metadata all in one place, making it ideal for packaging and distributing LLMs in containerized environments.

Step 2: Tag and Push to Local Registry (Optional)

If you want to push models to a private or local registry:

Tag model with your registry’s address:

docker model tag hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF localhost:5000/foobar

Run a local Docker registry:

docker run -d -p 6000:5000 --name registry registry:2

Push the model to the local registry:

docker model push localhost:6000/foobar

Check your local models with:

docker model list

Step 3: Run the Model

Run a prompt (one-shot)

docker model run ai/smollm2 "What is Docker?"

Interactive chat mode

docker model run ai/smollm2

Note: Models are loaded into memory on demand and unloaded after 5 minutes of inactivity.

Step 4: Test via OpenAI-Compatible API

To call the model from the host:

  1. Enable TCP host access for Model Runner (via Docker Desktop GUI or CLI)
Screenshot of Docker Desktop’s Features in development tab showing host-side TCP support enabled for Docker Model Runner.

Screenshot of Docker Desktop’s Features in development tab showing host-side TCP support enabled for Docker Model Runner.

docker desktop enable model-runner --tcp 12434
  1. Send a prompt using the OpenAI-compatible chat endpoint:
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Tell me about the fall of Rome."}
    ]
  }'

Note: No API key required — this runs locally and securely on your machine.

Step 5: Package Your Own Model

You can package your own pre-trained GGUF model as a Docker-compatible artifact if you already have a .gguf file — such as one downloaded from Hugging Face or converted using tools like llama.cpp.

Note: This guide assumes you already have a .gguf model file. It does not cover how to train or convert models to GGUF.

docker model package \
  --gguf "$(pwd)/model.gguf" \
  --license "$(pwd)/LICENSE.txt" \
  --push registry.example.com/ai/custom-llm:v1

This is ideal for custom-trained or private models. You can now pull it like any other model:

docker model pull registry.example.com/ai/custom-llm:v1

Step 6: Optimize & Iterate

  • Use docker model logs to monitor model usage and debug issues
  • Set up CI/CD to automate pulls, scans, and packaging
  • Track model lineage and training versions to ensure consistency
  • Use semantic versioning (:v1, :2025-05, etc.) instead of latest when packaging custom models
  • Only one model can be loaded at a time; requesting a new model will unload the previous one.

Compose Integration (Optional)

Docker Compose v2.35+ (included in Docker Desktop 4.41+) introduces support for AI model services using a new provider.type: model. You can define models directly in your compose.yml and reference them in app services using depends_on.

During docker compose up, Docker Model Runner automatically pulls the model and starts it on the host system, then injects connection details into dependent services using environment variables such as MY_MODEL_URL and MY_MODEL_MODEL, where MY_MODEL matches the name of the model service.

This enables seamless multi-container AI applications — with zero extra glue code. Learn more.

Navigating AI Development Challenges

  • Latency: Use quantized GGUF models
  • Security: Never run unknown models; validate sources and attach licenses
  • Compliance: Mask PII, respect data consent
  • Costs: Run locally to avoid cloud compute bills

Best Practices

  • Prefer GGUF models for optimal CPU inference
  • Use the --license flag when packaging custom models to ensure compliance
  • Use versioned tags (e.g., :v1, :2025-05) instead of latest
  • Monitor model logs using docker model logs
  • Validate model sources before pulling or packaging
  • Only pull models from trusted sources (e.g., Docker Hub’s ai/ namespace or verified Hugging Face repos).
  • Review the license and usage terms for each model before packaging or deploying.

The Road Ahead

  • Support for Retrieval-Augmented Generation (RAG)
  • Expanded multimodal support (text + images, video, audio)
  • LLMs as services in Docker Compose (Requires Docker Compose v2.35+)
  • More granular Model Dashboard features in Docker Desktop
  • Secure packaging and deployment pipelines for private AI models

Docker Model Runner lets DevOps teams treat models like any other artifact — pulled, tagged, versioned, tested, and deployed.

Final Thoughts

You don’t need a GPU cluster or external API to build AI apps. Learn more and explore everything you can do with Docker Model Runner:

  • Pull prebuilt models from Docker Hub or Hugging Face
  • Run them locally using the CLI, API, or Docker Desktop’s Model tab
  • Package and push your own models as OCI artifacts
  • Integrate with your CI/CD pipelines securely

You can also find other helpful information to get started at: 

You’re not just deploying containers — you’re delivering intelligence.

Learn more

Introducing the Beta Launch of Docker’s AI Agent, Transforming Development Experiences

For years, Docker has been an essential partner for developers, empowering everyone from small startups to the world’s largest enterprises. Today, AI is transforming organizations across industries, creating opportunities for those who embrace it to gain a competitive edge. Yet, for many teams, the question of where to start and how to effectively integrate AI into daily workflows remains a challenge. True to its developer-first philosophy, Docker is here to bridge that gap.

We’re thrilled to introduce the beta launch of Docker AI Agent (also known as Project: Gordon)—an embedded, context-aware assistant seamlessly integrated into the Docker suite. Available within Docker Desktop and CLI, this innovative agent delivers tailored guidance for tasks like building and running containers, authoring Dockerfiles and Docker-specific troubleshooting—eliminating disruptive context-switching. By addressing challenges precisely when and where developers encounter them, Docker AI Agent ensures a smoother, more productive workflow.

As the AI Agent evolves, enterprise teams will unlock even greater capabilities, including customizable features that streamline collaboration, enhance security, and help developers work smarter. With the Docker AI Agent, we’re making Docker even easier and more effective to use than it has ever been — AI accessible, actionable, and indispensable for developers everywhere.

How Docker’s AI Agent Simplifies Development Challenges  

Developing in today’s fast-paced tech landscape is increasingly complex, with developers having to learn an ever growing number of tools, libraries and technologies.

By integrating a GenAI Agent into Docker’s ecosystem, we aim to provide developers with a powerful assistant that can help them navigate these complexities. 

The Docker AI Agent helps developers accelerate their work, providing real-time assistance, actionable suggestions, and automations that remove many of the manual tasks associated with containerized application development. Delivering the most helpful, expert-level guidance on Docker-related questions and technologies, Gordon serves as a powerful support system for developers, meeting them exactly where they are in their workflow. 

If you’re a developer who favors graphical interfaces, Docker Desktop AI UI will help you navigate container running issues, image size management and more generic Dockerfile oriented questions. If you’re a command line interface user, you can call, and share context with the agent directly in your favorite terminal.

So what can Docker’s AI Agent do today? 

We’re delivering an expert assistant for every Docker-related concept and technology, whether it’s getting started, optimizing an existing Dockerfile or Compose file, or understanding Docker technologies in general. With Docker AI Agent, you also have the ability to delegate actions while maintaining full control and review over the process.

A first example, if you want to run a container from an image, our agent can suggest the most appropriate docker run command tailored to your needs. This eliminates the guesswork or the need to search Docker Hub, saving you time and effort. The result combines a custom prompt, live data from Docker Hub, Docker container expertise and private usage insights, unique to Docker Inc.

blog DD Gordon Chat Light

We’ve intentionally designed the output to be concise and actionable, avoiding the overwhelming verbosity often associated with AI-generated commands. We also provide sources for most of the AI agent recommendations, pointing directly to our documentation website. Our goal is to continuously refine this experience, ensuring that Docker’s AI Agent always provides the best possible command based on your specific local context.

Beside helping you run containers, the Docker AI Agent can today:

  • Explain, Rate and optimize Dockerfile leveraging the latest version of Docker.
  • Help you run containers in an effective, concise way, leveraging the local context (checking port already used or volumes).
  • Answers any docker related questions with the latest version of our documentations for our whole tool suite, and as such is able to answer any kind of questions on Docker tools and technologies.
  • Containerize a software project helping you run your software in containers.
  • Helps on Docker related Github Actions.
  • Suggest fix when a container is failing to start in Docker Desktop.
  • Provides contextual help for containers, images and volumes.
  • Can augment its answer with per directory MCP servers (see doc).
Blog DD Terminal new 1524x1140 1

For the node expert, in the above screenshot the AI is recommending node 20.12 which is not the latest version but the one the AI found in the package.json.

With every future version of Docker Desktop and thanks to the feedback that you provide, the agent will be able to do so much more in the future.

How can you try Docker AI Agent? 

This first beta release of Docker AI Agent is now progressively available for all signed-in users*. By default, the Docker AI agent is disabled. To enable it you will need to follow the steps below. Here’s how to get started:

  1. Install or update to the latest release of Docker Desktop 4.38
  2. Enable Docker AI into Docker Desktop Settings -> Features in Development
  3. For the best experience, ensure the Docker terminal is enabled by going to Settings → General
  4. Apply Changes 
blog DD Gordon Settings Dark

* If you’re a business subscriber, your Administrator needs to enable the Docker AI Agent for the organization first. This can be done through the Settings Management. If this is your case, feel free to contact us through the support  for further information.

Docker Agent’s Vision for 2025

By 2025, we aim to expand the agent’s capabilities with features like customizing your experience with more context from your registry, enhanced GitHub Copilot integrations, and deeper presence across the development tools you already use. With regular updates and your feedback, Docker AI Agent is being built to become an indispensable part of your development process.

For now this beta is the start of an exciting evolution in how we approach developer productivity. Stay tuned for more updates as we continue to shape a smarter, more streamlined way to build, secure, and ship applications. We want to hear from you, if you like or want more information you can contact us.

Learn more

❌