Vue lecture

Il y a de nouveaux articles disponibles, cliquez pour rafraîchir la page.

Docker State of App Dev: AI

AI is changing software development — but not how you think

The hype is real, but so are the challenges. Here’s what developers, teams, and tech leaders need to know about AI’s uneven, evolving role in software.

Rumors of AI’s pervasiveness in software development have been greatly exaggerated. A look under the hood shows adoption is far from uniform. While some dev teams are embedding AI into daily workflows, others are still kicking the tires or sitting it out entirely. Real-world usage reveals a nuanced picture shaped by industry, role, and data readiness.


Here are six key insights into AI tools and development from Docker’s second annual State of Application Development Survey, based on responses from over 4,500 industry professionals.

1. How are people using AI?

Right off the bat, we saw a split between two classes of respondents: 

  • Those who use AI tools like ChatGPT and GitHub Copilot for everyday work-related tasks such as writing, documentation, and research 
  • Those who build applications with AI/ML functionality

2. IT leads the way in AI tool usage and app development 

Only about 1 in 4 respondents (22%) report using AI tools for work. But there’s a huge spread across industries — from 1% to 84%. Among the top AI users are IT/SaaS folks (76%). And because we surveyed over three times more users this year than for last year’s report, the snapshot covers a broader spectrum of industries beyond just those focused on IT.

Underscoring tech’s embrace of AI: 34% of IT/SaaS respondents say they develop AI/ML apps, compared to just 8% outside that bubble.

And strategy reflects this gulf. Only 16% of companies outside IT report having a real AI strategy. Within tech, the number soars to 73%. Translation: AI is gaining traction, but it’s concentrated in certain industries — at least for now.

3. AI tools are overhyped — and incredibly useful

Here’s the paradox: 64% of users say AI tools make work easier, yet almost as many (59%) think AI tools are overhyped. The hype may be loud, but utility is speaking louder, especially for those who’ve stuck with it. In fact, 65% of current users say they’re using AI more than they did a year ago, and that same percentage use it every day.

This tracks roughly with findings in our 2024 report, in which 61% of respondents agreed AI made their job easier, even as 45% reported feeling AI was overhyped. And 65% agreed that AI was a positive option.

4. AI tool usage is up — and ChatGPT leads the pack

No surprises here. The most-used AI-powered tools are the same as in our 2024 survey — ChatGPT (especially among full-stack developers), GitHub Copilot, and Google Gemini. 

But usage this year far outstrips what users reported last year, with 80% selecting ChatGPT (versus 46% in our 2024 report), 53% Copilot (versus 30%), and 23% Gemini (versus 19%).

5. Developers don’t use AI the same way

The top overall use case is coding. Beyond that, it depends.

  • Seasoned devs turn to AI to write documentation and tests but use it sparingly. 
  • DevOps engineers use it for CLI help and writing docs.
  • Software devs tap AI to write tests and do research.

And not all devs lean on AI equally. Seasoned devs are the least reliant, most often rating themselves as not at all dependent (0/10), while DevOps engineers rate their dependence at 7/10. Software devs are somewhere in the middle, usually landing at a 5/10 on the dependence scale. For comparison, the overall average dependence on AI in our 2024 survey was about 4 out of 10 (all users).

Looking ahead, it will be interesting to see how dependence on AI shifts and becomes further integrated by role. 

6. Data is the bottleneck no one talks about

The use of AI/ML in app development is a new and rapidly growing phenomenon that, not surprisingly, brings new pain points. For teams building AI/ML apps, one headache stands out: data prep. A full 24% of AI builders say they’re not confident in how to identify or prepare the right datasets.

Even with the right intent and tools, teams hit friction where it hurts productivity most — upfront.

Bottom line:
We’re in the early stages of the next tech revolution — complex, fast-evolving, and rife of challenges. Developers are meeting it head-on, quickly ramping up on new tools and architectures, and driving innovation at every layer of the stack. And Docker is right there with them, empowering innovation every step of the way.

Building RAG Applications with Ollama and Python: Complete 2025 Tutorial

Retrieval-Augmented Generation (RAG) has revolutionized how we build intelligent applications that can access and reason over external knowledge bases. In this comprehensive tutorial, we’ll explore how to build production-ready RAG applications using Ollama and Python, leveraging the latest techniques and best practices for 2025. What is RAG and Why Use Ollama? Retrieval-Augmented Generation combines the […]

Portainer + Talos Linux

Portainer now manages Talos Linux Kubernetes

🍿 YouTube Video

Portainer + Talos Linux

At KubeCon London, I took a workshop on using Portainer to deploy and manage Talos Linux and Kubernetes together. Portainer now manages Talos Linux, and I got to spend some time with these two tools for a one-stop shop of deploying a very unique Talos OS (from Sidero Labs, Inc.) and setting up a Kubernetes cluster with just a few clicks inside Portainer.

The Portainer Workshop: https://academy.portainer.io/yachtops/

Talos Linux: https://www.talos.dev/

Get CNDO Weekly

Cloud Native DevOps education. Bestselling courses, live streams, and podcasts on DevOps, platform engineering, and containers, from a Docker Captain and Cloud Native Ambassador.

Email sent! Check your inbox to complete your signup.

No spam. Unsubscribe anytime.

👀 In case you missed a newsletter, read them at bret.news


Agentic AI in Customer Service: The Complete Technical Implementation Guide for 2025

Let’s get one thing straight—if you’re still deploying rule-based chatbots in 2025, you’re essentially bringing a flip phone to a smartphone convention. I’ve been in the trenches with AI implementations for years, and I can tell you that the shift from reactive customer service bots to autonomous agentic AI isn’t just evolutionary—it’s revolutionary. And frankly, […]

10 Agentic AI Tools That Will Replace ChatGPT in 2025

Stop settling for AI that just answers questions. The future belongs to AI that actually does the work. If you’re still using ChatGPT like it’s 2023, you’re about to be left behind. While you’ve been asking ChatGPT to write emails, a revolutionary shift is happening in the AI world—and it’s called Agentic AI. Here’s the […]

Ollama vs ChatGPT 2025: Complete Technical Comparison Guide

Ollama vs ChatGPT 2025: A Comprehensive Comparison A  comprehensive technical analysis comparing local LLM deployment via Ollama against cloud-based ChatGPT APIs, including performance benchmarks, cost analysis, and implementation strategies The artificial intelligence landscape has reached a critical inflection point in 2025. Organizations worldwide face a fundamental strategic decision that will define their AI capabilities for […]

Best Ollama Models 2025: Performance Comparison Guide

Top Picks for Best Ollama Models 2025 A comprehensive technical analysis of the most powerful local language models available through Ollama, including benchmarks, implementation guides, and optimization strategies Introduction to Ollama’s 2025 Ecosystem The landscape of local language model deployment has dramatically evolved in 2025, with Ollama establishing itself as the de facto standard for […]

Docker Multi-Stage Builds for Python Developers: A Complete Guide

As a Python developer, you’ve probably experienced the pain of slow Docker builds, bloated images filled with build tools, and the frustration of waiting 10+ minutes for a simple code change to rebuild. Docker multi-stage builds solve these problems elegantly, and they’re particularly powerful for Python applications. In this comprehensive guide, we’ll explore how to […]

Optimize Your AI Containers with Docker Multi-Stage Builds: A Complete Guide

If you’re developing AI applications, you’ve probably experienced the frustration of slow Docker builds, bloated container images, and inefficient caching. Every time you tweak your model code, you’re stuck waiting for dependencies to reinstall, and your production images are loaded with unnecessary build tools. Docker multi-stage builds solve these problems elegantly, and they’re particularly powerful […]

Docker State of App Dev: Security

Security is a team sport: why everyone owns it now

Six security takeaways from Docker’s 2025 State of Application Development Report.

In the evolving world of software development, one thing is clear — security is no longer a siloed specialty. It’s a team sport, especially when vulnerabilities strike. That’s one of several key security findings in the 2025 Docker State of Application Development Survey.

Here’s what else we learned about security from our second annual report, which was based on an online survey of over 4,500 industry professionals.

1. Security isn’t someone else’s problem

Forget the myth that only “security people” handle security. Across orgs big and small, roles are blending. If you’re writing code, you’re in the security game. As one respondent put it, “We don’t have dedicated teams — we all do it.” According to the survey, just 1 in 5 organizations outsource security. And it’s top of mind at most others: only 1% of respondents say security is not a concern at their organization.

One exception to this trend: In larger organizations (50 or more employees), software security is more likely to be the exclusive domain of security engineers, with other types of engineers playing less of a role.

2. Everyone thinks they’re in charge of security

Team leads from multiple corners report that they’re the ones focused on security. Seasoned developers are as likely to zero in on it as are mid-career security engineers. And they’re both right. Security has become woven into every function — devs, leads, and ops alike.

3. When vulnerabilities hit, it’s all hands on deck

No turf wars here. When scan alerts go off, everyone pitches in — whether it’s security engineers helping experienced devs to decode scan results, engineering managers overseeing the incident, or DevOps engineers filling in where needed.

Fixing vulnerabilities is also a major time suck. Among security-related tasks that respondents routinely deal with, it was the most selected option across all roles. Worth noting: Last year’s State of Application Development Survey identified security/vulnerability remediation tools as a key area where better tools were needed in the development process.

4. Security isn’t the bottleneck — planning and execution are

Surprisingly, security doesn’t crack the top 10 issues holding teams back. Planning and execution-type activities are bigger sticking points. Translation? Security is better integrated into the workflow than many give it credit for. 

5. Shift-left is yesterday’s news

The once-pervasive mantra of “shift security left” is now only the 9th most important trend. Has the shift left already happened? Is AI and cloud complexity drowning it out? Or is this further evidence that security is, by necessity, shifting everywhere?

Again, perhaps security tools have gotten better, making it easier to shift left. (Our 2024 survey identified the shift-left approach as a possible source of frustration for developers and an area where more effective tools could make a difference.) Or perhaps there’s simply broader acceptance of the shift-left trend.

6. Shifting security left may not be the buzziest trend, but it’s still influential

The impact of shifting security left pales beside more dominant trends such as Generative AI and infrastructure as code. But it’s still a strong influence for developers in leadership roles. 

Bottom line: Security is no longer a roadblock; it’s a reflex. Teams aren’t asking, “Who owns security?” — they’re asking, “How can we all do it better?”

Why Docker Chose OCI Artifacts for AI Model Packaging

As AI development accelerates, developers need tools that let them move fast without having to reinvent their workflows. Docker Model Runner introduces a new specification for packaging large language models (LLMs) as OCI artifacts — a format developers already know and trust. It brings model sharing into the same workflows used for containers, with support for OCI registries like Docker Hub.

By using OCI artifacts, teams can skip custom toolchains and work with models the same way they do with container images. In this post, we’ll share why we chose OCI artifacts, how the format works, and what it unlocks for GenAI developers.

Why OCI artifacts?

One of Docker’s goals is to make genAI application development accessible to a larger community of developers. We can do this by helping models become first-class citizens within the cloud native ecosystem. 

When models are packaged as OCI artifacts, developers can get started with AI development without the need to learn, vet, and adopt a new distribution toolchain. Instead, developers can discover new models on Hub and distribute variants publicly or privately via existing OCI registries, just like they do with container images today! For teams using Docker Hub, enterprise features like Registry Access Management (RAM) provide policy-based controls and guardrails to help enforce secure, consistent access.

Packaging models as OCI artifacts also paves the way for deeper integration between inference runners like Docker Model Runner and existing tools like containerd and Kubernetes.

Understanding OCI images and artifacts

Many of these advantages apply equally to OCI images and OCI artifacts. To understand why images can be a less optimal fit for LLMs and why a custom artifact specification conveys additional advantages, it helps to first revisit the components of an OCI image and its generic cousin, the OCI artifact.

What are OCI images?

OCI images are a standardized format for container images, defined by the Open Container Initiative (OCI). They package everything needed to run a container: metadata, configuration, and filesystem layers.

An OCI image is composed of three main components:

  • An image manifest – a JSON file containing references to an image configuration and a set of filesystem layers.
  • An image configuration – a JSON file containing the layer ordering and OCI runtime configuration.
  • One or more layers – TAR archives (typically compressed), containing filesystem changesets that, applied in order, produce a container root filesystem.

Below is an example manifest from the busybox image:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "digest": "sha256:7b4721e214600044496305a20ca3902677e572127d4d976ed0e54da0137c243a",
    "size": 477
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:189fdd1508372905e80cc3edcdb56cdc4fa216aebef6f332dd3cba6e300238ea",
      "size": 1844697
    }
  ],
  "annotations": {
    "org.opencontainers.image.url": "https://github.com/docker-library/busybox",
    "org.opencontainers.image.version": "1.37.0-glibc"
  }
}

Because the image manifest contains content-addressable references to all image components, the hash of the manifest file, otherwise known as the image digest, can be used to uniquely identify an image.

What are OCI artifacts?

OCI artifacts offer a way to extend the OCI image format to support distributing content beyond container images. They follow the same structure: a manifest, a config file, and one or more layers. 

The artifact guidance in the OCI image specifications describes how this same basic structure (manifest + config + layers) can be used to distribute other types of content.

The artifact type is designated by the config file’s media type. For example, in the manifest below config.mediaType is set to application/vnd.cncf.helm.config.v1+json. This indicates to registries and other tooling that the artifact is a Helm chart and should be parsed accordingly.

{
  "schemaVersion": 2,
  "config": {
    "mediaType": "application/vnd.cncf.helm.config.v1+json",
    "digest": "sha256:8ec7c0f2f6860037c19b54c3cfbab48d9b4b21b485a93d87b64690fdb68c2111",
    "size": 117
  },
  "layers": [
    {
      "mediaType": "application/vnd.cncf.helm.chart.content.v1.tar+gzip",
      "digest": "sha256:1b251d38cfe948dfc0a5745b7af5ca574ecb61e52aed10b19039db39af6e1617",
      "size": 2487
    }
  ]
}

In an OCI artifact, layers may be of any media type and are not restricted to filesystem changesets. Whoever defines the artifact type defines the supported layer types and determines how the contents should be used and interpreted.

Using container images vs. custom artifact types

With this background in mind, while we could have packaged LLMs as container images, defining a custom type has some important advantages:

  1. A custom artifact type allows us to define a domain-specific config schema. Programmatic access to key metadata provides a support structure for an ecosystem of useful tools specifically tailored to AI use-cases.
  2. A custom artifact type allows us to package content in formats other than compressed TAR archives, thus avoiding performance issues that arise when LLMs are packaged as image layers. For more details on how model layers are different and why it matters, see the Layers section below.
  3. A custom type ensures that models are packaged and distributed separately from inference engines. This separation is important because it allows users to consume the variant of the inference engine optimized for their system without requiring every model to be packaged in combination with every engine.
  4. A custom artifact type frees us from the expectations that typically accompany a container image. Standalone models are not executable without an inference engine. Packaging as a custom type makes clear that they are not independently runnable, thus avoiding confusion and unexpected errors.

Docker Model Artifacts

Now that we understand the high-level goals, let’s dig deeper into the details of the format.

Media Types

The model specification defines the following media types:

  • application/vnd.docker.ai.model.config.v0.1+json – identifies a model config JSON file. This value in config.mediaType in a manifest identifies an artifact as a Docker model with config file adhering to v0.1 of the specification.
  • application/vnd.docker.ai.gguf.v3 – indicates that a layer contains a model packaged as a GGUF file.
  • application/vnd.docker.ai.license – indicates that a layer contains a plain text software license file.

Expect more media types to be defined in the future as we add runtime configuration, add support for new features like projectors and LoRA adaptors, and expand the supported packaging formats for model files.

Manifest

A model manifest is formatted like an image manifest and distinguished by the config.MediaType. The following example manifest, taken from the ai/gemma3, references a model config JSON and two layers, one containing a GGUF file and the other containing the model’s license.

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.docker.ai.model.config.v0.1+json",
    "size": 372,
    "digest": "sha256:22273fd2f4e6dbaf5b5dae5c5e1064ca7d0ff8877d308eb0faf0e6569be41539"
  },
  "layers": [
    {
      "mediaType": "application/vnd.docker.ai.gguf.v3",
      "size": 2489757856,
      "digest": "sha256:09b370de51ad3bde8c3aea3559a769a59e7772e813667ddbafc96ab2dc1adaa7"
    },
    {
      "mediaType": "application/vnd.docker.ai.license",
      "size": 8346,
      "digest": "sha256:a4b03d96571f0ad98b1253bb134944e508a4e9b9de328909bdc90e3f960823e5"
    }
  ]
}


Model ID

The manifest digest uniquely identifies the model and is used by Docker Model Runner as the model ID.

Model Config JSON

The model configuration is a JSON file that surfaces important metadata about the model, such as size, parameter count, quantization, as well as metadata about the artifact provenance (like the creation timestamp).
The following example comes from the ai/gemma model on Dockerhub:

{
  "config": {
    "format": "gguf",
    "quantization": "IQ2_XXS/Q4_K_M",
    "parameters": "3.88 B",
    "architecture": "gemma3",
    "size": "2.31 GiB"
  },
  "descriptor": {
    "created": "2025-03-26T09:57:32.086694+01:00"
  },
  "rootfs": {
    "type": "rootfs",
    "diff_ids": [
      "sha256:09b370de51ad3bde8c3aea3559a769a59e7772e813667ddbafc96ab2dc1adaa7",
      "sha256:a4b03d96571f0ad98b1253bb134944e508a4e9b9de328909bdc90e3f960823e5"
    ]
  }
}

By defining a domain-specific configuration schema, we allow tools to access and use model metadata cheaply — by fetching and parsing a small JSON file — only fetching the model itself when needed.

For example, a registry frontend like Docker Hub can directly surface this data to users who can, in turn, use it to compare models or select based on system capabilities and requirements. Tooling might use this data to estimate memory requirements for a given model. It could then assist in the selection process by suggesting the best variant that is compatible with the available resources.

Layers

Layers in a model artifact differ from layers within an OCI image in two important respects.

Unlike an image layer, where compression is recommended, model layers are always uncompressed. Because models are large, high-entropy files, compressing them provides a negligible reduction in size, while (un)compressing is time and compute-intensive.

In contrast to a layer in an OCI image, which contains multiple files in an archive, each “layer” in a model artifact must contain a single raw file. This allows runtimes like Docker Model Runner to reduce disk usage on the client machine by storing a single uncompressed copy of the model. This file can then be directly memory mapped by the inference engine at runtime.

The lack of file names, hierarchy, and metadata (e.g. modification time) ensures that identical model files always result in identical reusable layer blobs. This prevents unnecessary duplication, which is particularly important when working with LLMs, given the file size.

You may have noticed that these “layers” are not really filesystem layers at all. They are files, but they do not specify a filesystem. So, how does this work at runtime? When Docker Model runner runs a model, instead of finding the GGUF file by name in a model filesystem, the desired file is identified by its media type (application/vnd.docker.ai.gguf.v3) and fetched from the model store. For more information on the Model Runner architecture, please see the architecture overview in this accompanying blog post.

Distribution

Like OCI images and other OCI artifacts, Docker model artifacts are distributed via registries like Dockerhub, Artifactory, or Azure Container Registry that comply with the OCI distribution specification.

Discovery

Docker Hub

The Docker Hub Gen AI catalog aids in the discovery of popular models. These models are packaged in the format described here and are compatible with Docker Model Runner and any other runtime that supports the OCI specification.

Hugging Face

If you are accustomed to exploring models on Hugging Face, there’s good news! Hugging Face now supports on-demand conversion to the Docker Model Artifact format when you pull from Hugging Face with docker model pull.

What’s Next?

Hopefully, you now have a better understanding of the Docker OCI Model format and how it supports our goal of making AI app development more accessible to developers via familiar workflows and commands. But this version of the artifact format is just the beginning! In the future, you can expect the enhancements to the packaging format to bring this level of accessibility and flexibility to a broader range of use cases. Future versions will support:

  • Additional runtime configuration options like templates, context size, and default parameters. This will allow users to configure models for specific use cases and distribute that config alongside the model, as a single immutable artifact.
  • LoRA adapters, allowing users to extend existing model artifacts with use-case-specific fine-tuning.
  • Multi-modal projectors, enabling users to package multi-modal such as language-and-vision models using LLaVA-style projectors.
  • Model index files that provide a set of models with different parameter count and quantizations, allowing runtimes the best option for the available resources.

In addition to adding features, we are committed to fostering an open ecosystem. Expect:

  • Deeper integrations into containerd for a more native runtime experience.
  • Efforts to harmonize with ModelPack and other model packaging standards to improve interoperability.

These advancements show our ongoing commitment to making the OCI artifact a versatile and flexible way to package and run AI models, delivering the same ease and reliability developers already expect from Docker.

Learn more

Behind the scenes: How we designed Docker Model Runner and what’s next

The last few years have made it clear that AI models will continue to be a fundamental component of many applications. The catch is that they’re also a fundamentally different type of component, with complex software and hardware requirements that don’t (yet) fit neatly into the constraints of container-oriented development lifecycles and architectures. To help address this problem, Docker launched the Docker Model Runner with Docker Desktop 4.40. Since then, we’ve been working aggressively to expand Docker Model Runner with additional OS and hardware support, deeper integration with popular Docker tools, and improvements to both performance and usability.
For those interested in Docker Model Runner and its future, we offer a behind-the-scenes look at its design, development, and roadmap.

Note: Docker Model Runner is really two components: the model runner and the model distribution specification. In this article, we’ll be covering the former, but be sure to check out the companion blog post by Emily Casey for the equally important distribution side of the story.

Design goals

Docker Model Runner’s primary design goal was to allow users to run AI models locally and to access them from both containers and host processes. While that’s simple enough to articulate, it still leaves an enormous design space in which to find a solution. Fortunately, we had some additional constraints: we were a small engineering team, and we had some ambitious timelines. Most importantly, we didn’t want to compromise on UX, even if we couldn’t deliver it all at once. In the end, this motivated design decisions that have so far allowed us to deliver a viable solution while leaving plenty of room for future improvement.





Multiple backends

One thing we knew early on was that we weren’t going to write our own inference engine (Docker’s wheelhouse is containerized development, not low-level inference engines). We’re also big proponents of open-source, and there were just so many great existing solutions! There’s llama.cpp, vLLM, MLX, ONNX, and PyTorch, just to name a few.

Of course, being spoiled for choice can also be a curse — which to choose? The obvious answer was: as many as possible, but not all at once.

We decided to go with llama.cpp for our initial implementation, but we intentionally designed our APIs with an additional, optional path component (the {name} in /engines/{name}) to allow users to take advantage of multiple future backends. We also designed interfaces and stubbed out implementations for other backends to enforce good development hygiene and to avoid becoming tethered to one “initial” implementation.

OpenAI API compatibility

The second design choice we had to make was how to expose inference to consumers in containers. While there was also a fair amount of choice in the inference API space, we found that the OpenAI API standard seemed to offer the best initial tooling compatibility. We were also motivated by the fact that several teams inside Docker were already using this API for various real-world products. While we may support additional APIs in the future, we’ve so far found that this API surface is sufficient for most applications. One gap that we know exists is full compatibility with this API surface, which is something we’re working on iteratively.

This decision also drove our choice of llama.cpp as our initial backend. The llama.cpp project already offered a turnkey option for OpenAI API compatibility through its server implementation. While we had to make some small modifications (e.g. Unix domain socket support), this offered us the fastest path to a solution. We’ve also started contributing these small patches upstream, and we hope to expand our contributions to these projects in the future.

First-class citizenship for models in the Docker API

While the OpenAI API standard was the most ubiquitous option amongst existing tooling, we also knew that we wanted models to be first-class citizens in the Docker Engine API. Models have a fundamentally different execution lifecycle than the processes that typically make up the ENTRYPOINTs of containers, and thus, they don’t fit well under the standard /containers endpoints of the Docker Engine API. However, much like containers, images, networks, and volumes, models are such a fundamental component that they really deserve their own API resource type. This motivated the addition of a set of /models endpoints, closely modeled after the /images endpoints, but separate for reasons that are best discussed in the distribution blog post.

GPU acceleration

Another critical design goal was support for GPU acceleration of inference operations. Even the smallest useful models are extremely computationally demanding, while more sophisticated models (such as those with tool-calling capabilities) would be a stretch to fit onto local hardware at all. GPU support was going to be non-negotiable for a useful experience.

Unfortunately, passing GPUs across the VM boundary in Docker Desktop, especially in a way that would be reliable across platforms and offer a usable computation API inside containers, was going to be either impossible or very flaky.

As a compromise, we decided to run inference operations outside of the Docker Desktop VM and simply proxy API calls from the VM to the host. While there are some risks with this approach, we are working on initiatives to mitigate these with containerd-hosted sandboxing on macOS and Windows. Moreover, with Docker-provided models and application-provided prompts, the risk is somewhat lower, especially given that inference consists primarily of numerical operations. We assess the risk in Docker Desktop to be about on par with accessing host-side services via host.docker.internal (something already enabled by default).

However, agents that drive tool usage with model output can cause more significant side effects, and that’s something we needed to address. Fortunately, using the Docker MCP Toolkit, we’re able to perform tool invocation inside ephemeral containers, offering reliable encapsulation of the side effects that models might drive. This hybrid approach allows us to offer the best possible local performance with relative peace of mind when using tools.

Outside the context of Docker Desktop, for example, in Docker CE, we’re in a significantly better position due to the lack of a VM boundary (or at least a very transparent VM boundary in the case of a hypervisor) between the host hardware and containers. When running in standalone mode in Docker CE, the Docker Model Runner will have direct access to host hardware (e.g. via the NVIDIA Container Toolkit) and will run inference operations within a container.

Modularity, iteration, and open-sourcing

As previously mentioned, the Docker Model Runner team is relatively small, which meant that we couldn’t rely on a monolithic architecture if we wanted to effectively parallelize the development work for Docker Model Runner. Moreover, we had an early and overarching directive: open-source as much as possible.

We decided on three high-level components around which we could organize development work: the model runner, the model distribution tooling, and the model CLI plugin.

Breaking up these components allowed us to divide work more effectively, iterate faster, and define clean API boundaries between different concerns. While there have been some tricky dependency hurdles (in particular when integrating with closed-source components), we’ve found that the modular approach has facilitated faster incremental changes and support for new platforms.

The High-Level Architecture

At a high level, the Docker Model Runner architecture is composed of the three components mentioned above (the runner, the distribution code, and the CLI), but there are also some interesting sub-components within each:

DMR_architecture@3x_Resized

Figure 1: Docker Model Runner high-level architecture

How these components are packaged and hosted (and how they interact) also depends on the platform where they’re deployed. In each case it looks slightly different. Sometimes they run on the host, sometimes they run in a VM, sometimes they run in a container, but the overall architecture looks the same.

Model storage and client

The core architectural component is the model store. This component, provided by the model distribution code, is where the actual model tensor files are stored. These files are stored differently (and separately) from images because (1) they’re high-entropy and not particularly compressible and (2) the inference engine needs direct access to the files so that it can do things like mapping them into its virtual address space via mmap(). For more information, see the accompanying model distribution blog post.

The model distribution code also provides the model distribution client. This component performs operations (such as pulling models) using the model distribution protocol against OCI registries.

Model runner

Built on top of the model store is the model runner. The model runner maps inbound inference API requests (e.g. /v1/chat/completions or /v1/embeddings requests) to processes hosting pairs of inference engines and models. It includes scheduler, loader, and runner components that coordinate the loading of models in and out of memory so that concurrent requests can be serviced, even if models can’t be loaded simultaneously (e.g. due to resource constraints). This makes the execution lifecycle of models different from that of containers, with engines and models operating as ephemeral processes (mostly hidden from users) that can be terminated and unloaded from memory as necessary (or when idle). A different backend process is run for each combination of engine (e.g. llama.cpp) and model (e.g. ai/qwen3:8B-Q4_K_M) as required by inference API requests (though multiple requests targeting the same pair will reuse the same runner and backend processes if possible).

The runner also includes an installer service that can dynamically download backend binaries and libraries, allowing users to selectively enable features (such as CUDA support) that might require downloading hundreds of MBs of dependencies.

Finally, the model runner serves as the central server for all Docker Model Runner APIs, including the /models APIs (which it routes to the model distribution code) and the /engines APIs (which it routes to its scheduler). This API server will always opt to hold in-flight requests until the resources (primarily RAM or VRAM) are available to service them, rather than returning something like a 503 response. This is critical for a number of usage patterns, such multiple agents running with different models or concurrent requests for both embedding and completion.

Model CLI

The primary user-facing component of the Docker Model Runner architecture is the model CLI. This component is a standard Docker CLI plugin that offers an interface very similar to the docker image command. While the lifecycle of model execution is different from that of containers, the concepts (such as pushing, pulling, and running) should be familiar enough to existing Docker users.

The model CLI communicates with the model runner’s APIs to perform almost all of its operations (though the transport for that communication varies by platform). The model CLI is context-aware, allowing it to determine if it’s talking to a Docker Desktop model runner, Docker CE model runner, or a model runner on some custom platform. Because we’re using the standard Docker CLI plugin framework, we get all of the standard Docker Context functionality for free, making this detection much easier.

API design and routing

As previously mentioned, the Docker Model Runner comprises two sets of APIs: the Docker-style APIs and the OpenAI-compatible APIs. The Docker-style APIs (modeled after the /image APIs) include the following endpoints:

  • POST /models/create (Model pulling)
  • GET /models (Model listing)
  • GET /models/{namespace}/{name} (Model metadata)
  • DELETE /models/{namespace}/{name} (Model deletion)

The bodies for these requests look very similar to their image analogs. There’s no documentation at the moment, but you can get a glimpse of the format by looking at their corresponding Go types.

In contrast, the OpenAI endpoints follow a different but still RESTful convention:

  • GET /engines/{engine}/v1/models (OpenAI-format model listing)
  • GET /engines/{engine}/v1/models/{namespace}/{name} (OpenAI-format model metadata)
  • POST /engines/{engine}/v1/chat/completions (Chat completions)
  • POST /engines/{engine}/v1/completions (Chat completions (legacy endpoint))
  • POST /engines/{engine}/v1/embeddings (Create embeddings)

At this point in time, only one {engine} value is supported (llama.cpp), and it can also be omitted to use the default (llama.cpp) engine.

We make these APIs available on several different endpoints:


First, in Docker Desktop, they’re available on the Docker socket (/var/run/docker.sock), both inside and outside containers. This is in service of our design goal of having models as a first-class citizen in the Docker Engine API. At the moment, these endpoints are prefixed with a /exp/vDD4.40 path (to avoid dependencies on APIs that may evolve during development), but we’ll likely remove this prefix in the next few releases since these APIs have now mostly stabilized and will evolve in a backward-compatible way.

Second, also in Docker Desktop, we make the APIs available on a special model-runner.docker.internal endpoint that’s accessible just from containers (though not currently from ECI containers, because we want to have inference sandboxing implemented first). This TCP-based endpoint exposes just the /models and /engines API endpoints (not the whole Docker API) and is designed to serve existing tooling (which likely can’t access APIs via a Unix domain socket). No /exp/vDD4.40 prefix is used in this case.

Finally, in both Docker Desktop and Docker CE, we make the /models and /engines API endpoints available on a host TCP endpoint (localhost:12434, by default, again without any /exp/vDD4.40 prefix). In Docker Desktop this is optional and not enabled by default. In Docker CE, it’s a critical component of how the API endpoints are accessed, because we currently lack the integration to add endpoints to Docker CE’s /var/run/docker.sock or to inject a custom model-runner.docker.internal hostname, so we advise using the standard 172.17.0.1 host gateway address to access this localhost-exposed port (e.g. setting your OpenAI API base URL to http://172.17.0.1:12434/engines/v1). Hopefully we’ll be able to unify this across Docker platforms in the near future (see our roadmap below).

First up: Docker Desktop

The natural first step for Docker Model Runner was integration into Docker Desktop. In Docker Desktop, we have more direct control over integration with the Docker Engine, as well as existing processes that we can use to host the model runner components. In this case, the model runner and model distribution components live in the Docker Desktop host backend process (the com.docker.backend process you may have seen running) and we use special middleware and networking magic to route requests on /var/run/docker.sock and model-runner.docker.internal to the model runner’s API server. Since the individual inference backend processes run as subprocesses of com.docker.backend, there’s no risk of a crash in Docker Desktop if, for example, an inference backend is killed by an Out Of Memory (OOM) error.

We started initially with support for macOS on Apple Silicon, because it provided the most uniform platform for developing the model runner functionality, but we implemented most of the functionality along the way to build and test for all Docker Desktop platforms. This made it significantly easier to port to Windows on AMD64 and ARM64 platforms, as well as the GPU variations that we found there.

The one complexity with Windows was the larger size of the supporting library dependencies for the GPU-based backends. It wouldn’t have been feasible (or tolerated) if we added another 500 MB – 1 GB to the Docker Desktop for Windows installer, so we decided to default to a CPU-based backend in Docker Desktop for Windows with optional support for the GPU backend. This was the primary motivating factor for the dynamic installer component of the model runner (in addition to our desire for incremental updates to different backends).

This all sounds like a very well-planned exercise, and we did indeed start with a three-component design and strictly enforced API boundaries, but in truth we started with the model runner service code as a sub-package of the Docker Desktop source code. This made it much easier to iterate quickly, especially as we were exploring the architecture for the different services. Fortunately, by sticking to a relatively strict isolation policy for the code, and enforcing clean dependencies through APIs and interfaces, we were able to easily extract the code (kudos to the excellent git-filter-repo tool) into a separate repository for the purposes of open-sourcing.

Next stop: Docker CE

Aside from Docker’s penchant for open-sourcing, one of the main reasons that we wanted to make the Docker Model Runner source code publicly available was to support integration into Docker CE. Our goal was to package the docker model command in the same way as docker buildx and docker compose.

The trick with Docker CE is that we wanted to ship Docker Model Runner as a “vanilla” Docker CLI plugin (i.e. without any special privileges or API access), which meant that we didn’t have a backend process that could host the model runner service. However, in the Docker CE case, the boundary between host hardware and container processes is much less disruptive, meaning that we could actually run Docker Model Runner in a container and simply make any accelerator hardware available to it directly. So, much like a standalone BuildKit builder container, we run the Docker Model Runner as a standalone container in Docker CE, with a special named volume for model storage (meaning you can uninstall the runner without having to re-pull models). This “installation” is performed by the model CLI automatically (and when necessary) by pulling the docker/model-runner image and starting a container. Explicit configuration for the runner can also be specified using the docker model install-runner command. If you want, you can also remove the model runner (and optionally the model storage) using docker model uninstall-runner.

This unfortunately leads to one small compromise with the UX: we don’t currently support the model runner APIs on /var/run/docker.sock or on the special model-runner.docker.internal URL. Instead, the model runner API server listens on the host system’s loopback interface at localhost:12434 (by default), which is available inside most containers at 172.17.0.1:12434. If desired, users can also make this available on model-runner.docker.internal:12434 by utilizing something like –add-host=model-runner.docker.internal:host-gateway when running docker run or docker create commands. This can also be achieved by using the extra_hosts key in a Compose YAML file. We have plans to make this more ergonomic in future releases.

The road ahead…

The status quo is Docker Model Runner support in Docker Desktop on macOS and Windows and support for Docker CE on Linux (including WSL2), but that’s definitely not the end of the story. Over the next few months, we have a number of initiatives planned that we think will reshape the user experience, performance, and security of Docker Model Runner.

Additional GUI and CLI functionality

The most visible functionality coming out over the next few months will be in the model CLI and the “Models” tab in the Docker Desktop dashboard. Expect to see new commands (such as df, ps, and unload) that will provide more direct support for monitoring and controlling model execution. Also, expect to see new and expanded layouts and functionality in the Models tab.

Expanded OpenAI API support

A less-visible but equally important aspect of the Docker Model Runner user experience is our compatibility with the OpenAI API. There are dozens of endpoints and parameters to support (and we already support many), so we will work to expand API surface compatibility with a focus on practical use cases and prioritization of compatibility with existing tools.

containerd and Moby integration

One of the longer-term initiatives that we’re looking at is integration with containerd. containerd already provides a modular runtime system that allows for task execution coordinated with storage. We believe this is the right way forward and that it will allow us to better codify the relationship between model storage, model execution, and model execution sandboxing.

In combination with the containerd work, we would also like tighter integration with the Moby project. While our existing Docker CE integration offers a viable and performant solution, we believe that better ergonomics could be achieved with more direct integration. In particular, niceties like support for model-runner.docker.internal DNS resolution in Docker CE are on our radar. Perhaps the biggest win from this tighter integration would be to expose Docker Model Runner APIs on the Docker socket and to include the API endpoints (e.g. /models) in the official Docker Engine API documentation.

Kubernetes

One of the product goals for Docker Model Runner was a consistent experience from development inner loop to production, and Kubernetes is inarguably a part of that path. The existing Docker Model Runner images that we’re using for Docker CE will also work within a Kubernetes cluster, and we’re currently developing instructions to set up a Docker Model Runner instance in a Kubernetes cluster. The big difference with Kubernetes is the variety of cluster and application architectures in use, so we’ll likely end up with different “recipes” for how to configure the Docker Model Runner in different scenarios.

vLLM

One of the things we’ve heard from a number of customers is that vLLM forms a core component of their production stack. This was also the first alternate backend that we stubbed out in the model runner repository, and the time has come to start poking at an implementation.

Even more to come…

Finally, there are some bits that we just can’t talk about yet, but they will fundamentally shift the way that developers interact with models. Be sure to tune-in to Docker’s sessions at WeAreDevelopers from July 9–11 for some exciting announcements around AI-related initiatives at Docker.

Learn more

Understanding the n8 app and Its Solutions

In today’s digital world, we use dozens of different apps and services every day. Email, Slack, Google Sheets, databases, social media, CRM systems – the list goes on. While each tool serves its purpose, getting them to work together smoothly can be a nightmare. Enter n8n (pronounced “n-eight-n”), a powerful workflow automation platform that connects […]

LM Studio vs Ollama: Picking the Right Tool for Local LLM Use

LM Studio prioritizes ease of use with a polished GUI ideal for beginners, while Ollama offers greater flexibility and control through its developer-friendly command-line interface and REST API. Choose LM Studio if you want a plug-and-play experience with visual controls, or Ollama if you prefer command-line power and deeper customization options. The landscape of local […]

Neural Networks and AI’s Impact on the Evolution of Gaming Experience

Artificial intelligence is no longer just a background tool in the gaming world. Neural networks now shape how stories unfold, how enemies react, and how levels are generated. Players are seeing more personalized, dynamic, and unpredictable experiences thanks to AI integration. But what happens when systems collect too much personal data during gameplay? What if […]
❌