Vue normale

Il y a de nouveaux articles disponibles, cliquez pour rafraîchir la page.
À partir d’avant-hierFlux principal

Why Docker Chose OCI Artifacts for AI Model Packaging

Par : Emily Casey
18 juin 2025 à 13:32

As AI development accelerates, developers need tools that let them move fast without having to reinvent their workflows. Docker Model Runner introduces a new specification for packaging large language models (LLMs) as OCI artifacts — a format developers already know and trust. It brings model sharing into the same workflows used for containers, with support for OCI registries like Docker Hub.

By using OCI artifacts, teams can skip custom toolchains and work with models the same way they do with container images. In this post, we’ll share why we chose OCI artifacts, how the format works, and what it unlocks for GenAI developers.

Why OCI artifacts?

One of Docker’s goals is to make genAI application development accessible to a larger community of developers. We can do this by helping models become first-class citizens within the cloud native ecosystem. 

When models are packaged as OCI artifacts, developers can get started with AI development without the need to learn, vet, and adopt a new distribution toolchain. Instead, developers can discover new models on Hub and distribute variants publicly or privately via existing OCI registries, just like they do with container images today! For teams using Docker Hub, enterprise features like Registry Access Management (RAM) provide policy-based controls and guardrails to help enforce secure, consistent access.

Packaging models as OCI artifacts also paves the way for deeper integration between inference runners like Docker Model Runner and existing tools like containerd and Kubernetes.

Understanding OCI images and artifacts

Many of these advantages apply equally to OCI images and OCI artifacts. To understand why images can be a less optimal fit for LLMs and why a custom artifact specification conveys additional advantages, it helps to first revisit the components of an OCI image and its generic cousin, the OCI artifact.

What are OCI images?

OCI images are a standardized format for container images, defined by the Open Container Initiative (OCI). They package everything needed to run a container: metadata, configuration, and filesystem layers.

An OCI image is composed of three main components:

  • An image manifest – a JSON file containing references to an image configuration and a set of filesystem layers.
  • An image configuration – a JSON file containing the layer ordering and OCI runtime configuration.
  • One or more layers – TAR archives (typically compressed), containing filesystem changesets that, applied in order, produce a container root filesystem.

Below is an example manifest from the busybox image:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "digest": "sha256:7b4721e214600044496305a20ca3902677e572127d4d976ed0e54da0137c243a",
    "size": 477
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:189fdd1508372905e80cc3edcdb56cdc4fa216aebef6f332dd3cba6e300238ea",
      "size": 1844697
    }
  ],
  "annotations": {
    "org.opencontainers.image.url": "https://github.com/docker-library/busybox",
    "org.opencontainers.image.version": "1.37.0-glibc"
  }
}

Because the image manifest contains content-addressable references to all image components, the hash of the manifest file, otherwise known as the image digest, can be used to uniquely identify an image.

What are OCI artifacts?

OCI artifacts offer a way to extend the OCI image format to support distributing content beyond container images. They follow the same structure: a manifest, a config file, and one or more layers. 

The artifact guidance in the OCI image specifications describes how this same basic structure (manifest + config + layers) can be used to distribute other types of content.

The artifact type is designated by the config file’s media type. For example, in the manifest below config.mediaType is set to application/vnd.cncf.helm.config.v1+json. This indicates to registries and other tooling that the artifact is a Helm chart and should be parsed accordingly.

{
  "schemaVersion": 2,
  "config": {
    "mediaType": "application/vnd.cncf.helm.config.v1+json",
    "digest": "sha256:8ec7c0f2f6860037c19b54c3cfbab48d9b4b21b485a93d87b64690fdb68c2111",
    "size": 117
  },
  "layers": [
    {
      "mediaType": "application/vnd.cncf.helm.chart.content.v1.tar+gzip",
      "digest": "sha256:1b251d38cfe948dfc0a5745b7af5ca574ecb61e52aed10b19039db39af6e1617",
      "size": 2487
    }
  ]
}

In an OCI artifact, layers may be of any media type and are not restricted to filesystem changesets. Whoever defines the artifact type defines the supported layer types and determines how the contents should be used and interpreted.

Using container images vs. custom artifact types

With this background in mind, while we could have packaged LLMs as container images, defining a custom type has some important advantages:

  1. A custom artifact type allows us to define a domain-specific config schema. Programmatic access to key metadata provides a support structure for an ecosystem of useful tools specifically tailored to AI use-cases.
  2. A custom artifact type allows us to package content in formats other than compressed TAR archives, thus avoiding performance issues that arise when LLMs are packaged as image layers. For more details on how model layers are different and why it matters, see the Layers section below.
  3. A custom type ensures that models are packaged and distributed separately from inference engines. This separation is important because it allows users to consume the variant of the inference engine optimized for their system without requiring every model to be packaged in combination with every engine.
  4. A custom artifact type frees us from the expectations that typically accompany a container image. Standalone models are not executable without an inference engine. Packaging as a custom type makes clear that they are not independently runnable, thus avoiding confusion and unexpected errors.

Docker Model Artifacts

Now that we understand the high-level goals, let’s dig deeper into the details of the format.

Media Types

The model specification defines the following media types:

  • application/vnd.docker.ai.model.config.v0.1+json – identifies a model config JSON file. This value in config.mediaType in a manifest identifies an artifact as a Docker model with config file adhering to v0.1 of the specification.
  • application/vnd.docker.ai.gguf.v3 – indicates that a layer contains a model packaged as a GGUF file.
  • application/vnd.docker.ai.license – indicates that a layer contains a plain text software license file.

Expect more media types to be defined in the future as we add runtime configuration, add support for new features like projectors and LoRA adaptors, and expand the supported packaging formats for model files.

Manifest

A model manifest is formatted like an image manifest and distinguished by the config.MediaType. The following example manifest, taken from the ai/gemma3, references a model config JSON and two layers, one containing a GGUF file and the other containing the model’s license.

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.docker.ai.model.config.v0.1+json",
    "size": 372,
    "digest": "sha256:22273fd2f4e6dbaf5b5dae5c5e1064ca7d0ff8877d308eb0faf0e6569be41539"
  },
  "layers": [
    {
      "mediaType": "application/vnd.docker.ai.gguf.v3",
      "size": 2489757856,
      "digest": "sha256:09b370de51ad3bde8c3aea3559a769a59e7772e813667ddbafc96ab2dc1adaa7"
    },
    {
      "mediaType": "application/vnd.docker.ai.license",
      "size": 8346,
      "digest": "sha256:a4b03d96571f0ad98b1253bb134944e508a4e9b9de328909bdc90e3f960823e5"
    }
  ]
}


Model ID

The manifest digest uniquely identifies the model and is used by Docker Model Runner as the model ID.

Model Config JSON

The model configuration is a JSON file that surfaces important metadata about the model, such as size, parameter count, quantization, as well as metadata about the artifact provenance (like the creation timestamp).
The following example comes from the ai/gemma model on Dockerhub:

{
  "config": {
    "format": "gguf",
    "quantization": "IQ2_XXS/Q4_K_M",
    "parameters": "3.88 B",
    "architecture": "gemma3",
    "size": "2.31 GiB"
  },
  "descriptor": {
    "created": "2025-03-26T09:57:32.086694+01:00"
  },
  "rootfs": {
    "type": "rootfs",
    "diff_ids": [
      "sha256:09b370de51ad3bde8c3aea3559a769a59e7772e813667ddbafc96ab2dc1adaa7",
      "sha256:a4b03d96571f0ad98b1253bb134944e508a4e9b9de328909bdc90e3f960823e5"
    ]
  }
}

By defining a domain-specific configuration schema, we allow tools to access and use model metadata cheaply — by fetching and parsing a small JSON file — only fetching the model itself when needed.

For example, a registry frontend like Docker Hub can directly surface this data to users who can, in turn, use it to compare models or select based on system capabilities and requirements. Tooling might use this data to estimate memory requirements for a given model. It could then assist in the selection process by suggesting the best variant that is compatible with the available resources.

Layers

Layers in a model artifact differ from layers within an OCI image in two important respects.

Unlike an image layer, where compression is recommended, model layers are always uncompressed. Because models are large, high-entropy files, compressing them provides a negligible reduction in size, while (un)compressing is time and compute-intensive.

In contrast to a layer in an OCI image, which contains multiple files in an archive, each “layer” in a model artifact must contain a single raw file. This allows runtimes like Docker Model Runner to reduce disk usage on the client machine by storing a single uncompressed copy of the model. This file can then be directly memory mapped by the inference engine at runtime.

The lack of file names, hierarchy, and metadata (e.g. modification time) ensures that identical model files always result in identical reusable layer blobs. This prevents unnecessary duplication, which is particularly important when working with LLMs, given the file size.

You may have noticed that these “layers” are not really filesystem layers at all. They are files, but they do not specify a filesystem. So, how does this work at runtime? When Docker Model runner runs a model, instead of finding the GGUF file by name in a model filesystem, the desired file is identified by its media type (application/vnd.docker.ai.gguf.v3) and fetched from the model store. For more information on the Model Runner architecture, please see the architecture overview in this accompanying blog post.

Distribution

Like OCI images and other OCI artifacts, Docker model artifacts are distributed via registries like Dockerhub, Artifactory, or Azure Container Registry that comply with the OCI distribution specification.

Discovery

Docker Hub

The Docker Hub Gen AI catalog aids in the discovery of popular models. These models are packaged in the format described here and are compatible with Docker Model Runner and any other runtime that supports the OCI specification.

Hugging Face

If you are accustomed to exploring models on Hugging Face, there’s good news! Hugging Face now supports on-demand conversion to the Docker Model Artifact format when you pull from Hugging Face with docker model pull.

What’s Next?

Hopefully, you now have a better understanding of the Docker OCI Model format and how it supports our goal of making AI app development more accessible to developers via familiar workflows and commands. But this version of the artifact format is just the beginning! In the future, you can expect the enhancements to the packaging format to bring this level of accessibility and flexibility to a broader range of use cases. Future versions will support:

  • Additional runtime configuration options like templates, context size, and default parameters. This will allow users to configure models for specific use cases and distribute that config alongside the model, as a single immutable artifact.
  • LoRA adapters, allowing users to extend existing model artifacts with use-case-specific fine-tuning.
  • Multi-modal projectors, enabling users to package multi-modal such as language-and-vision models using LLaVA-style projectors.
  • Model index files that provide a set of models with different parameter count and quantizations, allowing runtimes the best option for the available resources.

In addition to adding features, we are committed to fostering an open ecosystem. Expect:

  • Deeper integrations into containerd for a more native runtime experience.
  • Efforts to harmonize with ModelPack and other model packaging standards to improve interoperability.

These advancements show our ongoing commitment to making the OCI artifact a versatile and flexible way to package and run AI models, delivering the same ease and reliability developers already expect from Docker.

Learn more

Behind the scenes: How we designed Docker Model Runner and what’s next

Par : Jacob Howard
18 juin 2025 à 13:30

The last few years have made it clear that AI models will continue to be a fundamental component of many applications. The catch is that they’re also a fundamentally different type of component, with complex software and hardware requirements that don’t (yet) fit neatly into the constraints of container-oriented development lifecycles and architectures. To help address this problem, Docker launched the Docker Model Runner with Docker Desktop 4.40. Since then, we’ve been working aggressively to expand Docker Model Runner with additional OS and hardware support, deeper integration with popular Docker tools, and improvements to both performance and usability.
For those interested in Docker Model Runner and its future, we offer a behind-the-scenes look at its design, development, and roadmap.

Note: Docker Model Runner is really two components: the model runner and the model distribution specification. In this article, we’ll be covering the former, but be sure to check out the companion blog post by Emily Casey for the equally important distribution side of the story.

Design goals

Docker Model Runner’s primary design goal was to allow users to run AI models locally and to access them from both containers and host processes. While that’s simple enough to articulate, it still leaves an enormous design space in which to find a solution. Fortunately, we had some additional constraints: we were a small engineering team, and we had some ambitious timelines. Most importantly, we didn’t want to compromise on UX, even if we couldn’t deliver it all at once. In the end, this motivated design decisions that have so far allowed us to deliver a viable solution while leaving plenty of room for future improvement.





Multiple backends

One thing we knew early on was that we weren’t going to write our own inference engine (Docker’s wheelhouse is containerized development, not low-level inference engines). We’re also big proponents of open-source, and there were just so many great existing solutions! There’s llama.cpp, vLLM, MLX, ONNX, and PyTorch, just to name a few.

Of course, being spoiled for choice can also be a curse — which to choose? The obvious answer was: as many as possible, but not all at once.

We decided to go with llama.cpp for our initial implementation, but we intentionally designed our APIs with an additional, optional path component (the {name} in /engines/{name}) to allow users to take advantage of multiple future backends. We also designed interfaces and stubbed out implementations for other backends to enforce good development hygiene and to avoid becoming tethered to one “initial” implementation.

OpenAI API compatibility

The second design choice we had to make was how to expose inference to consumers in containers. While there was also a fair amount of choice in the inference API space, we found that the OpenAI API standard seemed to offer the best initial tooling compatibility. We were also motivated by the fact that several teams inside Docker were already using this API for various real-world products. While we may support additional APIs in the future, we’ve so far found that this API surface is sufficient for most applications. One gap that we know exists is full compatibility with this API surface, which is something we’re working on iteratively.

This decision also drove our choice of llama.cpp as our initial backend. The llama.cpp project already offered a turnkey option for OpenAI API compatibility through its server implementation. While we had to make some small modifications (e.g. Unix domain socket support), this offered us the fastest path to a solution. We’ve also started contributing these small patches upstream, and we hope to expand our contributions to these projects in the future.

First-class citizenship for models in the Docker API

While the OpenAI API standard was the most ubiquitous option amongst existing tooling, we also knew that we wanted models to be first-class citizens in the Docker Engine API. Models have a fundamentally different execution lifecycle than the processes that typically make up the ENTRYPOINTs of containers, and thus, they don’t fit well under the standard /containers endpoints of the Docker Engine API. However, much like containers, images, networks, and volumes, models are such a fundamental component that they really deserve their own API resource type. This motivated the addition of a set of /models endpoints, closely modeled after the /images endpoints, but separate for reasons that are best discussed in the distribution blog post.

GPU acceleration

Another critical design goal was support for GPU acceleration of inference operations. Even the smallest useful models are extremely computationally demanding, while more sophisticated models (such as those with tool-calling capabilities) would be a stretch to fit onto local hardware at all. GPU support was going to be non-negotiable for a useful experience.

Unfortunately, passing GPUs across the VM boundary in Docker Desktop, especially in a way that would be reliable across platforms and offer a usable computation API inside containers, was going to be either impossible or very flaky.

As a compromise, we decided to run inference operations outside of the Docker Desktop VM and simply proxy API calls from the VM to the host. While there are some risks with this approach, we are working on initiatives to mitigate these with containerd-hosted sandboxing on macOS and Windows. Moreover, with Docker-provided models and application-provided prompts, the risk is somewhat lower, especially given that inference consists primarily of numerical operations. We assess the risk in Docker Desktop to be about on par with accessing host-side services via host.docker.internal (something already enabled by default).

However, agents that drive tool usage with model output can cause more significant side effects, and that’s something we needed to address. Fortunately, using the Docker MCP Toolkit, we’re able to perform tool invocation inside ephemeral containers, offering reliable encapsulation of the side effects that models might drive. This hybrid approach allows us to offer the best possible local performance with relative peace of mind when using tools.

Outside the context of Docker Desktop, for example, in Docker CE, we’re in a significantly better position due to the lack of a VM boundary (or at least a very transparent VM boundary in the case of a hypervisor) between the host hardware and containers. When running in standalone mode in Docker CE, the Docker Model Runner will have direct access to host hardware (e.g. via the NVIDIA Container Toolkit) and will run inference operations within a container.

Modularity, iteration, and open-sourcing

As previously mentioned, the Docker Model Runner team is relatively small, which meant that we couldn’t rely on a monolithic architecture if we wanted to effectively parallelize the development work for Docker Model Runner. Moreover, we had an early and overarching directive: open-source as much as possible.

We decided on three high-level components around which we could organize development work: the model runner, the model distribution tooling, and the model CLI plugin.

Breaking up these components allowed us to divide work more effectively, iterate faster, and define clean API boundaries between different concerns. While there have been some tricky dependency hurdles (in particular when integrating with closed-source components), we’ve found that the modular approach has facilitated faster incremental changes and support for new platforms.

The High-Level Architecture

At a high level, the Docker Model Runner architecture is composed of the three components mentioned above (the runner, the distribution code, and the CLI), but there are also some interesting sub-components within each:

DMR_architecture@3x_Resized

Figure 1: Docker Model Runner high-level architecture

How these components are packaged and hosted (and how they interact) also depends on the platform where they’re deployed. In each case it looks slightly different. Sometimes they run on the host, sometimes they run in a VM, sometimes they run in a container, but the overall architecture looks the same.

Model storage and client

The core architectural component is the model store. This component, provided by the model distribution code, is where the actual model tensor files are stored. These files are stored differently (and separately) from images because (1) they’re high-entropy and not particularly compressible and (2) the inference engine needs direct access to the files so that it can do things like mapping them into its virtual address space via mmap(). For more information, see the accompanying model distribution blog post.

The model distribution code also provides the model distribution client. This component performs operations (such as pulling models) using the model distribution protocol against OCI registries.

Model runner

Built on top of the model store is the model runner. The model runner maps inbound inference API requests (e.g. /v1/chat/completions or /v1/embeddings requests) to processes hosting pairs of inference engines and models. It includes scheduler, loader, and runner components that coordinate the loading of models in and out of memory so that concurrent requests can be serviced, even if models can’t be loaded simultaneously (e.g. due to resource constraints). This makes the execution lifecycle of models different from that of containers, with engines and models operating as ephemeral processes (mostly hidden from users) that can be terminated and unloaded from memory as necessary (or when idle). A different backend process is run for each combination of engine (e.g. llama.cpp) and model (e.g. ai/qwen3:8B-Q4_K_M) as required by inference API requests (though multiple requests targeting the same pair will reuse the same runner and backend processes if possible).

The runner also includes an installer service that can dynamically download backend binaries and libraries, allowing users to selectively enable features (such as CUDA support) that might require downloading hundreds of MBs of dependencies.

Finally, the model runner serves as the central server for all Docker Model Runner APIs, including the /models APIs (which it routes to the model distribution code) and the /engines APIs (which it routes to its scheduler). This API server will always opt to hold in-flight requests until the resources (primarily RAM or VRAM) are available to service them, rather than returning something like a 503 response. This is critical for a number of usage patterns, such multiple agents running with different models or concurrent requests for both embedding and completion.

Model CLI

The primary user-facing component of the Docker Model Runner architecture is the model CLI. This component is a standard Docker CLI plugin that offers an interface very similar to the docker image command. While the lifecycle of model execution is different from that of containers, the concepts (such as pushing, pulling, and running) should be familiar enough to existing Docker users.

The model CLI communicates with the model runner’s APIs to perform almost all of its operations (though the transport for that communication varies by platform). The model CLI is context-aware, allowing it to determine if it’s talking to a Docker Desktop model runner, Docker CE model runner, or a model runner on some custom platform. Because we’re using the standard Docker CLI plugin framework, we get all of the standard Docker Context functionality for free, making this detection much easier.

API design and routing

As previously mentioned, the Docker Model Runner comprises two sets of APIs: the Docker-style APIs and the OpenAI-compatible APIs. The Docker-style APIs (modeled after the /image APIs) include the following endpoints:

  • POST /models/create (Model pulling)
  • GET /models (Model listing)
  • GET /models/{namespace}/{name} (Model metadata)
  • DELETE /models/{namespace}/{name} (Model deletion)

The bodies for these requests look very similar to their image analogs. There’s no documentation at the moment, but you can get a glimpse of the format by looking at their corresponding Go types.

In contrast, the OpenAI endpoints follow a different but still RESTful convention:

  • GET /engines/{engine}/v1/models (OpenAI-format model listing)
  • GET /engines/{engine}/v1/models/{namespace}/{name} (OpenAI-format model metadata)
  • POST /engines/{engine}/v1/chat/completions (Chat completions)
  • POST /engines/{engine}/v1/completions (Chat completions (legacy endpoint))
  • POST /engines/{engine}/v1/embeddings (Create embeddings)

At this point in time, only one {engine} value is supported (llama.cpp), and it can also be omitted to use the default (llama.cpp) engine.

We make these APIs available on several different endpoints:


First, in Docker Desktop, they’re available on the Docker socket (/var/run/docker.sock), both inside and outside containers. This is in service of our design goal of having models as a first-class citizen in the Docker Engine API. At the moment, these endpoints are prefixed with a /exp/vDD4.40 path (to avoid dependencies on APIs that may evolve during development), but we’ll likely remove this prefix in the next few releases since these APIs have now mostly stabilized and will evolve in a backward-compatible way.

Second, also in Docker Desktop, we make the APIs available on a special model-runner.docker.internal endpoint that’s accessible just from containers (though not currently from ECI containers, because we want to have inference sandboxing implemented first). This TCP-based endpoint exposes just the /models and /engines API endpoints (not the whole Docker API) and is designed to serve existing tooling (which likely can’t access APIs via a Unix domain socket). No /exp/vDD4.40 prefix is used in this case.

Finally, in both Docker Desktop and Docker CE, we make the /models and /engines API endpoints available on a host TCP endpoint (localhost:12434, by default, again without any /exp/vDD4.40 prefix). In Docker Desktop this is optional and not enabled by default. In Docker CE, it’s a critical component of how the API endpoints are accessed, because we currently lack the integration to add endpoints to Docker CE’s /var/run/docker.sock or to inject a custom model-runner.docker.internal hostname, so we advise using the standard 172.17.0.1 host gateway address to access this localhost-exposed port (e.g. setting your OpenAI API base URL to http://172.17.0.1:12434/engines/v1). Hopefully we’ll be able to unify this across Docker platforms in the near future (see our roadmap below).

First up: Docker Desktop

The natural first step for Docker Model Runner was integration into Docker Desktop. In Docker Desktop, we have more direct control over integration with the Docker Engine, as well as existing processes that we can use to host the model runner components. In this case, the model runner and model distribution components live in the Docker Desktop host backend process (the com.docker.backend process you may have seen running) and we use special middleware and networking magic to route requests on /var/run/docker.sock and model-runner.docker.internal to the model runner’s API server. Since the individual inference backend processes run as subprocesses of com.docker.backend, there’s no risk of a crash in Docker Desktop if, for example, an inference backend is killed by an Out Of Memory (OOM) error.

We started initially with support for macOS on Apple Silicon, because it provided the most uniform platform for developing the model runner functionality, but we implemented most of the functionality along the way to build and test for all Docker Desktop platforms. This made it significantly easier to port to Windows on AMD64 and ARM64 platforms, as well as the GPU variations that we found there.

The one complexity with Windows was the larger size of the supporting library dependencies for the GPU-based backends. It wouldn’t have been feasible (or tolerated) if we added another 500 MB – 1 GB to the Docker Desktop for Windows installer, so we decided to default to a CPU-based backend in Docker Desktop for Windows with optional support for the GPU backend. This was the primary motivating factor for the dynamic installer component of the model runner (in addition to our desire for incremental updates to different backends).

This all sounds like a very well-planned exercise, and we did indeed start with a three-component design and strictly enforced API boundaries, but in truth we started with the model runner service code as a sub-package of the Docker Desktop source code. This made it much easier to iterate quickly, especially as we were exploring the architecture for the different services. Fortunately, by sticking to a relatively strict isolation policy for the code, and enforcing clean dependencies through APIs and interfaces, we were able to easily extract the code (kudos to the excellent git-filter-repo tool) into a separate repository for the purposes of open-sourcing.

Next stop: Docker CE

Aside from Docker’s penchant for open-sourcing, one of the main reasons that we wanted to make the Docker Model Runner source code publicly available was to support integration into Docker CE. Our goal was to package the docker model command in the same way as docker buildx and docker compose.

The trick with Docker CE is that we wanted to ship Docker Model Runner as a “vanilla” Docker CLI plugin (i.e. without any special privileges or API access), which meant that we didn’t have a backend process that could host the model runner service. However, in the Docker CE case, the boundary between host hardware and container processes is much less disruptive, meaning that we could actually run Docker Model Runner in a container and simply make any accelerator hardware available to it directly. So, much like a standalone BuildKit builder container, we run the Docker Model Runner as a standalone container in Docker CE, with a special named volume for model storage (meaning you can uninstall the runner without having to re-pull models). This “installation” is performed by the model CLI automatically (and when necessary) by pulling the docker/model-runner image and starting a container. Explicit configuration for the runner can also be specified using the docker model install-runner command. If you want, you can also remove the model runner (and optionally the model storage) using docker model uninstall-runner.

This unfortunately leads to one small compromise with the UX: we don’t currently support the model runner APIs on /var/run/docker.sock or on the special model-runner.docker.internal URL. Instead, the model runner API server listens on the host system’s loopback interface at localhost:12434 (by default), which is available inside most containers at 172.17.0.1:12434. If desired, users can also make this available on model-runner.docker.internal:12434 by utilizing something like –add-host=model-runner.docker.internal:host-gateway when running docker run or docker create commands. This can also be achieved by using the extra_hosts key in a Compose YAML file. We have plans to make this more ergonomic in future releases.

The road ahead…

The status quo is Docker Model Runner support in Docker Desktop on macOS and Windows and support for Docker CE on Linux (including WSL2), but that’s definitely not the end of the story. Over the next few months, we have a number of initiatives planned that we think will reshape the user experience, performance, and security of Docker Model Runner.

Additional GUI and CLI functionality

The most visible functionality coming out over the next few months will be in the model CLI and the “Models” tab in the Docker Desktop dashboard. Expect to see new commands (such as df, ps, and unload) that will provide more direct support for monitoring and controlling model execution. Also, expect to see new and expanded layouts and functionality in the Models tab.

Expanded OpenAI API support

A less-visible but equally important aspect of the Docker Model Runner user experience is our compatibility with the OpenAI API. There are dozens of endpoints and parameters to support (and we already support many), so we will work to expand API surface compatibility with a focus on practical use cases and prioritization of compatibility with existing tools.

containerd and Moby integration

One of the longer-term initiatives that we’re looking at is integration with containerd. containerd already provides a modular runtime system that allows for task execution coordinated with storage. We believe this is the right way forward and that it will allow us to better codify the relationship between model storage, model execution, and model execution sandboxing.

In combination with the containerd work, we would also like tighter integration with the Moby project. While our existing Docker CE integration offers a viable and performant solution, we believe that better ergonomics could be achieved with more direct integration. In particular, niceties like support for model-runner.docker.internal DNS resolution in Docker CE are on our radar. Perhaps the biggest win from this tighter integration would be to expose Docker Model Runner APIs on the Docker socket and to include the API endpoints (e.g. /models) in the official Docker Engine API documentation.

Kubernetes

One of the product goals for Docker Model Runner was a consistent experience from development inner loop to production, and Kubernetes is inarguably a part of that path. The existing Docker Model Runner images that we’re using for Docker CE will also work within a Kubernetes cluster, and we’re currently developing instructions to set up a Docker Model Runner instance in a Kubernetes cluster. The big difference with Kubernetes is the variety of cluster and application architectures in use, so we’ll likely end up with different “recipes” for how to configure the Docker Model Runner in different scenarios.

vLLM

One of the things we’ve heard from a number of customers is that vLLM forms a core component of their production stack. This was also the first alternate backend that we stubbed out in the model runner repository, and the time has come to start poking at an implementation.

Even more to come…

Finally, there are some bits that we just can’t talk about yet, but they will fundamentally shift the way that developers interact with models. Be sure to tune-in to Docker’s sessions at WeAreDevelopers from July 9–11 for some exciting announcements around AI-related initiatives at Docker.

Learn more

How to Build, Run, and Package AI Models Locally with Docker Model Runner

12 juin 2025 à 16:00

Introduction

As a Senior DevOps Engineer and Docker Captain, I’ve helped build AI systems for everything from retail personalization to medical imaging. One truth stands out: AI capabilities are core to modern infrastructure.

This guide will show you how to run and package local AI models with Docker Model Runner — a lightweight, developer-friendly tool for working with AI models pulled from Docker Hub or Hugging Face. You’ll learn how to run models in the CLI or via API, publish your own model artifacts, and do it all without setting up Python environments or web servers.

What is AI in Development?

Artificial Intelligence (AI) refers to systems that mimic human intelligence, including:

  • Making decisions via machine learning
  • Understanding language through NLP
  • Recognizing images with computer vision
  • Learning from new data automatically

Common Types of AI in Development:

  • Machine Learning (ML): Learns from structured and unstructured data
  • Deep Learning: Neural networks for pattern recognition
  • Natural Language Processing (NLP): Understands/generates human language
  • Computer Vision: Recognizes and interprets images

Why Package and Run Your Own AI Model?

Local model packaging and execution offer full control over your AI workflows. Instead of relying on external APIs, you can run models directly on your machine — unlocking:

  • Faster inference with local compute (no latency from API calls)
  • Greater privacy by keeping data and prompts on your own hardware
  • Customization through packaging and versioning your own models
  • Seamless CI/CD integration with tools like Docker and GitHub Actions
  • Offline capabilities for edge use cases or constrained environments

Platforms like Docker and Hugging Face make cutting-edge AI models instantly accessible without building from scratch. Running them locally means lower latency, better privacy, and faster iteration.

Real-World Use Cases for AI

  • Chatbots & Virtual Assistants: Automate support (e.g., ChatGPT, Alexa)
  • Generative AI: Create text, art, music (e.g., Midjourney, Lensa)
  • Dev Tools: Autocomplete and debug code (e.g., GitHub Copilot)
  • Retail Intelligence: Recommend products based on behavior
  • Medical Imaging: Analyze scans for faster diagnosis

How to Package and Run AI Models Locally with Docker Model Runner

Prerequisites:

Step 0 — Enable Docker Model Runner

Open Docker Desktop

Go to Settings → Features in development

Under the Experimental features tab, enable Access experimental features

Click Apply and restart

Quit and reopen Docker Desktop to ensure changes take effect

Reopen Settings → Features in development

Switch to the Beta tab and check Enable Docker Model Runner

(Optional) Enable host-side TCP support to access the API from localhost

Once enabled, you can use the docker model CLI and manage models in the Models tab.

Screenshot of Docker Desktop’s Features in development tab with Docker Model Runner and Dev Environments enabled.

Screenshot of Docker Desktop’s Features in development tab with Docker Model Runner and Dev Environments enabled.

Step 1: Pull a Model

From Docker Hub:

docker model pull ai/smollm2

Or from Hugging Face (GGUF format):

docker model pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF

Note: Only GGUF models are supported. GGUF (GPT-style General Use Format) is a lightweight binary file format designed for efficient local inference, especially with CPU-optimized runtimes like llama.cpp. It includes the model weights, tokenizer, and metadata all in one place, making it ideal for packaging and distributing LLMs in containerized environments.

Step 2: Tag and Push to Local Registry (Optional)

If you want to push models to a private or local registry:

Tag model with your registry’s address:

docker model tag hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF localhost:5000/foobar

Run a local Docker registry:

docker run -d -p 6000:5000 --name registry registry:2

Push the model to the local registry:

docker model push localhost:6000/foobar

Check your local models with:

docker model list

Step 3: Run the Model

Run a prompt (one-shot)

docker model run ai/smollm2 "What is Docker?"

Interactive chat mode

docker model run ai/smollm2

Note: Models are loaded into memory on demand and unloaded after 5 minutes of inactivity.

Step 4: Test via OpenAI-Compatible API

To call the model from the host:

  1. Enable TCP host access for Model Runner (via Docker Desktop GUI or CLI)
Screenshot of Docker Desktop’s Features in development tab showing host-side TCP support enabled for Docker Model Runner.

Screenshot of Docker Desktop’s Features in development tab showing host-side TCP support enabled for Docker Model Runner.

docker desktop enable model-runner --tcp 12434
  1. Send a prompt using the OpenAI-compatible chat endpoint:
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Tell me about the fall of Rome."}
    ]
  }'

Note: No API key required — this runs locally and securely on your machine.

Step 5: Package Your Own Model

You can package your own pre-trained GGUF model as a Docker-compatible artifact if you already have a .gguf file — such as one downloaded from Hugging Face or converted using tools like llama.cpp.

Note: This guide assumes you already have a .gguf model file. It does not cover how to train or convert models to GGUF.

docker model package \
  --gguf "$(pwd)/model.gguf" \
  --license "$(pwd)/LICENSE.txt" \
  --push registry.example.com/ai/custom-llm:v1

This is ideal for custom-trained or private models. You can now pull it like any other model:

docker model pull registry.example.com/ai/custom-llm:v1

Step 6: Optimize & Iterate

  • Use docker model logs to monitor model usage and debug issues
  • Set up CI/CD to automate pulls, scans, and packaging
  • Track model lineage and training versions to ensure consistency
  • Use semantic versioning (:v1, :2025-05, etc.) instead of latest when packaging custom models
  • Only one model can be loaded at a time; requesting a new model will unload the previous one.

Compose Integration (Optional)

Docker Compose v2.35+ (included in Docker Desktop 4.41+) introduces support for AI model services using a new provider.type: model. You can define models directly in your compose.yml and reference them in app services using depends_on.

During docker compose up, Docker Model Runner automatically pulls the model and starts it on the host system, then injects connection details into dependent services using environment variables such as MY_MODEL_URL and MY_MODEL_MODEL, where MY_MODEL matches the name of the model service.

This enables seamless multi-container AI applications — with zero extra glue code. Learn more.

Navigating AI Development Challenges

  • Latency: Use quantized GGUF models
  • Security: Never run unknown models; validate sources and attach licenses
  • Compliance: Mask PII, respect data consent
  • Costs: Run locally to avoid cloud compute bills

Best Practices

  • Prefer GGUF models for optimal CPU inference
  • Use the --license flag when packaging custom models to ensure compliance
  • Use versioned tags (e.g., :v1, :2025-05) instead of latest
  • Monitor model logs using docker model logs
  • Validate model sources before pulling or packaging
  • Only pull models from trusted sources (e.g., Docker Hub’s ai/ namespace or verified Hugging Face repos).
  • Review the license and usage terms for each model before packaging or deploying.

The Road Ahead

  • Support for Retrieval-Augmented Generation (RAG)
  • Expanded multimodal support (text + images, video, audio)
  • LLMs as services in Docker Compose (Requires Docker Compose v2.35+)
  • More granular Model Dashboard features in Docker Desktop
  • Secure packaging and deployment pipelines for private AI models

Docker Model Runner lets DevOps teams treat models like any other artifact — pulled, tagged, versioned, tested, and deployed.

Final Thoughts

You don’t need a GPU cluster or external API to build AI apps. Learn more and explore everything you can do with Docker Model Runner:

  • Pull prebuilt models from Docker Hub or Hugging Face
  • Run them locally using the CLI, API, or Docker Desktop’s Model tab
  • Package and push your own models as OCI artifacts
  • Integrate with your CI/CD pipelines securely

You can also find other helpful information to get started at: 

You’re not just deploying containers — you’re delivering intelligence.

Learn more

Publishing AI models to Docker Hub

Par : Kevin Wittek
11 juin 2025 à 12:16

When we first released Docker Model Runner, it came with built-in support for running AI models published and maintained by Docker on Docker Hub. This made it simple to pull a model like llama3.2 or gemma3 and start using it locally with familiar Docker-style commands.

Model Runner now supports three new commands: tag, push, and package. These enable you to share models with your team, your organization, or the wider community. Whether you’re managing your own fine-tuned models or curating a set of open-source models, Model Runner now lets you publish them to Docker Hub or any other OCI Artifact compatible Container Registry.  For teams using Docker Hub, enterprise features like Registry Access Management (RAM) provide policy-based controls and guardrails to help enforce secure, consistent access.

Tagging and pushing to Docker Hub

Let’s start by republishing an existing model from Docker Hub under your own namespace.

# Step 1: Pull the model from Docker Hub
$ docker model pull ai/smollm2

# Step 2: Tag it for your own organization
$ docker model tag ai/smollm2 myorg/smollm2

# Step 3: Push it to Docker Hub
$ docker model push myorg/smollm2

That’s it! Your model is now available at myorg/smollm2 and ready to be consumed using Model Runner by anyone with access.

Pushing to other container registries

Model Runner supports other container registries beyond Docker Hub, including GitHub Container Registry (GHCR).

# Step 1: Tag for GHCR
$ docker model tag ai/smollm2 ghcr.io/myorg/smollm2

# Step 2: Push to GHCR
$ docker model push ghcr.io/myorg/smollm2

Authentication and permissions work just like they do with regular Docker images in the context of GHCR, so you can leverage your existing workflow for managing registry credentials.

Packaging a custom GGUF file

Want to publish your own model file? You can use the package command to wrap a .gguf file into a Docker-compatible OCI artifact and directly push it into a Container Registry, such as Docker Hub.

# Step 1: Download a model, e.g. from HuggingFace
$ curl -L -o model.gguf https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF/resolve/main/mistral-7b-v0.1.Q4_K_M.gguf

# Step 2: Package and push it
$ docker model package --gguf "$(pwd)/model.gguf" --push myorg/mistral-7b-v0.1:Q4_K_M

You’ve now turned a raw model file in GGUF format into a portable, versioned, and sharable artifact that works seamlessly with docker model run.

Conclusion

We’ve seen how easy it is to publish your own models using Docker Model Runner’s new tag, push, and package commands. These additions bring the familiar Docker developer experience to the world of AI model sharing. Teams and enterprises using Docker Hub can securely manage access and control for their models, just like with container images, making it easier to scale GenAI applications across teams.

Stay tuned for more improvements to Model Runner that will make packaging and running models even more powerful and flexible.

Learn more

Docker Desktop 4.42: Native IPv6, Built-In MCP, and Better Model Packaging

10 juin 2025 à 16:35

Docker Desktop 4.42 introduces powerful new capabilities that enhance network flexibility, improve security, and deepen AI toolchain integration, all while reducing setup friction. With native IPv6 support, a fully integrated MCP Toolkit, and major upgrades to Docker Model Runner and our AI agent Gordon, this release continues our commitment to helping developers move faster, ship smarter, and build securely across any environment. Whether you’re managing enterprise-grade networks or experimenting with agentic workflows, Docker Desktop 4.42 brings the tools you need right into your development workflows. 

2400x1260_4.42-rectangle-docker-desktop-release

IPv6 support 

Docker Desktop now provides IPv6 networking capabilities with customization options to better support diverse network environments. You can now choose between dual IPv4/IPv6 (default), IPv4-only, or IPv6-only networking modes to align with your organization’s network requirements. The new intelligent DNS resolution behavior automatically detects your host’s network stack and filters unsupported record types, preventing connectivity timeouts in IPv4-only or IPv6-only environments. 

These ipv6 settings are available in Docker Desktop Settings > Resources > Network section and can be enforced across teams using Settings Management, making Docker Desktop more reliable in complex enterprise network configurations including IPv6-only deployments.

Further documentation here.

Screenshot of Docker Desktop IPv6 settings

Figure 1: Docker Desktop IPv6 settings

Docker MCP Toolkit integrated into Docker Desktop

Last month, we launched the Docker MCP Catalog and Toolkit to help developers easily discover MCP servers and securely connect them to their favorite clients and agentic apps. We’re humbled by the incredible support from the community. User growth is up by over 50%, and we’ve crossed 1 million pulls! Now, we’re excited to share that the MCP Toolkit is built right into Docker Desktop, no separate extension required.

You can now access more than 100 MCP servers, including GitHub, MongoDB, Hashicorp, and more, directly from Docker Desktop – just enable the servers you need, configure them, and connect to clients like Claude Desktop, Cursor, Continue.dev, or Docker’s AI agent Gordon.

Unlike typical setups that run MCP servers via npx or uvx processes with broad access to the host system, Docker Desktop runs these servers inside isolated containers with well-defined security boundaries. All container images are cryptographically signed, with proper isolation of secrets and configuration data. 

Screenshot of the MCP Toolkit tab on Docker Desktop, showing a list of downloadable and connected clients.

Figure 2: Docker MCP Toolkit is now integrated natively into Docker Desktop

To meet developers where they are, we’re bringing Docker MCP support to the CLI, using the same command structure you’re already familiar with. With the new docker mcp commands, you can launch, configure, and manage MCP servers directly from the terminal. The CLI plugin offers comprehensive functionality, including catalog management, client connection setup, and secret management.

Screenshot of the available Docker MCP CLI commands, including catalog, client, config, and more.

Figure 3:  Docker MCP CLI commands.

Docker AI Agent Gordon Now Supports MCP Toolkit Integration

In this release, we’ve upgraded Gordon, Docker’s AI agent, with direct integration to the MCP Toolkit in Docker Desktop. To enable it, open Gordon, click the “Tools” button, and toggle on the “MCP” Toolkit option. Once activated, the MCP Toolkit tab will display tools available from any MCP servers you’ve configured.

Screenshot of Gordon working with MCP Toolkit

Figure 4: Docker’s AI Agent Gordon now integrates with Docker’s MCP Toolkit, bringing 100+ MCP servers

This integration gives you immediate access to 100+ MCP servers with no extra setup, letting you experiment with AI capabilities directly in your Docker workflow. Gordon now acts as a bridge between Docker’s native tooling and the broader AI ecosystem, letting you leverage specialized tools for everything from screenshot capture to data analysis and API interactions – all from a consistent, unified interface.

Screenshot of Gordon calling Github

Figure 5: Docker’s AI Agent Gordon uses the GitHub MCP server to pull issues and suggest solutions.

Finally, we’ve also improved the Dockerize feature with expanded support for Java, Kotlin, Gradle, and Maven projects. These improvements make it easier to containerize a wider range of applications with minimal configuration. With expanded containerization capabilities and integrated access to the MCP Toolkit, Gordon is more powerful than ever. It streamlines container workflows, reduces repetitive tasks, and gives you access to specialized tools, so you can stay focused on building, shipping, and running your applications efficiently.

Docker Model Runner adds Qualcomm support, Docker Engine Integration, and UX Upgrades

Staying true to our philosophy of giving developers more flexibility and meeting them where they are, the latest version of Docker Model Runner adds broader OS support, deeper integration with popular Docker tools, and improvements in both performance and usability.

In addition to supporting Apple Silicon and Windows systems with NVIDIA GPUs, Docker Model Runner now works on Windows devices with Qualcomm chipsets. Under the hood, we’ve upgraded our inference engine to use the latest version of llama.cpp, bringing significantly enhanced tool calling capabilities to your AI applications.Docker Model Runner can now be installed directly in Docker Engine Community Edition across multiple Linux distributions supported by Docker Engine. This integration is particularly valuable for developers looking to incorporate AI capabilities into their CI/CD pipelines and automated testing workflows. To get started, check out our documentation for the setup guide.

Get Up and Running with Models Faster

The Docker Model Runner user experience has been upgraded with expanded GUI functionality in Docker Desktop. All of these UI enhancements are designed to help you get started with Model Runner quickly and build applications faster. A dedicated interface now includes three new tabs that simplify model discovery, management, and streamline troubleshooting workflows. Additionally, Docker Desktop’s updated GUI introduces a more intuitive onboarding experience with streamlined “two-click” actions.

After clicking on the Model tab, you’ll see three new sub-tabs. The first, labeled “Local,” displays a set of models in various sizes that you can quickly pull. Once a model is pulled, you can launch a chat interface to test and experiment with it immediately.

Screenshot of the Models menu within Docker Desktop, along with suggested models.

Figure 6: Access a set of models of various sizes to get quickly started in Models menu of Docker Desktop

The second tab ”Docker Hub” offers a comprehensive view for browsing and pulling models from Docker Hub’s AI Catalog, making it easy to get started directly within Docker Desktop, without switching contexts.

Screenshot of the Docker Hub tab within the Docker Desktop Models menu.

Figure 7: A shortcut to the Model catalog from Docker Hub in Models menu of Docker Desktop

The third tab “Logs” offers real-time access to the inference engine’s log tail, giving developers immediate visibility into model execution status and debugging information directly within the Docker Desktop interface.

model debug

Figure 8: Gain visibility into model execution status and debugging information in Docker Desktop

Model Packaging Made Simple via CLI

As part of the Docker Model CLI, the most significant enhancement is the introduction of the docker model package command. This new command enables developers to package their models from GGUF format into OCI-compliant artifacts, fundamentally transforming how AI models are distributed and shared. It enables seamless publishing to both public and private and OCI-compatible repositories such as Docker Hub and establishes a standardized, secure workflow for model distribution, using the same trusted Docker tools developers already rely on. See our docs for more details. 

Conclusion 

From intelligent networking enhancements to seamless AI integrations, Docker Desktop 4.42 makes it easier than ever to build with confidence. With native support for IPv6, in-app access to 100+ MCP servers, and expanded platform compatibility for Docker Model Runner, this release is all about meeting developers where they are and equipping them with the tools to take their work further. Update to the latest version today and unlock everything Docker Desktop 4.42 has to offer.

Learn more

How to Make an AI Chatbot from Scratch using Docker Model Runner

Par : Harsh Manvar
3 juin 2025 à 18:40

Today, we’ll show you how to build a fully functional Generative AI chatbot using Docker Model Runner and powerful observability tools, including Prometheus, Grafana, and Jaeger. We’ll walk you through the common challenges developers face when building AI-powered applications, demonstrate how Docker Model Runner solves these pain points, and then guide you step-by-step through building a production-ready chatbot with comprehensive monitoring and metrics.

By the end of this guide, you’ll know how to make an AI chatbot and run it locally. You’ll also learn how to set up real-time monitoring insights, streaming responses, and a modern React interface — all orchestrated through familiar Docker workflows.

The current challenges with GenAI development

Generative AI (GenAI) is revolutionizing software development, but creating AI-powered applications comes with significant challenges. First, the current AI landscape is fragmented — developers must piece together various libraries, frameworks, and platforms that weren’t designed to work together. Second, running large language models efficiently requires specialized hardware configurations that vary across platforms, while AI model execution remains disconnected from standard container workflows. This forces teams to maintain separate environments for their application code and AI models.

Third, without standardized methods for storing, versioning, and serving models, development teams struggle with inconsistent deployment practices. Meanwhile, relying on cloud-based AI services creates financial strain through unpredictable costs that scale with usage. Additionally, sending data to external AI services introduces privacy and security risks, especially for applications handling sensitive information.

These challenges combine to create a frustrating developer experience that hinders experimentation and slows innovation precisely when businesses need to accelerate their AI adoption. Docker Model Runner addresses these pain points by providing a streamlined solution for running AI models locally, right within your existing Docker workflow.

How Docker is solving these challenges

Docker Model Runner offers a revolutionary approach to GenAI development by integrating AI model execution directly into familiar container workflows. 

dmr-genai-comparison

Figure 1: Comparison diagram showing complex multi-step traditional GenAI setup versus simplified Docker Model Runner single-command workflow

Many developers successfully use containerized AI models, benefiting from integrated workflows, cost control, and data privacy. Docker Model Runner builds on these strengths by making it even easier and more efficient to work with models. By running models natively on your host machine while maintaining the familiar Docker interface, Model Runner delivers.

  • Simplified Model Execution: Run AI models locally with a simple Docker CLI command, no complex setup required.
  • Hardware Acceleration: Direct access to GPU resources without containerization overhead
  • Integrated Workflow: Seamless integration with existing Docker tools and container development practices
  • Standardized Packaging: Models are distributed as OCI artifacts through the same registries you already use
  • Cost Control: Eliminate unpredictable API costs by running models locally
  • Data Privacy: Keep sensitive data within your infrastructure with no external API calls

This approach fundamentally changes how developers can build and test AI-powered applications, making local development faster, more secure, and dramatically more efficient.

How to create an AI chatbot with Docker

In this guide, we’ll build a comprehensive GenAI application that showcases how to create a fully-featured chat interface powered by Docker Model Runner, complete with advanced observability tools to monitor and optimize your AI models.

Project overview

The project is a complete Generative AI interface that demonstrates how to:

  1. Create a responsive React/TypeScript chat UI with streaming responses
  2. Build a Go backend server that integrates with Docker Model Runner
  3. Implement comprehensive observability with metrics, logging, and tracing
  4. Monitor AI model performance with real-time metrics

Architecture

The application consists of these main components:

  1. The frontend sends chat messages to the backend API
  2. The backend formats the messages and sends them to the Model Runner
  3. The LLM processes the input and generates a response
  4. The backend streams the tokens back to the frontend as they’re generated
  5. The frontend displays the incoming tokens in real-time
  6. Observability components collect metrics, logs, and traces throughout the process

arch

Figure 2: Architecture diagram showing data flow between frontend, backend, Model Runner, and observability tools like Prometheus, Grafana, and Jaeger.

Project structure

The project has the following structure:

tree -L 2
.
├── Dockerfile
├── README-model-runner.md
├── README.md
├── backend.env
├── compose.yaml
├── frontend
..
├── go.mod
├── go.sum
├── grafana
│   └── provisioning
├── main.go
├── main_branch_update.md
├── observability
│   └── README.md
├── pkg
│   ├── health
│   ├── logger
│   ├── metrics
│   ├── middleware
│   └── tracing
├── prometheus
│   └── prometheus.yml
├── refs
│   └── heads
..


21 directories, 33 files

We’ll examine the key files and understand how they work together throughout this guide.

Prerequisites

Before we begin, make sure you have:

  • Docker Desktop (version 4.40 or newer) 
  • Docker Model Runner enabled
  • At least 16GB of RAM for running AI models efficiently
  • Familiarity with Go (for backend development)
  • Familiarity with React and TypeScript (for frontend development)

Getting started

To run the application:

  1. Clone the repository: 
git clone 
https://github.com/dockersamples/genai-model-runner-metrics

cd genai-model-runner-metrics


  1. Enable Docker Model Runner in Docker Desktop:
  • Go to Settings > Features in Development > Beta tab
  • Enable “Docker Model Runner”
  • Select “Apply and restart”
enable-dmr

Figure 3: Screenshot of Docker Desktop Beta Features settings panel with Docker AI, Docker Model Runner, and TCP support enabled.

  1. Download the model

For this demo, we’ll use Llama 3.2, but you can substitute any model of your choice:

docker model pull ai/llama3.2:1B-Q8_0


Just like viewing containers, you can manage your downloaded AI models directly in Docker Dashboard under the Models section. Here you can see model details, storage usage, and manage your local AI model library.

model-ui

Figure 4: View of Docker Dashboard showing locally downloaded AI models with details like size, parameters, and quantization.

  1. Start the application: 
docker compose up -d --build
list-of-containers

Figure 5: List of active running containers in Docker Dashboard, including Jaeger, Prometheus, backend, frontend, and genai-model-runner-metrics.

  1. Open your browser and navigate to the frontend URL at http://localhost:3000 . You’ll be greeted with a modern chat interface (see screenshot) featuring: 
  • Clean, responsive design with dark/light mode toggle
  • Message input area ready for your first prompt
  • Model information displayed in the footer
expand

Figure 6: GenAI chatbot interface showing live metrics panel with input/output tokens, response time, and error rate.

  1. Click on Expand to view the metrics like:
  • Input tokens
  • Output tokens
  • Total Requests
  • Average Response Time
  • Error Rate
metrics

Figure 7: Expanded metrics view with input and output tokens, detailed chat prompt, and response generated by Llama 3.2 model.

Grafana allows you to visualize metrics through customizable dashboards. Click on View Detailed Dashboard to open up Grafana dashboard.

chatbot

Figure 8: Chat interface showing metrics dashboard with prompt and response plus option to view detailed metrics in Grafana.

Log in with the default credentials (enter “admin” as user and password) to explore pre-configured AI performance dashboards (see screenshot below) showing real-time metrics like tokens per second, memory usage, and model performance. 

Select Add your first data source. Choose Prometheus as a data source. Enter “http://prometheus:9090” as Prometheus Server URL. Scroll down to the end of the site and click “Save and test”. By now, you should see “Successfully queried the Prometheus API” as an acknowledgement. Select Dashboard and click Re-import for all these dashboards.

By now, you should have a Prometheus 2.0 Stats dashboard up and running.

grafana

Figure 9: Grafana dashboard with multiple graph panels monitoring GenAI chatbot performance, displaying time-series charts for memory consumption, processing speeds, and application health

Prometheus allows you to collect and store time-series metrics data. Open the Prometheus query interface http://localhost:9091 and start typing “genai” in the query box to explore all available AI metrics (as shown in the screenshot below). You’ll see dozens of automatically collected metrics, including tokens per second, latency measurements, and llama.cpp-specific performance data. 

prometheus

Figure 10: Prometheus web interface showing dropdown of available GenAI metrics including genai_app_active_requests and genai_app_token_latency

Jaeger provides a visual exploration of request flows and performance bottlenecks. You can access it via http://localhost:16686

Implementation details

Let’s explore how the key components of the project work:

  1. Frontend implementation

The React frontend provides a clean, responsive chat interface built with TypeScript and modern React patterns. The core App.tsx component manages two essential pieces of state: dark mode preferences for user experience and model metadata fetched from the backend’s health endpoint. 

When the component mounts, the useEffect hook automatically retrieves information about the currently running AI model. It displays details like the model name directly in the footer to give users transparency about which LLM is powering their conversations.

// Essential App.tsx structure
function App() {
  const [darkMode, setDarkMode] = useState(false);
  const [modelInfo, setModelInfo] = useState<ModelMetadata | null>(null);

  // Fetch model info from backend
  useEffect(() => {
    fetch('http://localhost:8080/health')
      .then(res => res.json())
      .then(data => setModelInfo(data.model_info));
  }, []);

  return (
    <div className="min-h-screen bg-white dark:bg-gray-900">
      <Header toggleDarkMode={() => setDarkMode(!darkMode)} />
      <ChatBox />
      <footer>
        Powered by Docker Model Runner running {modelInfo?.model}
      </footer>
    </div>
  );
}

The main App component orchestrates the overall layout while delegating specific functionality to specialized components like Header for navigation controls and ChatBox for the actual conversation interface. This separation of concerns makes the codebase maintainable while the automatic model info fetching demonstrates how the frontend seamlessly integrates with the Docker Model Runner through the Go backend’s API, creating a unified user experience that abstracts away the complexity of local AI model execution.

  1. Backend implementation: Integration with Model Runner

The core of this application is a Go backend that communicates with Docker Model Runner. Let’s examine the key parts of our main.go file:

client := openai.NewClient(
    option.WithBaseURL(baseURL),
    option.WithAPIKey(apiKey),
)

This demonstrates how we leverage Docker Model Runner’s OpenAI-compatible API. The Model Runner exposes endpoints that match OpenAI’s API structure, allowing us to use standard clients. Depending on your connection method, baseURL is set to either:

  • http://model-runner.docker.internal/engines/llama.cpp/v1/ (for Docker socket)
  • http://host.docker.internal:12434/engines/llama.cpp/v1/ (for TCP)

How metrics flow from host to containers

One key architectural detail worth understanding: llama.cpp runs natively on your host (via Docker Model Runner), while Prometheus and Grafana run in containers. Here’s how they communicate:

The Backend as Metrics Bridge:

  • Connects to llama.cpp via Model Runner API (http://localhost:12434)
  • Collects performance data from each API call (response times, token counts)
  • Calculates metrics like tokens per second and memory usage
  • Exposes all metrics in Prometheus format at http://backend:9090/metrics
  • Enables containerized Prometheus to scrape metrics without host access

This hybrid architecture gives you the performance benefits of native model execution with the convenience of containerized observability.

LLama.cpp metrics integration

The project provides detailed real-time metrics specifically for llama.cpp models:

Metric

Description

Implementation in Code

Tokens per Second

Measure of model generation speed

LlamaCppTokensPerSecond in metrics.go

Context Window Size

Maximum context length in tokens

LlamaCppContextSize in metrics.go

Prompt Evaluation Time

Time spent processing input prompt

LlamaCppPromptEvalTime in metrics.go

Memory per Token

Memory efficiency measurement

LlamaCppMemoryPerToken in metrics.go

Thread Utilization

Number of CPU threads used

LlamaCppThreadsUsed in metrics.go

Batch Size

Token processing batch size

LlamaCppBatchSize in metrics.go

One of the most powerful features is our detailed metrics collection for llama.cpp models. These metrics help optimize model performance and identify bottlenecks in your inference pipeline.

// LlamaCpp metrics
llamacppContextSize = promautoFactory.NewGaugeVec(
    prometheus.GaugeOpts{
        Name: "genai_app_llamacpp_context_size",
        Help: "Context window size in tokens for llama.cpp models",
    },
    []string{"model"},
)

llamacppTokensPerSecond = promautoFactory.NewGaugeVec(
    prometheus.GaugeOpts{
        Name: "genai_app_llamacpp_tokens_per_second",
        Help: "Tokens generated per second",
    },
    []string{"model"},
)

// More metrics definitions...


These metrics are collected, processed, and exposed both for Prometheus scraping and for real-time display in the front end. This gives us unprecedented visibility into how the llama.cpp inference engine is performing.

Chat implementation with streaming

The chat endpoint implements streaming for real-time token generation:


// Set up streaming with a proper SSE format
w.Header().Set("Content-Type", "text/event-stream")
w.Header().Set("Cache-Control", "no-cache")
w.Header().Set("Connection", "keep-alive")

// Stream each chunk as it arrives
if len(chunk.Choices) > 0 && chunk.Choices[0].Delta.Content != "" {
    outputTokens++
    _, err := fmt.Fprintf(w, "%s", chunk.Choices[0].Delta.Content)
    if err != nil {
        log.Printf("Error writing to stream: %v", err)
        return
    }
    w.(http.Flusher).Flush()
}

This streaming implementation ensures that tokens appear in real-time in the user interface, providing a smooth and responsive chat experience. You can also measure key performance metrics like time to first token and tokens per second.

Performance measurement

You can measure various performance aspects of the model:

// Record first token time
if firstTokenTime.IsZero() && len(chunk.Choices) > 0 && 
chunk.Choices[0].Delta.Content != "" {
    firstTokenTime = time.Now()
    
    // For llama.cpp, record prompt evaluation time
    if strings.Contains(strings.ToLower(model), "llama") || 
       strings.Contains(apiBaseURL, "llama.cpp") {
        promptEvalTime := firstTokenTime.Sub(promptEvalStartTime)
        llamacppPromptEvalTime.WithLabelValues(model).Observe(promptEvalTime.Seconds())
    }
}

// Calculate tokens per second for llama.cpp metrics
if strings.Contains(strings.ToLower(model), "llama") || 
   strings.Contains(apiBaseURL, "llama.cpp") {
    totalTime := time.Since(firstTokenTime).Seconds()
    if totalTime > 0 && outputTokens > 0 {
        tokensPerSecond := float64(outputTokens) / totalTime
        llamacppTokensPerSecond.WithLabelValues(model).Set(tokensPerSecond)
    }
}

These measurements help us understand the model’s performance characteristics and optimize the user experience.

Metrics collection

The metrics.go file is a core component of our observability stack for the Docker Model Runner-based chatbot. This file defines a comprehensive set of Prometheus metrics that allow us to monitor both the application performance and the underlying llama.cpp model behavior.

Core metrics architecture

The file establishes a collection of Prometheus metric types:

  • Counters: For tracking cumulative values (like request counts, token counts)
  • Gauges: For tracking values that can increase and decrease (like active requests)
  • Histograms: For measuring distributions of values (like latencies)

Each metric is created using the promauto factory, which automatically registers metrics with Prometheus.

Categories of metrics

The metrics can be divided into three main categories:

1. HTTP and application metrics

// RequestCounter counts total HTTP requests
RequestCounter = promauto.NewCounterVec(
    prometheus.CounterOpts{
        Name: "genai_app_http_requests_total",
        Help: "Total number of HTTP requests",
    },
    []string{"method", "endpoint", "status"},
)

// RequestDuration measures HTTP request durations
RequestDuration = promauto.NewHistogramVec(
    prometheus.HistogramOpts{
        Name:    "genai_app_http_request_duration_seconds",
        Help:    "HTTP request duration in seconds",
        Buckets: prometheus.DefBuckets,
    },
    []string{"method", "endpoint"},
)

These metrics monitor the HTTP server performance, tracking request counts, durations, and error rates. The metrics are labelled with dimensions like method, endpoint, and status to enable detailed analysis.

2. Model performance metrics

// ChatTokensCounter counts tokens in chat requests and responses
ChatTokensCounter = promauto.NewCounterVec(
    prometheus.CounterOpts{
        Name: "genai_app_chat_tokens_total",
        Help: "Total number of tokens processed in chat",
    },
    []string{"direction", "model"},
)

// ModelLatency measures model response time
ModelLatency = promauto.NewHistogramVec(
    prometheus.HistogramOpts{
        Name:    "genai_app_model_latency_seconds",
        Help:    "Model response time in seconds",
        Buckets: []float64{0.1, 0.5, 1, 2, 5, 10, 20, 30, 60},
    },
    []string{"model", "operation"},
)

These metrics track the LLM usage patterns and performance, including token counts (both input and output) and overall latency. The FirstTokenLatency metric is particularly important as it measures the time to get the first token from the model, which is a critical user experience factor.

3. llama.cpp specific metrics

// LlamaCppContextSize measures the context window size
LlamaCppContextSize = promauto.NewGaugeVec(
    prometheus.GaugeOpts{
        Name: "genai_app_llamacpp_context_size",
        Help: "Context window size in tokens for llama.cpp models",
    },
    []string{"model"},
)

// LlamaCppTokensPerSecond measures generation speed
LlamaCppTokensPerSecond = promauto.NewGaugeVec(
    prometheus.GaugeOpts{
        Name: "genai_app_llamacpp_tokens_per_second",
        Help: "Tokens generated per second",
    },
    []string{"model"},
)

These metrics capture detailed performance characteristics specific to the llama.cpp inference engine used by Docker Model Runner. They include:

1. Context Size: 

It represents the token window size used by the model, typically ranging from 2048 to 8192 tokens. The optimization goal is balancing memory usage against conversation quality.  When memory usage becomes problematic, reduce context size to 2048 tokens for faster processing

2. Prompt Evaluation Time

It measures the time spent processing input before generating tokens, essentially your time-to-first-token latency with a target of under 2 seconds. The optimization focus is minimizing user wait time for the initial response. If evaluation time exceeds 3 seconds, reduce context size or implement prompt compression techniques.

3. Tokens Per Second

It measures the time spent processing input before generating tokens, essentially your time-to-first-token latency with a target of under 2 seconds. The optimization focus is minimizing user wait time for the initial response. If evaluation time exceeds 3 seconds, reduce context size or implement prompt compression techniques. 

4. Tokens Per Second

It indicates generation speed, with a target of 8+ TPS for good user experience. This metric requires balancing response speed with model quality. When TPS drops below 5, switch to more aggressive quantization (Q4 instead of Q8) or use a smaller model variant. 

5. Memory Per Token

It tracks RAM consumption per generated token, with optimization aimed at preventing out-of-memory crashes and optimizing resource usage. When memory consumption exceeds 100MB per token, implement aggressive conversation pruning to reduce memory pressure. If memory usage grows over time during extended conversations, add automatic conversation resets after a set number of exchanges. 

6. Threads Used

It monitors the number of CPU cores actively processing model operations, with the goal of maximizing throughput without overwhelming the system. If thread utilization falls below 50% of available cores, increase the thread count for better performance. 

7. Batch Size

It controls how many tokens are processed simultaneously, requiring optimization based on your specific use case balancing latency versus throughput. For real-time chat applications, use smaller batches of 32-64 tokens to minimize latency and provide faster response times.

In nutshell, these metrics are crucial for understanding and optimizing llama.cpp performance characteristics, which directly affect the user experience of the chatbot.

Docker Compose: LLM as a first-class service

With Docker Model Runner integration, Compose makes AI model deployment as simple as any other service. One docker-compose.yml file defines your entire AI application:

  • Your AI models (via Docker Model Runner)
  • Application backend and frontend
  • Observability stack (Prometheus, Grafana, Jaeger)
  • All networking and dependencies

The most innovative aspect is the llm service using Docker’s model provider, which simplifies model deployment by directly integrating with Docker Model Runner without requiring complex configuration. This composition creates a complete, scalable AI application stack with comprehensive observability.

  llm:
    provider:
      type: model
      options:
        model: ${LLM_MODEL_NAME:-ai/llama3.2:1B-Q8_0}


​​This configuration tells Docker Compose to treat an AI model as a standard service in your application stack, just like a database or web server. 

  • The provider syntax is Docker’s new way of handling AI models natively. Instead of building containers or pulling images, Docker automatically manages the entire model-serving infrastructure for you. 
  • The model: ${LLM_MODEL_NAME:-ai/llama3.2:1B-Q8_0} line uses an environment variable with a fallback, meaning it will use whatever model you specify in LLM_MODEL_NAME, or default to Llama 3.2 1B if nothing is set.

Docker Compose: One command to run your entire stack

Why is this revolutionary? Before this, deploying an LLM required dozens of lines of complex configuration – custom Dockerfiles, GPU device mappings, volume mounts for model files, health checks, and intricate startup commands.

Now, those four lines replace all of that complexity. Docker handles downloading the model, configuring the inference engine, setting up GPU access, and exposing the API endpoints automatically. Your other services can connect to the LLM using simple service names, making AI models as easy to use as any other infrastructure component. This transforms AI from a specialized deployment challenge into standard infrastructure-as-code.

Here’s the full compose.yml file that orchestrates the entire application:

 services:
  backend:
    env_file: 'backend.env'
    build:
      context: .
      target: backend
    ports:
      - '8080:8080'
      - '9090:9090'  # Metrics port
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock  # Add Docker socket access
    healthcheck:
      test: ['CMD', 'wget', '-qO-', 'http://localhost:8080/health']
      interval: 3s
      timeout: 3s
      retries: 3
    networks:
      - app-network
    depends_on:
      - llm

  frontend:
    build:
      context: ./frontend
    ports:
      - '3000:3000'
    depends_on:
      backend:
        condition: service_healthy
    networks:
      - app-network

  prometheus:
    image: prom/prometheus:v2.45.0
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'
    ports:
      - '9091:9090'
    networks:
      - app-network

  grafana:
    image: grafana/grafana:10.1.0
    volumes:
      - ./grafana/provisioning:/etc/grafana/provisioning
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_SERVER_DOMAIN=localhost
    ports:
      - '3001:3000'
    depends_on:
      - prometheus
    networks:
      - app-network

  jaeger:
    image: jaegertracing/all-in-one:1.46
    environment:
      - COLLECTOR_ZIPKIN_HOST_PORT=:9411
    ports:
      - '16686:16686'  # UI
      - '4317:4317'    # OTLP gRPC
      - '4318:4318'    # OTLP HTTP
    networks:
      - app-network

  # New LLM service using Docker Compose's model provider
  llm:
    provider:
      type: model
      options:
        model: ${LLM_MODEL_NAME:-ai/llama3.2:1B-Q8_0}

volumes:
  grafana-data:

networks:
  app-network:
    driver: bridge

This compose.yml defines a complete microservices architecture for the application with integrated observability tools and Model Runner support:

backend

  • Go-based API server with Docker socket access for container management
  • Implements health checks and exposes both API (8080) and metrics (9090) ports

frontend

  • React-based user interface for an interactive chat experience
  • Waits for backend health before starting to ensure system reliability

prometheus

  • Time-series metrics database for collecting and storing performance data
  • Configured with custom settings for monitoring application behavior

grafana

  • Data visualization platform for metrics with persistent dashboard storage
  • Pre-configured with admin access and connected to the Prometheus data source

jaeger

  • Distributed tracing system for visualizing request flows across services
  • Supports multiple protocols (gRPC/HTTP) with UI on port 16686

How Docker Model Runner integration works

The project integrates with Docker Model Runner through the following mechanisms:

  1. Connection Configuration:
    • Using internal DNS: http://model-runner.docker.internal/engines/llama.cpp/v1/
    • Using TCP via host-side support: localhost:12434
  2. Docker’s Host Networking:
    • The extra_hosts configuration maps host.docker.internal to the host’s gateway IP
  3. Environment Variables:
    • BASE_URL: URL for the model runner
    • MODEL: Model identifier (e.g., ai/llama3.2:1B-Q8_0)
  4. API Communication:
    • The backend formats messages and sends them to Docker Model Runner
    • It then streams tokens back to the frontend in real-time

Why this approach excels

Building GenAI applications with Docker Model Runner and comprehensive observability offers several advantages:

  • Privacy and Security: All data stays on your local infrastructure
  • Cost Control: No per-token or per-request API charges
  • Performance Insights: Deep visibility into model behavior and efficiency
  • Developer Experience: Familiar Docker-based workflow with powerful monitoring
  • Flexibility: Easy to experiment with different models and configurations

Conclusion

The genai-model-runner-metrics project demonstrates a powerful approach to building AI-powered applications with Docker Model Runner while maintaining visibility into performance characteristics. By combining local model execution with comprehensive metrics, you get the best of both worlds: the privacy and cost benefits of local execution with the observability needed for production applications.

Whether you’re building a customer support bot, a content generation tool, or a specialized AI assistant, this architecture provides the foundation for reliable, observable, and efficient AI applications. The metrics-driven approach ensures you can continuously monitor and optimize your application, leading to better user experiences and more efficient resource utilization.

Ready to get started? Clone the repository, fire up Docker Desktop, and experience the future of AI development — your own local, metrics-driven GenAI application is just a docker compose up away!

Learn more

Settings Management for Docker Desktop now generally available in the Admin Console

4 juin 2025 à 15:39

We’re excited to announce that Settings Management for Docker Desktop is now Generally Available!  Settings Management can be configured in the Admin Console for customers with a Docker Business subscription.  After a successful Early Access period, this powerful administrative solution has been enhanced with new compliance reporting capabilities, completing our vision for centralized Docker Desktop configuration management at scale through the Admin Console.

To add additional context, Docker provides an enterprise-grade integrated solution suite for container development.  This includes administration and management capabilities that support enterprise needs for security, governance, compliance, scale, ease of use, control, insights, and observability.  The new Settings Management capabilities in the Admin Console for managing Docker Desktop instances are the latest enhancement to this area.  This new feature provides organization administrators with a single, unified interface to configure and enforce security policies, and control Docker Desktop settings across all users in their organization.  Overall, Settings Management eliminates the need to manually configure each individual Docker machine and ensures consistent compliance and security standards company-wide.

Enterprise-grade management for Docker Desktop

First introduced in Docker Desktop 4.36 as an Early Access feature, Docker Desktop Settings Management enables administrators to centrally deploy and enforce settings policies directly from the Admin Console. From the Docker Admin Console, administrators can configure Docker Desktop settings according to a security policy and select users to whom the policy applies. When users start Docker Desktop, those settings are automatically applied and enforced.

With the addition of Desktop Settings Reporting in Docker Desktop 4.40, the solution offers end-to-end management capabilities from policy creation to compliance verification.

This comprehensive approach to settings management delivers on our promise to simplify Docker Desktop administration while ensuring organizational compliance across diverse enterprise environments.

Complete settings management lifecycle

Desktop Settings Management now offers multiple administration capabilities:

  • Admin Console policies: Configure and enforce default Docker Desktop settings directly from the cloud-based Admin Console. There’s no need to distribute admin-settings.json files to local machines via MDM.
  • Quick import: Seamlessly migrate existing configurations from admin-settings.json files
  • Export and share: Easily share policies as JSON files with security and compliance teams
  • Targeted testing: Roll out policies to smaller groups before deploying globally
  • Enhanced security: Benefit from improved signing and reporting methods that reduce the risk of tampering with settings
  • Settings compliance reporting: Track and verify policy application across all developers in your engineering organization

Figure 1: Admin Console Settings Management

Admin Console Settings Management

New: Desktop Settings Reporting

The newly added settings reporting dashboard in the Admin Console provides administrators with crucial visibility into the compliance status of all users:

  • Real-time settings compliance tracking: Easily monitor which users are compliant with their assigned settings policies.
  • Streamlined troubleshooting: Detailed status information helps administrators diagnose and resolve non-compliance issues.

The settings reporting dashboard is accessible via Admin Console > Docker Desktop > Reporting, offering options to:

  • Search by username or email address
  • Filter by assigned policies
  • Toggle visibility of compliant users to focus on potential issues
  • View detailed compliance information for specific users
  • Download comprehensive compliance data as a CSV file

For non-compliant users, the settings reporting dashboard provides targeted resolution steps to help administrators quickly address issues and ensure organizational compliance.

Figure 2: Admin Console Settings Reporting

Docker Admin Console Settings Reporting

Figure 3: Locked settings in Docker Desktop

Docker Desktop settings locked

Enhanced security through centralized management

Desktop Settings Management is particularly valuable for engineering organizations with strict security and compliance requirements. This GA release enables administrators to:

  • Enforce consistent configuration across all Docker Desktop instances, without having to go through complicated and error prone MDM based deployments
  • Verify policy application and quickly remediate non-compliant systems
  • Reduce the risk of tampering with local settings
  • Generate compliance reports for security audits

Getting started

To take advantage of Desktop Settings Management:

  1. Ensure your Docker Desktop users are signed in on version 4.40 or later
  2. Log in to the Docker Admin Console
  3. Navigate to Docker Desktop > Settings Management to create policies
  4. Navigate to Docker Desktop > Reporting to monitor compliance

For more detailed information, visit our documentation on Settings Management.

What’s next?

Included with Docker Business, the GA release of Settings Management for Docker Desktop represents a significant milestone in our commitment to delivering enterprise-grade management, governance, and administration tools. We’ll continue to enhance these capabilities based on customer feedback, enterprise needs, and evolving security requirements.

We encourage you to explore Settings Management and let us know how it’s helping you manage Docker Desktop instances more efficiently across your development teams and engineering organization.

We’re thrilled to meet the management and administration needs of our customers with these exciting enhancements and we want you to stay connected with us as we build even more administration and management capabilities for development teams and engineering organizations.

Learn more

Thank you!

logo docker blue horz

Introducing Docker Hardened Images: Secure, Minimal, and Ready for Production

19 mai 2025 à 13:00

From the start, Docker has focused on enabling developers to build, share, and run software efficiently and securely. Today, Docker Hub powers software delivery at a global scale, with over 14 million images and more than 11 billion pulls each month. That scale gives us a unique vantage point into how modern software is built and the challenges teams face in securing it.

That’s why we’ve made security a cornerstone of our platform. From trusted Docker Official Images to SBOM support for transparency, the launch of Docker Scout for real-time vulnerability insights, and a hardened Docker Desktop to secure local development, every investment reflects our commitment to making software supply chain security more accessible, actionable, and developer-first.

Now, we’re taking that commitment even further.

We’re excited to introduce Docker Hardened Images (DHI) — secure-by-default container images purpose-built for modern production environments.

These images go far beyond being just slim or minimal. Docker Hardened Images start with a dramatically reduced attack surface, up to 95% smaller, to limit exposure from the outset. Each image is curated and maintained by Docker, kept continuously up to date to ensure near-zero known CVEs. They support widely adopted distros like Alpine and Debian, so teams can integrate them without retooling or compromising compatibility.

Plus, they’re designed to work seamlessly with the tools you already depend on. We’ve partnered with a range of leading security and DevOps platforms, including Microsoft, NGINX, Sonatype, GitLab, Wiz, Grype, Neo4j, JFrog, Sysdig and Cloudsmith, to ensure seamless integration with scanning tools, registries, and CI/CD pipelines.

What we’re hearing from customers

We talk to teams every day, from fast-moving startups to global enterprises, and the same themes keep coming up.

Integrity is a growing concern: “How do we know every component in our software is exactly what it claims to be—and hasn’t been tampered with?” With so many dependencies, it’s getting harder to answer that with confidence.

Then there’s the attack surface problem. Most teams start with general-purpose base images like Ubuntu or Alpine. But over time, these containers get bloated with unnecessary packages and outdated software, creating more ways in for attackers.

And of course, operational overhead is through the roof. Security teams are flooded with CVEs. Developers are stuck in a loop of patching and re-patching, instead of shipping new features. We’re hearing about vulnerability scanners lighting up constantly, platform teams stretched thin by centralized dependencies, and developers resorting to manual upgrades just to stay afloat. These challenges aren’t isolated — they’re systemic. And they’re exactly what we designed Docker Hardened Images to address.

Inside Docker Hardened Images

Docker Hardened Images aren’t just trimmed-down versions of existing containers — they’re built from the ground up with security, efficiency, and real-world usability in mind. They’re designed to meet teams where they are. Here’s how they deliver value across three essential areas:

Seamless Migration

First, they integrate seamlessly into existing workflows. Unlike other minimal or “secure” images that force teams to change base OSes, rewrite Dockerfiles, or abandon tooling, DHI supports the distributions developers already use, including familiar Debian and Alpine variants. In fact, upgrading to a DHI can be simple. Switching to a hardened image is as simple as updating one line in your Dockerfile:

dhi node updated

Flexible customization

Second, they strike the right balance between security and flexibility. Security shouldn’t mean sacrificing usability. DHI supports the customizations teams rely on, including certificates, packages, scripts, and configuration files, without compromising the hardened foundation. You get the security posture you need with the flexibility to tailor images to your environment.

flexible DHI updated

Under the hood, Docker Hardened Images follow a distroless philosophy, stripping away unnecessary components like shells, package managers, and debugging tools that commonly introduce risk. While these extras might be helpful during development, they significantly expand the attack surface in production, slow down startup times, and complicate security management.

By including only the essential runtime dependencies needed to run your application, DHI delivers leaner, faster containers that are easier to secure and maintain. This focused, minimal design leads to up to a 95% reduction in attack surface, giving teams a dramatically stronger security posture right out of the box.

Automated Patching & Rapid CVE Response

Finally, patching and updates are continuous and automated. Docker monitors upstream sources, OS packages, and CVEs across all dependencies. When updates are released, DHI images are rebuilt, subjected to extensive testing, and published with fresh attestations—ensuring integrity and compliance within our SLSA Build Level 3–compliant build system. The result: you’re always running the most secure, verified version—no manual intervention required.

Most importantly, when essential components are built directly from source, allowing us to deliver critical patches faster and remediate vulnerabilities promptly. We patch Critical and High-severity CVEs within 7 days — faster than typical industry response times —and back it all with an enterprise-grade SLA for added peace of mind.

Internal Adoption: Validating Docker Hardened Images in Production Environments

We’ve been using DHI internally across several key projects — putting them to the test in real-world, production environments. One standout example is our internal use of a hardened Node image. 

By replacing the standard Node base image with a Docker Hardened Image, we saw immediate and measurable results: vulnerabilities dropped to zero, and the package count was reduced by over 98%. 

That reduction in packages isn’t just a matter of image size, it directly translates to a smaller attack surface, fewer moving parts to manage, and significantly less overhead for our security and platform teams. This shift gave us a stronger security posture and simplified operational complexity — exactly the kind of outcome we designed DHI to deliver.

Ready to get started?

Docker Hardened Images are designed to help you ship software with confidence by dramatically reducing your attack surface, automating patching, and integrating seamlessly into your existing workflows. Developers stay focused on building. Security teams get the assurance they need.

Looking to reduce your vulnerability count?

We’re here to help. Get in touch with us and let’s harden your software supply chain, together.

💾

--

Docker at Microsoft Build 2025: Where Secure Software Meets Intelligent Innovation

15 mai 2025 à 22:57

This year at Microsoft Build, Docker will blend developer experience, security, and AI innovation with our latest product announcements. Whether you attend in person at the Seattle Convention Center or tune in online, you’ll see how Docker is redefining the way teams build, secure, and scale modern applications.

Docker’s Vision for Developers

At Microsoft Build 2025, Docker’s EVP of Product and Engineering, Tushar Jain, will present the company’s vision for AI-native software delivery, prioritizing simplicity, security, and developer flow. His session will explore how Docker is helping teams adopt AI without complexity and scale confidently from local development to production using the workflows they already trust.

This vision starts with security. Today’s developers are expected to manage a growing number of vulnerabilities, stay compliant with evolving standards, and still ship software on time. Docker helps teams simplify container security by integrating with tools like Microsoft Defender, Azure Container Registry, and AKS. This makes it easier to build secure, production-ready applications without overhauling existing workflows.

This session explores how Docker is streamlining agentic AI development by bringing models and MCP tools together in one familiar environment. Learn how to build agentic AI with your existing workflows and commands. Explore curated AI tools on Docker Hub to get inspired and jumpstart your projects. No steep learning curve is required! With built-in security, access control, and secret management, Docker handles the heavy lifting so you can focus on building smarter, more capable agents.

Don’t miss our follow-up demo session with Principal Engineer Jim Clark. He’ll show how to build an agentic app that uses Docker’s latest AI tools and familiar workflows.

Visit Docker at Booth #400 to see us in action

Throughout the conference, Docker will be live at Booth #400. Drop by for demos, expert walkthroughs, and to try out Docker Hardened Images, Model Runner, and MCP Catalog and Toolkit. Our product, engineering, and DevRel teams will be on-site to answer questions and help you get hands-on.

Party with your fellow Developers at MOPOP

We’re hosting an evening event at one of Seattle’s most iconic pop culture venues to celebrate the launch of our latest tools.

Docker MCP @ MOPOP
Date: Monday, May 19
Time: 7:00–10:00 PM
Location: Museum of Pop Culture, Seattle

Enjoy live demos, food and drinks, access to Docker engineers and leaders, and private after-hours access to the museum. Space is limited. RSVP now to reserve your spot!

Securing Model Context Protocol: Safer Agentic AI with Containers

6 mai 2025 à 18:38

Model Context Protocol (MCP) tools remain primarily in the hands of early adopters, but broader adoption is accelerating. Alongside this growth, MCP security concerns are becoming more urgent. By increasing agent autonomy, MCP tools introduce new risks related to misalignment between agent behavior and user expectations and uncontrolled execution. These systems also present a novel attack surface, creating new software supply chain threats. As a result, MCP adoption raises critical questions about trust, isolation, and runtime control before these systems are integrated into production environments.

Where MCP tools fall short on security

Most of us first experimented with MCP tools by configuring files like the one shown below. This workflow is fast, flexible, and productive, ideal for early experimentation. But it also comes with trade-offs. MCP servers are pulled directly from the internet, executed on the host machine, and configured with sensitive credentials passed as plaintext environment variables. It has been like setting off fireworks in your living room: it’s thrilling, but it’s not very safe.

{
  "mcpServers": {
    "command": "npx",
    "args": [
      "-y",
      "@org/mcp-server",
      "--user", "me"
    ],
    "env": {
      "SECRET_API_KEY": "YOUR_API_KEY_HERE"
    }
  }
}

As MCP tools move closer to production use, they force us to confront a set of foundational questions:

Can we trust the MCP server?

Can we guarantee the right software is installed on the host? Without that baseline, reproducibility and reliability fall apart. How do we verify the provenance and integrity of the MCP server itself? If we can’t trace where it came from or confirm what it contains, we can’t trust it to run safely. Even if it runs, how do we know it hasn’t been tampered with — either before it reached us or while it’s executing?

Are we managing secrets and access securely?

Secret management also becomes a pressing concern. Environment variables are convenient, but they’re not secure. We need ways to safely inject sensitive data into only the runtimes permitted to read it and nowhere else. The same goes for access control. As teams scale up their use of MCP tools, it becomes essential to define which agents are allowed to talk to which servers and ensure those rules are enforced at runtime.

blog MCP security Reddit

Figure 1: Discussions on not storing secrets in.env on Reddit. Credit: amirshk

How do we detect threats early? 

And then there’s the question of detection. Are we equipped to recognize the kinds of threats that are emerging around MCP tools? From prompt injection to malicious server responses, new attack vectors are already appearing. Without purpose-built tooling and clear security standards, we risk walking into these threats blind. Some recent threat patterns include:

  • MCP Rug Pull – A malicious MCP server can perform a “rug pull” by altering a tool’s description after it’s been approved by the user.
  • MCP Shadowing – A malicious server injects a tool description that alters the agent’s behavior toward a trusted service or tool. 
  • Tool Poisoning – Malicious instructions in MCP tool descriptions, hidden from users but readable by AI models.

What’s clear is that the practices that worked for early-stage experimentation won’t scale safely. As adoption grows, the need for secure, standardized mechanisms to package, verify, and run MCP servers becomes critical. Without them, the very autonomy that makes MCP tools powerful could also make them dangerous.

Why Containers for MCP servers

Developers quickly realized that the same container technology used to deliver cloud-native applications is also a natural fit for safely powering agentic systems. Containers aren’t just about packaging, they give us a controlled runtime environment where we can add guardrails and build a safer path toward adopting MCP servers.

Making MCP servers portable and secure 

Most of us are familiar with how containers are used to move software around, providing runtime consistency and easy distribution. Containers also provide a strong layer of isolation between workloads, helping prevent one application from interfering with another or with the host system. This isolation limits the blast radius of a compromise and makes it easier to enforce least-privilege access. In addition, containers can provide us with verification of both provenance and integrity. This continues to be one of the important lessons from software supply chain security. Together, these properties help mitigate the risks of running untrusted MCP servers directly on the host.

As a first step, we can use what we already know about cloud native delivery and simply distribute the MCP servers in a container. 

{
  "mcpServers": {
    "mcpserver": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "org/mcpserver:latest",
        "--user", "me"
      ],
      "env": {
        "SECRET_API_KEY": "YOUR_API_KEY_HERE"
      }
    }
  }
}

But containerizing the server is only half the story. Developers still would need to specify arguments for the MCP server runtime and secrets. If those arguments are misconfigured, or worse, intentionally altered, they could expose sensitive data or make the server unsafe to run. 

In the next section, we’ll cover key design considerations, guardrails, and best practices for mitigating these risks.

Designing secure containerized architectures for MCP servers and clients

Containers provide a solid foundation for securely running MCP servers, but they’re just the beginning. It’s important to consider additional guardrails and designs, such as how to handle secrets, defend against threats, and manage tool selection and authorization as the number of MCP servers and clients increases. 

Secure secrets handling

When these servers require runtime configuration secrets, container-based solutions must provide a secure interface for users to supply that data. Sensitive information like credentials, API keys, or OAuth access tokens should then be injected into only the authorized container runtimes. As with cloud-native deployments, secrets remain isolated and scoped to the workloads that need them, reducing the risk of accidental exposure or misuse.

Defenses against new MCP threats

Many of the emerging threats in the MCP ecosystem involve malicious servers attempting to trick agents and MCP servers into taking actions that conflict with the user’s intent. These attacks often begin with poisoned data flowing from the server to the client.

To mitigate this, it’s recommended to route all MCP client traffic through a single connection endpoint, a MCP Gateway, or a proxy built on top of containers. Think of MCP servers like passengers at an airport: by establishing one centralized security checkpoint (the Gateway), you ensure that everyone is screened before boarding the plane (the MCP client). This Gateway becomes the critical interface where threats like MCP Rug Pull Attacks, MCP Shadowing, and Tool Poisoning can be detected early and stopped. Mitigations include:

  • MCP Rug Pull: Prevents a server from changing its tool description after user consent. Clients must re-authorize if a new version is introduced.
  • MCP Shadowing: Detects agent sessions with access to sets of tools with semantically close descriptions, or outright conflicts.
  • Tool Poisoning: Uses heuristics or signature-based scanning to detect suspicious patterns in tool metadata, such as manipulative prompts or misleading capabilities, that are common in poisoning attacks.

Managing MCP server selection and authorization

As agentic systems evolve, it’s important to distinguish between two separate decisions: which MCP servers are trusted across an environment, and which are actually needed by a specific agent. The first defines a trusted perimeter, determining which servers can be used. The second is about intent and scope — deciding which servers should be used by a given client.

With the number of available MCP servers expected to grow rapidly, most agents will only require a small, curated subset. Managing this calls for clear policies around trust, selective exposure, and strict runtime controls. Ideally, these decisions should be enforced through platforms that already support container-based distribution, with built-in capabilities for storing, managing, and securely sharing workloads, along with the necessary guardrails to limit unintended access.

MCP security best practices

As the MCP spec evolves, we are already seeing helpful additions such as tool-level annotations like readOnlyHint and destructiveHint.  A readOnlyHint can direct the runtime to mount file systems in read-only mode, minimizing the risk of unintentional changes. Networking hints can isolate an MCP from the internet entirely or restrict outbound connections to a limited set of routes. Declaring these annotations in your tool’s metadata is strongly recommended. They can be enforced at container runtime and help drive adoption — users are more likely to trust and run tools with clearly defined boundaries.

We’re starting by focusing on developer productivity. But making these guardrails easy to adopt and test means they won’t get in the way, and that’s a critical step toward building safer, more resilient agentic systems by default.

How Docker helps  

Containers offer a natural way to package and isolate MCP tools, making them easier and safer to run. Docker extends this further with its latest MCP Catalog and Toolkit, streamlining how trusted tools are discovered, shared, and executed.

While many developers know that Docker provides an API for containerized workloads, the Docker MCP Toolkit builds on that by enabling MCP clients to securely connect to any trusted server listed in your MCP Catalog. This creates a controlled interface between agents and tools, with the familiar benefits of container-based delivery: portability, consistency, and isolation.

blog MCP security container

Figure 2: Docker MCP Catalog and Toolkit securely connects MCP servers to clients by running them in containers

The MCP Catalog, a part of Docker Hub, helps manage the growing ecosystem of tools by letting you identify trusted MCP servers while still giving you the flexibility to configure your MCP clients. Developers can not only decide which servers to make available to any agent, but also scope specific servers to their agents. The MCP Toolkit simplifies this further by exposing any set of trusted MCP servers through a single, unified connection, the MCP Gateway. 

Developers stay in control, defining how secrets are stored and which MCP servers are authorized to access them. Each server is referenced by a URL that points to a fully configured, ready-to-run Docker container. Since the runtime handles both content and configuration, agents interact only with MCP runtimes that are reproducible, verifiable, and self-contained.   These runtimes are tamper-resistant, isolated, and constrained to access only the resources explicitly granted by the user. Since all MCP messages pass through one gateway, the MCP Toolkit offers a single enforcement point for detecting threats before they become visible to the MCP client. 

Going back to the earlier example, our configuration is now a single connection to the Catalog with an allowed set of configured MCP server containers. MCP client sees a managed view of configured MCP servers over STDIO. The result: MCP clients have a safe connection to the MCP ecosystem!

{
  "mcpServers": {
    "mcpserver": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "alpine/socat", "STDIO", "TCP:host.docker.internal:8811"
      ],
    }
  }
}

Summary

We’re at a pivotal point in the evolution of MCP tool adoption. The ecosystem is expanding rapidly, and while it remains developer-led, more users are exploring ways to safely extend their agentic systems. Containers are proving to be the ideal delivery model for MCP tools — providing isolation, reproducibility, and security with minimal friction.

Docker’s MCP Catalog and Toolkit build on this foundation, offering a lightweight way to share and run trusted MCP servers. By packaging tools as containers, we can introduce guardrails without disrupting how users already consume MCP from their existing clients. The Catalog is compatible with any MCP client today, making it easy to get started without vendor lock-in.

Our goal is to support this fast-moving space by making MCP adoption as safe and seamless as possible, without getting in the way of innovation. We’re excited to keep working with the community to make MCP adoption not just easy and productive, but secure by default.

Learn more

Introducing Docker MCP Catalog and Toolkit: The Simple and Secure Way to Power AI Agents with MCP

5 mai 2025 à 15:57

Model Context Protocols (MCPs) are quickly becoming the standard for connecting AI agents to external tools, but the developer experience hasn’t caught up. Discovery is fragmented, setup is clunky, and security is too often bolted on last. Fixing this experience isn’t a solo mission—it will take an industry-wide effort. A secure, scalable, and trusted MCP ecosystem demands collaboration across platforms and vendors.

That’s why we’re excited to announce Docker MCP Catalog and Toolkit are now available in Beta. The Docker MCP Catalog, now a part of Docker Hub, is your starting point for discovery, surfacing a curated set of popular, containerized MCP servers to jumpstart agentic AI development. But discovery alone isn’t enough. That’s where the MCP Toolkit comes in. It simplifies installation, manages credentials, enforces access control, and secures the runtime environment. Together, Docker MCP Catalog and MCP Toolkit give developers and teams a complete foundation for working with MCP tools, making them easier to find, safer to use, and ready to scale across projects and teams.

We’re partnering with some of the most trusted names in cloud, developer tooling, and AI, including Stripe, Elastic, Heroku, Pulumi, Grafana Labs, Kong Inc., Neo4j, New Relic, Continue.dev, and many more, to shape a secure ecosystem for MCP tools. With a one-click connection right from Docker Desktop to leading MCP clients like Gordon (Docker AI Agent), Claude, Cursor, VSCode, Windsurf, continue.dev, and Goose, building powerful, intelligent AI agents has never been easier.

This aligns perfectly with our mission. Docker pioneered the container revolution, transforming how developers build and deploy software. Today, over 20 million registered developers rely on Docker to build, share, and run modern applications. Now, we’re bringing that same trusted experience to the next frontier: Agentic AI with MCP tools.

Model Context Protocol is gaining momentum — what improvements are still needed?

As MCPs become the backbone of agentic AI systems, the developer experience still faces key challenges. Here are some of the major hurdles:

Discovering the right, official, and/or trustworthy tools is hard

Finding MCP servers is fragmented. Developers search across registries, community-curated lists, and blog posts—yet it’s still hard to know which ones are official and trustworthy.

Complex installations and distribution

Getting started with MCP tools remains complex. Developers often have to clone repositories, wrangle conflicting dependencies in environments like Node.js or Python, and self-host local services—many of which aren’t containerized, making setup and portability even harder. On top of that, connecting MCP clients adds more friction, with each one requiring custom configuration that slows down onboarding and adoption.

Auth and permissions fall short

Many MCP tools run with full access to the host, launched via npx or uvx, with no isolation or sandboxing. Credentials are commonly passed as plaintext environment variables, exposing sensitive data and increasing the risk of leaks. Moreover, these tools often aren’t designed for scale and security. They’re missing enterprise-ready features like policy enforcement, audit logs, and standardized security. 

How Docker can help solve these challenges

The Docker MCP Catalog and Toolkit are designed to address the above pain points by securely streamlining the discovery, installation, and authentication of MCP servers — making it easy to connect with your favorite MCP clients. 

Discover and run MCP servers easily in secure, isolated containers

The MCP Catalog makes it easy to discover and access 100+ MCP servers — including Stripe, Elastic, Neo4j, and many more — all available on Docker Hub. With the MCP Toolkit Docker Desktop extension, you can quickly and securely run and interact with these servers. By packaging MCP servers as containers, developers can sidestep common challenges such as runtime setup, dependency conflicts, and environment inconsistencies — just run the container, and it works. 

blog MCP Hub

Figure 1: Discover curated and popular MCP servers in Docker MCP Catalog, part of the Docker Hub

We’re not just simplifying discovery and installation — we’re placing security at the heart of the MCP experience. Because MCPs run inside Docker container images, they inherit the same built-in security features developers already trust and a rich ecosystem of tools for securing software throughout the supply chain. And we’re going further. The Docker MCP Toolkit addresses emerging threats unique to MCP servers like Tool Poisoning and Tool Rug Pulls, by leveraging Docker’s strong position as both a provider of secure content and secure runtimes.

blog MCP Servers 1

Figure 2: The MCP Toolkit Docker Desktop Extension allows you to easily and securely run MCP servers in containers.

Go to the extensions menu of Docker Desktop to get started with Docker MCP Catalog and Toolkit, or use this for installation. Check out our doc for more information.

One-Click MCP Client Integration with Built-In Secure Authentication

While a curated list of MCPs and simplified security is a great starting point, it’s just the beginning. You can connect popular MCP servers from the Docker MCP Catalog to any MCP client. For clients like Gordon (Docker AI Agent), Claude, Cursor, VSCode, Windsurf, continue.dev, and Goose, one-click setup will make integration seamless. 

The Docker MCP Toolkit includes built-in OAuth support and secure credential storage, enabling clients to authenticate with MCP servers and third-party services without hardcoding secrets into environment variables. This ensures your MCP tools run securely and reliably right from the start.

blog MCP Clients

Figure 3: Easily connect to your favorite MCP clients like Gordon, Claude, Cursor, and continue.dev with one click.

Enterprise-Ready MCP Tooling: Build, manage, and share in Docker Hub

Soon, you’ll be able to build and share your own MCPs on Docker Hub—home to over 14 million images, millions of active users, and a robust ecosystem of trusted content. Teams count on Docker Hub for verified images, deep image analysis, lifecycle management, and enterprise-grade tooling. Those same trusted capabilities will soon extend to MCPs, giving teams access to the latest tools and a secure, reliable way to distribute their own. And just like container images, MCPs will integrate with enterprise features like Registry Access Management and Image Access Management, ensuring secure, streamlined developer workflows from end to end. 

Wrapping up

Docker MCP Catalog and Toolkit bring much-needed structure, security, and simplicity to the fast-growing world of MCP tools. By standardizing how MCP servers are discovered, installed, and secured, we’re removing friction for developers building smarter, more capable AI-powered applications and agents.

Whether you’re connecting to external tools, customizing workflows, or scaling automation inside your IDE, Docker makes the entire process easy and secure. And this is just the beginning. With ongoing investments in expanding the MCP ecosystem and streamlining how tools are managed, we’re committed to making powerful AI tooling accessible to every team.

With Docker Catalog and Toolkit, your AI agent isn’t limited by what’s built in — it’s empowered by everything you can plug in. 

Go to the extensions menu of Docker Desktop to get started with Docker MCP Catalog and Toolkit, or use this for installation. See it in action during our upcoming webinar. Interested in hosting your MCP servers on Docker? Let’s connect.

Learn more

💾

Coming soon! We’re introducing the Docker MCP Catalog and ToolKit to streamline how developers discover, install, authenticate, and connect MCP servers to th...

Simplifying Enterprise Management with Docker Desktop on the Microsoft Store

1 mai 2025 à 23:13

We’re excited to announce that Docker Desktop is now available on the Microsoft Store! This new distribution channel enhances both the installation and update experience for individual developers while significantly simplifying management for enterprise IT teams.

This milestone reinforces our commitment to Windows, our most widely used platform among Docker Desktop users. By partnering with the Microsoft Store, we’re ensuring seamless compatibility with enterprise management tools while delivering a more consistent experience to our shared customers.

blog WIndows store resized

[Figure 1]: MS Store listing: https://apps.microsoft.com/detail/xp8cbj40xlbwkx?hl=en-GB&gl=GB

Seamless deployment and control for enterprises

For developers:

  • Automatic Updates: The Microsoft Store handles all update processes automatically, ensuring you’re always running the latest version without manual intervention.
  • Streamlined Installation: Experience a more reliable setup process with fewer startup errors..
  • Unified Management: Manage Docker Desktop alongside your other applications in one familiar interface.

For IT administrators:

  • Native Intune MDM Integration: Deploy Docker Desktop across your organization using Microsoft’s enterprise management tools — Learn how to add Microsoft Store apps via Intune.
  • Centralized Control: Easily roll out Docker Desktop through the Microsoft Store’s enterprise distribution channels.
  • Security-Compatible Updates: Updates are handled automatically by the Microsoft Store infrastructure, even in organizations where users don’t have direct store access.
  • Updates Without Direct Store Access: The native integration with Intune allows automatic updates to function even when users don’t have Microsoft Store access — a significant advantage for security-conscious organizations with restricted environments.
  • Familiar Workflow: The update mechanism works similarly to winget commands (winget install –id=XP8CBJ40XLBWKX –source=msstore), providing consistency with other enterprise software management.

Why it matters for businesses and developers 

With 99% of enterprise users not running the latest version of Docker Desktop, the Microsoft Store’s automatic update capabilities directly address compliance and security concerns while minimizing downtime. IT administrators can now:

  • Increase Productivity: Developers can focus on innovation instead of managing installations.
  • Improve Operational Efficiency: Better control over Docker Desktop deployments reduces IT bottlenecks.
  • Enhance Compliance: Automatic updates and secure installations support enterprise security protocols.

Conclusion

Docker Desktop’s availability on the Microsoft Store represents a significant step forward in simplifying how organizations deploy and maintain development environments. By focusing on seamless updates, reliability, and enterprise-grade management, Docker and Microsoft are empowering teams to innovate with greater confidence.

Ready to try it out? Download Docker Desktop from the Microsoft Store today!

Learn more

Update on the Docker DX extension for VS Code

30 avril 2025 à 20:52

It’s now been a couple of weeks since we released the new Docker DX extension for Visual Studio Code. This launch reflects a deeper collaboration between Docker and Microsoft to better support developers building containerized applications.

Over the past few weeks, you may have noticed some changes to your Docker extension in VS Code. We want to take a moment to explain what’s happening—and where we’re headed next.

What’s Changing?

The original Docker extension in VS Code is being migrated to the new Container Tools extension, maintained by Microsoft. It’s designed to make it easier to build, manage, and deploy containers—streamlining the container development experience directly inside VS Code.

As part of this partnership, it was decided to bundle the new Docker DX extension with the existing Docker extension, so that it would install automatically to make the process seamless.

While the automatic installation was intended to simplify the experience, we realize it may have caught some users off guard. To provide more clarity and choice, the next release will make Docker DX Extension an opt-in installation, giving you full control over when and how you want to use it. 

What’s New from Docker?

Docker is introducing the new Docker DX extension, focused on delivering a best-in-class authoring experience for Dockerfiles, Compose files, and Bake files

Key features include:

  • Dockerfile linting: Get build warnings and best-practice suggestions directly from BuildKit and Buildx—so you can catch issues early, right inside your editor.
  • Image vulnerability remediation (experimental): Automatically flag references to container images with known vulnerabilities, directly in your Dockerfiles.
  • Bake file support: Enjoy code completion, variable navigation, and inline suggestions when authoring Bake files—including the ability to generate targets based on your Dockerfile stages.
  • Compose file outline: Easily navigate and understand complex Compose files with a new outline view in the editor.

Better Together

These two extensions are designed to work side-by-side, giving you the best of both worlds:

  • Powerful tooling to build, manage, and deploy your containers
  • Smart, contextual authoring support for Dockerfiles, Compose files, and Bake files

And the best part? Both extensions are free and fully open source.

Thank You for Your Patience

We know changes like this can be disruptive. While our goal was to make the transition as seamless as possible, we recognize that the approach caused some confusion, and we sincerely apologize for the lack of early communication.

The teams at Docker and Microsoft are committed to delivering the best container development experience possible—and this is just the beginning.

Where Docker DX is Going Next

At Docker, we’re proud of the contributions we’ve made to the container ecosystem, including Dockerfiles, Compose, and Bake.

We’re committed to ensuring the best possible experience when editing these files in your IDE, with instant feedback while you work.

Here’s a glimpse of what’s coming:

  • Expanded Dockerfile checks: More best-practice validations, actionable tips, and guidance—surfaced right when you need them.
  • Stronger security insights: Deeper visibility into vulnerabilities across your Dockerfiles, Compose files, and Bake configurations.
  • Improved debugging and troubleshooting: Soon, you’ll be able to live debug Docker builds—step through your Dockerfile line-by-line, inspect the filesystem at each stage, see what’s cached, and troubleshoot issues faster.

We Want Your Feedback!

Your feedback is critical in helping us improve the Docker DX extension and your overall container development experience.

If you encounter any issues or have ideas for enhancements you’d like to see, please let us know:

We’re listening and excited to keep making things better for you! 

Docker Desktop 4.41: Docker Model Runner supports Windows, Compose, and Testcontainers integrations, Docker Desktop on the Microsoft Store

Par : Yiwen Xu
29 avril 2025 à 20:20

Big things are happening in Docker Desktop 4.41! Whether you’re building the next AI breakthrough or managing development environments at scale, this release is packed with tools to help you move faster and collaborate smarter. From bringing Docker Model Runner to Windows (with NVIDIA GPU acceleration!), Compose and Testcontainers, to new ways to manage models in Docker Desktop, we’re making AI development more accessible than ever. Plus, we’ve got fresh updates for your favorite workflows — like a new Docker DX Extension for Visual Studio Code, a speed boost for Mac users, and even a new location for Docker Desktop on the Microsoft Store. Also, we’re enabling ACH transfer as a payment option for self-serve customers. Let’s dive into what’s new!

1920x1080 4.41 docker desktop release

Docker Model Runner now supports Windows, Compose & Testcontainers

This release brings Docker Model Runner to Windows users with NVIDIA GPU support. We’ve also introduced improvements that make it easier to manage, push, and share models on Docker Hub and integrate with familiar tools like Docker Compose and Testcontainers. Docker Model Runner works with Docker Compose projects for orchestrating model pulls and injecting model runner services, and Testcontainers via its libraries. These updates continue our focus on helping developers build AI applications faster using existing tools and workflows. 

In addition to CLI support for managing models, Docker Desktop now includes a dedicated “Models” section in the GUI. This gives developers more flexibility to browse, run, and manage models visually, right alongside their containers, volumes, and images.

blog DMS Models

Figure 1: Easily browse, run, and manage models from Docker Desktop

Further extending the developer experience, you can now push models directly to Docker Hub, just like you would with container images. This creates a consistent, unified workflow for storing, sharing, and collaborating on models across teams. With models treated as first-class artifacts, developers can version, distribute, and deploy them using the same trusted Docker tooling they already use for containers — no extra infrastructure or custom registries required.

docker model push <model>

The Docker Compose integration makes it easy to define, configure, and run AI applications alongside traditional microservices within a single Compose file. This removes the need for separate tools or custom configurations, so teams can treat models like any other service in their dev environment.

blog New Help

Figure 2: Using Docker Compose to declare services, including running AI models

Similarly, the Testcontainers integration extends testing to AI models, with initial support for Java and Go and more languages on the way. This allows developers to run applications and create automated tests using AI services powered by Docker Model Runner. By enabling full end-to-end testing with Large Language Models, teams can confidently validate application logic, their integration code, and drive high-quality releases.

String modelName = "ai/gemma3";
DockerModelRunnerContainer modelRunnerContainer = new DockerModelRunnerContainer()
       .withModel(modelName);
modelRunnerContainer.start();


OpenAiChatModel model = OpenAiChatModel.builder()
       .baseUrl(modelRunnerContainer.getOpenAIEndpoint())
       .modelName(modelName)
       .logRequests(true)
       .logResponses(true)
       .build();


String answer = model.chat("Give me a fact about Whales.");
System.out.println(answer);

Docker DX Extension in Visual Studio: Catch issues early, code with confidence 

The Docker DX Extension is now live on the Visual Studio Marketplace. This extension streamlines your container development workflow with rich editing, linting features, and built-in vulnerability scanning. You’ll get inline warnings and best-practice recommendations for your Dockerfiles, powered by Build Check — a feature we introduced last year. 

It also flags known vulnerabilities in container image references, helping you catch issues early in the dev cycle. For Bake files, it offers completion, variable navigation, and inline suggestions based on your Dockerfile stages. And for those managing complex Docker Compose setups, an outline view makes it easier to navigate and understand services at a glance.

blog Docker DX

Figure 3: Docker DX Extension in Visual Studio provides actionable recommendations for fixing vulnerabilities and optimizing Dockerfiles

Read more about this in our announcement blog and GitHub repo. Get started today by installing Docker DX – Visual Studio Marketplace 

MacOS QEMU virtualization option deprecation

The QEMU virtualization option in Docker Desktop for Mac will be deprecated on July 14, 2025

With the new Apple Virtualization Framework, you’ll experience improved performance, stability, and compatibility with macOS updates as well as tighter integration with Apple Silicon architecture. 

What this means for you:

  • If you’re using QEMU as your virtualization backend on macOS, you’ll need to switch to either Apple Virtualization Framework (default) or Docker VMM (beta) options.
  • This does NOT affect QEMU’s role in emulating non-native architectures for multi-platform builds.
  • Your multi-architecture builds will continue to work as before.

For complete details, please see our official announcement

Introducing Docker Desktop in the Microsoft Store

Docker Desktop is now available for download from the Microsoft Store! We’re rolling out an EXE-based installer for Docker Desktop on Windows. This new distribution channel provides an enhanced installation and update experience for Windows users while simplifying deployment management for IT administrators across enterprise environments.

Key benefits

For developers:

  • Automatic Updates: The Microsoft Store handles all update processes automatically, ensuring you’re always running the latest version without manual intervention.
  • Streamlined Installation: Experience a more reliable setup process with fewer startup errors.
  • Simplified Management: Manage Docker Desktop alongside your other applications in one familiar interface.

For IT admins: 

  • Native Intune MDM Integration: Deploy Docker Desktop across your organization with Microsoft’s native management tools.
  • Centralized Deployment Control: Roll out Docker Desktop more easily through the Microsoft Store’s enterprise distribution channels.
  • Automatic Updates Regardless of Security Settings: Updates are handled automatically by the Microsoft Store infrastructure, even in organizations where users don’t have direct store access.
  • Familiar Process: The update mechanism maps to the widget command, providing consistency with other enterprise software management tools.

This new distribution option represents our commitment to improving the Docker experience for Windows users while providing enterprise IT teams with the management capabilities they need.

Unlock greater flexibility: Enable ACH transfer as a payment option for self-serve customers

We’re focused on making it easier for teams to scale, grow, and innovate. All on their own terms. That’s why we’re excited to announce an upgrade to the self-serve purchasing experience: customers can pay via ACH transfer starting on 4/30/25.

Historically, self-serve purchases were limited to credit card payments, forcing many customers who could not use credit cards into manual sales processes, even for small seat expansions. With the introduction of an ACH transfer payment option, customers can choose the payment method that works best for their business. Fewer delays and less unnecessary friction.

This payment option upgrade empowers customers to:

  • Purchase more independently without engaging sales
  • Choose between credit card or ACH transfer with a verified bank account

By empowering enterprises and developers, we’re freeing up your time, and ours, to focus on what matters most: building, scaling, and succeeding with Docker.

Visit our documentation to explore the new payment options, or log in to your Docker account to get started today!

Wrapping up 

With Docker Desktop 4.41, we’re continuing to meet developers where they are — making it easier to build, test, and ship innovative apps, no matter your stack or setup. Whether you’re pushing AI models to Docker Hub, catching issues early with the Docker DX Extension, or enjoying faster virtualization on macOS, these updates are all about helping you do your best work with the tools you already know and love. We can’t wait to see what you build next!

Learn more

How to build and deliver an MCP server for production

Par : Moby Dock
25 avril 2025 à 16:04

In December of 2024, we published a blog with Anthropic about their totally new spec (back then) to run tools with AI agents: the Model Context Protocol, or MCP. Since then, we’ve seen an explosion in developer appetite to build, share, and run their tools with Agentic AI – all using MCP. We’ve seen new MCP clients pop up, and big players like Google and OpenAI committing to this standard. However, nearly immediately, early growing pains have led to friction when it comes down to actually building and using MCP tools. At the moment, we’ve hit a major bump in the road.

MCP Pain Points

  • Runtime:
    • Getting up and running with MCP servers is a headache for devs. The standard runtimes for MCP servers rely on a specific version of Python or NodeJS, and combining tools means managing those versions, on top of extra dependencies an MCP server may require.
  • Security:
    • Giving an LLM direct access to run software on the host system is unacceptable to devs outside of hobbyist environments. In the event of hallucinations or incorrect output, significant damage could be done.
    • Users are asked to configure sensitive data in plaintext json files. An MCP config file contains all of the necessary data for your agent to act on your behalf, but likewise it centralizes everything a bad actor needs to exploit your accounts.
  • Discoverability
    • The tools are out there, but there isn’t a single good place to find the best MCP servers. Marketplaces are beginning to crop up, but the developers are still required to hunt out good sources of tools for themselves.
    • Later on in the MCP user experience, it’s very easy to end up with enough servers and tools to overwhelm your LLM – leading to incorrect tools being used, and worse outcomes. When an LLM has the right tools for the job, it can execute more efficiently. When an LLM gets the wrong tools – or too many tools to decide, hallucinations spike while evals plummet.
  • Trust:
    • When the tools are run by LLMs on behalf of the developer, it’s critical to trust the publisher of MCP servers. The current MCP publisher landscape looks like a gold rush, and is therefore vulnerable to supply-chain attacks from untrusted authors.

Docker as an MCP Runtime

Docker is a tried and true runtime to stabilize the environment in which tools run. Instead of managing multiple Node or Python installations, using Dockerized MCP servers allows anyone with the Docker Engine to run MCP servers.

Docker provides sandboxed isolation for tools so that undesirable LLM behavior can’t damage the host configuration. The LLM has no access to the host filesystem for example, unless that MCP container is explicitly bound. 

The MCP Gateway

blog How Docker Revolutionizes MCP

In order for LLM’s to work autonomously, they need to be able to discover and run tools for themselves. This is nearly impossible using all of these MCP servers. Every time a new tool is added, a config file needs to be updated and the MCP client needs to be updated. The current workaround is to develop MCP servers which configure new MCP servers, but even this requires reloading. A much better approach is to simply use one MCP server: Docker. This MCP server acts as a gateway into a dynamic set of containerized tools. But how can tools be dynamic?

The MCP Catalog 

Catalog MCP dark

A dynamic set of tools in one MCP server means that users can go somewhere to add or remove MCP tools without modifying any config. This is achieved through a simple UI in Docker Desktop to maintain a list of tools which the MCP gateway can serve out. Users gain the ability to configure their MCP clients use hundreds of Dockerized servers all by “connecting” to the gateway MCP server. 

Much like Docker Hub, Docker MCP Catalog delivers a trusted, centralized hub to discover tools for developers. And for tool authors, that same hub becomes a critical distribution channel: a way to reach new users and ensure compatibility with platforms like Claude, Cursor, OpenAI, and VS Code. 

Docker Secrets

Finally, in order to securely pass access tokens and other secrets around containers, we’ve developed a feature as part of Docker Desktop to manage secrets. When configured, secrets are only exposed to the MCP’s container process. That means the secret won’t appear even when inspecting the running container. Allowing secrets to be kept scoped tightly to the tools that need them means you no longer risk big data breaches leaving MCP config files around.

Dockerizing MCP – Bringing Discovery, Simplicity, and Trust to the Ecosystem

Par : Mark Cavage
22 avril 2025 à 13:02

AI agents are moving fast—from labs to real-world apps. And as they go from generating text to taking real action, the Model Context Protocol (MCP) has emerged as the de facto standard for connecting agents to tools.

MCP is exciting. It’s simple, modular, and built on web-native principles. We believe it has the potential to do for agentic AI interaction what containers did for app deployment – standardize and simplify a complex, fragmented landscape.

But, that leaves us at a classic inflection point. MCP Clients and Servers hold enormous potential, but the experience isn’t production-ready – yet. Discovery is fragmented, trust is manual, and core capabilities like security and authentication are still patched together with workarounds. 

To move from prototypes to production, a few things need to become non-negotiable. First, developers need a trusted, centralized hub to discover tools – no more digging through Discord threads or Twitter replies. And for tool authors, that same hub becomes a critical distribution channel: a way to reach new users and ensure compatibility with platforms like Claude, Cursor, OpenAI, and VS Code. Today, that channel simply doesn’t exist. Second, containerization should be the default; cloning repos and wrangling dependencies just to get started is unnecessary friction. Third, credential management must be seamless and secure – centralized, encrypted, and built to fit modern pipelines. And finally, security has to be foundational. Sandbox it. Permission it. Audit it. Trust can’t be an afterthought—it needs to be built in from day one. And it needs to be simple to use: accessible to all developers.

This moment for MCP reminds us a lot of the early days of the cloud and containers – high potential, a few sharp edges, and massive opportunity ahead. These aren’t abstract problems – they’re the same challenges developers face every time a new technology hits its inflection point. We’ve seen it before. And we know how to help. Back in the early days of the cloud, Docker brought structure to chaos by making immutability and isolation the standard, building in authentication, and launching Docker Hub as a central discovery layer. It didn’t just streamline deployment – it redefined how software gets built, shared, and trusted. Today, Docker serves over 20 million developers and powers billions of image pulls every month. If we bring that same clarity, trust, and scalability to MCP, we unlock a whole new generation of intelligent agents and real-world automation. That’s exactly what we’re doing – with Docker MCP Catalog and Docker MCP Toolkit.

And we’re not doing it alone. We’re partnering with leaders like Stripe, Elastic, Heroku, Pulumi, Grafana Labs, Kong Inc., Neo4j, New Relic, Continue.dev, and more – each contributing their expertise to help shape a robust, open, and secure MCP ecosystem. This isn’t just another product launch – it’s the foundation of a platform shift. And we’re building it together.

The world we’ve envisioned is one we’re building together with our partners — and it all begins this May. Starting then, the Docker MCP Catalog will serve as the trusted home for discovering MCP tools – seamlessly integrated into Docker Hub. At launch, it will include over 100 verified tools from leading partners like Stripe, Elastic, Neo4j, and more. Each tool will feature publisher verification, versioned releases, and curated collections to help developers find exactly what they need, faster. And just like container images, MCP tools will be distributed via Docker’s proven pull-based infrastructure – the same trusted backbone behind billions of downloads every month.

Alongside it, the Docker MCP Toolkit brings these tools to life – making them secure, seamless, and instantly usable on your local machine or anywhere Docker runs. With one-click launch from Docker Desktop, you can spin up MCP servers in seconds and connect them to clients like Docker AI Agent, Claude, Cursor, VS Code, Windsurf, continue.dev, and Goose – no complex setup required. It also includes built-in credentials and OAuth management, integrated with your Docker Hub account, ensuring smooth authentication and making it easy to revoke credentials when necessary. A Gateway MCP Server dynamically exposes enabled tools to compatible clients, while the new docker mcp CLI lets you build, run, and manage them with ease. And with built-in memory, network and disk isolation, every tool runs securely by default-ready for production from day one.

So what does the future look like with Docker MCP Catalog and Toolkit? Picture this: browsing hundreds of ready-to-run MCP servers directly on Docker Hub and spinning them up as easily as Redis or Postgres. Instantly connecting them to agents with a few clicks. No more hardcoded secrets, no more launching tools with full host access via npx or uvx, and no more compromising on isolation or security. Best of all? Run a Docker container, and the MCP tools just work. With familiar commands and tooling, the learning curve is nearly zero—and the possibilities are massive.

Whether you’re building tools, creating agents, or just exploring what’s possible with MCP—we’d love to hear from you. Eager to try the Docker MCP Toolkit and MCP Catalog? Click here to join our alert list. Want a sneak peek? Schedule a session with our DevRel team here. Interested in hosting your own tools on the MCP Catalog? Get in touch with us here. Let’s build this ecosystemtogether.

Docker Desktop for Mac: QEMU Virtualization Option to be Deprecated in 90 Days

15 avril 2025 à 16:10

We are announcing the upcoming deprecation of QEMU as a virtualization option for Docker Desktop on Apple Silicon Macs. After serving as our legacy virtualization solution during the early transition to Apple Silicon, QEMU will be fully deprecated 90 days from today, on July 14, 2025. This deprecation does not affect QEMU’s role in emulating non-native architectures for multi-platform builds. By moving to Apple Virtualization Framework or Docker VMM, you will ensure optimal performance.

Why We’re Making This Change

Our telemetry shows that a very small percentage of users are still using the QEMU option. We’ve maintained QEMU support for backward compatibility, but both Docker VMM and Apple Virtualization Framework now offer:

  • Significantly better performance
  • Improved stability
  • Enhanced compatibility with macOS updates
  • Better integration with Apple Silicon architecture

What This Means For You

If you’re currently using QEMU as your Virtual Machine Manager (VMM) on Docker Desktop for Mac:

  • Your current installation will continue to work normally during the 90-day transition period
  • After July 1, 2025, Docker Desktop releases will automatically migrate your environment to Apple Virtualization Framework
  • You’ll experience improved performance and stability with the newer virtualization options

Migration Plan

The migration process will be smooth and straightforward:

  1. Users on the latest Docker Desktop release will be automatically migrated to Apple Virtualization Framework after the 90-day period
  2. During the transition period, you can manually switch to either Docker VMM (our fastest option for Apple Silicon Macs) or Apple Virtualization Framework through Settings > General > Virtual Machine Options
  3. For 30 days after the deprecation date, the QEMU option will remain available in settings for users who encounter migration issues
  4. After this extended period, the QEMU option will be fully removed

Note: This deprecation does not affect QEMU’s role in emulating non-native architectures for multi-platform builds.

What You Should Do Now

We recommend proactively switching to one of our newer VMM options before the automatic migration:

blog virtual machine options
  1. Update to the latest version of Docker Desktop for Mac
  2. Open Docker Desktop Settings > General
  3. Under “Choose Virtual Machine Manager (VMM)” select either:
    • Docker VMM (BETA) – Our fastest option for Apple Silicon Macs
    • Apple Virtualization Framework – A mature, high-performance alternative

Questions or Concerns?

If you have questions or encounter any issues during migration, please:

We’re committed to making this transition as seamless as possible while delivering the best development experience on macOS.

New Docker Extension for Visual Studio Code

Par : Remy Suen
11 avril 2025 à 00:10

Today, we are excited to announce the release of a new, open-source Docker Language Server and Docker DX VS Code extension. In a joint collaboration between Docker and the Microsoft Container Tools team, this new integration enhances the existing Docker extension with improved Dockerfile linting, inline image vulnerability checks, Docker Bake file support, and outlines for Docker Compose files. By working directly with Microsoft, we’re ensuring a native, high-performance experience that complements the existing developer workflow. It’s the next evolution of Docker tooling in VS Code — built to help you move faster, catch issues earlier, and focus on what matters most: building great software.

What’s the Docker DX extension?

The Docker DX extension is focused on providing developers with faster feedback as they edit. Whether you’re authoring a complex Compose file or fine-tuning a Dockerfile, the extension surfaces relevant suggestions, validations, and warnings in real time. 

Key features include:

  • Dockerfile linting: Get build warnings and best-practice suggestions directly from BuildKit and Buildx.
  • Image vulnerability remediation (experimental): Flags references to container images with known vulnerabilities directly in Dockerfiles.
  • Bake file support: Includes code completion, variable navigation, and inline suggestions for generating targets based on your Dockerfile stages.
  • Compose file outline: Easily navigate complex Compose files with an outline view in the editor.

If you’re already using the Docker VS Code extension, the new features are included — just update the extension and start using them!

Dockerfile linting and vulnerability remediation

The inline Dockerfile linting provides warnings and best-practice guidance for writing Dockerfiles from the experts at Docker, powered by Build Checks. Potential vulnerabilities are highlighted directly in the editor with context about their severity and impact, powered by Docker Scout.

blog dockerfile

Figure 1: Providing actionable recommendations for fixing vulnerabilities and optimizing Dockerfiles

Early feedback directly in Dockerfiles keeps you focused and saves you and your team time debugging and remediating later.

Docker Bake files

The Docker DX extension makes authoring and editing Docker Bake files quick and easy. It provides code completion, code navigation, and error reporting to make editing Bake files a breeze. The extension will also look at your Dockerfile and suggest Bake targets based on the build stages you have defined in your Dockerfile.

image

Figure 2: Editing Bake files is simple and intuitive with the rich language features that the Docker DX extension provides.

image 1

Figure 3: Creating new Bake files is straightforward as your Dockerfile’s build stages are analyzed and suggested as Bake targets.

Compose outlines

Quickly navigate complex Compose files with the extension’s support for outlines available directly through VS Code’s command palette.

blog docker compose outline

Figure 4: Navigate complex Compose files with the outline panel.

Don’t use VS Code? Try the Language Server!

The features offered by the Docker DX extension are powered by the brand-new Docker Language Server, built on the Language Server Protocol (LSP). This means the same smart editing experience — like real-time feedback, validation, and suggestions for Dockerfiles, Compose, and Bake files — is available in your favorite editor.

Wrapping up

Install the extension from Docker DX – Visual Studio Marketplace today! The functionality is also automatically installed with the existing Docker VS Code extension from Microsoft.

Share your feedback on how it’s working for you, and share what features you’d like to see next. If you’d like to learn more or contribute to the project, check out our GitHub repo.

Learn more

Run Gemma 3 with Docker Model Runner: Fully Local GenAI Developer Experience

9 avril 2025 à 13:01

The landscape of generative AI development is evolving rapidly but comes with significant challenges. API usage costs can quickly add up, especially during development. Privacy concerns arise when sensitive data must be sent to external services. And relying on external APIs can introduce connectivity issues and latency.

Enter Gemma 3 and Docker Model Runner, a powerful combination that brings state-of-the-art language models to your local environment, addressing these challenges head-on.

In this blog post, we’ll explore how to run Gemma 3 locally using Docker Model Runner. We’ll also walk through a practical case study: a Comment Processing System that analyzes user feedback about a fictional AI assistant named Jarvis.

The power of local GenAI development

Before diving into the implementation, let’s look at why local GenAI development is becoming increasingly important:

  1. Cost efficiency: With no per-token or per-request charges, you can experiment freely without worrying about usage fees.
  2. Data privacy: Sensitive data stays within your environment, with no third-party exposure.
  3. Reduced network latency: Eliminates reliance on external APIs and enables offline use.
  4. Full control: Run the model on your terms, with no intermediaries and full transparency.

Setting up Docker Model Runner with Gemma 3

Docker Model Runner provides an OpenAI-compatible API interface to run models locally.
It is included in Docker Desktop for macOS, starting with version 4.40.0.

Here’s how to set it up with Gemma 3:

docker desktop enable model-runner --tcp 12434
docker model pull ai/gemma3

Once setup is complete, the OpenAI-compatible API provided by the Model Runner is available at: http://localhost:12434/engines/v1

Case study: Comment processing system

To demonstrate the power of local GenAI development, we’ve built a Comment Processing System that leverages Gemma 3 for multiple NLP tasks. This system:

  • Generates synthetic user comments about a fictional AI assistant
  • Categorizes comments as positive, negative, or neutral
  • Clusters similar comments together using embeddings
  • Identifies potential product features from the comments
  • Generates contextually appropriate responses

All tasks are performed locally with no external API calls.

Implementation details

Configuring the OpenAI SDK to use local models

To make this work, we configure the OpenAI SDK to point to the Docker Model Runner:

// config.js

export default {
  // Model configuration
  openai: {
    baseURL: "http://localhost:12434/engines/v1", // Base URL for Docker Model Runner
    apiKey: 'ignored',
    model: "ai/gemma3",
    commentGeneration: { // Each task has its own configuration, for example temperature is set to a high value when generating comments for creativity
      temperature: 0.3, 
      max_tokens: 250,
      n: 1,
    },
    embedding: {
      model: "ai/mxbai-embed-large", // Model for generating embeddings
    },
  },
  // ... other configuration options
};

import OpenAI from 'openai';
import config from './config.js';

// Initialize OpenAI client with local endpoint
const client = new OpenAI({
  baseURL: config.openai.baseURL,
  apiKey: config.openai.apiKey,
});

Task-specific configuration

One key benefit of running models locally is the ability to experiment freely with different configurations for each task without worrying about API costs or rate limits.

In our case:

  • Synthetic comment generation uses a higher temperature for creativity.
  • Categorization uses a lower temperature and a 10-token limit for consistency.
  • Clustering allows up to 20 tokens to improve semantic richness in embeddings.

This flexibility lets us iterate quickly, tune for performance, and tailor the model’s behavior to each use case.

Generating synthetic comments

To simulate user feedback, we use Gemma 3’s ability to follow detailed, context-aware prompts.

/**
 * Create a prompt for comment generation
 * @param {string} type - Type of comment (positive, negative, neutral)
 * @param {string} topic - Topic of the comment
 * @returns {string} - Prompt for OpenAI
 */
function createPromptForCommentGeneration(type, topic) {
  let sentiment = '';
  
  switch (type) {
    case 'positive':
      sentiment = 'positive and appreciative';
      break;
    case 'negative':
      sentiment = 'negative and critical';
      break;
    case 'neutral':
      sentiment = 'neutral and balanced';
      break;
    default:
      sentiment = 'general';
  }
  
  return `Generate a realistic ${sentiment} user comment about an AI assistant called Jarvis, focusing on its ${topic}.
  
The comment should sound natural, as if written by a real user who has been using Jarvis.
Keep the comment concise (1-3 sentences) and focused on the specific topic.
Do not include ratings (like "5/5 stars") or formatting.
Just return the comment text without any additional context or explanation.`;
}

Examples:

"Honestly, Jarvis is just a lot of empty promises. It keeps suggesting irrelevant articles and failing to actually understand my requests for help with my work – it’s not helpful at all."

"Jarvis is seriously impressive – the speed at which it responds is incredible! I’ve never used an AI assistant that’s so quick and efficient, it’s a game changer.

The ability to produce realistic feedback on demand is incredibly useful for simulating user data with zero API cost.

Generating contextual responses

We also use Gemma 3 to simulate polite, on-brand support responses to user comments. Here’s the prompt logic:

const response = await client.chat.completions.create({
    model: config.openai.model,
    messages: [
      {
        role: "system",
        content: `You are a customer support representative for an AI assistant called Jarvis. Your task is to generate polite, helpful responses to user comments.

Guidelines for responses:
1. Be empathetic and acknowledge the user's feedback
2. Thank the user for their input
3. If the comment is positive, express appreciation
4. If the comment is negative, apologize for the inconvenience and assure them you're working on improvements
5. If the comment is neutral, acknowledge their observation
6. If relevant, mention that their feedback will be considered for future updates
7. Keep responses concise (2-4 sentences) and professional
8. Do not make specific promises about feature implementation or timelines
9. Sign the response as "The Jarvis Team"`
      },
      {
        role: "user",
        content: `User comment: "${comment.text}"
Comment category: ${comment.category || 'unknown'}

${featuresContext}

Generate a polite, helpful response to this user comment.`
      }
    ],
    temperature: 0.7,
    max_tokens: 200
  });

Examples:

For a positive comment:

Thank you so much for your positive feedback regarding Jarvis’s interface! We’re thrilled to hear you find it clean and intuitive – that’s exactly what we’re aiming for. We appreciate you pointing out your desire for more visual customization options, and your feedback will definitely be considered as we continue to develop Jarvis.

The Jarvis Team

For a negative comment:

Thank you for your feedback – we appreciate you taking the time to share your observations about Jarvis. We sincerely apologize for the glitches and freezes you’ve experienced; we understand how frustrating that can be. Your input is valuable, and we’re actively working on improvements to enhance Jarvis’s reliability and accuracy. 

The Jarvis Team

This approach ensures a consistent, human-like support experience generated entirely locally.

Extracting product features from user feedback

Beyond generating and responding to comments, we also use Gemma 3 to analyze user feedback and identify actionable insights. This helps simulate the role of a product analyst, surfacing recurring themes, user pain points, and opportunities for improvement.

Here, we provide a prompt instructing the model to identify up to three potential features or improvements based on a set of user comments. 

/**
 * Extract features from comments
 * @param {string} commentsText - Text of comments
 * @returns {Promise<Array>} - Array of identified features
 */
async function extractFeaturesFromComments(commentsText) {
  const response = await client.chat.completions.create({
    model: config.openai.model,
    messages: [
      {
        role: "system",
        content: `You are a product analyst for an AI assistant called Jarvis. Your task is to identify potential product features or improvements based on user comments.
        
For each set of comments, identify up to 3 potential features or improvements that could address the user feedback.

For each feature, provide:
1. A short name (2-5 words)
2. A brief description (1-2 sentences)
3. The type of feature (New Feature, Improvement, Bug Fix)
4. Priority (High, Medium, Low)

Format your response as a JSON array of features, with each feature having the fields: name, description, type, and priority.`
      },
      {
        role: "user",
        content: `Here are some user comments about Jarvis. Identify potential features or improvements based on these comments:

${commentsText}`
      }
    ],
    response_format: { type: "json_object" },
    temperature: 0.5
  });
  
  try {
    const result = JSON.parse(response.choices[0].message.content);
    return result.features || [];
  } catch (error) {
    console.error('Error parsing feature identification response:', error);
    return [];
  }
}

Here’s an example of what the model might return:

"features": [
    {
      "name": "Enhanced Visual Customization",
      "description": "Allows users to personalize the Jarvis interface with more themes, icon styles, and display options to improve visual appeal and user preference.",
      "type": "Improvement",
      "priority": "Medium",
      "clusters": [
        "1"
      ]
    },

And just like everything else in this project, it’s generated locally with no external services.

Conclusion

By combining Gemma 3 with Docker Model Runner, we’ve unlocked a local GenAI workflow that’s fast, private, cost-effective, and fully under our control. In building our Comment Processing System, we experienced firsthand the benefits of this approach:

  • Rapid iteration without worrying about API costs or rate limits
  • Flexibility to test different configurations for each task
  • Offline development with no dependency on external services
  • Significant cost savings during development

And this is just one example of what’s possible. Whether you’re prototyping a new AI product, building internal tools, or exploring advanced NLP use cases, running models locally puts you in the driver’s seat.

As open-source models and local tooling continue to evolve, the barrier to entry for building powerful AI systems keeps getting lower.

Don’t just consume AI; develop, shape, and own the process.

Try it yourself: clone the repository and start experimenting today.

❌
❌