Vue normale

Il y a de nouveaux articles disponibles, cliquez pour rafraîchir la page.
Aujourd’hui — 25 juin 2025Flux principal

Docker State of App Dev: AI

25 juin 2025 à 14:42

AI is changing software development — but not how you think

The hype is real, but so are the challenges. Here’s what developers, teams, and tech leaders need to know about AI’s uneven, evolving role in software.

Rumors of AI’s pervasiveness in software development have been greatly exaggerated. A look under the hood shows adoption is far from uniform. While some dev teams are embedding AI into daily workflows, others are still kicking the tires or sitting it out entirely. Real-world usage reveals a nuanced picture shaped by industry, role, and data readiness.


Here are six key insights into AI tools and development from Docker’s second annual State of Application Development Survey, based on responses from over 4,500 industry professionals.

1. How are people using AI?

Right off the bat, we saw a split between two classes of respondents: 

  • Those who use AI tools like ChatGPT and GitHub Copilot for everyday work-related tasks such as writing, documentation, and research 
  • Those who build applications with AI/ML functionality

2. IT leads the way in AI tool usage and app development 

Only about 1 in 4 respondents (22%) report using AI tools for work. But there’s a huge spread across industries — from 1% to 84%. Among the top AI users are IT/SaaS folks (76%). And because we surveyed over three times more users this year than for last year’s report, the snapshot covers a broader spectrum of industries beyond just those focused on IT.

Underscoring tech’s embrace of AI: 34% of IT/SaaS respondents say they develop AI/ML apps, compared to just 8% outside that bubble.

And strategy reflects this gulf. Only 16% of companies outside IT report having a real AI strategy. Within tech, the number soars to 73%. Translation: AI is gaining traction, but it’s concentrated in certain industries — at least for now.

3. AI tools are overhyped — and incredibly useful

Here’s the paradox: 64% of users say AI tools make work easier, yet almost as many (59%) think AI tools are overhyped. The hype may be loud, but utility is speaking louder, especially for those who’ve stuck with it. In fact, 65% of current users say they’re using AI more than they did a year ago, and that same percentage use it every day.

This tracks roughly with findings in our 2024 report, in which 61% of respondents agreed AI made their job easier, even as 45% reported feeling AI was overhyped. And 65% agreed that AI was a positive option.

4. AI tool usage is up — and ChatGPT leads the pack

No surprises here. The most-used AI-powered tools are the same as in our 2024 survey — ChatGPT (especially among full-stack developers), GitHub Copilot, and Google Gemini. 

But usage this year far outstrips what users reported last year, with 80% selecting ChatGPT (versus 46% in our 2024 report), 53% Copilot (versus 30%), and 23% Gemini (versus 19%).

5. Developers don’t use AI the same way

The top overall use case is coding. Beyond that, it depends.

  • Seasoned devs turn to AI to write documentation and tests but use it sparingly. 
  • DevOps engineers use it for CLI help and writing docs.
  • Software devs tap AI to write tests and do research.

And not all devs lean on AI equally. Seasoned devs are the least reliant, most often rating themselves as not at all dependent (0/10), while DevOps engineers rate their dependence at 7/10. Software devs are somewhere in the middle, usually landing at a 5/10 on the dependence scale. For comparison, the overall average dependence on AI in our 2024 survey was about 4 out of 10 (all users).

Looking ahead, it will be interesting to see how dependence on AI shifts and becomes further integrated by role. 

6. Data is the bottleneck no one talks about

The use of AI/ML in app development is a new and rapidly growing phenomenon that, not surprisingly, brings new pain points. For teams building AI/ML apps, one headache stands out: data prep. A full 24% of AI builders say they’re not confident in how to identify or prepare the right datasets.

Even with the right intent and tools, teams hit friction where it hurts productivity most — upfront.

Bottom line:
We’re in the early stages of the next tech revolution — complex, fast-evolving, and rife of challenges. Developers are meeting it head-on, quickly ramping up on new tools and architectures, and driving innovation at every layer of the stack. And Docker is right there with them, empowering innovation every step of the way.

Building RAG Applications with Ollama and Python: Complete 2025 Tutorial

24 juin 2025 à 16:30
Retrieval-Augmented Generation (RAG) has revolutionized how we build intelligent applications that can access and reason over external knowledge bases. In this comprehensive tutorial, we’ll explore how to build production-ready RAG applications using Ollama and Python, leveraging the latest techniques and best practices for 2025. What is RAG and Why Use Ollama? Retrieval-Augmented Generation combines the […]
À partir d’avant-hierFlux principal

Docker Multi-Stage Builds for Python Developers: A Complete Guide

19 juin 2025 à 02:44
As a Python developer, you’ve probably experienced the pain of slow Docker builds, bloated images filled with build tools, and the frustration of waiting 10+ minutes for a simple code change to rebuild. Docker multi-stage builds solve these problems elegantly, and they’re particularly powerful for Python applications. In this comprehensive guide, we’ll explore how to […]

Optimize Your AI Containers with Docker Multi-Stage Builds: A Complete Guide

19 juin 2025 à 02:08
If you’re developing AI applications, you’ve probably experienced the frustration of slow Docker builds, bloated container images, and inefficient caching. Every time you tweak your model code, you’re stuck waiting for dependencies to reinstall, and your production images are loaded with unnecessary build tools. Docker multi-stage builds solve these problems elegantly, and they’re particularly powerful […]

Docker State of App Dev: Security

18 juin 2025 à 15:51

Security is a team sport: why everyone owns it now

Six security takeaways from Docker’s 2025 State of Application Development Report.

In the evolving world of software development, one thing is clear — security is no longer a siloed specialty. It’s a team sport, especially when vulnerabilities strike. That’s one of several key security findings in the 2025 Docker State of Application Development Survey.

Here’s what else we learned about security from our second annual report, which was based on an online survey of over 4,500 industry professionals.

1. Security isn’t someone else’s problem

Forget the myth that only “security people” handle security. Across orgs big and small, roles are blending. If you’re writing code, you’re in the security game. As one respondent put it, “We don’t have dedicated teams — we all do it.” According to the survey, just 1 in 5 organizations outsource security. And it’s top of mind at most others: only 1% of respondents say security is not a concern at their organization.

One exception to this trend: In larger organizations (50 or more employees), software security is more likely to be the exclusive domain of security engineers, with other types of engineers playing less of a role.

2. Everyone thinks they’re in charge of security

Team leads from multiple corners report that they’re the ones focused on security. Seasoned developers are as likely to zero in on it as are mid-career security engineers. And they’re both right. Security has become woven into every function — devs, leads, and ops alike.

3. When vulnerabilities hit, it’s all hands on deck

No turf wars here. When scan alerts go off, everyone pitches in — whether it’s security engineers helping experienced devs to decode scan results, engineering managers overseeing the incident, or DevOps engineers filling in where needed.

Fixing vulnerabilities is also a major time suck. Among security-related tasks that respondents routinely deal with, it was the most selected option across all roles. Worth noting: Last year’s State of Application Development Survey identified security/vulnerability remediation tools as a key area where better tools were needed in the development process.

4. Security isn’t the bottleneck — planning and execution are

Surprisingly, security doesn’t crack the top 10 issues holding teams back. Planning and execution-type activities are bigger sticking points. Translation? Security is better integrated into the workflow than many give it credit for. 

5. Shift-left is yesterday’s news

The once-pervasive mantra of “shift security left” is now only the 9th most important trend. Has the shift left already happened? Is AI and cloud complexity drowning it out? Or is this further evidence that security is, by necessity, shifting everywhere?

Again, perhaps security tools have gotten better, making it easier to shift left. (Our 2024 survey identified the shift-left approach as a possible source of frustration for developers and an area where more effective tools could make a difference.) Or perhaps there’s simply broader acceptance of the shift-left trend.

6. Shifting security left may not be the buzziest trend, but it’s still influential

The impact of shifting security left pales beside more dominant trends such as Generative AI and infrastructure as code. But it’s still a strong influence for developers in leadership roles. 

Bottom line: Security is no longer a roadblock; it’s a reflex. Teams aren’t asking, “Who owns security?” — they’re asking, “How can we all do it better?”

Model Context Protocol: Integrate Claude with Discord

Par : Ajeet Raina
8 juin 2025 à 08:05
As Discord continues to evolve as the primary platform for community building, gaming, and collaboration, the need for intelligent automation and AI-powered community management becomes increasingly important. The Model Context Protocol (MCP) opens up exciting possibilities for integrating AI assistants like Claude directly with Discord servers, enabling sophisticated bot interactions, content moderation, and community engagement […]

How to Make an AI Chatbot from Scratch using Docker Model Runner

Par : Harsh Manvar
3 juin 2025 à 18:40

Today, we’ll show you how to build a fully functional Generative AI chatbot using Docker Model Runner and powerful observability tools, including Prometheus, Grafana, and Jaeger. We’ll walk you through the common challenges developers face when building AI-powered applications, demonstrate how Docker Model Runner solves these pain points, and then guide you step-by-step through building a production-ready chatbot with comprehensive monitoring and metrics.

By the end of this guide, you’ll know how to make an AI chatbot and run it locally. You’ll also learn how to set up real-time monitoring insights, streaming responses, and a modern React interface — all orchestrated through familiar Docker workflows.

The current challenges with GenAI development

Generative AI (GenAI) is revolutionizing software development, but creating AI-powered applications comes with significant challenges. First, the current AI landscape is fragmented — developers must piece together various libraries, frameworks, and platforms that weren’t designed to work together. Second, running large language models efficiently requires specialized hardware configurations that vary across platforms, while AI model execution remains disconnected from standard container workflows. This forces teams to maintain separate environments for their application code and AI models.

Third, without standardized methods for storing, versioning, and serving models, development teams struggle with inconsistent deployment practices. Meanwhile, relying on cloud-based AI services creates financial strain through unpredictable costs that scale with usage. Additionally, sending data to external AI services introduces privacy and security risks, especially for applications handling sensitive information.

These challenges combine to create a frustrating developer experience that hinders experimentation and slows innovation precisely when businesses need to accelerate their AI adoption. Docker Model Runner addresses these pain points by providing a streamlined solution for running AI models locally, right within your existing Docker workflow.

How Docker is solving these challenges

Docker Model Runner offers a revolutionary approach to GenAI development by integrating AI model execution directly into familiar container workflows. 

dmr-genai-comparison

Figure 1: Comparison diagram showing complex multi-step traditional GenAI setup versus simplified Docker Model Runner single-command workflow

Many developers successfully use containerized AI models, benefiting from integrated workflows, cost control, and data privacy. Docker Model Runner builds on these strengths by making it even easier and more efficient to work with models. By running models natively on your host machine while maintaining the familiar Docker interface, Model Runner delivers.

  • Simplified Model Execution: Run AI models locally with a simple Docker CLI command, no complex setup required.
  • Hardware Acceleration: Direct access to GPU resources without containerization overhead
  • Integrated Workflow: Seamless integration with existing Docker tools and container development practices
  • Standardized Packaging: Models are distributed as OCI artifacts through the same registries you already use
  • Cost Control: Eliminate unpredictable API costs by running models locally
  • Data Privacy: Keep sensitive data within your infrastructure with no external API calls

This approach fundamentally changes how developers can build and test AI-powered applications, making local development faster, more secure, and dramatically more efficient.

How to create an AI chatbot with Docker

In this guide, we’ll build a comprehensive GenAI application that showcases how to create a fully-featured chat interface powered by Docker Model Runner, complete with advanced observability tools to monitor and optimize your AI models.

Project overview

The project is a complete Generative AI interface that demonstrates how to:

  1. Create a responsive React/TypeScript chat UI with streaming responses
  2. Build a Go backend server that integrates with Docker Model Runner
  3. Implement comprehensive observability with metrics, logging, and tracing
  4. Monitor AI model performance with real-time metrics

Architecture

The application consists of these main components:

  1. The frontend sends chat messages to the backend API
  2. The backend formats the messages and sends them to the Model Runner
  3. The LLM processes the input and generates a response
  4. The backend streams the tokens back to the frontend as they’re generated
  5. The frontend displays the incoming tokens in real-time
  6. Observability components collect metrics, logs, and traces throughout the process

arch

Figure 2: Architecture diagram showing data flow between frontend, backend, Model Runner, and observability tools like Prometheus, Grafana, and Jaeger.

Project structure

The project has the following structure:

tree -L 2
.
├── Dockerfile
├── README-model-runner.md
├── README.md
├── backend.env
├── compose.yaml
├── frontend
..
├── go.mod
├── go.sum
├── grafana
│   └── provisioning
├── main.go
├── main_branch_update.md
├── observability
│   └── README.md
├── pkg
│   ├── health
│   ├── logger
│   ├── metrics
│   ├── middleware
│   └── tracing
├── prometheus
│   └── prometheus.yml
├── refs
│   └── heads
..


21 directories, 33 files

We’ll examine the key files and understand how they work together throughout this guide.

Prerequisites

Before we begin, make sure you have:

  • Docker Desktop (version 4.40 or newer) 
  • Docker Model Runner enabled
  • At least 16GB of RAM for running AI models efficiently
  • Familiarity with Go (for backend development)
  • Familiarity with React and TypeScript (for frontend development)

Getting started

To run the application:

  1. Clone the repository: 
git clone 
https://github.com/dockersamples/genai-model-runner-metrics

cd genai-model-runner-metrics


  1. Enable Docker Model Runner in Docker Desktop:
  • Go to Settings > Features in Development > Beta tab
  • Enable “Docker Model Runner”
  • Select “Apply and restart”
enable-dmr

Figure 3: Screenshot of Docker Desktop Beta Features settings panel with Docker AI, Docker Model Runner, and TCP support enabled.

  1. Download the model

For this demo, we’ll use Llama 3.2, but you can substitute any model of your choice:

docker model pull ai/llama3.2:1B-Q8_0


Just like viewing containers, you can manage your downloaded AI models directly in Docker Dashboard under the Models section. Here you can see model details, storage usage, and manage your local AI model library.

model-ui

Figure 4: View of Docker Dashboard showing locally downloaded AI models with details like size, parameters, and quantization.

  1. Start the application: 
docker compose up -d --build
list-of-containers

Figure 5: List of active running containers in Docker Dashboard, including Jaeger, Prometheus, backend, frontend, and genai-model-runner-metrics.

  1. Open your browser and navigate to the frontend URL at http://localhost:3000 . You’ll be greeted with a modern chat interface (see screenshot) featuring: 
  • Clean, responsive design with dark/light mode toggle
  • Message input area ready for your first prompt
  • Model information displayed in the footer
expand

Figure 6: GenAI chatbot interface showing live metrics panel with input/output tokens, response time, and error rate.

  1. Click on Expand to view the metrics like:
  • Input tokens
  • Output tokens
  • Total Requests
  • Average Response Time
  • Error Rate
metrics

Figure 7: Expanded metrics view with input and output tokens, detailed chat prompt, and response generated by Llama 3.2 model.

Grafana allows you to visualize metrics through customizable dashboards. Click on View Detailed Dashboard to open up Grafana dashboard.

chatbot

Figure 8: Chat interface showing metrics dashboard with prompt and response plus option to view detailed metrics in Grafana.

Log in with the default credentials (enter “admin” as user and password) to explore pre-configured AI performance dashboards (see screenshot below) showing real-time metrics like tokens per second, memory usage, and model performance. 

Select Add your first data source. Choose Prometheus as a data source. Enter “http://prometheus:9090” as Prometheus Server URL. Scroll down to the end of the site and click “Save and test”. By now, you should see “Successfully queried the Prometheus API” as an acknowledgement. Select Dashboard and click Re-import for all these dashboards.

By now, you should have a Prometheus 2.0 Stats dashboard up and running.

grafana

Figure 9: Grafana dashboard with multiple graph panels monitoring GenAI chatbot performance, displaying time-series charts for memory consumption, processing speeds, and application health

Prometheus allows you to collect and store time-series metrics data. Open the Prometheus query interface http://localhost:9091 and start typing “genai” in the query box to explore all available AI metrics (as shown in the screenshot below). You’ll see dozens of automatically collected metrics, including tokens per second, latency measurements, and llama.cpp-specific performance data. 

prometheus

Figure 10: Prometheus web interface showing dropdown of available GenAI metrics including genai_app_active_requests and genai_app_token_latency

Jaeger provides a visual exploration of request flows and performance bottlenecks. You can access it via http://localhost:16686

Implementation details

Let’s explore how the key components of the project work:

  1. Frontend implementation

The React frontend provides a clean, responsive chat interface built with TypeScript and modern React patterns. The core App.tsx component manages two essential pieces of state: dark mode preferences for user experience and model metadata fetched from the backend’s health endpoint. 

When the component mounts, the useEffect hook automatically retrieves information about the currently running AI model. It displays details like the model name directly in the footer to give users transparency about which LLM is powering their conversations.

// Essential App.tsx structure
function App() {
  const [darkMode, setDarkMode] = useState(false);
  const [modelInfo, setModelInfo] = useState<ModelMetadata | null>(null);

  // Fetch model info from backend
  useEffect(() => {
    fetch('http://localhost:8080/health')
      .then(res => res.json())
      .then(data => setModelInfo(data.model_info));
  }, []);

  return (
    <div className="min-h-screen bg-white dark:bg-gray-900">
      <Header toggleDarkMode={() => setDarkMode(!darkMode)} />
      <ChatBox />
      <footer>
        Powered by Docker Model Runner running {modelInfo?.model}
      </footer>
    </div>
  );
}

The main App component orchestrates the overall layout while delegating specific functionality to specialized components like Header for navigation controls and ChatBox for the actual conversation interface. This separation of concerns makes the codebase maintainable while the automatic model info fetching demonstrates how the frontend seamlessly integrates with the Docker Model Runner through the Go backend’s API, creating a unified user experience that abstracts away the complexity of local AI model execution.

  1. Backend implementation: Integration with Model Runner

The core of this application is a Go backend that communicates with Docker Model Runner. Let’s examine the key parts of our main.go file:

client := openai.NewClient(
    option.WithBaseURL(baseURL),
    option.WithAPIKey(apiKey),
)

This demonstrates how we leverage Docker Model Runner’s OpenAI-compatible API. The Model Runner exposes endpoints that match OpenAI’s API structure, allowing us to use standard clients. Depending on your connection method, baseURL is set to either:

  • http://model-runner.docker.internal/engines/llama.cpp/v1/ (for Docker socket)
  • http://host.docker.internal:12434/engines/llama.cpp/v1/ (for TCP)

How metrics flow from host to containers

One key architectural detail worth understanding: llama.cpp runs natively on your host (via Docker Model Runner), while Prometheus and Grafana run in containers. Here’s how they communicate:

The Backend as Metrics Bridge:

  • Connects to llama.cpp via Model Runner API (http://localhost:12434)
  • Collects performance data from each API call (response times, token counts)
  • Calculates metrics like tokens per second and memory usage
  • Exposes all metrics in Prometheus format at http://backend:9090/metrics
  • Enables containerized Prometheus to scrape metrics without host access

This hybrid architecture gives you the performance benefits of native model execution with the convenience of containerized observability.

LLama.cpp metrics integration

The project provides detailed real-time metrics specifically for llama.cpp models:

Metric

Description

Implementation in Code

Tokens per Second

Measure of model generation speed

LlamaCppTokensPerSecond in metrics.go

Context Window Size

Maximum context length in tokens

LlamaCppContextSize in metrics.go

Prompt Evaluation Time

Time spent processing input prompt

LlamaCppPromptEvalTime in metrics.go

Memory per Token

Memory efficiency measurement

LlamaCppMemoryPerToken in metrics.go

Thread Utilization

Number of CPU threads used

LlamaCppThreadsUsed in metrics.go

Batch Size

Token processing batch size

LlamaCppBatchSize in metrics.go

One of the most powerful features is our detailed metrics collection for llama.cpp models. These metrics help optimize model performance and identify bottlenecks in your inference pipeline.

// LlamaCpp metrics
llamacppContextSize = promautoFactory.NewGaugeVec(
    prometheus.GaugeOpts{
        Name: "genai_app_llamacpp_context_size",
        Help: "Context window size in tokens for llama.cpp models",
    },
    []string{"model"},
)

llamacppTokensPerSecond = promautoFactory.NewGaugeVec(
    prometheus.GaugeOpts{
        Name: "genai_app_llamacpp_tokens_per_second",
        Help: "Tokens generated per second",
    },
    []string{"model"},
)

// More metrics definitions...


These metrics are collected, processed, and exposed both for Prometheus scraping and for real-time display in the front end. This gives us unprecedented visibility into how the llama.cpp inference engine is performing.

Chat implementation with streaming

The chat endpoint implements streaming for real-time token generation:


// Set up streaming with a proper SSE format
w.Header().Set("Content-Type", "text/event-stream")
w.Header().Set("Cache-Control", "no-cache")
w.Header().Set("Connection", "keep-alive")

// Stream each chunk as it arrives
if len(chunk.Choices) > 0 && chunk.Choices[0].Delta.Content != "" {
    outputTokens++
    _, err := fmt.Fprintf(w, "%s", chunk.Choices[0].Delta.Content)
    if err != nil {
        log.Printf("Error writing to stream: %v", err)
        return
    }
    w.(http.Flusher).Flush()
}

This streaming implementation ensures that tokens appear in real-time in the user interface, providing a smooth and responsive chat experience. You can also measure key performance metrics like time to first token and tokens per second.

Performance measurement

You can measure various performance aspects of the model:

// Record first token time
if firstTokenTime.IsZero() && len(chunk.Choices) > 0 && 
chunk.Choices[0].Delta.Content != "" {
    firstTokenTime = time.Now()
    
    // For llama.cpp, record prompt evaluation time
    if strings.Contains(strings.ToLower(model), "llama") || 
       strings.Contains(apiBaseURL, "llama.cpp") {
        promptEvalTime := firstTokenTime.Sub(promptEvalStartTime)
        llamacppPromptEvalTime.WithLabelValues(model).Observe(promptEvalTime.Seconds())
    }
}

// Calculate tokens per second for llama.cpp metrics
if strings.Contains(strings.ToLower(model), "llama") || 
   strings.Contains(apiBaseURL, "llama.cpp") {
    totalTime := time.Since(firstTokenTime).Seconds()
    if totalTime > 0 && outputTokens > 0 {
        tokensPerSecond := float64(outputTokens) / totalTime
        llamacppTokensPerSecond.WithLabelValues(model).Set(tokensPerSecond)
    }
}

These measurements help us understand the model’s performance characteristics and optimize the user experience.

Metrics collection

The metrics.go file is a core component of our observability stack for the Docker Model Runner-based chatbot. This file defines a comprehensive set of Prometheus metrics that allow us to monitor both the application performance and the underlying llama.cpp model behavior.

Core metrics architecture

The file establishes a collection of Prometheus metric types:

  • Counters: For tracking cumulative values (like request counts, token counts)
  • Gauges: For tracking values that can increase and decrease (like active requests)
  • Histograms: For measuring distributions of values (like latencies)

Each metric is created using the promauto factory, which automatically registers metrics with Prometheus.

Categories of metrics

The metrics can be divided into three main categories:

1. HTTP and application metrics

// RequestCounter counts total HTTP requests
RequestCounter = promauto.NewCounterVec(
    prometheus.CounterOpts{
        Name: "genai_app_http_requests_total",
        Help: "Total number of HTTP requests",
    },
    []string{"method", "endpoint", "status"},
)

// RequestDuration measures HTTP request durations
RequestDuration = promauto.NewHistogramVec(
    prometheus.HistogramOpts{
        Name:    "genai_app_http_request_duration_seconds",
        Help:    "HTTP request duration in seconds",
        Buckets: prometheus.DefBuckets,
    },
    []string{"method", "endpoint"},
)

These metrics monitor the HTTP server performance, tracking request counts, durations, and error rates. The metrics are labelled with dimensions like method, endpoint, and status to enable detailed analysis.

2. Model performance metrics

// ChatTokensCounter counts tokens in chat requests and responses
ChatTokensCounter = promauto.NewCounterVec(
    prometheus.CounterOpts{
        Name: "genai_app_chat_tokens_total",
        Help: "Total number of tokens processed in chat",
    },
    []string{"direction", "model"},
)

// ModelLatency measures model response time
ModelLatency = promauto.NewHistogramVec(
    prometheus.HistogramOpts{
        Name:    "genai_app_model_latency_seconds",
        Help:    "Model response time in seconds",
        Buckets: []float64{0.1, 0.5, 1, 2, 5, 10, 20, 30, 60},
    },
    []string{"model", "operation"},
)

These metrics track the LLM usage patterns and performance, including token counts (both input and output) and overall latency. The FirstTokenLatency metric is particularly important as it measures the time to get the first token from the model, which is a critical user experience factor.

3. llama.cpp specific metrics

// LlamaCppContextSize measures the context window size
LlamaCppContextSize = promauto.NewGaugeVec(
    prometheus.GaugeOpts{
        Name: "genai_app_llamacpp_context_size",
        Help: "Context window size in tokens for llama.cpp models",
    },
    []string{"model"},
)

// LlamaCppTokensPerSecond measures generation speed
LlamaCppTokensPerSecond = promauto.NewGaugeVec(
    prometheus.GaugeOpts{
        Name: "genai_app_llamacpp_tokens_per_second",
        Help: "Tokens generated per second",
    },
    []string{"model"},
)

These metrics capture detailed performance characteristics specific to the llama.cpp inference engine used by Docker Model Runner. They include:

1. Context Size: 

It represents the token window size used by the model, typically ranging from 2048 to 8192 tokens. The optimization goal is balancing memory usage against conversation quality.  When memory usage becomes problematic, reduce context size to 2048 tokens for faster processing

2. Prompt Evaluation Time

It measures the time spent processing input before generating tokens, essentially your time-to-first-token latency with a target of under 2 seconds. The optimization focus is minimizing user wait time for the initial response. If evaluation time exceeds 3 seconds, reduce context size or implement prompt compression techniques.

3. Tokens Per Second

It measures the time spent processing input before generating tokens, essentially your time-to-first-token latency with a target of under 2 seconds. The optimization focus is minimizing user wait time for the initial response. If evaluation time exceeds 3 seconds, reduce context size or implement prompt compression techniques. 

4. Tokens Per Second

It indicates generation speed, with a target of 8+ TPS for good user experience. This metric requires balancing response speed with model quality. When TPS drops below 5, switch to more aggressive quantization (Q4 instead of Q8) or use a smaller model variant. 

5. Memory Per Token

It tracks RAM consumption per generated token, with optimization aimed at preventing out-of-memory crashes and optimizing resource usage. When memory consumption exceeds 100MB per token, implement aggressive conversation pruning to reduce memory pressure. If memory usage grows over time during extended conversations, add automatic conversation resets after a set number of exchanges. 

6. Threads Used

It monitors the number of CPU cores actively processing model operations, with the goal of maximizing throughput without overwhelming the system. If thread utilization falls below 50% of available cores, increase the thread count for better performance. 

7. Batch Size

It controls how many tokens are processed simultaneously, requiring optimization based on your specific use case balancing latency versus throughput. For real-time chat applications, use smaller batches of 32-64 tokens to minimize latency and provide faster response times.

In nutshell, these metrics are crucial for understanding and optimizing llama.cpp performance characteristics, which directly affect the user experience of the chatbot.

Docker Compose: LLM as a first-class service

With Docker Model Runner integration, Compose makes AI model deployment as simple as any other service. One docker-compose.yml file defines your entire AI application:

  • Your AI models (via Docker Model Runner)
  • Application backend and frontend
  • Observability stack (Prometheus, Grafana, Jaeger)
  • All networking and dependencies

The most innovative aspect is the llm service using Docker’s model provider, which simplifies model deployment by directly integrating with Docker Model Runner without requiring complex configuration. This composition creates a complete, scalable AI application stack with comprehensive observability.

  llm:
    provider:
      type: model
      options:
        model: ${LLM_MODEL_NAME:-ai/llama3.2:1B-Q8_0}


​​This configuration tells Docker Compose to treat an AI model as a standard service in your application stack, just like a database or web server. 

  • The provider syntax is Docker’s new way of handling AI models natively. Instead of building containers or pulling images, Docker automatically manages the entire model-serving infrastructure for you. 
  • The model: ${LLM_MODEL_NAME:-ai/llama3.2:1B-Q8_0} line uses an environment variable with a fallback, meaning it will use whatever model you specify in LLM_MODEL_NAME, or default to Llama 3.2 1B if nothing is set.

Docker Compose: One command to run your entire stack

Why is this revolutionary? Before this, deploying an LLM required dozens of lines of complex configuration – custom Dockerfiles, GPU device mappings, volume mounts for model files, health checks, and intricate startup commands.

Now, those four lines replace all of that complexity. Docker handles downloading the model, configuring the inference engine, setting up GPU access, and exposing the API endpoints automatically. Your other services can connect to the LLM using simple service names, making AI models as easy to use as any other infrastructure component. This transforms AI from a specialized deployment challenge into standard infrastructure-as-code.

Here’s the full compose.yml file that orchestrates the entire application:

 services:
  backend:
    env_file: 'backend.env'
    build:
      context: .
      target: backend
    ports:
      - '8080:8080'
      - '9090:9090'  # Metrics port
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock  # Add Docker socket access
    healthcheck:
      test: ['CMD', 'wget', '-qO-', 'http://localhost:8080/health']
      interval: 3s
      timeout: 3s
      retries: 3
    networks:
      - app-network
    depends_on:
      - llm

  frontend:
    build:
      context: ./frontend
    ports:
      - '3000:3000'
    depends_on:
      backend:
        condition: service_healthy
    networks:
      - app-network

  prometheus:
    image: prom/prometheus:v2.45.0
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'
    ports:
      - '9091:9090'
    networks:
      - app-network

  grafana:
    image: grafana/grafana:10.1.0
    volumes:
      - ./grafana/provisioning:/etc/grafana/provisioning
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_SERVER_DOMAIN=localhost
    ports:
      - '3001:3000'
    depends_on:
      - prometheus
    networks:
      - app-network

  jaeger:
    image: jaegertracing/all-in-one:1.46
    environment:
      - COLLECTOR_ZIPKIN_HOST_PORT=:9411
    ports:
      - '16686:16686'  # UI
      - '4317:4317'    # OTLP gRPC
      - '4318:4318'    # OTLP HTTP
    networks:
      - app-network

  # New LLM service using Docker Compose's model provider
  llm:
    provider:
      type: model
      options:
        model: ${LLM_MODEL_NAME:-ai/llama3.2:1B-Q8_0}

volumes:
  grafana-data:

networks:
  app-network:
    driver: bridge

This compose.yml defines a complete microservices architecture for the application with integrated observability tools and Model Runner support:

backend

  • Go-based API server with Docker socket access for container management
  • Implements health checks and exposes both API (8080) and metrics (9090) ports

frontend

  • React-based user interface for an interactive chat experience
  • Waits for backend health before starting to ensure system reliability

prometheus

  • Time-series metrics database for collecting and storing performance data
  • Configured with custom settings for monitoring application behavior

grafana

  • Data visualization platform for metrics with persistent dashboard storage
  • Pre-configured with admin access and connected to the Prometheus data source

jaeger

  • Distributed tracing system for visualizing request flows across services
  • Supports multiple protocols (gRPC/HTTP) with UI on port 16686

How Docker Model Runner integration works

The project integrates with Docker Model Runner through the following mechanisms:

  1. Connection Configuration:
    • Using internal DNS: http://model-runner.docker.internal/engines/llama.cpp/v1/
    • Using TCP via host-side support: localhost:12434
  2. Docker’s Host Networking:
    • The extra_hosts configuration maps host.docker.internal to the host’s gateway IP
  3. Environment Variables:
    • BASE_URL: URL for the model runner
    • MODEL: Model identifier (e.g., ai/llama3.2:1B-Q8_0)
  4. API Communication:
    • The backend formats messages and sends them to Docker Model Runner
    • It then streams tokens back to the frontend in real-time

Why this approach excels

Building GenAI applications with Docker Model Runner and comprehensive observability offers several advantages:

  • Privacy and Security: All data stays on your local infrastructure
  • Cost Control: No per-token or per-request API charges
  • Performance Insights: Deep visibility into model behavior and efficiency
  • Developer Experience: Familiar Docker-based workflow with powerful monitoring
  • Flexibility: Easy to experiment with different models and configurations

Conclusion

The genai-model-runner-metrics project demonstrates a powerful approach to building AI-powered applications with Docker Model Runner while maintaining visibility into performance characteristics. By combining local model execution with comprehensive metrics, you get the best of both worlds: the privacy and cost benefits of local execution with the observability needed for production applications.

Whether you’re building a customer support bot, a content generation tool, or a specialized AI assistant, this architecture provides the foundation for reliable, observable, and efficient AI applications. The metrics-driven approach ensures you can continuously monitor and optimize your application, leading to better user experiences and more efficient resource utilization.

Ready to get started? Clone the repository, fire up Docker Desktop, and experience the future of AI development — your own local, metrics-driven GenAI application is just a docker compose up away!

Learn more

Settings Management for Docker Desktop now generally available in the Admin Console

4 juin 2025 à 15:39

We’re excited to announce that Settings Management for Docker Desktop is now Generally Available!  Settings Management can be configured in the Admin Console for customers with a Docker Business subscription.  After a successful Early Access period, this powerful administrative solution has been enhanced with new compliance reporting capabilities, completing our vision for centralized Docker Desktop configuration management at scale through the Admin Console.

To add additional context, Docker provides an enterprise-grade integrated solution suite for container development.  This includes administration and management capabilities that support enterprise needs for security, governance, compliance, scale, ease of use, control, insights, and observability.  The new Settings Management capabilities in the Admin Console for managing Docker Desktop instances are the latest enhancement to this area.  This new feature provides organization administrators with a single, unified interface to configure and enforce security policies, and control Docker Desktop settings across all users in their organization.  Overall, Settings Management eliminates the need to manually configure each individual Docker machine and ensures consistent compliance and security standards company-wide.

Enterprise-grade management for Docker Desktop

First introduced in Docker Desktop 4.36 as an Early Access feature, Docker Desktop Settings Management enables administrators to centrally deploy and enforce settings policies directly from the Admin Console. From the Docker Admin Console, administrators can configure Docker Desktop settings according to a security policy and select users to whom the policy applies. When users start Docker Desktop, those settings are automatically applied and enforced.

With the addition of Desktop Settings Reporting in Docker Desktop 4.40, the solution offers end-to-end management capabilities from policy creation to compliance verification.

This comprehensive approach to settings management delivers on our promise to simplify Docker Desktop administration while ensuring organizational compliance across diverse enterprise environments.

Complete settings management lifecycle

Desktop Settings Management now offers multiple administration capabilities:

  • Admin Console policies: Configure and enforce default Docker Desktop settings directly from the cloud-based Admin Console. There’s no need to distribute admin-settings.json files to local machines via MDM.
  • Quick import: Seamlessly migrate existing configurations from admin-settings.json files
  • Export and share: Easily share policies as JSON files with security and compliance teams
  • Targeted testing: Roll out policies to smaller groups before deploying globally
  • Enhanced security: Benefit from improved signing and reporting methods that reduce the risk of tampering with settings
  • Settings compliance reporting: Track and verify policy application across all developers in your engineering organization

Figure 1: Admin Console Settings Management

Admin Console Settings Management

New: Desktop Settings Reporting

The newly added settings reporting dashboard in the Admin Console provides administrators with crucial visibility into the compliance status of all users:

  • Real-time settings compliance tracking: Easily monitor which users are compliant with their assigned settings policies.
  • Streamlined troubleshooting: Detailed status information helps administrators diagnose and resolve non-compliance issues.

The settings reporting dashboard is accessible via Admin Console > Docker Desktop > Reporting, offering options to:

  • Search by username or email address
  • Filter by assigned policies
  • Toggle visibility of compliant users to focus on potential issues
  • View detailed compliance information for specific users
  • Download comprehensive compliance data as a CSV file

For non-compliant users, the settings reporting dashboard provides targeted resolution steps to help administrators quickly address issues and ensure organizational compliance.

Figure 2: Admin Console Settings Reporting

Docker Admin Console Settings Reporting

Figure 3: Locked settings in Docker Desktop

Docker Desktop settings locked

Enhanced security through centralized management

Desktop Settings Management is particularly valuable for engineering organizations with strict security and compliance requirements. This GA release enables administrators to:

  • Enforce consistent configuration across all Docker Desktop instances, without having to go through complicated and error prone MDM based deployments
  • Verify policy application and quickly remediate non-compliant systems
  • Reduce the risk of tampering with local settings
  • Generate compliance reports for security audits

Getting started

To take advantage of Desktop Settings Management:

  1. Ensure your Docker Desktop users are signed in on version 4.40 or later
  2. Log in to the Docker Admin Console
  3. Navigate to Docker Desktop > Settings Management to create policies
  4. Navigate to Docker Desktop > Reporting to monitor compliance

For more detailed information, visit our documentation on Settings Management.

What’s next?

Included with Docker Business, the GA release of Settings Management for Docker Desktop represents a significant milestone in our commitment to delivering enterprise-grade management, governance, and administration tools. We’ll continue to enhance these capabilities based on customer feedback, enterprise needs, and evolving security requirements.

We encourage you to explore Settings Management and let us know how it’s helping you manage Docker Desktop instances more efficiently across your development teams and engineering organization.

We’re thrilled to meet the management and administration needs of our customers with these exciting enhancements and we want you to stay connected with us as we build even more administration and management capabilities for development teams and engineering organizations.

Learn more

Thank you!

logo docker blue horz

AWS MCP Servers: Revolutionizing AI-Powered Cloud Development with the Model Context Protocol

1 juin 2025 à 17:27
The landscape of AI-assisted development is evolving rapidly, and AWS Labs has introduced a game-changing suite of specialized MCP servers that bring AWS best practices directly to your development workflow. Whether you’re building cloud-native applications, managing infrastructure, or optimizing costs, AWS MCP Servers are transforming how developers interact with AWS services through AI coding assistants. […]

How to successfully run Open WebUI with Docker Model Runner

30 mai 2025 à 11:48
How to Use Open WebUI with Docker Model Runner The landscape of local AI development has evolved dramatically in recent years, with developers increasingly seeking privacy-focused, offline-capable solutions for running Large Language Models (LLMs). Two powerful tools have emerged to address this need: OpenWebUI and Docker Model Runner. This comprehensive guide will explore both technologies […]

Before and After MCP: The Evolution of AI Tool Integration

21 mai 2025 à 11:21
This past weekend, I presented a talk titled “How Docker is revolutionizing the MCP Landscape,” which garnered positive feedback from attendees. During the presentation, I provided an in-depth exploration of both Model Runner capabilities and MCP Toolkit functionalities. For those keeping track of recent AI developments, the rapid pace of innovation is unmistakable, with MCP […]

Which Model to Choose with Docker Model Runner?

17 mai 2025 à 16:52
Choosing the Right Docker Model Runner for Your Needs Docker Model Runner allows you to run AI models locally through Docker Desktop. Here’s a breakdown of the available models and their recommended use cases: 1. ai/smollm2 2. ai/llama3.2 3. ai/llama3.3 4. ai/gemma3 5. ai/phi4 6. ai/mistral and ai/mistral-nemo 7. ai/qwen2.5 8. ai/deepseek-r1-distill-llama How to Choose […]

Docker at Microsoft Build 2025: Where Secure Software Meets Intelligent Innovation

15 mai 2025 à 22:57

This year at Microsoft Build, Docker will blend developer experience, security, and AI innovation with our latest product announcements. Whether you attend in person at the Seattle Convention Center or tune in online, you’ll see how Docker is redefining the way teams build, secure, and scale modern applications.

Docker’s Vision for Developers

At Microsoft Build 2025, Docker’s EVP of Product and Engineering, Tushar Jain, will present the company’s vision for AI-native software delivery, prioritizing simplicity, security, and developer flow. His session will explore how Docker is helping teams adopt AI without complexity and scale confidently from local development to production using the workflows they already trust.

This vision starts with security. Today’s developers are expected to manage a growing number of vulnerabilities, stay compliant with evolving standards, and still ship software on time. Docker helps teams simplify container security by integrating with tools like Microsoft Defender, Azure Container Registry, and AKS. This makes it easier to build secure, production-ready applications without overhauling existing workflows.

This session explores how Docker is streamlining agentic AI development by bringing models and MCP tools together in one familiar environment. Learn how to build agentic AI with your existing workflows and commands. Explore curated AI tools on Docker Hub to get inspired and jumpstart your projects. No steep learning curve is required! With built-in security, access control, and secret management, Docker handles the heavy lifting so you can focus on building smarter, more capable agents.

Don’t miss our follow-up demo session with Principal Engineer Jim Clark. He’ll show how to build an agentic app that uses Docker’s latest AI tools and familiar workflows.

Visit Docker at Booth #400 to see us in action

Throughout the conference, Docker will be live at Booth #400. Drop by for demos, expert walkthroughs, and to try out Docker Hardened Images, Model Runner, and MCP Catalog and Toolkit. Our product, engineering, and DevRel teams will be on-site to answer questions and help you get hands-on.

Party with your fellow Developers at MOPOP

We’re hosting an evening event at one of Seattle’s most iconic pop culture venues to celebrate the launch of our latest tools.

Docker MCP @ MOPOP
Date: Monday, May 19
Time: 7:00–10:00 PM
Location: Museum of Pop Culture, Seattle

Enjoy live demos, food and drinks, access to Docker engineers and leaders, and private after-hours access to the museum. Space is limited. RSVP now to reserve your spot!

How to Build Your First MCP Server in Python

15 mai 2025 à 08:15
The Model Context Protocol (MCP) is an open standard designed to help AI systems maintain context throughout a conversation. It provides a consistent way for AI applications to manage context, making it easier to build reliable AI systems with persistent memory. In this blog, I will show you how to build MCP server from the […]

5 Minutes to Kubernetes MCP Server using Docker MCP Tookit

Par : Ajeet Raina
8 mai 2025 à 09:27
Have you ever wished you could manage your Kubernetes clusters more easily without switching between multiple tools and terminals? Imagine managing K8s clusters using simple natural language commands instead of memorizing dozens of kubectl incantations. Well, the wait is over. Docker’s Model Context Protocol (MCP) Toolkit is revolutionizing how we interact with Kubernetes, bringing AI-powered […]
❌
❌