Vue lecture

Il y a de nouveaux articles disponibles, cliquez pour rafraîchir la page.

Tracing External Processes with Akka.NET and OpenTelemetry: Part 1 (The Code)

Tracing External Processes with Akka.NET and OpenTelemetry: Part 1 (The Code)

Distributed tracing is one of the most useful observability tools you can add to your products. Digging into the steps of some process to see what happened and how long everything took gives you a valuable debugging tool for distributed systems. It's usually straightforward to add tracing to HTTP components - you can get a lot of the work for free if you use a service mesh like Istio - but I had an interesting problem where I wanted to monitor processes running in an external system.

I cover the easy(ish) way to do this with HTTP services, and look at the benefits of observability in my 5* Pluralsight course Managing Apps on Kubernetes with Istio.

The system is a risk calculation engine. It has a REST API where you submit work and check on progress, but it doesn't expose much useful instrumentation. When we submit a piece of work it goes through several stages, which range in duration from 5 minutes to several hours. In that time we can poll the API for a progress report, but we just get a snapshot of the current status, we don't get an overall picture of the workflow.

I wanted to capture the stages of processing as a tracing graph, so we could build a dashboard with a list of completed processes, and drill down into the details for each. Something like the classic Jaeger view:

Tracing External Processes with Akka.NET and OpenTelemetry: Part 1 (The Code)

Terminology

To make sense of the rest of this post (and the series), some definitions:

  • each job we send to the calculation engine is called a Workflow
  • each Workflow has several stages, represented in the API as a collection of Workflow Entities in the Workflow object

In the real system there are different categories of job, each of which creates a Workflow with a different set of Entities. For this series I'm using a simplified version where very workflow has three Entities which run in sequence:

  • Data Loader, representing the initial setup of data, which typically takes from 2 to 10 minutes
  • Processor, which is the real work and can take from 30 to 240 minutes
  • Output Generator, which transforms the processor output into the required format and can take from 5 to 60 minutes.

I have a dummy API for testing which does nothing but reports on Workflow progress using random durations for each Entity.

Architecture

We've been live with the real system for a while so we have a good understanding of the workload. It's pretty bursty with batches of processing coming in for a few hours, and then going quiet. During the batches we have a fairly small number of workflows, typically under 500. The external system breaks each Processor stage into tens of thousands of tasks (running on Spark), but we're only interested in high-level progress of the Workflow and Entities. We also have a custom-built infrastructure around the external system, to publish events when we submit work, and a backend processor which listens for those events.

So to monitor the processes we need to spin up ~500 watchers which can poll the external system and track workflow progress. The actor model with Akka.NET is a great fit here; I can use one actor for each Workflow - and the Workflow actor in turn manages an actor for each Workflow Entity - and not have to worry about threads, parallelism, timers or managing lifetime. Here's the overall design:

  • register a supervisor process with Akka.NET and listen for "workflow started" event messages (which we already publish to Redis)
  • on receipt of a message, the supervisor creates an actor to monitor that new Workflow
  • each actor polls the external REST API to get the status of the Workflow, and as the stages progress it creates its own actors to monitor the Workflow Entities
  • status updates are recorded in the actors using OpenTelemetry, stopping and starting spans for each Workflow Entity, linked to the overall trace for the Workflow.
I've published a full code sample on GitHub here if you want to see how it all fits together: sixeyed/tracing-external-worflows.

Towards the end of processing, each Workflow monitor actor has had three Entity monitor actors, one for each stage. The Workflow owns the overall trace, and in this example the spans for Data Loader and Processor would be complete, and the span for Output Generator would still be running:

Tracing External Processes with Akka.NET and OpenTelemetry: Part 1 (The Code)

Interesting Bits of Code

In the worker a background service runs which creates the supervisor actor and subscribes to Redis, listening for Workflow started messages. When it gets a message it sends it on to the supervisor:

_supervisor = _actorSystem.ActorOf(Props.Create<TSupervisor>(), ActorCollectionName);

_subscriber = _redis.GetSubscriber();
_subscriber.Subscribe(MessageType, (channel, value) =>
{
  var message = JsonSerializer.Deserialize<TStartedMessage>(value);
  _supervisor.Tell(message);
});

(The work happens in base classes because in the real system we actually have a few types of process we monitor - hence the generics - but in the sample code there's just one type).

When the supervisor gets a "started" message, it spins up a monitor actor to watch the Workflow:

 var id = started.GetId();
 var props = DependencyResolver.For(Context.System).Props<TMonitor>();
 
 var monitor = Context.ActorOf(props, id);
 _monitors.Add(id, monitor);
 monitor.Forward(started);

The monitor is loaded with the DependencyResolver, which connects the .NET Dependency Injection framework to Akka.NET. The monitor uses an Akka.NET periodic timer to trigger polling the external API for updates, and an additional one-off timer is also used as a timeout, so if the Workflow stalls (which can happen) we don't keep watching it forever.

So the Workflow Actor responds to four message types - when the workflow starts, when an update is due, when the update is received and if the timeout fires:

Receive<TStartedMessage>(StartActivity);

ReceiveAsync<MonitorRefresh>(async refresh => await RefreshStatus());

Receive<TUpdatedMessage>(UpdateActivity);

Receive<MonitorTimeout>(_ => Terminate("Monitor timed out"));

When the refresh timer fires, the actor calls the external API to get the current status of the Workflow and its Entities. The client code is generated from the system's OpenAPI spec and then wrapped in services. Those are all registered with standard .NET DI, and every call to the API uses a scoped client:

using (var scope = _serviceProvider.CreateScope())
{
  var workflowService = scope.ServiceProvider.GetRequiredService<WorkflowService>();
  workflow = await workflowService.GetWorkflow(EntityId);
}
_log.Info("Loaded workflow");

Each monitor actor tracks state using an Activity object, which is part of the .NET implementation of OpenTelemetry tracing. The Activity gets started when the actor is created, and updated when there's a status update in the response from polling the API. The status updates include the current stage of the process, and for each stage the workflow monitor actor creates a Workflow Entity actor which has its own Activity linked to the parent Activity:

foreach (var entity in workflow.WorkflowEntities)
{
  var entityType = Enum.Parse<EntityType>(entity.Key);
  if (!_entityMonitors.ContainsKey(entityType))
  {
    var entityMonitor = Context.ActorOf(WorkflowEntityMonitor.Props(entityType, Activity), entity.Key);
    _entityMonitors.Add(entityType, entityMonitor);
  }
}

When the stage completes, the Workflow Entity actor ends the child Activity, ending the span, and sends a message to the workflow monitor actor to say this entity is finished with:

_activity.AddTagIfNew("endTime", entity.EntityEndTime);
if (string.IsNullOrEmpty(entity.EntityErrorMessage))
{
  _activity.SetStatus(ActivityStatusCode.Ok);
}
else
{
  _activity.SetStatus(ActivityStatusCode.Error, entity.EntityErrorMessage);
}

_activity.SetEndTime(entity.EntityEndTime.Value.DateTime);
_activity.Stop();

var ended = new WorkflowEntityEnded(_entityType);
Context.Parent.Tell(ended, Self);

And when all the Entities are done and the whole Workflow is finished, the parent Activity is ended which completes the trace and sends it on to the exporters. In the sample code I've configured the console exporter so traces get published as logs, and the OTLP exporter to send the traces to a real collector so you can visualize them:

Tracing External Processes with Akka.NET and OpenTelemetry: Part 1 (The Code)

In the next post I'll show you how to run the sample app with Docker containers, collecting the traces with Tempo and exploring them with Grafana.

You can't always have Kubernetes: running containers in Azure VM Scale Sets

You can't always have Kubernetes: running containers in Azure VM Scale Sets

Rule number 1 for running containers in production: don't run them on individual Docker servers. You want reliability, scale and automated upgrades and for that you need an orchestrator like Kubernetes, or a managed container platform like Azure Container Instances.

If you're choosing between container platforms, my new Pluralsight course Deploying Containerized Applications walks you through the major options.

But the thing about production is: you've got to get your system running, and real systems have technical constraints. Those constraints might mean you have to forget the rules. This post covers a client project I worked on where my design had to forsake rule number 1, and build a scalable and reliable system based on containers running on VMs.

This post is a mixture of architecture diagrams and scripts - just like the client engagement.

When Kubernetes won't do

I was brought in to design the production deployment, and build out the DevOps pipeline. The system was for provisioning bots which join online meetings. The client had run a successful prototype with a single bot running on a VM in Azure.

The goal was to scale the solution to run multiple bots, with each bot running in a Docker container. In production the system would need to scale quickly, spinning up more containers to join meetings on demand - and more hosts to provide capacity for more containers.

So far, so Kubernetes. Each bot needs to be individually addressable, and the connection from the bot to the meeting server uses mutual TLS. The bot has two communication channels - HTTPS for a REST API, and a direct TCP connection for the data stream from the meeting. That can all be done with Kubernetes - Services with custom ports for each bot, Secrets for the TLS certs, and a public IP address for each node.

If you want to learn how to model an app like that, my book Learn Kubernetes in a Month of Lunches is just the thing for you :)

But... The bot uses a Windows-only library to connect to the meeting, and the bot workload involves a lot of video manipulation. So that brought in the technical constraints for the containers:

  • they need to run with GPU access
  • the app uses the Windows video subsystem, and that needs the full (big!) Windows base Docker image.

Right now you can run GPU workloads in Kubernetes, but only in Linux Pods, and you can run containers with GPUs in in Azure Container Instances, but only for Linux containers. So we're looking at a valid scenario where orchestration and managed container services won't do.

The alternative - Docker containers on Windows VMs in Azure

You can run Docker containers with GPU access on Windows with the devices flag. You need to have your GPU drivers set up and configured, and then your containers will have GPU access (the DirectX Container Sample walks through it all):

# on Windows 10 20H2:
docker run --isolation process --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 sixeyed/winml-runner:20H2

# on Windows Server LTSC 2019:
docker run --isolation process --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 sixeyed/winml-runner:1809

The container also needs to be running with process isolation - see my container show ECS-W4: Isolation and Versioning in Windows Containers on YouTube for more details on that.

Note - we're talking about the standard Docker Engine here. GPU access for containers used to require an Nvidia fork of Docker, but now GPU access is part of the main Docker runtime.

You can spin up Windows VMs with GPUs in Azure, and have Docker already installed using the Windows Server 2019 Datacenter with Containers VM image. And for the scaling requirements, there are Virtual Machine Scale Sets (VMSS), which let you run multiple instances of the same VM image - where each instance can run multiple containers.

The design I sketched out looked like this:

You can't always have Kubernetes: running containers in Azure VM Scale Sets

  • each VM hosts multiple containers, each using custom ports
  • a load balancer spans all the VMs in the scale set
  • load balancer rules are configured for each bot's ports

The idea is to run a minimum number of VMs, providing a stable pool of bot containers. Then we can scale up and add more VMs running more containers as required. Each bot is uniquely addressable within the pool, with a predictable address range, so bots.sixeyed.com:8031 would reach the first container on the third VM and bots.sixeyed.com:8084 would reach the fourth container on the eighth VM.

Using a custom VM image

With this approach the VM is the unit of scale. My assumption was that adding a new VM to provide more bot capacity would take several minutes - too long for a client waiting for a bot to join. So the plan was to run with spare capacity in the bot pool, scaling up the VMSS when the pool of free bots fell below a threshold.

Even so, scaling up to add a new VM had to be a quick operation - not waiting minutes to pull the super-sized Windows base image and extract all the layers. The first step in minmizing scale-up time is to use a custom VM image for the scale set.

A VMSS base image can be set up manually by running a VM and doing whatever you need to do. In this case I could use the Windows Server 2019 image with Docker configured, and then run an Azure extension to install the Nvidia GPU drivers:

# create vm:
az vm create `
  --resource-group $rg `
  --name $vmName `
  --image 'MicrosoftWindowsServer:WindowsServer:2019-Datacenter-Core-with-Containers' `
  --size 'Standard_NC6_Promo' `
  --admin-username $username `
  --admin-password $password

# deploy the nvidia drivers:
az vm extension set `
  --resource-group $rg `
  --vm-name $vmName `
  --name NvidiaGpuDriverWindows `
  --publisher Microsoft.HpcCompute `
  --version 1.3

The additional setup for this particular VM:

Then you can create a private base image from the VM, first deallocating and generalizing it:

az vm deallocate --resource-group $rg --name $vmName

az vm generalize --resource-group $rg --name $vmName

az image create --resource-group $rg `
    --name $imageName --source $vmName

The image can be in its own Resource Group - you can use it for VMSSs in other Resources Groups.

Creating the VM Scale Set

Scripting all the setup with the Azure CLI makes for a nice repeatable process - which you can easily put into a GitHub workflow. The az documentation is excellent and you can build up pretty much any Azure solution using just the CLI.

There are a few nice features you can use with VMSS that simplify the rest of the deployment. This abridged command shows the main details:

az vmss create `
   --image $imageId `
   --subnet $subnetId `
   --public-ip-per-vm `
   --public-ip-address-dns-name $vmssPipDomainName `
   --assign-identity `
  ...

That's going to use my custom base image, and attach the VMs in the scale set to a specific virtual network subnet - so they can connect to other components in the client's backend. Each VM will get its own public IP address, and a custom DNS name will be applied to the public IP address for the load balancer across the set.

The VMs will use managed identity - so they can securely use other Azure resources without passing credentials around. You can use az role assignment create to grant access for the VMSS managed identity to ACR.

When the VMSS is created, you can set up the rules for the load balancer, directing the traffic for each port to a specific bot container. This is what makes each container individually addressable - only one container in the VMSS will listen on a specific port. A health probe in the LB tests for a TCP connection on the port, so only the VM which is running that container will pass the probe and be sent traffic.

# health probe:
az network lb probe create `
 --resource-group $rg --lb-name $lbName `
 -n "p$port" --protocol tcp --port $port

# LB rule:
az network lb rule create `
 --resource-group $rgName --lb-name $lbName `
 --frontend-ip-name loadBalancerFrontEnd `
 --backend-pool-name $backendPoolName `
 --probe-name "p$port" -n "p$port" --protocol Tcp `
 --frontend-port $port --backend-port $port

Spinning up containers on VMSS instances

You can use the Azure VM custom script extension to run a script on a VM, and you can trigger that on all the instances in a VMSS. This is the deployment and upgrade process for the bot containers - run a script which pulls the app image and starts the containers.

Up until now the solution is pretty solid. This script is the ugly part, because we're going to manually spin up the containers using docker run:

docker container run -d `
 -p "$($port):443" `
 --restart always `
 --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 `
 $imageName

The real script adds an env-file for config settings, and the run commands are in a loop so we can dynamically set the number of containers to run on each VM. So what's wrong with this? Nothing is managing the containers. The restart flag means Docker will restart the container if the app crashes, and start the containers if the VM restarts, but that's all the additional reliability we'll get.

In the client's solution, they added functionality to their backend API to manage the containers - but that sounds a lot like writing a custom orchestrator...

Moving on from the script, upgrading the VMSS instances is simple to do. The script and any additional assets - env files and certs - can be uploaded to private blob storage, using SAS tokens for the VM to download. You use JSON configuration for the script extension and you can split out sensitive settings.

# set the script on the VMSS:
az vmss extension set `
    --publisher Microsoft.Compute `
    --version 1.10 `
    --name CustomScriptExtension `
    --resource-group $rg `
    --vmss-name $vmss `
    --settings $settings.Replace('"','\"') `
    --protected-settings $protectedSettings.Replace('"','\"')

# updating all instances triggers the script:
az vmss update-instances `
 --instance-ids * `
 --name $vmss `
 --resource-group $rg

When you apply the custom script extension that updates the model for the VMSS - but it doesn't actually run the script. The next step does that, updating instances runs the script on each of them, replacing the containers with the new Docker image version.

Code and infra workflows

All the Azure scripts can live in a separate GitHub repo, with secrets added for the az authentication, cert passwords and everything else. The upgrade scripts to deploy the custom script extension and update the VMSS instances can sit in a workflow with a workflow_dispatch trigger and input parameters:

on:
  workflow_dispatch:
    inputs:
      environment:
        description: 'Environment to deploy: dev, test or prod'     
        required: true
        default: 'dev'
      imageTag:
        description: 'Image tag to deploy, e.g. v1.0-175'     
        required: true
        default: 'v1.0'

The Dockerfile for the image lives in the source code repo with the rest of the bot code. The workflow in that repo build and pushes the image and ends by triggering the upgrade deployment in the infra repo - using Ben Coleman's benc-uk/workflow-dispatch action:

deploy-dev:  
  if: ${{ github.ref == 'refs/heads/dev' }}
  runs-on: ubuntu-18.04
  needs: build-teams-bot
    steps:
    - name: Dispatch upgrade workflow
      uses: benc-uk/workflow-dispatch@v1
      with:
        workflow: Upgrade bot containers
        repo: org/infra-repo
        token: ${{ secrets.ACCESS_TOKEN }}
        inputs: '{"environment":"dev", "imageTag":"v1.0-${{github.run_number}}"}'
        ref: master

So the final pipeline looks like this:

  • devs push to the main codebase
  • build workflow triggered - uses Docker to compile the code and package the image
  • if the build is successful, that triggers the publish workflow in the infrastructure repo
  • the publish workflow updates the VM script to use the new image label, and deploys it to the Azure VMSS.

I covered GitHub workflows with Docker in ECS-C2: Continuous Deployment with Docker and GitHub on YouTube

Neat and automated for a reliable and scalable deployment. Just don't tell anyone we're running containers on individual servers, instead of using an orchestrator...

Experimenting with .NET 5 and 6 using Docker containers

Experimenting with .NET 5 and 6 using Docker containers

The .NET team publish Docker images for every release of the .NET SDK and runtime. Running .NET in containers is a great way to experiment with a new release or try out an upgrade of an existing project, without deploying any new runtimes onto your machine.

In case you missed it, .NET 5 is the latest version of .NET and it's the end of the ".NET Core" and ".NET Framework" names. .NET Framework ends with 4.8 which is the last supported version. and .NET Core ends with 3.1 - and evolves into straight ".NET". The first release is .NET 5 and the next version - .NET 6 - will be a long-term support release.

If you're new to the SDK/runtime distinction, check my blog post on the .NET Docker images for Windows and Linux.

Run a .NET 5 development environment in a Docker container

You can use the .NET 5.0 SDK image to run a container with all the build and dev tools installed. These are official Microsoft images, published to MCR (the Microsoft Container Registry).

Create a local folder for the source code and mount it inside a container:

mkdir -p /tmp/dotnet-5-docker

docker run -it --rm \
  -p 5000:5000 \
  -v /tmp/dotnet-5-docker:/src \
  mcr.microsoft.com/dotnet/sdk:5.0

All you need to run this command is Docker Desktop on Windows or macOS, or Docker Community Edition on Linux.

Docker will pull the .NET 5.0 SDK image the first time you use it, and start running a container. If you're new to Docker this is what the options mean:

  • -it connects you to an interactive session inside the container
  • -p publishes a network port, so you can send traffic into the container from your machine
  • --rm deletes the container and its storage when you exit the session
  • -v mounts a local folder from your machine into the container filesystem - when you use /src inside the container it's actually using the /tmp/dotnet-5-docker folder on your machine
  • mcr.microsoft.com/dotnet/sdk:5.0 is the full image name for the 5.0 release of the SDK

And this is how it looks:

Experimenting with .NET 5 and 6 using Docker containers

When the container starts you'll drop into a shell session inside the container, which has the .NET 5.0 runtime and developer tools installed. Now you can start playing with .NET 5, using the Docker container to run commands but working with the source code on your local machine.

In the container session, run this to check the version of the SDK:

dotnet --list-sdks

Run a quickstart project

The dotnet new command creates a new project from a template. There are plenty of templates to choose from, we'll start with a nice simple REST service, using ASP.NET WebAPI.

Initialize and run a new project:

# create a WebAPI project without HTTPS or Swagger:
dotnet new webapi \
  -o /src/api \
  --no-openapi --no-https

# configure ASP.NET to listen on port 5000:
export ASPNETCORE_URLS=http://+:5000

# run the new project:
dotnet run \
  --no-launch-profile \
  --project /src/api/api.csproj

When you run this you'll see lots of output from the build process - NuGet packages being restored and the C# project being compiled. The output ends with the ASP.NET runtime showing the address where it's listening for requests.

Now your .NET 5 app is running inside Docker, and because the container has a published port to the host machine, you can browse to http://localhost:5000/weatherforecast on your machine. Docker sends the request into the container, and the ASP.NET app processes it and sends the response.

Package your app into a Docker image

What you have now isn't fit to ship and run in another environment, but it's easy to get there by building your own Docker image to package your app.

I cover the path to production in my Udemy course Docker for .NET Apps

To ship your app you can use this .NET 5 sample Dockerfile to package it up. You'll do this from your host machine, so you can stop the .NET app in the container with Ctrl-C and then run exit to get back to your command line.

Use Docker to publish and package your WebAPI app:

# verify the source code is on your machine: 
ls /tmp/dotnet-5-docker/api

# switch to your local source code folder:
cd /tmp/dotnet-5-docker

# download the sample Dockerfile:
curl -o Dockerfile https://raw.githubusercontent.com/sixeyed/blog/master/dotnet-5-with-docker/Dockerfile

# use Docker to package from source code:
docker build -t dotnet-api:5.0 .

Now you have your own Docker image, with your .NET 5 app packaged and ready to run. You can edit the code on your local machine and repeat the docker build command to package a new version.

Run your app in a new container

The SDK container you ran is gone, but now you have an application image so you can run your app without any additional setup. Your image is configured with the ASP.NET runtime and when you start a container from the image it will run your app.

Start a new container listening on a different port:

# run a container from your .NET 5 API image:
docker run -d -p 8010:80 --name api dotnet-api:5.0

# check the container logs:
docker logs api

In the logs you'll see the usual ASP.NET startup log entries, telling you the app is listening on port 80. That's port 80 inside the container though, which is published to port 8010 on the host.

The container is running in the bckground, waiting for traffic. You can try your app again, running this on the host:

curl http://localhost:8010/weatherforecast

When you're done fetching fictional weather forecasts, you can stop and remove your container with a single command:

docker rm -f api

And if you're done experimenting, you can remove your image and the .NET 5 images:

docker image rm dotnet-api:5.0

docker image rm mcr.microsoft.com/dotnet/sdk:5.0

docker image rm mcr.microsoft.com/dotnet/aspnet:5.0

Now your machine is back to the exact same state before you tried .NET 5.

What about .NET 6?

You can do exactly the same thing for .NET 6, just changing the version number in the image tags. .NET 6 is in preview right now but the 6.0 tag is a moving target which gets updated with each new release (check the .NET SDK repository and the ASP.NET runtime repository on Docker Hub for the full version names).

To try .NET 6 you're going to run this for your dev environment:

mkdir -p /tmp/dotnet-6-docker

docker run -it --rm \
  -p 5000:5000 \
  -v /tmp/dotnet-6-docker:/src \
  mcr.microsoft.com/dotnet/sdk:6.0

Then you can repeat the steps to create a new .NET 6 app and run it inside a container.

And in your Dockerfile you'll use the mcr.microsoft.com/dotnet/sdk:6.0 image for the builder stage and the mcr.microsoft.com/dotnet/aspnet:6.0 image for the final application image.

It's a nice workflow to try out a new major or minor version of .NET with no dependencies (other than Docker). You can even put your docker build command into a GitHub workflow and build and package your app from your cource code repo - check my YouTube show Continuous Deployment with Docker and GitHub for more information on that.

Build Docker images *quickly* with GitHub Actions and a self-hosted runner

Build Docker images *quickly* with GitHub Actions and a self-hosted runner

GitHub Actions is a fantastic workflow engine. Combine it with multi-stage Docker builds and you have a CI process defined in a few lines of YAML, which lives inside your Git repo.

I covered this in an epsiode of my container show - ECS-C2: Continuous Deployment with Docker and GitHub on YouTube

You can use GitHub's own servers (in Azure) to run your workflows - they call them runners and they have Linux and Windows options, with a bunch of software preinstalled (including Docker). There's an allocation of free minutes with your account which means your whole CI (and CD) process can be zero cost.

The downside of using GitHub's runners is that every job starts with a fresh environment. That means no Docker build cache and no pre-pulled images (apart from these Linux base images on the Ubuntu runner and these on Windows). If your Dockerfiles are heavily optimized to use the cache, you'll suddenly lose all that benefit because every run starts with an empty cache.

Speeding up the build farm

You have quite a few options here. Caching Docker builds in GitHub Actions: Which approach is the fastest? 🤔 A research by Thai Pangsakulyanont gives you an excellent overview:

  • using the GitHub Actions cache with BuildKit
  • saving and loading images as TAR files in the Actions cache
  • using a local Docker registry in the build
  • using GitHub's package registry (now GitHub Container Registry).

None of those will work if your base images are huge.

The GitHub Actions cache is only good for 5GB so that's out. Pulling from remote registries will take too long. Image layers are heavily compressed, and when Docker pulls an image it extracts the archive - so gigabytes of pulls will take network transfer time and lots of CPU time (the self-hosted runners only have 2 cores).

This blog walks through the alternative approach, using your own infrastructure to run the build - a self-hosted runner. That's your own VM which you'll reuse for every build. You can pre-pull whatever SDK and runtime images you need and they'll always be there, and you get the Docker build cache optimizations without any funky setup.

Self-hosted runners are particularly useful for Windows apps, but the approach is the same for Linux. I dug into this when I was building out a Dockerized CI process for a client, and every build was taking 45 minutes...

Create a self-hosted runner

This is all surprisingly easy. You don't need any special ports open in your VM or a fixed IP address. The GitHub docs to create a self-hosted runner explain it all nicely, the approach is basically:

  • create your VM
  • follow the scripts in your GitHub repo to deploy the runner
  • as part of the setup, you'll configure the runner as a daemon (or Windows Service) so it's always available.

In the Settings...Actions section of your repo on GitHub you'll find the option to add a runner. GitHub supports cross-platform runners, so you can deploy to Windows or macOS on Intel, and Linux on Intel or Arm:

Build Docker images *quickly* with GitHub Actions and a self-hosted runner

That's all straightforward, but you don't want a VM running 24x7 to provide a CI service you'll only use when code gets pushed, so here's the good part: you'll start and stop your VM as part of the GitHub workflow.

Managing the VM in the workflow

My self-hosted runner is an Azure VM. In Azure you only pay for the compute when your VM is running, and you can easily start and stop VMs with az, the Azure command line:

# start the VM:
az start -g ci-resource-group -n runner-vm

# deallocate the VM - deallocation means the VM stops and we're not charged for compute:
az deallocate-g ci-resource-group -n runner-vm

It's easy enough to add those start and stop steps in your workflow. You can map dependencies so the build step won't happen until the runner has been started. So your GitHub action will have three jobs:

  • job 1 - on GitHub's hosted runner - start the VM for the self-hosted runner
  • job 2 - on the self-hosted runner - execute your super-fast Docker build
  • job 3 - on GitHub's hosted runner - stop the VM

You'll need to create a Service Principal and save the credentials as a GitHub secret so you can log in with the Azure Login action.

The full workflow looks something like this:

name: optimized Docker build

on:
  push:
    paths:
      - "docker/**"
      - "src/**"
      - ".github/workflows/build.yaml"
  schedule:
    - cron: "0 5 * * *"
  workflow_dispatch:

jobs:
  start-runner:
    runs-on: ubuntu-18.04
    steps:
      - name: Login 
        uses: azure/login@v1
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}     
      - name: Start self-hosted runner
        run: |
          az vm start -g ci-rg -n ci-runner

  build:
    runs-on: [self-hosted, docker]
    needs: start-runner
    steps:
      - uses: actions/checkout@master   
      - name: Build images   
        working-directory: docker/base
        run: |
          docker-compose build --pull 
          
  stop-runner:
    runs-on: ubuntu-18.04
    needs: build
    steps:
      - name: Login 
        uses: azure/login@v1
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}
      - name: Deallocate self-hosted runner
        run: |
          az vm deallocate -g ci-rg -n ci-runner --no-wait

Here are the notable points:

  • an on-push trigger with path filters, so the workflow will run when a push has a change to source code, or the Docker artifacts or the workflow definition

  • a scheduled trigger so the build runs every day. You should definitely do this with Dockerized builds. SDK and runtime image updates could fail your build, and you want to know that ASAP

  • the build job won't be queued until the start-runner job has finished. It will stay queued until your runner comes online - even if it takes a minute or so for the runner daemon to start. As soon as the runner starts, the build step runs.

Improvement and cost

This build was for a Windows app that uses the graphics subsystem so it needs the full Windows Docker image. That's a big one, so the jobs were taking 45-60 minutes to run every time - no performance advantage from all my best-practice Dockerfile optimization.

With the self-hosted runner, repeat builds take 9-10 minutes. Starting the VM takes 1-2 minutes, and the build stage takes around 5 minutes. If we run 10 builds a day, we'll only be billed for 1 hour of VM compute time.

Your mileage may vary.

Understanding Microsoft's Docker Images for .NET Apps

Understanding Microsoft's Docker Images for .NET Apps

To run .NET apps in containers you need to have the .NET Framework or .NET Core runtime installed in the container image. That's not something you need to manage yourself, because Microsoft provide Docker images with the runtimes already installed, and you'll use those as the base image to package your own apps.

There are several variations of .NET images, covering different versions and different runtimes. This is your guide to picking the right image for your applications.

I cover this in plenty of detail in my Udemy course Docker for .NET Apps

Using a Base Image

Your app has a bunch of pre-requisites it needs to run, things like an operating system and the language runtime. Typically the owners of the platform package an image with all the pre-reqs installed and publish it on Docker Hub - you'll see Go, Node.js, Java etc. all as official images.

Microsoft do the same for .NET apps, so you can use one of their images as the base image for your container images. They're regularly updated so you can patch your images just by rebuilding them using the latest Microsoft image.

The Docker images for .NET apps are hosted on Microsoft's own container registry, mcr.microsoft.com, but they're still listed on Docker Hub, so that's where you'll go to find them:

Those are umbrella pages which list lots of different variants of the .NET images, splitting them between SDK images and runtime images.

Runtime and SDK Images

You can package .NET apps using a runtime image with a simple Dockerfile like this:

FROM mcr.microsoft.com/dotnet/framework/aspnet:4.8
SHELL ["powershell"]

COPY app.msi /
RUN Start-Process msiexec.exe -ArgumentList '/i', 'C:\app.msi', '/quiet', '/norestart' -NoNewWindow -Wait

(see the full ASP.NET 4.8 app Dockerfile on GitHub).

That's an easy way to get into Docker, taking an existing deployment package (an MSI installer in this case) and installing it using a PowerShell command running in the container.

This example uses the ASP.NET 4.8 base image, so the image you build from this Dockerfile:

  • has IIS, .NET Framework 4.8 and ASP.NET already configured
  • deploys your app from the MSI, which hopefully is an ASP.NET app
  • requires you to have an existing process to create the MSI.

It's a simple approach but its problematic because the Dockerfile is the packaging format and it should have all the details about the deployment, but all the installation steps are hidden in the MSI - which is a redundant additional artifact.

Instead you can compile the app from source code using Docker, which is where the SDK images come in. Those SDK images have all the build tools for your apps: MSBuild and NuGet or the dotnet CLI. You use them in a multi-stage Docker build, where stage 1 compiles from source and stage 2 packages the compiled build from stage 1:

# the build stage uses the SDK image:
FROM mcr.microsoft.com/dotnet/core/sdk:3.1 as builder
COPY src /src
RUN dotnet publish -c Release -o /out app.csproj

# the final app uses the runtime image:
FROM mcr.microsoft.com/dotnet/core/aspnet:3.1
COPY --from=builder /out/ .
ENTRYPOINT ["dotnet", "app.dll"]

(see the full ASP.NET Core app Dockerfile on GitHub).

This approach is much better because:

  • the whole build is portable, you just need Docker and the source code to build and run the app, you don't need any .NET SDKs or runtimes installed on your machine
  • your Dockerfile is the deployment script, every step is clear and it's in one place with no additional deployment artifacts
  • your final image has all the runtime pre-reqs it needs, but none of the extra tools - MSBuild etc. are only used in the builder stage

I show you how to use GitHub actions with multi-stage Docker builds in my YouTube show ECS-C2: Continuous Deployment with Docker and GitHub.

There are still lots of variants of the .NET Docker images, so the next job is to work out which ones to use for different apps.

Docker Images for .NET Framework Apps

.NET Framework apps are the simplest because they only run on Windows, and they need the full Windows Server Core feature set (you can't run .NET fx apps on the minimal Nano Server OS). You'll use these for any .NET Framework apps you want to containerize - you can run them using Windows containers in Docker, Docker Swarm and Kubernetes.

All the current .NET Framework Docker images use mcr.microsoft.com/windows/servercore:lts2019 as the base image - that's the latest long-term support version of Windows Server Core 2019. Then the .NET images extend from the base Windows image in a hierarchy:

Understanding Microsoft's Docker Images for .NET Apps

The Docker image names are shortened in that graphic, they're all hosted on MCR so they all need to be prefixed with mcr.microsoft.com/. The tag for each is the latest release, so that's a moving target - the :ltsc2019 Windows image is updated every month with new OS patches, so if you use that in your FROM instruction you'll always get the current release.

Microsoft also publish images with more specific tags, so you can pin to a particular release and you know that image won't change in the future. The .NET 4.8 SDK image was updated last year to include .NET 5 updates, and that broke some builds - so you could use mcr.microsoft.com/dotnet/framework/sdk:4.8-20201013-windowsservercore-ltsc2019 in your builder stage, which is pinned to the version before the change.

Here's how you'll choose between the images:

  • windows/servercore:lts2019 comes with .NET 4.7, so you can use it for .NET Console apps, but not ASP.NET or .NET Core apps;
  • dotnet/framework/runtime:4.8 has the last supported version of .NET Framework which you can use to run containerized console apps;
  • dotnet/framework/sdk:4.8 has MSBuild, NuGet and all the targeting packs installed, so you should be able to build pretty much any .NET Framework app - you'll use this in the builder stage only;
  • dotnet/framework/aspnet:4.8 has ASP.NET 4.8 installed and configured with IIS, so you can use it for any web apps - WebForms, MVC, Web API etc.

There's also dotnet/framework/wcf:4.8 for running WCF apps. All the Dockerfiles for those images are on GitHub at microsoft/dotnet-framework-docker in the src folder, and there are also a whole bunch of .NET Framework Docker sample apps.

Those images have the 4.x runtime installed, so they can run most .NET Framework apps - everything from 1.x to 4.x but not 3.5. The 3.5 runtime adds another gigabyte or so and it's only needed for some apps, so they have their own set of images:

  • dotnet/framework/runtime:3.5
  • dotnet/framework/sdk:3.5
  • dotnet/framework/aspnet:3.5

Docker Images for .NET Core Apps

.NET Core gets a bit more complicated, because it's a cross-platform framework with different images available for Windows and Linux containers. You'll use the Linux variants as a preference because they're leaner and you don't need to pay OS licences for the host machine.

If you're not sure on the difference with Docker on Windows and Linux, check out ECS-W1: We Need to Talk About Windows Containers on YouTube or enrol on Docker for .NET Apps on Udemy.

The Linux variants are derived from Debian, and they use a similar hierarchical build approach and have the same naming standards as the .NET Framework images:

Understanding Microsoft's Docker Images for .NET Apps

Again those image names need to be prefixed with mcr.microsoft.com/, and the tags are for the latest LTS release so they're moving targets - right now aspnet:3.1 is an alias for aspnet:3.1.11, but next month the same 3.1 tag will be used for an updated release.

  • dotnet/core/runtime:3.1 has the .NET Core runtime, so you can use it for console apps;
  • dotnet/core/sdk:3.1 has the SDK installed so you'll use it in the builder stage to compile .NET Core apps;
  • dotnet/core/aspnet:3.1 has ASP.NET Core 3.1 installed, so you can use it to run web apps (they're still console apps in .NET Core, but the web runtime has extra dependencies).

.NET Core 3.1 will be supported until December 2022; 2.1 is also an LTS release with support until August 2021, and there are images available for the 2.1 runtime using the same image names and the tag :2.1. You'll find all the Dockerfiles and some sample apps on GitHub in the dotnet/dotnet-docker repo.

There are also Alpine Linux variants of the .NET Core images, which are smaller and leaner still. If you're building images to run on Linux and you're not interested in cross-platform running on Windows, these are preferable - but some dependencies don't work correctly in Alpine (Sqlite is one), so you'll need to test your apps:

  • dotnet/core/runtime:3.1-alpine
  • dotnet/core/sdk:3.1-alpine
  • dotnet/core/aspnet:3.1-alpine

If you do want to build images for Linux and Windows from the same source code and the same Dockerfiles, stick with the generic :3.1 tags - these are multi-architecture images, so there are versions published for Linux, Windows, Intel and Arm 64.

The Windows variants are all based on Nano Server:

Understanding Microsoft's Docker Images for .NET Apps

Note that they have the same image names - with multi-architecture images Docker will pull the correct version to match the OS and CPU you're using. You can check all the available variants by looking at the manifest (you need experimental features enabled in the Docker CLI for this):

docker manifest inspect mcr.microsoft.com/dotnet/core/runtime:3.1

You'll see a chunk of JSON in the response, which includes details of all the variants - here's a trimmed version:

"manifests": [
      {        
         "digest": "sha256:6c67be...",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "digest": "sha256:d50e61...",
         "platform": {
            "architecture": "arm64",
            "os": "linux",
            "variant": "v8"
         }
      },
      {
         "digest": "sha256:3eb5f6...",
         "platform": {
            "architecture": "amd64",
            "os": "windows",
            "os.version": "10.0.17763.1697"
         }
      },
      {
         "digest": "sha256:4d53d2d...",
         "platform": {
            "architecture": "amd64",
            "os": "windows",
            "os.version": "10.0.18363.1316"
         }
      }
]

You can see in there that the single image tag dotnet/core/runtime:3.1 has image variants available for Linux on Intel, Linux on Arm and multiple versions of Windows on Intel. As long as you keep your Dockerfiles generic - and don't include OS-specific commands in RUN instructions - you can build your own multi-arch .NET Core apps based on Microsoft's images.

Going Forward - Docker Images for .NET 5

.NET 5 is the new evolution of .NET Core, and there are Docker images for the usual variants on MCR:

  • dotnet/runtime:5.0
  • dotnet/sdk:5.0
  • dotnet/aspnet:5.0

Note that "core" has been dropped from the image name - there's more information in this issue .NET Docker Repo Name Change.

Migrating .NET Core apps to .NET 5 should be a simple change, but remember that 5 is not an LTS version - you'll need to wait for .NET 6, which is LTS (see Microsoft's .NET Core and .NET 5 Support Policy.

Learn Docker in a Month: your week 4 guide

Learn Docker in a Month: your week 4 guide

The YouTube series of my book Learn Docker in a Month of Lunches is all done! The final five episodes dig into some more advanced topics which are essential in your container journey, with the theme: getting your containers ready for production.

The whole series is on the Learn Docker in a Month of Lunches playlist and you can find out about the book at the DIAMOL homepage

Episode 16: Optimizing your Docker images for size, speed and security

It's easy to get started with Docker, packaging your apps into images using basic Dockerfiles. But you really need a good understanding of the best practices to safe yourself from trouble later on.

Docker images are composed of multiple layers, and layers can be cached and shared between images. That's what makes container images so lightweight - similar apps can share all the common layers. Knowing how the cache works and how to make the best use of it speeds up your build times and reduces image size.

Smaller images mean faster network transfers and less disk usage, but they have a bigger impact too. The space you save is typically from removing software your apps don't actually need to run, and that reduces the attack surface for your application in production - here's how optimization counts:

Learn Docker in a Month: your week 4 guide

This episode covers all those with recommendations for using multi-stage Dockerfiles to optimize your builds and your runtime images.

Episode 17: Application configuration management in containers

Your container images should be generic - you should run the same image in every environment. The image is the packaging format and one of the main advantages of Docker is that you can be certain the app you deploy to production will work in the same way as the test environment, because it has the exact same set of binaries in the image.

Images are built in the CI process and then deployed by running containers from the image in the test environments and then onto production. Every environment uses the same image, so to allow for different setups in each environment your application needs to be able to read configuration from the container environment.

Docker creates that environment and you can set configuration using environment variables or files. Your application needs to look for settings in known locations and then you can provide those settings in your Dockerfile and container run commands. The typical approach is to use a hierarchy of config sources, which can be set by the container platform and read by the app:

Learn Docker in a Month: your week 4 guide

Episode 17 walks through different variations of that config hierarchy in Docker, using examples in Node.js with node-config, Go with Viper and the standard config systems in .NET Core and Java Spring Boot.

Episode 18: Writing and managing application logs with Docker

Docker adds a consistent management layer to all your apps - you don't need to know what tech stack they use or how they're configured to know that you start them with docker run and you can monitor them with docker top and docker logs. For that to work, your app needs to fit with the conventions Docker expects.

Container logs are collected from the standard output and standard error streams of the startup process (the one in the CMD or ENTRYPOINT instruction). Modern app platforms run as foreground processes which fits neatly with Docker's expectations. Older apps might write to a different log sink which means you need to relay logs from a file (or other source) to standard out.

You can do that in your Dockerfile without any changes to your application which means old and new apps behave in the same way when they're running in containers:

Learn Docker in a Month: your week 4 guide

This episode shows you how to get logs out from your apps into containers, and then collect those logs from Docker and forward them to a central system for storage and searching - using the EFK stack (Elasticsearch, Fluentd and Kibana).

Episode 19: Controlling HTTP traffic to containers with a reverse proxy

The series ends with a couple of more in-depth topics which will help you understand how your application architecture might look as you migrate more apps to containers. The first is managing network traffic using a reverse proxy.

A reverse proxy runs in a container and publishes ports to Docker. It's the only publicly-accessible component, all your other containers are internal and can only be reached by other containers on the same Docker network. The reverse proxy receives all incoming traffic and fetches the content from the application container:

Learn Docker in a Month: your week 4 guide

Reverse proxies can do a lot of work for you - SSL termination, response caching, sticky sessions - and we see them all in this episode. The demos use two of the most popular technologies in this space, Nginx and Traefik and helps you to evaluate them.

Episode 20: Asynchronous communication with a message queue

This is one of my favourite topics. Message queues let components of your apps communicate asynchronously - decoupling the consumer and the service. It's a great way to add reliability and scale to your architecture, but it used to be complex and expensive before Docker.

Now you can run an enterprise-grade message queue like NATS in a container with minimal effort and start moving your apps to a modern event-driven approach. With a message queue in place you can have multiple features triggering in response to events being created:

Learn Docker in a Month: your week 4 guide

This is an enabler for all sorts of patterns, and episode 20 walks you through a few of them: decoupling a web app from the database to increase scale and adding new features without changing the existing application.

This episode also covers Chapter 22 of the book, with some tips on helping you to gain adoption for Docker in your organization.

And next... Elton's Container Show (ECS)

That's all for the book serialization. I'll do the same thing when my new book Learn Kubernetes in a Month of Lunches gets released - it's in the finishing stages now and you can read all the chapters online.

In the meantime I have a new YouTube show all about containers called... Elton's Container Show. It runs once a week and each month I'll focus on a particular topic. The first topic is Windows containers and then I'll move on to orchestration.

You'll find all the info here at https://eltons.show and the first episode is ECS-W1: We Need to Talk About Windows Containers.

Hope to see you there :)

LEARN DOCKER IN ONE MONTH! Your week 3 guide.

LEARN DOCKER IN ONE MONTH! Your week 3 guide.

My YouTube series to help you learn Docker continued this week with five more episodes. The theme for the week is running at scale with a container orchestrator.

You can find all the episodes on the Learn Docker in a Month of Lunches playlist, and more details about the book at https://diamol.net

Episode 11: Understanding Orchestration - Docker Swarm and Kubernetes

Orchestration is how you run containers at scale in a production environment. You join together a bunch of servers which have Docker running - that's called the cluster. Then you install orchestration software to manage containers for you. When you deploy an application you send a description of the desired state of your app to the cluster. The orchestrator creates containers to run your app and makes sure they keep running even if there are problems with individual servers.

The most common container orchestrators are Docker Swarm and Kubernetes. They have very different ways of modelling your applications and different feature sets, but they work in broadly the same way. They manage the containers running on the server and they expose an API endpoint you use for deployment and administration:

LEARN DOCKER IN ONE MONTH! Your week 3 guide.

This episode walks through the main features of orchestration, like high availability and scale, and the abstractions they provide for compute, networking and storage. The exercises all use Docker Swarm which is very simple to set up - it's a one-line command once you have Docker installed:

docker swarm init

And that's it :)

Swarm uses the Docker Compose specification to model applications so it's very simple to get started. In the episode I compare Swarm and Kubernetes and suggest starting with Swarm - even if you plan to use Kubernetes some day. The learning curve for Swarm is much smoother than Kubernetes, and once you know Swarm you'll have a good understanding of orchestration which will help you learn Kubernetes (although you'll need a good book to help you, something like Learn Kubernetes in a Month of Lunches).

Episode 12: Deploying Distributed Apps as Stacks in Docker Swarm

Container orchestrators use a distributed database to store application definitions, and your deployments can include custom data for your application - which you can use for configuration settings. That lets you use the same container images which have passed all your automated and manual tests in your production environment, but with production settings applied.

Promoting the same container image up through environments is how you guarantee your production deployment is using the exact same binaries you've approved in testing. The image contains the whole application stack so there's no danger of the deployment failing on missing dependencies or version mismatches.

To support different behaviour in production you create your config objects in the cluster and reference them in your application manifest (the YAML file which is in Docker Compose format if you use Swarm, or Kubernetes' own format). When the orchestrator creates a container which references a config object it surfaces the content of the object as a files in the container filesystem - as in this sample Docker Swarm manifest:

todo-web:
  image: diamol/ch06-todo-list
  ports:
    - 8080:80
  configs:
    - source: todo-list-config
      target: /app/config/config.json

Config objects are used for app configuration and secrets are used in the same way, but for sensitive data. The episode shows you how to use them and includes other considerations for deploying apps in Swarm mode - including setting compute limits on the containers and persisting data in volumes.

Episode 13: Automating Releases with Upgrades and Rollbacks

One of the goals of orchestration is to have the cluster manage the application for you, and that includes providing zero-downtime updates. Swarm and Kubernetes both provide automated rollouts when you upgrade your applications. The containers running your app are gradually updated, with the ones running the old version replaced with new containers running the new version.

During the rollout the new containers are monitored to make sure they're healthy. If there are any issues then the rollout can be stopped or automatically rolled back to the previous version. The episode walks through several updates and rollbacks, demonstrating all the different configurations you can apply to control the process. This is the overall process you'll see when you watch:

LEARN DOCKER IN ONE MONTH! Your week 3 guide.

You can configure all the aspects of the rollout - how many containers are started, whether the new ones are started first or the old ones are removed first, how long to monitor the health of new containers and what to do if they're not healthy. You need a pretty good understanding of all the options so you can plan your rollouts and know how they'll behave if there's a problem.

Episode 14: Configuring Docker for Secure Remote Access and CI/CD

The Docker command line doesn't do much by itself - it just sends instructions to the Docker API which is running on your machine. The API is part of the Docker Engine, which is what runs and manages containers. You can expose the API to make it remotely available, which means you can manage your Docker servers in the cloud from the Docker command line running on your laptop.

There are good and bad ways to expose the API, and this episode covers the different approaches - including secure access using SSH and TLS. You'll also learn how to use remote machines as Docker Contexts so you can easily apply your credentials and switch between machines with commands like this:

# create a context using TLS certs:
docker context create x--docker "host=tcp://x,ca=/certs/ca.pem,cert=/certs/client-cert.pem,key=/certs/client-key.pem"

# for SSH it would be:
docker context create y --docker "host=ssh://user@y

# connect:
docker context use x

# now this will list containers running on server x:
docker ps

You'll also see why using environment variables is preferable to docker context use...

Remote access is how you enable the Continuous Deployment part of the CI/CD pipeline. This episode uses Play with Docker, an online Docker playground, as a remote target for a deployment running in Jenkins on a local container. It's a pretty slick exercise (if I say so myself), which you can try out in chapter 15 of Learn Docker in a Month of Lunches.

This feature-packed episode ends with an overview of the access model in Docker, and explains why you need to carefully control who has access to your machines.

Episode 15: Building Docker Images that Run Anywhere: Linux, Windows, Intel & Arm

Every exercise in the book uses Docker images which are built to run on any of the main computing architectures - Windows and Linux operating systems, Intel and Arm CPUs - so you can follow along whatever computer you're using. In this episode you'll find out how that works, with multi-architecture images. A multi-arch image is effectively one image tag which has multiple variants:

LEARN DOCKER IN ONE MONTH! Your week 3 guide.

There are two ways to create multi-arch images: build and push all the variants yourself and then push a manifest to the registry which describes the variants, or have the new Docker buildx plugin do it all for you. The episode covers both options with lots of examples and shows you the benefits and limitations of each.

You'll also learn why multi-arch images are important (example: you could cut your cloud bills almost in half by running containers on Arm instead of Intel on AWS), and the Dockerfile best practices for supporting multi-arch builds.

Coming next

Week 3 covered orchestration and some best practices for production deployments. Week 4 is the final week and the theme is production readiness. You'll learn everything you need to take a Docker proof-of-concept project into production, including optimizing your images, managing app configuration and logging, and controlling traffic to your application with a reverse proxy.

The live stream is running through September 2020 and kicks off on my YouTube channel weekdays at 19:00 UTC. The episodes are available to watch on demand as soon as the session ends.

Hope you can join me on the final leg of your journey to learn Docker in one month :)

Learn Docker in *ONE MONTH*. Your guide to week 2.

Learn Docker in *ONE MONTH*. Your guide to week 2.

I've added five more episodes to my YouTube series Learn Docker in a Month of Lunches. You can find the overview at https://diamol.net and the theme for week 2 is:

Running distributed applications in containers

This follows a tried-and-trusted path for learning Docker which I've used for workshops and training sessions over many years. Week 1 is all about getting used to Docker and the key container concepts. Week 2 is about understanding what Docker enables you to do, focusing on multi-container apps with Docker Compose.

Episode 6: Running multi-container apps with Docker Compose

Docker Compose is a tool for modelling and managing containers. You model your apps in a YAML file and use the Compose command line to start and stop the whole app or individual components.

We start with a nice simple docker-compose.yml file which models a single container plugged into a Docker network called nat:

version: '3.7'

services:
  
  todo-web:
    image: diamol/ch06-todo-list
    ports:
      - "8020:80"
    networks:
      - app-net

networks:
  app-net:
    external:
      name: nat

The services section defines a single component called todo-web which will run in a container. The configuration for the service includes the container image to use, the ports to publish and the Docker network to connect to.

Docker Compose files effectively capture all the options you would put in a docker run command, but using a declarative approach. When you deploy apps with Compose it uses the spec in the YAML file as the desired state. It looks at the current state of what's running in Docker and creates/updates/removes objects (like containers or networks) to get to the desired state.

Here's how to run that app with Docker Compose:

# in this example the network needs to exist first:
docker network create nat

# compose will create the container:
docker-compose up

In the episode you'll see how to build on that, defining distributed apps which run across multiple containers in a single Compose file and exploring the commands to manage them.

You'll also learn how you can inject configuration settings into containerized apps using Compose, and you'll understand the limitations of what Compose can do.

Episode 7: Supporting reliability with health checks and dependency checks

Running your apps in containers unlocks new possibilities, like scaling up and down on demand and spreading your workload across a highly-available cluster of machines. But a distributed architecture introduces new failure modes too, like slow connections and timeouts from unresponsive services.

Docker lets you build reliability into your container images so the platform you use can understand if your applications are healthy and take corrective action if they're not. That gets you on the path to self-healing applications, which manage themselves through transient failures.

The first part of this is the Docker HEALTHCHECK instruction which lets you configure Docker to test if your application inside the container is healthy - here's the simplest example in a Dockerfile:

# builder stage omitted in this snippet
FROM diamol/dotnet-aspnet

ENTRYPOINT ["dotnet", "/app/Numbers.Api.dll"]
HEALTHCHECK CMD curl --fail http://localhost/health

WORKDIR /app
COPY --from=builder /out/ .

This is a basic example which uses curl - I've already written about why it's a bad idea to use curl for container healthchecks and you'll see in this episode the better practice of using the application runtime for your healthcheck.

When a container image has a healthcheck specified, Docker runs the command inside the container to see if the application is healthy. If it's unhealthy for a number of successive checks (the default is three) the Docker API raises an event. Then the container platform can take corrective action like restarting the container or removing it and replacing it.

This episode also covers dependency checks, which you can use in your CMD or ENTRYPOINT instruction to verify your app has all the dependencies it needs before it starts. This is useful in scenarios where components can't do anything useful if they're missing dependencies - but without the check it the container would start and it would look as if everything was OK.

Episode 8: Adding observability with containerized monitoring

Healthchecks and dependency checks get you a good way to reliability, but you also need to see what's going on inside your containers for situations where things go wrong in unexpected ways.

One of the big issues for ops teams moving from VMs to containers is going from a fairly static environment with a known set of machines to monitor, to a dynamic environment where containers appear and disappear all the time.

This episode introduces the typical monitoring stack for containerized apps using Prometheus. In this architecture all your containers expose metrics in an HTTP endpoint, as do your Docker servers. Prometheus runs in a container too and it collects those metrics and stores them in a time-series database.

Learn Docker in *ONE MONTH*. Your guide to week 2.

You need to add metrics to your app using a Prometheus client library, which will provide a set of runtime metrics (like memory and CPU usage) for free. The client library also gives you a simple way to capture your own metrics.

The demo apps for this module have components in .NET, Go, Java and Node.js so you can see how to use client libraries in different languages and wire them up to Prometheus.

You'll learn how to run a monitoring solution in containers alongside your application, all modelled in Docker Compose. One of the great benefits of containerized monitoring is that you can run the same tools in every environment - so developers can use the same Grafana dashboard that ops use in production.

Episode 9: Running multiple environments with Docker Compose

Docker is great for density - running lots of containers on very little hardware. You particularly see that for non-production environments where you don't need high availability and you don't have a lot of traffic to deal with.

This episode shows you how to run multiple environments - different configurations of the same application - on a single server. It covers more advanced Compose topics like override files and extension fields.

You'll also learn how to apply configuration to your apps with different approaches in the Compose file, like this docker-compose.yml example:

version: "3.7"

services:
  todo-web:
    ports:
      - 8089:80
    environment:
      - Database:Provider=Sqlite
    env_file:
      - ./config/logging.debug.env

secrets:
  todo-db-connection:
    file: ./config/empty.json

The episode has lots of examples of how you can use Compose to model different configurations of the same application, while keeping your Compose files clean and easy to manage.

Episode 10: Building and testing applications with Docker and Docker Compose

Containers make it easy to build a Continuous Integration pipeline where every component runs in Docker and you can dispense with build servers that need careful nurturing.

This epsiode shows you how to build a simple pipeline using techniques you learned in earlier episodes - like multi-stage Dockerfiles - to keep your CI process portable and maintainable.

You'll see how to run a complete build infrastructure in containers, using Gogs as the Git server, Jenkins to trigger the builds, and a local Docker registry in a container. The exercises focus on the patterns rather than the individual tools, so all the setup is done for you.

The easy way to keep your pipeline definitions clean is to use Docker Compose to model the build workflow as well as the runtime spec. This docker-compose-build.yml file is an override file which isolates the build settings, and uses variables and extension fields to reduce duplication:

version: "3.7"

x-args: &args
  args:
    BUILD_NUMBER: ${BUILD_NUMBER:-0}
    BUILD_TAG: ${BUILD_TAG:-local}

services:
  numbers-api:
    build:
      context: numbers
      dockerfile: numbers-api/Dockerfile.v4
      <<: *args

  numbers-web:
    build:
      context: numbers
      dockerfile: numbers-web/Dockerfile.v4
      <<: *args

Of course you're more likely to use managed services like GitHub and Azure DevOps, but the principle is the same - keep all the logic in your Dockerfiles and your Docker Compose files, and all you need from your service provider is Docker. That makes it super easy to migrate between providers without rewriting all your build scripts.

This episode also covers the secure software supply chain, extending your pipeline to include security scanning and signing so you can be sure the containers you run in production are safe.

Coming next

Week 2 covered multi-container apps, and in week 3 we move on to orchestration. We'll use Docker Swarm which is the production-grade orchestrator built into Docker. It's simpler than Kubernetes (which needs it's own series - Learn Kubernetes in a Month of Lunches will be televised in 2021), and it uses the now-familiar Docker Compose specification to model apps.

You can always find the upcoming episode at diamol.net/stream and there are often book giveaways at diamol.net/giveaway.

The live stream is running through September 2020 and kicks off on Elton Stoneman's YouTube channel weekdays at 19:00 UTC. The episodes are available to watch on demand as soon as the session ends.

Hope you can join me and continue to make progress in your Docker journey :)

Learn Docker in one month! Your guide to week 1

Learn Docker in one month! Your guide to week 1

I'm streaming every chapter of my new book Learn Docker in a Month of Lunches on YouTube, and the first week's episodes are out now.

Here's the Learn Docker in a Month of Lunches playlist.

The book is aimed at new and improving Docker users, it starts from the basics - with best practices built in - and moves on to more advanced topics like production readiness, orchestration, observability and HTTP routing.

It's a hands-on introduction to Docker, and the learning path is one I've honed from teaching Docker and Kubernetes at conference workshops and at clients for many years. Every exercise is built to work on Mac, Windows and Arm machines so you can follow along with whatever tech you like.

Episode 1: Understanding Docker and running Hello, World

You start by learning what a container is - a virtualized environment around the processes which make up an application. The container shares the OS kernel of the machine it's running on, which makes Docker super efficient and lightweight.

The very first exercise gets you to run a simple app in a container to see what the virtual environment looks like (all you need to follow along is Docker):

docker container run diamol/ch02-hello-diamol

That container just prints some information and exits. In the rest of the episode (which covers chapters 1 & 2 of the book), you'll learn about different ways to run containers, and how containers are different from other types of virtualization.

Episode 2: Building your own Docker images

You package your application into an image so you can run it in containers. All the exercises so far use images which I've already built, and this chapter introduces the Dockerfile syntax and shows you how to build your own images.

An important best practice is to make your container images portable - so in production you use the exact same Docker image that you've tested and approved in other environments. That means no gaps in the release, the deployment is the same set of binaries that you've successfully deployed in test.

Portable images need to be able to read configuration from the environment, so you can tweak the behaviour of your apps even though the image is the same. You'll run an exercise like this which shows you how to inject configuration settings using environment variables:

docker container run --env TARGET=google.com diamol/ch03-web-ping

Watch the episode to learn how that works, and to understand how images are stored as layers. That affects build speeds, image size and the security profile of your app, so it's fundamental to understanding about image optimization.

Episode 3: Packaging apps from source code into Docker images

The Dockerfile syntax is pretty simple and you can use it to copy binaries from your machine into the container image, or download and extract archives from a web address.

But things get more interesting with multi-stage Dockerfiles which you can use to compile applications in from source code using Docker. The exercises in this chapter use Go, Java and Node.js - and you don't need any of those runtimes installed on your machine because all the tools run inside containers.

Here's a sample Dockerfile for a Java app built with Maven:

FROM diamol/maven AS builder

WORKDIR /usr/src/iotd
COPY pom.xml .
RUN mvn -B dependency:go-offline

COPY . .
RUN mvn package

# app
FROM diamol/openjdk

WORKDIR /app
COPY --from=builder /usr/src/iotd/target/iotd-service-0.1.0.jar .

EXPOSE 80
ENTRYPOINT ["java", "-jar", "/app/iotd-service-0.1.0.jar"]

All the tools to download libraries, compile and package the app are in the SDK image - using Maven in this example. The final image is based on a much smaller image with just the Java runtime installed and none of the additional tools.

This approach is supported in all the major languages and it effectively means you can use Docker as your build server and everyone in the team has the exact same toolset because everyone uses the same images.

Episode 4: Sharing images with Docker Hub and other registries

Building your own images means you can run your apps in containers, but if you want to make them available to other people you need to share them on a registry like Docker Hub.

This chapter teaches you about image references and how you can use tags to version your applications. If you've only ever used the latest tag then you should watch this one to understand why that's a moving target and explicit version tags are a much better approach.

You'll push images to Docker Hub in the exercises (you can sign up for a free account with generous usage levels) and you'll also learn how to run your own registry server in a container with a simple command like this:

docker container run -d -p 5000:5000 --restart always diamol/registry

It's usually better to use a managed registry like Docker Hub or Azure Container Registry but it's useful to know how to run a registry in your own organization. It can be a simple backup plan if your provider has an outage or you lose Internet connectivity.

This chapter also explains the concept of golden images which your organization can use to ensure all your apps are running from an approved set of base images, curated by an infrastructure or security team.

Episode 5: Using Docker volumes for persistent storage

Containers are great for stateless apps, and you can run apps which write data in containers too - as long as you understand where the data goes. This episode walks you through the container filesystem so you can see how the disk which the container sees is actually composed from multiple sources.

Persisting state is all about separating the lifecycle of the data from the lifecycle of the container. When you update your apps in production you'll delete the existing container and replace it with a new one from the new application image. You can attach the storage from the old container to the new one so all the data is there.

You'll learn how to do that with Docker volumes and with bind mounts, in exercises which use a simple to-do list app that stores data in a Sqlite database file:

docker container run --name todo1 -d -p 8010:80 diamol/ch06-todo-list

# browse to http://localhost:8010

There are some limitations to mounting external data sources into the container filesystem which you'll learn all about in the chapter.

Coming next

Week 1 covers the basics of Docker: containers, images, registries and storage. Week 2 looks at running multi-container apps, introducing Docker Compose to manage multiple containers and approaches to deal with distributed applications - including monitoring and healthchecks.

You can always find the upcoming episode at diamol.net/stream and there are often book giveaways at diamol.net/giveaway.

The live stream is running through September 2020 and kicks off on Elton Stoneman's YouTube channel weekdays at 19:00 UTC. The episodes are available to watch on demand as soon as the session ends.

Hope you can join me and make progress in your Docker journey :)

What do Istio, SRE and Jenkins have in common? My latest Pluralsight courses

What do Istio, SRE and Jenkins have in common? My latest Pluralsight courses

One of my goals for 2020 was to publish much more content, and I've started well with a new Pluralsight course every month :) Here's what's new.

And it's FREE April on Pluralsight so you can watch these all now for free!

Managing Apps on Kubernetes with Istio

I've been using Istio for 18 months or so, and I really like it - but it has a pretty steep learning curve, so I was glad to get this course out. It covers the basics of service mesh technology and the patterns it supports, focusing on the key features of Istio.

Managing Apps on Kubernetes with Istio

You'll get more out of this one if you have a working knowledge of Docker and Kubernetes, but if you don't have that this course will still give you a good understanding of service mesh architectures.

It covers:

  • managing service traffic, using Istio for dark launches, blue/green deployments and canary deployments; applying a circuit breaker to keep apps healthy

  • securing communication between services with mutual TLS and certificates managed by Istio; authentication and authorization for services and end-users (using JWT)

  • observation of the service mesh, using telemetry recorded by Istio; visualisation with Kiali, dashboards with Grafana, distributed tracing with Jaeger, logging with Fluentd and Kibana

  • running Istio in production, deploying to Azure Kubernetes Service and managing a cluster with some Istio-enabled apps and others not on the service mesh; migrating existing apps to Istio; understanding failure conditions; evaluating if you need a service mesh.

Istio is a powerful technology and it's complex to learn if you try to dive straight in - this course leads you on gradual learning journey with lots of demos and attractive diagrams like this:

What do Istio, SRE and Jenkins have in common? My latest Pluralsight courses

Site Reliability Engineering (SRE): The Big Picture

The first time I worked on a DevOps project was in 2014 (the same year I started using Docker), and I was hooked (on Docker too). Since then I've worked with lots of organizations who have tried to adopt DevOps and found the transition very hard.

DevOps is just too big a change for a lot of places, and for them Site Reliability Engineering is likely to be a much better fit.

Site Reliability Engineering (SRE): The Big Picture

This is a high-level big picture course - I think it's the first one I've done with no demos - and it introduces all the principles and practices. It's aimed at helping you evaluate SRE and understand the path to implementing it:

  • comparing SRE with traditional ops and with DevOps, understanding why SRE works for lots of organisations and how SRE is growing

  • identifying and measuring toil, with the goal of restricting toil time to a known amount; automation to eliminate toil and how to prioritize toil-reducing projects

  • using Service Level Objectives and an Error Budget to define product availability; specifying Service Level Indicators and monitoring and alerting on them

  • incident management process and roles; guidance for working on incidents effectively; structuring on-call time and avoiding overload; why you should produce incident postmortems - like this:

What do Istio, SRE and Jenkins have in common? My latest Pluralsight courses

Google's SRE books are the de-facto resource for applying SRE, but their examples can be a bit... Googly. I try to use examples and guidance which are better suited for organizations which are not Google.

Using and Managing Jenkins Plugins

Ah, Jenkins. Myself and several other Pluralsight authors are busy building courses for a new Jenkins learning path, which should take the misery pain heartache difficulty out of using the world's most popular build tool.

My first contribution is aimed at helping you get the most out of plugins - which is a big topic because pretty much all the useful functionality of Jenkins comes from plugins.

Using and Managing Jenkins Plugins

This is partly a walkthrough of some of the must-have plugins for freestyle and pipeline jobs, but it's also about effective management of plugins, so you don't find yourself updating a plugin for a security fix and end up breaking all of your jobs (seen it happen). It covers:

  • understanding plugin architecture and dependencies; pitfalls using Jenkins's suggested plugins; how plugin updates work and why you should aim to minimize your plugin usage

  • installing and using plugins - three approaches; standard freestyle jobs with manual plugin installation; scripted builds with offline plugin installs; Jenkins running in Docker with automated plugin installs

  • writing your own custom plugin; walkthrough of the Java tools you can use to bootstrap a new Jenkins plugin - showing you don't need to be a Java guru; simple demo plugin with deployment options

  • managing plugins and upgrades - understanding the impact of plugin updates with breaking changes and what happens when updates fail; how to structure repeatable Jenkins deployments like this:

What do Istio, SRE and Jenkins have in common? My latest Pluralsight courses

And there's more to come...

I've got another course planned for April and my book Learn Docker in a Month of Lunches is in the production stage so that will be out soon. Follow @EltonStoneman on Twitter for all the latest news.

Learn Docker in a Month of Lunches - My New Book

Learn Docker in a Month of Lunches - My New Book

You can get access to all the first few chapters of my new book right now.

It's called Learn Docker in a Month of Lunches, and the goal is simple: to get you from zero knowledge of containers to the point where you're confident running your own POC with Docker (and knowing what you need to take it to production).

Learn Docker in a Month of Lunches on Manning.com

Here's a shiny promo video which also has a nice discount code:

About the book

Learn Docker in a Month of Lunches (DIAMOL for short) is a fully up-to-date, fully cross-platform, task-focused approach to learning Docker. I've tried hard to keep pre-requisite knowledge to a minimum, so it's not aimed at devs or sysadmins or IT pros or architects. Docker spans all those disciplines, and this book should work for you no matter what your background in IT.

Each chapter is full of exercises you can try for yourself, and ends with a lab to challenge you. All the resources I use in the book are online:

  • the source code is all on GitHub at sixeyed/diamol
  • the Docker images are all on Docker Hub in the diamol organization

Every single image is multi-arch, which means it works on Windows, Linux, Intel and Arm. You can follow along using Docker Desktop on Windows 10 or Mac, or Docker Community Edition on Linux - Raspberry Pi users welcome too.

The sample apps I use range across Java, Go, JavaScript and .NET Core. There's not too much focus on source code, but having sample apps in the major languages should help you map the concepts back to your own work.

Week One

The month-of-lunches format works really nicely for learning Docker. Each chapter has a clean focus on one aspect of Docker, and the idea is that you can read the chapter and follow along with the exercises in about an hour. So you can go from zero to wannabe Docker Captain in a month :)

Chapters 1-6 are finished (although I can edit them if you have feedback, which I'd love to hear). They cover the basics of understanding Docker containers and images. If you're new to Docker this will get you up to speed running apps in containers and packaging your own apps in containers.

You can watch the recording of me joining Bret Fisher's Docker and DevOps YouTube show to find a discount code...

If you've already worked with Docker but you're not using multi-stage builds, you haven't optimized your Dockerfiles to make good use of Docker's image layer cache, you've never run your own Docker registry or you can't answer the question what will Docker do for us? then there's still plenty for you here.

Chapter 6 says "a Docker volume is a unit of storage - you can think of it as a USB stick for containers", which should be enough to make anyone want to read more.

Week Two

Is all about running distributed applications in containers. You'll learn all about Docker Compose and how to use the Compose file format to define multi-container applications, and the Docker Compose command line to run and manage distributed apps.

Containers in a Docker network can all talk to each other using the container name as the target DNS name, but containers can't see containers in other Docker networks. You can use that with Compose to run multiple versions of the same app on one machine, to serve several test environments or to replicate a production issue locally.

You'll also learn some important production readiness techniques here - chapter 8 introduces health checks and reliability checks, and chapter 9 covers monitoring with Prometheus and Grafana. You'll end a busy week by running Jenkins in a container and building a fully containerized CI pipeline.

Week Three

Orchestration time! In this section you'll learn all about container orchestrators - clusters of servers which run containers for you. You'll use Docker Swarm, the powerful orchestrator built right into Docker, which uses the Compose format (which you're a master of by now) to define apps.

Orchestrators also take take of automated rollouts to upgrade your app, and rollbacks when things go wrong - tying into the health check work you did last week. And when you have a cluster available, you can finish your CI/CD pipeline - which we do in chapter 15.

Last thing for this section is multi-architecture images, packaging apps in Docker which run as Linux and Windows containers, on Intel and Arm. Every single image I use in the book is multi-arch, which is how you can follow along with everything from that $50 Raspberry Pi setup to a $50K Mac Pro rig.

I only cover Kubernetes briefly in this book, Docker is the primary focus. The techniques in this chapter all apply to Kubernetes too, but you'll need to wait for Learn Kubernetes in a Month of Lunches for the full picture :)

Week Four

Let's get ready for production. Chapter 17 tells you how to optimize your Docker images for size, speed and security. Chapters 18 and 19 show you how to integrate your applications with Docker, so they can read configuration settings from the platform and write log entries back out.

The next two chapters are about taking some fairly advanced architectures - with reverse proxies and messages queues - seeing how easily you can add them to your own apps with Docker, and what benefits they bring.

By the end of the book you'll be ready to make the case for containers in your own organization, and chapter 22 gives you practical advice on doing that, which stakeholders you should involve and what the move to Docker means for them.

Go get it!

You can read the full table of contents and get the digital copy right now. The first draft is all done and we're entering the production stage, so physical copies will be hitting the shelves in a few months.

Pluralsight's FREE Weekend: Your Docker & Kubernetes Course Guide

Pluralsight's FREE Weekend: Your Docker & Kubernetes Course Guide

Pluralsight's entire course library is FREE for everyone this weekend, from Friday 6th September to Sunday 8th September! That's over 6,000 courses covering everything in development, ops and creative.

Pluralsight's FREE Weekend: Your Docker & Kubernetes Course Guide

That's too much for one weekend...

So here's my guide of the top courses to get you up and running in the most wanted tech skills right now - Docker and Kubernetes. These are all highly-rated 4.5 or 5 star courses, and they're not all mine :)

Friday - Get the Basics Down

The ecosystem around containers is huge, but you should start by focusing on three technologies: you need a pretty thorough understanding of Docker and Docker Compose, and a solid introduction to Kubernetes. These courses are a great way to accelerate your learning.

Docker and Kubernetes: The Big Picture

This is the place to learn what what containers can do and why Docker and Kubernetes are core technologies in modern application development and deployment. Lasts for 1 hour and 47 minutes, so you can watch this during an (extended) lunch break on Friday.

You might want to check out the rest of Nigel Poulton's courses too.

Docker for Web Developers

A very thorough walkthrough of Docker from the web developer's perspective, with examples in ASP.NET Core and Node.js. Covers Docker, Docker Compose and Kubernetes over 5 hours and 52 minutes - what better way to spend a Friday evening?

One of many great courses by Dan Wahlin.

Saturday - Dockerize Your Own Apps

Knowing the technologies is the first part, and next you need to learn how to map your own applications onto the new architectures and approaches. These two are focused on app modernization, which shows you how apps are architected and monitored in the new stack.

Modernizing .NET Framework Apps with Docker

This brave new world isn't just for brand-new apps. This course shows you how to take older .NET apps and run them in Docker containers on Windows, then break up monolithic architectures into distributed apps across multiple containers. Clocks in at 3 hours and 42 minutes, so it'll take you up to lunchtime.

This is one of mine.

Monitoring Containerized Application Health with Docker

Getting your app running in Docker is usually straightforward - but before you go to production you need monitoring. This course walks through the cloud-native approach to monitoring for Docker containers on Windows and Linux - using Prometheus and Grafana. 2 hours and 43 minutes, so you can watch this after lunch and then have the evening off.

This one's mine too.

Sunday - Understanding the Clusters

Docker Swarm and Kubernetes are the clustering technologies which get you self-healing apps, automated updates, load-balancing, scale and more. Kubernetes is more powerful but more complicated - Docker Swarm lacks some of the features, but is far easier to work with. These two courses will give you a good feel for working with them, which will help you make the choice.

Managing the Kubernetes API Server and Pods

Kubernetes has a lot of moving parts. This course shows you how the Kubernetes API works, and how you can use labels, annotations and namespaces to organise and manage your Kubernetes objects. There's a deep dive on pods too, covering multi-container pods and shared state. 3 hours makes up the perfect Sunday morning :)

One of Anthony Nocentino's Kubernetes courses.

Managing Load Balancing and Scale in Docker Swarm Mode Clusters

Now you've seen some of the complexity of Kubernetes, it's worth checking out the competition - this course shows you how load balancing and scale works in Docker Swarm, which is the native clustering technology built into Docker. You might learn from this course that Swarm does all you need, and you can think about Kubernetes for the future. 1 hour 58 minutes, so you have plenty of time after lunch on Sunday to relax (or make some notes...)

Mine too :)

Docker on Windows: Second Edition - Fully Updated for Windows Server 2019

Docker on Windows: Second Edition - Fully Updated for Windows Server 2019

The Second Edition of my book Docker on Windows is out now. Every code sample and exercise has been fully rewritten to work on Windows Server 2019, and Windows 10 from update 1809.

Get Docker on Windows: Second Edition now on Amazon

If you're not into books, the source code and Dockerfiles are all available on GitHub: sixeyed/docker-on-windows, with some READMEs which are variably helpful.

Or if you prefer something more interactive and hands-on, check out my Docker on Windows Workshop.

Docker Containers on Windows Server 2019

There are at least six things you can do with Docker on Windows Server 2019 that you couldn't do on Windows Server 2016. The base images are much smaller, ports publish on localhost and volume mounts work logically.

You should be using Windows Server 2019 for Docker

(Unless you're already invested in Windows Server 2016 containers, which are still supported by Docker and Microsoft).

Windows Server 2019 is also the minimum version if you want to run Windows containers in a Kubernetes cluster.

Updated Content

The second edition of Docker on Windows takes you on the same journey as the previous edition, starting with the 101 of Windows containers, through packaging .NET Core and .NET Framework apps with Docker, to transforming monolithic apps into modern distributed architectures. And it takes in security, production readiness and CI/CD on the way.

Some new capabilities are unlocked in the latest release of Windows containers, so there's some great new content to take advantage of that:

The last one is especially important. It helps you understand how to bring cloud-native monitoring approaches to .NET apps, with an architecture like this:

Docker on Windows: Second Edition - Fully Updated for Windows Server 2019

If you want to learn more about observability in modern applications, check out my Pluralsight course Monitoring Containerized Application Health with Docker

The Evolution of Windows Containers

It's great to see how much attention Windows containers are getting from Microsoft and Docker. The next big thing is running Windows containers in Kubernetes, which is supported now and available in preview in AKS.

Kubernetes is a whole different learning curve, but it will become increasingly important as more providers support Windows nodes in their Kubernetes offerings. You'll be able to capture your whole application definition in a set of Kube manifests and deploy the same app without any changes on any platform from Docker Enterprise on-prem, to AKS or any other cloud service.

To get there you need to master Docker first, and the latest edition of Docker on Windows helps get you there.

Getting Started with Kubernetes on Windows

Getting Started with Kubernetes on Windows

Kubernetes now supports Windows machines as worker nodes. You can spin up a hybrid cluster and have Windows workloads running in Windows pods, talking to Linux workloads running in Linux pods.

TL;DR - I've scripted all the setup steps to create a three-node hybrid cluster, you'll find them with instructions at sixeyed/k8s-win

Now you can take older .NET Framework apps and run them in Kubernetes, which is going to help you move them to the cloud and modernize the architecture. You start by running your old monolithic app in a Windows container, then you gradually break features out and run them in .NET Core on Linux containers.

Organizations have been taking that approach with Docker Swarm for a few years now. I cover it in my book Docker on Windows and in my Docker Windows Workshop. It's a very successful way to do migrations - breaking up monoliths to get the benefits of cloud-native architecture, without a full-on rewrite project.

Now you can do those migrations with Kubernetes. That opens up some interesting new patterns, and the option of running containerized Windows workloads in a managed Kubernetes service in the cloud.

Cautionary Notes

Windows support in Kubernetes is still pretty new. The feature went GA in Kubernetes 1.14, and the current release is only 1.15. There are a few things you need to be aware of:

  • cloud support is in early stages. You can spin up a hybrid Windows/Linux Kubernetes cluster in AKS, but right now it's in preview.

  • core components are in beta. Pod networking is a separate component in Kubernetes, and the main options - Calico and Flannel only have beta support for Windows nodes.

  • Windows Server 2019 is the minimum version which supports Kubernetes.

  • the developer experience is not optimal, especially if you're used to using Docker Desktop. You can run Windows containers natively on Windows 10, and even run a single-node Docker Swarm on your laptop to do stack deployments. Kubernetes needs a Linux master node, so your dev environment is going to be multiple VMs.

  • Kubernetes is complicated. It has a wider feature set than Docker Swarm but the cost of all the features is complexity. Application manifests in Kubernetes are about 4X the size of equivalent Docker Compose files, and there are way more abstractions between the entrypoint to your app and the container which ultimately does the work.

If you want to get stuck into Kubernetes on Windows, you need to bear this all in mind and be aware that you're at the front-end right now. The safer, simpler, proven alternative is Docker Swarm - but if you want to see what Kubernetes on Windows can do, now's the time to get started.

Kubernetes on Windows: Cluster Options

Kubernetes has a master-worker architecture for the cluster. The control plane runs on the master, and right now those components are Linux-only. You can't have an all-Windows Kubernetes cluster. Your infrastructure setup will be one or more Linux masters, one or more Windows workers, and one or more Linux workers:

Getting Started with Kubernetes on Windows

For a development environment you can get away with one Linux master and one Windows worker, running any Linux workloads on the master, but an additional Linux worker is preferred.

You can spin up a managed Kubernetes cluster in the cloud. Azure and AWS both offer Windows nodes in preview for their Kubernetes services:

Kubernetes has a pluggable architecture for core components like networking and DNS. The cloud services take care of all that for you, but if you want to get deeper and check out the setup for yourself, you can build a local hybrid cluster with a few VMs.

Tasks for setting up a local cluster

There's already pretty good documentation on how to set up a local Kubernetes cluster with Windows nodes, but there's a lot of manual steps. This post walks through the setup using scripts which automate a much as possible. The original sources are:

If you want to follow along and use my scripts you'll need to have three VMs setup. The scripts are going to install Docker and the Kubernetes components, and then:

  • initialise the Kubernetes master with kubeadm
  • install pod networking, using Flannel
  • add the Windows worker node
  • add the Linux worker node

When that's done you can administer the cluster using kubectl and deploy applications which are all-Windows, all-Linux, or a mixture.

There are still a few manual steps, but the scripts take away most of the pain.

Provision VMs

You'll want three VMs in the same virtual network. My local cluster is for development and testing, so I'm not using any firewalls and all ports are open between the VMs.

I set up the following VMs:

  • k8s-master - which will become the master. Running Ubuntu Server 18.04 with nothing installed except the OpenSSH server;

  • k8s-worker - which will become the Linux worker. Set up in the same way as the master, with Ubuntu 18.04 and OpenSSH;

  • k8s-win-worker - which will be the Windows worker. Set up with Windows Server 2019 Core (the non-UI edition).

I'm using Parallels on the Mac for my VMs, and the IP addresses are all in the 10.211.55.* range.

The scripts assign two network address ranges for Kubernetes: 10.244.0.0/16 and 10.96.0.0/12. You'll need to use a different range for your VM network, or edit the scripts.

Initialise the Linux Master

Kubernetes installation has come far since the days of Kubernetes the Hard Way - the kubeadm tool does most of the hard work.

On the master node you're going to install Docker and kubeadm, along with the kubelet and kubectl using this setup script, running as administrator (that's sudo su on Ubuntu):

sudo su

curl -fsSL https://raw.githubusercontent.com/sixeyed/k8s-win/master/setup/ub-1804-setup.sh | sh

If you're not familiar with the tools: kubeadm is used to administer cluster nodes, kubelet is the service which connects nodes and kubectl is for operating the cluster.

The master setup script initialises the cluster and installs the pod network using Flannel. There's a bunch of this that needs root too:

sudo su

curl -fsSL https://raw.githubusercontent.com/sixeyed/k8s-win/master/setup/ub-1804-master.sh | sh

That gives you a Kubernetes master node. The final thing is to configure kubectl for your local user, so run this configuration script as your normal account (it will ask for your password when it does some sudo):

curl -fsSL https://raw.githubusercontent.com/sixeyed/k8s-win/master/setup/ub-1804-config.sh | sh

The output from that script is the Kubernetes config file. Everything you need to manage the cluster is in that file - including certificates for secure communication using kubectl.

You should copy the config block to the clipboard on your dev machine, you'll need it later to join the worker nodes.

Treat that config file carefully, it has all the connection information anyone needs to control your cluster.

You can verify your cluster nodes now with kubectl get nodes:

elton@k8s-master:~$ kubectl get nodes
NAME         STATUS     ROLES    AGE    VERSION
k8s-master   Ready      master   109m   v1.15.1

Add a Windows Worker Node

There's a bunch of additional setup tasks you need on the Windows node. I'd recommend starting with the setup I blogged about in Getting Started with Docker on Windows Server 2019 - that tells you where to get the trial version download, and how to configure remote access and Windows Updates.

Don't follow the Docker installation steps from that post though, you'll be using scripts for that.

The rest is scripted out from the steps which are described in the Microsoft docs. There are a couple of steps because the installs need a restart.

First run the Windows setup script, which installs Docker and ends by restarting your VM:

iwr -outf win-2019-setup.ps1 https://raw.githubusercontent.com/sixeyed/k8s-win/master/setup/win-2019-setup.ps1

./win-2019-setup.ps1

When your VM restarts, connect again and copy your Kubernetes config into a file on the VM:

mkdir C:\k

notepad C:\k\config

Now you can paste in the configuration file you copied from the Linux master and save it - make sure you the filename is config when you save it, don't let Notepad save it as config.txt.

Windows Server Core does have some GUI functionality. Notepad and Task Manager are useful ones :)

Now you're ready to download the Kubernetes components, join the node to the cluster and start Windows Services for all the Kube pieces. That's done in the Windows worker script. You need to pass a parameter to this one, which is the IP address of your Windows VM (the machine you're running this command on - use ipconfig to find it):

iwr -outf win-2019-worker.ps1 https://raw.githubusercontent.com/sixeyed/k8s-win/master/setup/win-2019-worker.ps1

./win-2019-worker.ps1 -ManagementIP <YOUR_WINDOWS_IP_GOES_HERE>

You'll see various "START" lines in the output there. If all goes well you should be able to run kubectl get nodes on the master and see both nodes ready:

elton@k8s-master:~$ kubectl get nodes
NAME             STATUS   ROLES    AGE     VERSION
k8s-master       Ready    master   5h23m   v1.15.1
k8s-win-worker   Ready    <none>   75m     v1.15.1

You can leave it there and get working, but Kubernetes doesn't let you schedule user workloads on the master by default. You can specify that it's OK to run Linux pods on the master in your application YAML files, but it's better to leave the master alone and add a second Linux node as a worker.

Add a Linux Worker Node

You're going to start in the same way as the Linux master, installing Docker and the Kubernetes components using the setup script.

SSH into the k8s-worker node and run:

sudo su

curl -fsSL https://raw.githubusercontent.com/sixeyed/k8s-win/master/setup/ub-1804-setup.sh | sh

That gives you all the pieces, and you can use kubeadm to join the cluster. You'll need a token for that which you can get from the join command on the master, so hop back to that SSH session on k8s-master and run:

kubeadm token create --print-join-command

The output from that is exactly what you need to run on the Linux worker node to join the cluster. Your master IP address and token will be unique to the cluster, but the command you want is something like:

sudo kubeadm join 10.211.55.27:6443 --token 28bj3n.l91uy8dskdmxznbn --discovery-token-ca-cert-hash sha256:ff571ad198ae0...

Those tokens are short-lived (24-hour TTL), so you'll need to run the token create command on the master if your token expires when you add a new node

And that's it. Now you can list the nodes on the master and you'll see a functioning dev cluster:

elton@k8s-master:~$ kubectl get nodes
NAME             STATUS   ROLES    AGE     VERSION
k8s-master       Ready    master   5h41m   v1.15.1
k8s-win-worker   Ready    <none>   92m     v1.15.1
k8s-worker       Ready    <none>   34s     v1.15.1

You can copy out the Kubernetes config into your local .kube folder on your laptop, if you want to manage the cluster direct, rather than logging into the master VM

Run a Hybrid .NET App

There's a very simple ASP.NET web app I use in my Docker on Windows workshop which you can now run as a distributed app in containers on Kubernetes. There are Kube specs for that app in sixeyed/k8s-win to run SQL Server in a Linux pod and the web app on a Windows pod.

Head back to the master node, or use your laptop if you've set up the Kube config. Clone the repo to get all the YAML files:

git clone https://github.com/sixeyed/k8s-win.git

Now switch to the dwwx directory and deploy all the spec files in the v1 folder:

git clone https://github.com/sixeyed/k8s-win.git

kubectl apply -f v1

You'll see output telling you the services and deployments have been created. The images that get used in the pod are quite big, so it will take a a few minutes to pull them. When it's done you'll see two pods running:

$ kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
signup-db-6f95f88795-s5vfv   1/1     Running   0          9s
signup-web-785cccf48-8zfx2   1/1     Running   0          9s

List the services and you'll see the ports where the web application (and SQL Server) are listening:

$ kubectl get svc
NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
kubernetes   ClusterIP   10.96.0.1        <none>        443/TCP          6h18m
signup-db    NodePort    10.96.65.255     <none>        1433:32266/TCP   19m
signup-web   NodePort    10.103.241.188   <none>        8020:31872/TCP   19m

It's the signup-web service you're interested in - in my case the node port is 31872. So now you can browse to the Kubernetes master node's IP address, on the service port, and the /app endpoint and you'll see this:

Getting Started with Kubernetes on Windows

It's a basic .NET demo app which has a sign-up form for a fake newsletter (currently running on .NET 4.7, but it originally started life as a .NET 2.0 app). Click on Sign Up and you can go and complete the form. The dropdowns you see are populated from reference data in the database, which means the web app - running in a Windows pod - is connected to the database - running in a Linux pod:

Getting Started with Kubernetes on Windows

You can go ahead and fill in the form, and that inserts a row into the database. The SQL Server pod has a service with a node port too (32266 in my case), so you can connect a client like SqlEctron directly to the containerized database (credentials are sa/DockerCon!!!). You'll see the data you saved:

Getting Started with Kubernetes on Windows

Next Steps

This is pretty cool. The setup is still a but funky (and my scripts come with no guarantees :), but once you have a functioning cluster you can deploy hybrid apps using the same YAMLs you'll use in other clusters.

I'll be adding more hybrid apps to the GitHub repo, so stay tuned to @EltonStoneman on Twitter.

ARMing a Hybrid Docker Swarm: Part 4 - Reverse Proxying with Traefik

ARMing a Hybrid Docker Swarm: Part 4 - Reverse Proxying with Traefik

A reverse proxy quickly becomes a must-have when you're running a container orchestrator with more than a couple of services. Network ports are single-occupancy resources, you can't have multiple processes listening on the same port, whether they're different apps or different containers.

You can't run different web apps in containers all listening on port 80. Docker lets you map ports instead, so if the app inside a container expects traffic on port 80 you can actually publish to a different port on the host - say 8080 - and Docker will receive traffic on port 8080 and send it to port 80 in the container.

That doesn't work for public-facing services though. Non-standard ports are only suitable for private or test environments. Public HTTP clients expect to use port 80, and HTTPS to use port 443. That's where a reverse proxy comes in.

You run the proxy in a container, publishing ports 80 and 443. All your other services run in containers, but they don't publish any ports - only the reverse proxy is accessible. It's the single entrypoint for all your services, and it has rules to relay incoming requests to other containers using private Docker networks.

Reverse Proxy Containers

A reverse proxy is just an HTTP server which doesn't have any of its own content, but fetches content from other servers - containers in this case. You define rules to link the incoming request to the target service. The HTTP host header is the typical example. The reverse proxy can load content for blog.sixeyed.com from the container called blog, and api.sixeyed.com from the container called api.

Reverse proxies can do a lot more than route traffic. They can load-balance requests across containers, or use sticky sessions to keep serving known users from the same container. They can cache responses to reduce the load on your web apps, they can apply SSL so you keep security concerns out of your app code, and they can modify responses, stripping HTTP headers or adding new ones.

Nginx and HAProxy are very popular options for running a reverse proxy, but they aren't explicitly container-aware. They need to be configured with a static list of rules where the targets are container names or service names. Traefik is different - its all about containers. You run Traefik in a container, and it accesses the Docker API and builds it own rules based on labels you apply to containers or swarm services.

I'll use Traefik to proxy my other applications, based on domain names I have specified in Dnsmasq. For example jenkins.athome.ga is configured as a CNAME for managers.swarm.sixeyed which contains A records for all the swarm manager IPs. Requests for jenkins.athome.ga will be routed to a Traefik container running on one of the managers, and it will proxy content from Jenkins running in a container on one of the workers:

ARMing a Hybrid Docker Swarm: Part 4 - Reverse Proxying with Traefik

Configuring Traefik with Docker Swarm

The Traefik setup is very simple - the docs are excellent and tell you exactly how to configure Traefik to run in Docker Swarm. The Traefik team already publish a multi-arch image with an ARM64 variant, so I can use their image directly:

> docker manifest inspect traefik:1.7.9
...
{
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 739,
         "digest": "sha256:917444e807edd21cb02a979e7d805391c5f6edfc3c02...",
         "platform": {
            "architecture": "arm64",
            "os": "linux",
            "variant": "v8"
         }
      }

Traefik has a very similar runtime profile to the DNS service I walked through in Part 3 - Name Resolution with Dnsmasq. It has high availability requirements and low compute requirements, so I'm also running it as a global service on my manager nodes.

If you're flinching at these workloads running on the managers, I promise it will only be Traefik and Dnsmasq, and that leaves plenty of RAM on the manager nodes.

Here's my Docker Compose specification for running Traefik. Nothing special - the deployment section specifies the placement and resource constraints:

    deploy:
      mode: global
      resources:
        limits:
          cpus: '0.50'
          memory: 250M
      placement:
        constraints:
          - node.platform.os == linux
          - node.role == manager

The startup command tells Traefik it's running in Docker Swarm and it should connect to the Docker API using the local pipe, so each container connects to the Docker engine where it is running.

Traefik queries the Docker API looking for labels which configure front-end routing rules. As services and containers come and go in the swarm, Traefik keeps its routing list up-to-date.

You can see that routing list in Traefik's admin Web UI, which is enabled with the --api flag in the startup command. My compose file includes Traefik routing labels for Traefik itself, the host name proxy.athome.ga gets served from port 8080:

      labels:
        - "traefik.frontend.rule=Host:proxy.athome.ga"
        - "traefik.port=8080"
        - "traefik.docker.network=frontend"

Running Traefik as a Docker Swarm Service

Deploying the proxy is just a case of deploying the stack:

docker stack deploy -c .\traefik.yml proxy

Now I have DNS resolution with Dnsmasq and reverse proxying with Traefik, which gives a friendly DNS name to a whole bunch of services:

> docker stack ls
NAME                SERVICES            ORCHESTRATOR
blog-dev            1                   Swarm
dns                 1                   Swarm
gogs                1                   Swarm
jenkins             1                   Swarm
nextcloud           1                   Swarm
proxy               1                   Swarm
registry            1                   Swarm
samba               1                   Swarm
squeezebox          1                   Swarm
unifi               1                   Swarm

At the proxy address, I see the routing rules Traefik has built. "Front-ends" are the HTTP request configuration Traefik is looking for, and "back-ends" are the containers where requests get proxied from:

ARMing a Hybrid Docker Swarm: Part 4 - Reverse Proxying with Traefik

I'm all set with the core services in my cluster now - I can run apps anywhere on the workers and access them with a friendly DNS name and a standard HTTP port.

But all the other services I want to run are stateful, which means I need a shared storage solution across the swarm. There are a few options for that, and in the next article I'll walk through my choice to set up GlusterFS.

Articles which may appear in this series:

Part 1 - Hardware and OS

Part 2 - Deploying the Swarm

Part 3 - Name Resolution with Dnsmasq

Part 4 - Reverse Proxying with Traefik

Part 5 - Distributed Storage with GlusterFS

Part 5 - CI/CD with Gogs, Jenkins & Registry

Part 6 - Building and Pushing Multi-Arch Images

Adventures in Docker: Coding on a Remote Browser

Adventures in Docker: Coding on a Remote Browser

This adventure lets you code on your normal dev machine from some other machine, using the browser. It's powered by Docker plus:

  • code-server - VS Code running in a container with browser access
  • ngrok - a public HTTP tunnel

And it's very simple. You just run code-server in a Docker container on your dev machine, mapping volumes for the data you want to be able to access and publishing a port. Then you expose that port to the Internet using ngrok, make a note of the URL and walk out the door.

Headless VS Code in Docker

code-server has done all the hard work here. They publish images to codercom/code-server on Docker Hub. There are only x64 Linux images right now.

Run the latest version with:

docker container run \
 -d -p 8443:8443 \
 -v /scm:/scm \
 codercom/code-server:1.621 \
 --allow-http --no-auth

That command runs VS Code as a headless server in a background container. The options:

  • publish port 8443 on your local machine into the container
  • mount the local /scm directory into /scm on the container
  • run insecure with plain HTTP and no authentication.

You can run insecure on your home network (if you trust folks who can access your network), because you'll add security with ngrok.

Now you can browse to http://localhost:8443 and you have VS Code running in the browser:

Adventures in Docker: Coding on a Remote Browser

That volume mount means all of the code in the scm folder on my machine is accessible from the VS Code instance. And you can fire up a terminal in VS Code in the browser, which means you can do pretty much anything else you need to do. But remember the terminal is executing inside the container, so the environment is the container.

The code-server images comes with a few dev tools installed, like Git and OpenSSL. But there are no dev toolkits, so you can't actually compile or run any code... Unless you're using multi-stage Dockerfiles and official images with SDKs installed. Then all you need is Docker.

Headless VS Code with Docker

code-server doesn't have the Docker CLI installed, but I've added that in my fork. So you can run my version and mount the local Docker socket as a volume, meaning you can use docker commands inside the browser-based VS Code instance:

docker container run \
 -d -p 8443:8443 \
 -v /scm:/scm \
 -v /var/run/docker.sock:/var/run/docker.sock \
 --network code-server \
 sixeyed/code-server:1.621 \
 --allow-http --no-auth

(I'm also using an explicit Docker network here which I created with docker network create code-server. You'll see why in a moment).

Now you can refresh your browser at http://localhost:8443, open up a terminal and run all the docker commands you like (with sudo). The Docker CLI inside the container is connected to the Docker Engine which is running the container.

Let's try out the .NET Core 3.0 preview. You can run these commands in VS Code on the browser. They all execute inside the container:

git clone https://github.com/sixeyed/whoami-dotnet.git
cd whoami-dotnet
sudo docker image build -t sixeyed/whoami-dotnet:3.0-linux-amd64 .
sudo docker container run -d \
 --network code-server --name whoami \
 sixeyed/whoami-dotnet:3.0-linux-amd64

Now the whoami container is running in the same Docker network as the code-server container, so you can reach it by the container name:

curl http://whoami

And here it is for real:

Adventures in Docker: Coding on a Remote Browser

Now this is a usable development environment. The multi-stage Dockerfile I've built starts with a build stage that uses an image with the .NET Core SDK, so there's no need to install any tools in the dev environment. You can do the same with Java, Go etc. - they all have official build images on Docker Hub.

And the final step is to make it publicly available through ngrok.

Remote Headless VS Code with Docker

Sign up for an ngrok account, and follow the setup instructions to install the software and apply your credentials. Now you can expose any local port through a public Internet tunnel - just by running something like ngrok http 8443.

But you can do more with ngrok. This command sets up a tunnel for my VS Code server with HTTPS and basic authentication:

ngrok http -bind-tls=true -auth="elton:DockerCon" 8443

You'll see output like this, telling you the public URL for your tunnel and some stats about who's using it:

Adventures in Docker: Coding on a Remote Browser

The Forwarding line tells you the public URL and the local port it's forwarding. Mine is https://112f7fb1.ngrok.io (you can use custom domains instead of the random ones). That endpoint is HTTPS so it's secure, and it's using basic auth so you'll need the username and password you specified in the ngrok command:

Adventures in Docker: Coding on a Remote Browser

Now you can access the headless VS Code instance running on your dev machine from anywhere on the Internet. Browser sessions are separate, so you can even have multiple people doing different things on the same remote code server:

Adventures in Docker: Coding on a Remote Browser

ngrok collects metrics while it's running, and there's an admin portal you can browse to locally - it shows you all the requests and responses the tunnel has handled:

Adventures in Docker: Coding on a Remote Browser

What about Windows?

I've only had a quick look, but it seems like this could work on Windows. ngrok already has Windows support, and it should just mean packaging code-server with a different Dockerfile.

Sounds like a nice weekend project for someone. Docker on Windows - second edition! will help :)

You can't always have Kubernetes: running containers in Azure VM Scale Sets

You can't always have Kubernetes: running containers in Azure VM Scale Sets

Rule number 1 for running containers in production: don't run them on individual Docker servers. You want reliability, scale and automated upgrades and for that you need an orchestrator like Kubernetes, or a managed container platform like Azure Container Instances.

If you're choosing between container platforms, my new Pluralsight course Deploying Containerized Applications walks you through the major options.

But the thing about production is: you've got to get your system running, and real systems have technical constraints. Those constraints might mean you have to forget the rules. This post covers a client project I worked on where my design had to forsake rule number 1, and build a scalable and reliable system based on containers running on VMs.

This post is a mixture of architecture diagrams and scripts - just like the client engagement.

When Kubernetes won't do

I was brought in to design the production deployment, and build out the DevOps pipeline. The system was for provisioning bots which join online meetings. The client had run a successful prototype with a single bot running on a VM in Azure.

The goal was to scale the solution to run multiple bots, with each bot running in a Docker container. In production the system would need to scale quickly, spinning up more containers to join meetings on demand - and more hosts to provide capacity for more containers.

So far, so Kubernetes. Each bot needs to be individually addressable, and the connection from the bot to the meeting server uses mutual TLS. The bot has two communication channels - HTTPS for a REST API, and a direct TCP connection for the data stream from the meeting. That can all be done with Kubernetes - Services with custom ports for each bot, Secrets for the TLS certs, and a public IP address for each node.

If you want to learn how to model an app like that, my book Learn Kubernetes in a Month of Lunches is just the thing for you :)

But... The bot uses a Windows-only library to connect to the meeting, and the bot workload involves a lot of video manipulation. So that brought in the technical constraints for the containers:

  • they need to run with GPU access
  • the app uses the Windows video subsystem, and that needs the full (big!) Windows base Docker image.

Right now you can run GPU workloads in Kubernetes, but only in Linux Pods, and you can run containers with GPUs in in Azure Container Instances, but only for Linux containers. So we're looking at a valid scenario where orchestration and managed container services won't do.

The alternative - Docker containers on Windows VMs in Azure

You can run Docker containers with GPU access on Windows with the devices flag. You need to have your GPU drivers set up and configured, and then your containers will have GPU access (the DirectX Container Sample walks through it all):

# on Windows 10 20H2:
docker run --isolation process --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 sixeyed/winml-runner:20H2

# on Windows Server LTSC 2019:
docker run --isolation process --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 sixeyed/winml-runner:1809  

The container also needs to be running with process isolation - see my container show ECS-W4: Isolation and Versioning in Windows Containers on YouTube for more details on that.

Note - we're talking about the standard Docker Engine here. GPU access for containers used to require an Nvidia fork of Docker, but now GPU access is part of the main Docker runtime.

You can spin up Windows VMs with GPUs in Azure, and have Docker already installed using the Windows Server 2019 Datacenter with Containers VM image. And for the scaling requirements, there are Virtual Machine Scale Sets (VMSS), which let you run multiple instances of the same VM image - where each instance can run multiple containers.

The design I sketched out looked like this:

You can't always have Kubernetes: running containers in Azure VM Scale Sets

  • each VM hosts multiple containers, each using custom ports
  • a load balancer spans all the VMs in the scale set
  • load balancer rules are configured for each bot's ports

The idea is to run a minimum number of VMs, providing a stable pool of bot containers. Then we can scale up and add more VMs running more containers as required. Each bot is uniquely addressable within the pool, with a predictable address range, so bots.sixeyed.com:8031 would reach the first container on the third VM and bots.sixeyed.com:8084 would reach the fourth container on the eighth VM.

Using a custom VM image

With this approach the VM is the unit of scale. My assumption was that adding a new VM to provide more bot capacity would take several minutes - too long for a client waiting for a bot to join. So the plan was to run with spare capacity in the bot pool, scaling up the VMSS when the pool of free bots fell below a threshold.

Even so, scaling up to add a new VM had to be a quick operation - not waiting minutes to pull the super-sized Windows base image and extract all the layers. The first step in minmizing scale-up time is to use a custom VM image for the scale set.

A VMSS base image can be set up manually by running a VM and doing whatever you need to do. In this case I could use the Windows Server 2019 image with Docker configured, and then run an Azure extension to install the Nvidia GPU drivers:

# create vm:
az vm create `  
  --resource-group $rg `
  --name $vmName `
  --image 'MicrosoftWindowsServer:WindowsServer:2019-Datacenter-Core-with-Containers' `
  --size 'Standard_NC6_Promo' `
  --admin-username $username `
  --admin-password $password

# deploy the nvidia drivers:
az vm extension set `  
  --resource-group $rg `
  --vm-name $vmName `
  --name NvidiaGpuDriverWindows `
  --publisher Microsoft.HpcCompute `
  --version 1.3

The additional setup for this particular VM:

Then you can create a private base image from the VM, first deallocating and generalizing it:

az vm deallocate --resource-group $rg --name $vmName

az vm generalize --resource-group $rg --name $vmName

az image create --resource-group $rg `  
    --name $imageName --source $vmName

The image can be in its own Resource Group - you can use it for VMSSs in other Resources Groups.

Creating the VM Scale Set

Scripting all the setup with the Azure CLI makes for a nice repeatable process - which you can easily put into a GitHub workflow. The az documentation is excellent and you can build up pretty much any Azure solution using just the CLI.

There are a few nice features you can use with VMSS that simplify the rest of the deployment. This abridged command shows the main details:

az vmss create `  
   --image $imageId `
   --subnet $subnetId `
   --public-ip-per-vm `
   --public-ip-address-dns-name $vmssPipDomainName `
   --assign-identity `
  ...

That's going to use my custom base image, and attach the VMs in the scale set to a specific virtual network subnet - so they can connect to other components in the client's backend. Each VM will get its own public IP address, and a custom DNS name will be applied to the public IP address for the load balancer across the set.

The VMs will use managed identity - so they can securely use other Azure resources without passing credentials around. You can use az role assignment create to grant access for the VMSS managed identity to ACR.

When the VMSS is created, you can set up the rules for the load balancer, directing the traffic for each port to a specific bot container. This is what makes each container individually addressable - only one container in the VMSS will listen on a specific port. A health probe in the LB tests for a TCP connection on the port, so only the VM which is running that container will pass the probe and be sent traffic.

# health probe:
az network lb probe create `  
 --resource-group $rg --lb-name $lbName `
 -n "p$port" --protocol tcp --port $port

# LB rule:
az network lb rule create `  
 --resource-group $rgName --lb-name $lbName `
 --frontend-ip-name loadBalancerFrontEnd `
 --backend-pool-name $backendPoolName `
 --probe-name "p$port" -n "p$port" --protocol Tcp `
 --frontend-port $port --backend-port $port

Spinning up containers on VMSS instances

You can use the Azure VM custom script extension to run a script on a VM, and you can trigger that on all the instances in a VMSS. This is the deployment and upgrade process for the bot containers - run a script which pulls the app image and starts the containers.

Up until now the solution is pretty solid. This script is the ugly part, because we're going to manually spin up the containers using docker run:

docker container run -d `  
 -p "$($port):443" `
 --restart always `
 --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 `
 $imageName

The real script adds an env-file for config settings, and the run commands are in a loop so we can dynamically set the number of containers to run on each VM. So what's wrong with this? Nothing is managing the containers. The restart flag means Docker will restart the container if the app crashes, and start the containers if the VM restarts, but that's all the additional reliability we'll get.

In the client's solution, they added functionality to their backend API to manage the containers - but that sounds a lot like writing a custom orchestrator...

Moving on from the script, upgrading the VMSS instances is simple to do. The script and any additional assets - env files and certs - can be uploaded to private blob storage, using SAS tokens for the VM to download. You use JSON configuration for the script extension and you can split out sensitive settings.

# set the script on the VMSS:
az vmss extension set `  
    --publisher Microsoft.Compute `
    --version 1.10 `
    --name CustomScriptExtension `
    --resource-group $rg `
    --vmss-name $vmss `
    --settings $settings.Replace('"','\"') `
    --protected-settings $protectedSettings.Replace('"','\"')

# updating all instances triggers the script:
az vmss update-instances `  
 --instance-ids * `
 --name $vmss `
 --resource-group $rg

When you apply the custom script extension that updates the model for the VMSS - but it doesn't actually run the script. The next step does that, updating instances runs the script on each of them, replacing the containers with the new Docker image version.

Code and infra workflows

All the Azure scripts can live in a separate GitHub repo, with secrets added for the az authentication, cert passwords and everything else. The upgrade scripts to deploy the custom script extension and update the VMSS instances can sit in a workflow with a workflow_dispatch trigger and input parameters:

on:  
  workflow_dispatch:
    inputs:
      environment:
        description: 'Environment to deploy: dev, test or prod'     
        required: true
        default: 'dev'
      imageTag:
        description: 'Image tag to deploy, e.g. v1.0-175'     
        required: true
        default: 'v1.0'

The Dockerfile for the image lives in the source code repo with the rest of the bot code. The workflow in that repo build and pushes the image and ends by triggering the upgrade deployment in the infra repo - using Ben Coleman's benc-uk/workflow-dispatch action:

deploy-dev:  
  if: ${{ github.ref == 'refs/heads/dev' }}
  runs-on: ubuntu-18.04
  needs: build-teams-bot
    steps:
    - name: Dispatch upgrade workflow
      uses: benc-uk/workflow-dispatch@v1
      with:
        workflow: Upgrade bot containers
        repo: org/infra-repo
        token: ${{ secrets.ACCESS_TOKEN }}
        inputs: '{"environment":"dev", "imageTag":"v1.0-${{github.run_number}}"}'
        ref: master

So the final pipeline looks like this:

  • devs push to the main codebase
  • build workflow triggered - uses Docker to compile the code and package the image
  • if the build is successful, that triggers the publish workflow in the infrastructure repo
  • the publish workflow updates the VM script to use the new image label, and deploys it to the Azure VMSS.

I covered GitHub workflows with Docker in ECS-C2: Continuous Deployment with Docker and GitHub on YouTube

Neat and automated for a reliable and scalable deployment. Just don't tell anyone we're running containers on individual servers, instead of using an orchestrator...

Experimenting with .NET 5 and 6 using Docker containers

Experimenting with .NET 5 and 6 using Docker containers

The .NET team publish Docker images for every release of the .NET SDK and runtime. Running .NET in containers is a great way to experiment with a new release or try out an upgrade of an existing project, without deploying any new runtimes onto your machine.

In case you missed it, .NET 5 is the latest version of .NET and it's the end of the ".NET Core" and ".NET Framework" names. .NET Framework ends with 4.8 which is the last supported version. and .NET Core ends with 3.1 - and evolves into straight ".NET". The first release is .NET 5 and the next version - .NET 6 - will be a long-term support release.

If you're new to the SDK/runtime distinction, check my blog post on the .NET Docker images for Windows and Linux.

Run a .NET 5 development environment in a Docker container

You can use the .NET 5.0 SDK image to run a container with all the build and dev tools installed. These are official Microsoft images, published to MCR (the Microsoft Container Registry).

Create a local folder for the source code and mount it inside a container:

mkdir -p /tmp/dotnet-5-docker

docker run -it --rm \  
  -p 5000:5000 \
  -v /tmp/dotnet-5-docker:/src \
  mcr.microsoft.com/dotnet/sdk:5.0

All you need to run this command is Docker Desktop on Windows or macOS, or Docker Community Edition on Linux.

Docker will pull the .NET 5.0 SDK image the first time you use it, and start running a container. If you're new to Docker this is what the options mean:

  • -it connects you to an interactive session inside the container
  • -p publishes a network port, so you can send traffic into the container from your machine
  • --rm deletes the container and its storage when you exit the session
  • -v mounts a local folder from your machine into the container filesystem - when you use /src inside the container it's actually using the /tmp/dotnet-5-docker folder on your machine
  • mcr.microsoft.com/dotnet/sdk:5.0 is the full image name for the 5.0 release of the SDK

And this is how it looks:

Experimenting with .NET 5 and 6 using Docker containers

When the container starts you'll drop into a shell session inside the container, which has the .NET 5.0 runtime and developer tools installed. Now you can start playing with .NET 5, using the Docker container to run commands but working with the source code on your local machine.

In the container session, run this to check the version of the SDK:

dotnet --list-sdks  

Run a quickstart project

The dotnet new command creates a new project from a template. There are plenty of templates to choose from, we'll start with a nice simple REST service, using ASP.NET WebAPI.

Initialize and run a new project:

# create a WebAPI project without HTTPS or Swagger:
dotnet new webapi \  
  -o /src/api \
  --no-openapi --no-https

# configure ASP.NET to listen on port 5000:
export ASPNETCORE_URLS=http://+:5000

# run the new project:
dotnet run \  
  --no-launch-profile \
  --project /src/api/api.csproj

When you run this you'll see lots of output from the build process - NuGet packages being restored and the C# project being compiled. The output ends with the ASP.NET runtime showing the address where it's listening for requests.

Now your .NET 5 app is running inside Docker, and because the container has a published port to the host machine, you can browse to http://localhost:5000/weatherforecast on your machine. Docker sends the request into the container, and the ASP.NET app processes it and sends the response.

Package your app into a Docker image

What you have now isn't fit to ship and run in another environment, but it's easy to get there by building your own Docker image to package your app.

I cover the path to production in my Udemy course Docker for .NET Apps

To ship your app you can use this .NET 5 sample Dockerfile to package it up. You'll do this from your host machine, so you can stop the .NET app in the container with Ctrl-C and then run exit to get back to your command line.

Use Docker to publish and package your WebAPI app:

# verify the source code is on your machine: 
ls /tmp/dotnet-5-docker/api

# switch to your local source code folder:
cd /tmp/dotnet-5-docker

# download the sample Dockerfile:
curl -o Dockerfile https://raw.githubusercontent.com/sixeyed/blog/master/dotnet-5-with-docker/Dockerfile

# use Docker to package from source code:
docker build -t dotnet-api:5.0 .  

Now you have your own Docker image, with your .NET 5 app packaged and ready to run. You can edit the code on your local machine and repeat the docker build command to package a new version.

Run your app in a new container

The SDK container you ran is gone, but now you have an application image so you can run your app without any additional setup. Your image is configured with the ASP.NET runtime and when you start a container from the image it will run your app.

Start a new container listening on a different port:

# run a container from your .NET 5 API image:
docker run -d -p 8010:80 --name api dotnet-api:5.0

# check the container logs:
docker logs api  

In the logs you'll see the usual ASP.NET startup log entries, telling you the app is listening on port 80. That's port 80 inside the container though, which is published to port 8010 on the host.

The container is running in the bckground, waiting for traffic. You can try your app again, running this on the host:

curl http://localhost:8010/weatherforecast  

When you're done fetching fictional weather forecasts, you can stop and remove your container with a single command:

docker rm -f api  

And if you're done experimenting, you can remove your image and the .NET 5 images:

docker image rm dotnet-api:5.0

docker image rm mcr.microsoft.com/dotnet/sdk:5.0

docker image rm mcr.microsoft.com/dotnet/aspnet:5.0  

Now your machine is back to the exact same state before you tried .NET 5.

What about .NET 6?

You can do exactly the same thing for .NET 6, just changing the version number in the image tags. .NET 6 is in preview right now but the 6.0 tag is a moving target which gets updated with each new release (check the .NET SDK repository and the ASP.NET runtime repository on Docker Hub for the full version names).

To try .NET 6 you're going to run this for your dev environment:

mkdir -p /tmp/dotnet-6-docker

docker run -it --rm \  
  -p 5000:5000 \
  -v /tmp/dotnet-6-docker:/src \
  mcr.microsoft.com/dotnet/sdk:6.0

Then you can repeat the steps to create a new .NET 6 app and run it inside a container.

And in your Dockerfile you'll use the mcr.microsoft.com/dotnet/sdk:6.0 image for the builder stage and the mcr.microsoft.com/dotnet/aspnet:6.0 image for the final application image.

It's a nice workflow to try out a new major or minor version of .NET with no dependencies (other than Docker). You can even put your docker build command into a GitHub workflow and build and package your app from your cource code repo - check my YouTube show Continuous Deployment with Docker and GitHub for more information on that.

Build Docker images *quickly* with GitHub Actions and a self-hosted runner

Build Docker images *quickly* with GitHub Actions and a self-hosted runner

GitHub Actions is a fantastic workflow engine. Combine it with multi-stage Docker builds and you have a CI process defined in a few lines of YAML, which lives inside your Git repo.

I covered this in an epsiode of my container show - ECS-C2: Continuous Deployment with Docker and GitHub on YouTube

You can use GitHub's own servers (in Azure) to run your workflows - they call them runners and they have Linux and Windows options, with a bunch of software preinstalled (including Docker). There's an allocation of free minutes with your account which means your whole CI (and CD) process can be zero cost.

The downside of using GitHub's runners is that every job starts with a fresh environment. That means no Docker build cache and no pre-pulled images (apart from these Linux base images on the Ubuntu runner and these on Windows). If your Dockerfiles are heavily optimized to use the cache, you'll suddenly lose all that benefit because every run starts with an empty cache.

Speeding up the build farm

You have quite a few options here. Caching Docker builds in GitHub Actions: Which approach is the fastest? 🤔 A research by Thai Pangsakulyanont gives you an excellent overview:

  • using the GitHub Actions cache with BuildKit
  • saving and loading images as TAR files in the Actions cache
  • using a local Docker registry in the build
  • using GitHub's package registry (now GitHub Container Registry).

None of those will work if your base images are huge.

The GitHub Actions cache is only good for 5GB so that's out. Pulling from remote registries will take too long. Image layers are heavily compressed, and when Docker pulls an image it extracts the archive - so gigabytes of pulls will take network transfer time and lots of CPU time (the self-hosted runners only have 2 cores).

This blog walks through the alternative approach, using your own infrastructure to run the build - a self-hosted runner. That's your own VM which you'll reuse for every build. You can pre-pull whatever SDK and runtime images you need and they'll always be there, and you get the Docker build cache optimizations without any funky setup.

Self-hosted runners are particularly useful for Windows apps, but the approach is the same for Linux. I dug into this when I was building out a Dockerized CI process for a client, and every build was taking 45 minutes...

Create a self-hosted runner

This is all surprisingly easy. You don't need any special ports open in your VM or a fixed IP address. The GitHub docs to create a self-hosted runner explain it all nicely, the approach is basically:

  • create your VM
  • follow the scripts in your GitHub repo to deploy the runner
  • as part of the setup, you'll configure the runner as a daemon (or Windows Service) so it's always available.

In the Settings...Actions section of your repo on GitHub you'll find the option to add a runner. GitHub supports cross-platform runners, so you can deploy to Windows or macOS on Intel, and Linux on Intel or Arm:

Build Docker images *quickly* with GitHub Actions and a self-hosted runner

That's all straightforward, but you don't want a VM running 24x7 to provide a CI service you'll only use when code gets pushed, so here's the good part: you'll start and stop your VM as part of the GitHub workflow.

Managing the VM in the workflow

My self-hosted runner is an Azure VM. In Azure you only pay for the compute when your VM is running, and you can easily start and stop VMs with az, the Azure command line:

# start the VM:
az start -g ci-resource-group -n runner-vm

# deallocate the VM - deallocation means the VM stops and we're not charged for compute:
az deallocate-g ci-resource-group -n runner-vm  

It's easy enough to add those start and stop steps in your workflow. You can map dependencies so the build step won't happen until the runner has been started. So your GitHub action will have three jobs:

  • job 1 - on GitHub's hosted runner - start the VM for the self-hosted runner
  • job 2 - on the self-hosted runner - execute your super-fast Docker build
  • job 3 - on GitHub's hosted runner - stop the VM

You'll need to create a Service Principal and save the credentials as a GitHub secret so you can log in with the Azure Login action.

The full workflow looks something like this:

name: optimized Docker build

on:  
  push:
    paths:
      - "docker/**"
      - "src/**"
      - ".github/workflows/build.yaml"
  schedule:
    - cron: "0 5 * * *"
  workflow_dispatch:

jobs:  
  start-runner:
    runs-on: ubuntu-18.04
    steps:
      - name: Login 
        uses: azure/login@v1
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}     
      - name: Start self-hosted runner
        run: |
          az vm start -g ci-rg -n ci-runner

  build:
    runs-on: [self-hosted, docker]
    needs: start-runner
    steps:
      - uses: actions/checkout@master   
      - name: Build images   
        working-directory: docker/base
        run: |
          docker-compose build --pull 

  stop-runner:
    runs-on: ubuntu-18.04
    needs: build
    steps:
      - name: Login 
        uses: azure/login@v1
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}
      - name: Deallocate self-hosted runner
        run: |
          az vm deallocate -g ci-rg -n ci-runner --no-wait

Here are the notable points:

  • an on-push trigger with path filters, so the workflow will run when a push has a change to source code, or the Docker artifacts or the workflow definition

  • a scheduled trigger so the build runs every day. You should definitely do this with Dockerized builds. SDK and runtime image updates could fail your build, and you want to know that ASAP

  • the build job won't be queued until the start-runner job has finished. It will stay queued until your runner comes online - even if it takes a minute or so for the runner daemon to start. As soon as the runner starts, the build step runs.

Improvement and cost

This build was for a Windows app that uses the graphics subsystem so it needs the full Windows Docker image. That's a big one, so the jobs were taking 45-60 minutes to run every time - no performance advantage from all my best-practice Dockerfile optimization.

With the self-hosted runner, repeat builds take 9-10 minutes. Starting the VM takes 1-2 minutes, and the build stage takes around 5 minutes. If we run 10 builds a day, we'll only be billed for 1 hour of VM compute time.

Your mileage may vary.

❌