Vue normale

Il y a de nouveaux articles disponibles, cliquez pour rafraîchir la page.
À partir d’avant-hierFlux principal

Running DeepSeek R1 on Azure Kubernetes Service (AKS) using Ollama

Par : Adesoji Alu
11 mars 2025 à 12:28
Introduction DeepSeek is an advanced open-source code language model (LLM) that has gained significant popularity in the developer community. When paired with Ollama, an easy-to-use framework for running and managing LLMs locally, and deployed on Azure Kubernetes Service (AKS), we can create a powerful, scalable, and cost-effective environment for AI applications. This blog post walks […]

You can't always have Kubernetes: running containers in Azure VM Scale Sets

9 mars 2021 à 15:51
You can't always have Kubernetes: running containers in Azure VM Scale Sets

Rule number 1 for running containers in production: don't run them on individual Docker servers. You want reliability, scale and automated upgrades and for that you need an orchestrator like Kubernetes, or a managed container platform like Azure Container Instances.

If you're choosing between container platforms, my new Pluralsight course Deploying Containerized Applications walks you through the major options.

But the thing about production is: you've got to get your system running, and real systems have technical constraints. Those constraints might mean you have to forget the rules. This post covers a client project I worked on where my design had to forsake rule number 1, and build a scalable and reliable system based on containers running on VMs.

This post is a mixture of architecture diagrams and scripts - just like the client engagement.

When Kubernetes won't do

I was brought in to design the production deployment, and build out the DevOps pipeline. The system was for provisioning bots which join online meetings. The client had run a successful prototype with a single bot running on a VM in Azure.

The goal was to scale the solution to run multiple bots, with each bot running in a Docker container. In production the system would need to scale quickly, spinning up more containers to join meetings on demand - and more hosts to provide capacity for more containers.

So far, so Kubernetes. Each bot needs to be individually addressable, and the connection from the bot to the meeting server uses mutual TLS. The bot has two communication channels - HTTPS for a REST API, and a direct TCP connection for the data stream from the meeting. That can all be done with Kubernetes - Services with custom ports for each bot, Secrets for the TLS certs, and a public IP address for each node.

If you want to learn how to model an app like that, my book Learn Kubernetes in a Month of Lunches is just the thing for you :)

But... The bot uses a Windows-only library to connect to the meeting, and the bot workload involves a lot of video manipulation. So that brought in the technical constraints for the containers:

  • they need to run with GPU access
  • the app uses the Windows video subsystem, and that needs the full (big!) Windows base Docker image.

Right now you can run GPU workloads in Kubernetes, but only in Linux Pods, and you can run containers with GPUs in in Azure Container Instances, but only for Linux containers. So we're looking at a valid scenario where orchestration and managed container services won't do.

The alternative - Docker containers on Windows VMs in Azure

You can run Docker containers with GPU access on Windows with the devices flag. You need to have your GPU drivers set up and configured, and then your containers will have GPU access (the DirectX Container Sample walks through it all):

# on Windows 10 20H2:
docker run --isolation process --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 sixeyed/winml-runner:20H2

# on Windows Server LTSC 2019:
docker run --isolation process --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 sixeyed/winml-runner:1809

The container also needs to be running with process isolation - see my container show ECS-W4: Isolation and Versioning in Windows Containers on YouTube for more details on that.

Note - we're talking about the standard Docker Engine here. GPU access for containers used to require an Nvidia fork of Docker, but now GPU access is part of the main Docker runtime.

You can spin up Windows VMs with GPUs in Azure, and have Docker already installed using the Windows Server 2019 Datacenter with Containers VM image. And for the scaling requirements, there are Virtual Machine Scale Sets (VMSS), which let you run multiple instances of the same VM image - where each instance can run multiple containers.

The design I sketched out looked like this:

You can't always have Kubernetes: running containers in Azure VM Scale Sets

  • each VM hosts multiple containers, each using custom ports
  • a load balancer spans all the VMs in the scale set
  • load balancer rules are configured for each bot's ports

The idea is to run a minimum number of VMs, providing a stable pool of bot containers. Then we can scale up and add more VMs running more containers as required. Each bot is uniquely addressable within the pool, with a predictable address range, so bots.sixeyed.com:8031 would reach the first container on the third VM and bots.sixeyed.com:8084 would reach the fourth container on the eighth VM.

Using a custom VM image

With this approach the VM is the unit of scale. My assumption was that adding a new VM to provide more bot capacity would take several minutes - too long for a client waiting for a bot to join. So the plan was to run with spare capacity in the bot pool, scaling up the VMSS when the pool of free bots fell below a threshold.

Even so, scaling up to add a new VM had to be a quick operation - not waiting minutes to pull the super-sized Windows base image and extract all the layers. The first step in minmizing scale-up time is to use a custom VM image for the scale set.

A VMSS base image can be set up manually by running a VM and doing whatever you need to do. In this case I could use the Windows Server 2019 image with Docker configured, and then run an Azure extension to install the Nvidia GPU drivers:

# create vm:
az vm create `
  --resource-group $rg `
  --name $vmName `
  --image 'MicrosoftWindowsServer:WindowsServer:2019-Datacenter-Core-with-Containers' `
  --size 'Standard_NC6_Promo' `
  --admin-username $username `
  --admin-password $password

# deploy the nvidia drivers:
az vm extension set `
  --resource-group $rg `
  --vm-name $vmName `
  --name NvidiaGpuDriverWindows `
  --publisher Microsoft.HpcCompute `
  --version 1.3

The additional setup for this particular VM:

Then you can create a private base image from the VM, first deallocating and generalizing it:

az vm deallocate --resource-group $rg --name $vmName

az vm generalize --resource-group $rg --name $vmName

az image create --resource-group $rg `
    --name $imageName --source $vmName

The image can be in its own Resource Group - you can use it for VMSSs in other Resources Groups.

Creating the VM Scale Set

Scripting all the setup with the Azure CLI makes for a nice repeatable process - which you can easily put into a GitHub workflow. The az documentation is excellent and you can build up pretty much any Azure solution using just the CLI.

There are a few nice features you can use with VMSS that simplify the rest of the deployment. This abridged command shows the main details:

az vmss create `
   --image $imageId `
   --subnet $subnetId `
   --public-ip-per-vm `
   --public-ip-address-dns-name $vmssPipDomainName `
   --assign-identity `
  ...

That's going to use my custom base image, and attach the VMs in the scale set to a specific virtual network subnet - so they can connect to other components in the client's backend. Each VM will get its own public IP address, and a custom DNS name will be applied to the public IP address for the load balancer across the set.

The VMs will use managed identity - so they can securely use other Azure resources without passing credentials around. You can use az role assignment create to grant access for the VMSS managed identity to ACR.

When the VMSS is created, you can set up the rules for the load balancer, directing the traffic for each port to a specific bot container. This is what makes each container individually addressable - only one container in the VMSS will listen on a specific port. A health probe in the LB tests for a TCP connection on the port, so only the VM which is running that container will pass the probe and be sent traffic.

# health probe:
az network lb probe create `
 --resource-group $rg --lb-name $lbName `
 -n "p$port" --protocol tcp --port $port

# LB rule:
az network lb rule create `
 --resource-group $rgName --lb-name $lbName `
 --frontend-ip-name loadBalancerFrontEnd `
 --backend-pool-name $backendPoolName `
 --probe-name "p$port" -n "p$port" --protocol Tcp `
 --frontend-port $port --backend-port $port

Spinning up containers on VMSS instances

You can use the Azure VM custom script extension to run a script on a VM, and you can trigger that on all the instances in a VMSS. This is the deployment and upgrade process for the bot containers - run a script which pulls the app image and starts the containers.

Up until now the solution is pretty solid. This script is the ugly part, because we're going to manually spin up the containers using docker run:

docker container run -d `
 -p "$($port):443" `
 --restart always `
 --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 `
 $imageName

The real script adds an env-file for config settings, and the run commands are in a loop so we can dynamically set the number of containers to run on each VM. So what's wrong with this? Nothing is managing the containers. The restart flag means Docker will restart the container if the app crashes, and start the containers if the VM restarts, but that's all the additional reliability we'll get.

In the client's solution, they added functionality to their backend API to manage the containers - but that sounds a lot like writing a custom orchestrator...

Moving on from the script, upgrading the VMSS instances is simple to do. The script and any additional assets - env files and certs - can be uploaded to private blob storage, using SAS tokens for the VM to download. You use JSON configuration for the script extension and you can split out sensitive settings.

# set the script on the VMSS:
az vmss extension set `
    --publisher Microsoft.Compute `
    --version 1.10 `
    --name CustomScriptExtension `
    --resource-group $rg `
    --vmss-name $vmss `
    --settings $settings.Replace('"','\"') `
    --protected-settings $protectedSettings.Replace('"','\"')

# updating all instances triggers the script:
az vmss update-instances `
 --instance-ids * `
 --name $vmss `
 --resource-group $rg

When you apply the custom script extension that updates the model for the VMSS - but it doesn't actually run the script. The next step does that, updating instances runs the script on each of them, replacing the containers with the new Docker image version.

Code and infra workflows

All the Azure scripts can live in a separate GitHub repo, with secrets added for the az authentication, cert passwords and everything else. The upgrade scripts to deploy the custom script extension and update the VMSS instances can sit in a workflow with a workflow_dispatch trigger and input parameters:

on:
  workflow_dispatch:
    inputs:
      environment:
        description: 'Environment to deploy: dev, test or prod'     
        required: true
        default: 'dev'
      imageTag:
        description: 'Image tag to deploy, e.g. v1.0-175'     
        required: true
        default: 'v1.0'

The Dockerfile for the image lives in the source code repo with the rest of the bot code. The workflow in that repo build and pushes the image and ends by triggering the upgrade deployment in the infra repo - using Ben Coleman's benc-uk/workflow-dispatch action:

deploy-dev:  
  if: ${{ github.ref == 'refs/heads/dev' }}
  runs-on: ubuntu-18.04
  needs: build-teams-bot
    steps:
    - name: Dispatch upgrade workflow
      uses: benc-uk/workflow-dispatch@v1
      with:
        workflow: Upgrade bot containers
        repo: org/infra-repo
        token: ${{ secrets.ACCESS_TOKEN }}
        inputs: '{"environment":"dev", "imageTag":"v1.0-${{github.run_number}}"}'
        ref: master

So the final pipeline looks like this:

  • devs push to the main codebase
  • build workflow triggered - uses Docker to compile the code and package the image
  • if the build is successful, that triggers the publish workflow in the infrastructure repo
  • the publish workflow updates the VM script to use the new image label, and deploys it to the Azure VMSS.

I covered GitHub workflows with Docker in ECS-C2: Continuous Deployment with Docker and GitHub on YouTube

Neat and automated for a reliable and scalable deployment. Just don't tell anyone we're running containers on individual servers, instead of using an orchestrator...

Crossplane Providers and Managed Resources | Tutorial (Part 2)

7 mars 2024 à 16:11

In this second installment of our Crossplane tutorial series, we dive deeper into the world of Crossplane Providers and Managed Resources. Watch as we guide you through the process of setting up and utilizing various providers and resources to manage your cloud infrastructure and services using Crossplane’s Kubernetes-style APIs. In this video, you’ll learn how to configure connections to cloud providers like AWS, GCP, and Azure, and show you how to create and manage resources in those.

▬▬▬▬▬▬ 📖 The Book 📖 ▬▬▬▬▬▬
Amazon: https://www.amazon.com/dp/B0CWCYP5CJ
LeanPub: https://leanpub.com/crossplane

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬
➡ Gist with the commands: https://gist.github.com/vfarcic/aa5ecfa315608d1257ba56df18088f2f
🔗 Crossplane: https://crossplane.io
🎬 Say Goodbye to Containers – Ephemeral Environments with Nix Shell: https://youtu.be/0ulldVwZiKA

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬
If you are interested in sponsoring this channel, please use https://calendly.com/vfarcic/meet to book a timeslot that suits and we’ll go over the details. Or feel free to contact me over Twitter or LinkedIn (see below)

▬▬▬▬▬▬ 🚀 Livestreams & podcasts 🚀 ▬▬▬▬▬▬
🎤 Podcast: https://www.devopsparadox.com/
💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬
➡ Follow me on Twitter: https://twitter.com/vfarcic
➡ Follow me on LinkedIn: https://www.linkedin.com/in/viktorfarcic/

Azure Container Registry and Docker Hub: Connecting the Dots with Seamless Authentication and Artifact Cache

Par : Jason Dunne
29 février 2024 à 14:48

By leveraging the wide array of public images available on Docker Hub, developers can accelerate development workflows, enhance productivity, and, ultimately, ship scalable applications that run like clockwork. When building with public content, acknowledging the potential operational risks associated with using that content without proper authentication is crucial. 

In this post, we will describe best practices for mitigating these risks and ensuring the security and reliability of your containers.

Black padlock on light blue digital background

Import public content locally

There are several advantages to importing public content locally. Doing so improves the availability and reliability of your public content pipeline and protects you from failed CI builds. By importing your public content, you can easily validate, verify, and deploy images to help run your business more reliably.

For more information on this best practice, check out the Open Container Initiative’s guide on Consuming Public Content.

Configure Artifact Cache to consume public content

Another best practice is to configure Artifact Cache to consume public content. Azure Container Registry’s (ACR) Artifact Cache feature allows you to cache your container artifacts in your own Azure Container Registry, even for private networks. This approach limits the impact of rate limits and dramatically increases pull reliability when combined with geo-replicated ACR, allowing you to pull artifacts from the region closest to your Azure resource. 

Additionally, ACR offers various security features, such as private networks, firewall configuration, service principals, and more, which can help you secure your container workloads. For complete information on using public content with ACR Artifact Cache, refer to the Artifact Cache technical documentation.

Authenticate pulls with public registries

We recommend authenticating your pull requests to Docker Hub using subscription credentials. Docker Hub offers developers the ability to authenticate when building with public library content. Authenticated users also have access to pull content directly from private repositories. For more information, visit the Docker subscriptions page. Microsoft Artifact Cache also supports authenticating with other public registries, providing an additional layer of security for your container workloads.

Following these best practices when using public content from Docker Hub can help mitigate security and reliability risks in your development and operational cycles. By importing public content locally, configuring Artifact Cache, and setting up preferred authentication methods, you can ensure your container workloads are secure and reliable.

Learn more about securing containers

Additional resources for improving container security for Microsoft and Docker customers

Stop Giving Permanent Access To Anyone: Just-in-Time with Apono

11 septembre 2023 à 14:34

Granting permanent access to anyone for anything is dangerous and unnecessary. Instead, we should be using the just-in-time approach, and Apono might be just the solution for that. Enhance security, prevent breaches, and empower us to control access more effectively while easily giving temporary access to whoever needs it with Apono.

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬
➡ Gist with the commands: https://gist.github.com/vfarcic/557b6cfcf0655a78e6f2a146564e4861

▬▬▬▬▬▬ 💰 Sponsoships 💰 ▬▬▬▬▬▬
If you are interested in sponsoring this channel, please use https://calendly.com/vfarcic/meet to book a timeslot that suits and we’ll go over the details. Or feel free to contact me over Twitter or LinkedIn (see below)

▬▬▬▬▬▬ 🚀 Livestreams & podcasts 🚀 ▬▬▬▬▬▬
🎤 Podcast: https://www.devopsparadox.com/
💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬
➡ Follow me on Twitter: https://twitter.com/vfarcic
➡ Follow me on LinkedIn: https://www.linkedin.com/in/viktorfarcic/

How To Create A Complete Internal Developer Platform (IDP)?

15 mai 2023 à 15:22

It’s time to build an internal developer platform (IDO) with Crossplane, Argo CD, SchemaHero, External Secrets Operator (ESO), GitHub Actions, Port, and a few others.

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬
➡ Gist with the commands: https://gist.github.com/vfarcic/78c1d2a87baf31512b87a2254194b11c
🎬 DevOps MUST Build Internal Developer Platform (IDP): https://youtu.be/j5i00z3QXyU
🎬 How To Create A “Proper” CLI With Shell And Charm Gum: https://youtu.be/U8zCHA-9VLA
🎬 Crossplane – GitOps-based Infrastructure as Code through Kubernetes API: https://youtu.be/n8KjVmuHm7A
🎬 How To Shift Left Infrastructure Management Using Crossplane Compositions: https://youtu.be/AtbS1u2j7po
🎬 Argo CD – Applying GitOps Principles To Manage A Production Environment In Kubernetes: https://youtu.be/vpWQeoaiRM4
🎬 How To Apply GitOps To Everything – Combining Argo CD And Crossplane: https://youtu.be/yrj4lmScKHQ
🎬 SchemaHero – Database Schema Migrations Inside Kubernetes: https://youtu.be/SofQxb4CDQQ
🎬 Manage Kubernetes Secrets With External Secrets Operator (ESO): https://youtu.be/SyRZe5YVCVk
🎬 Github Actions Review And Tutorial: https://youtu.be/eZcAvTb0rbA
🎬 GitHub CLI (gh) – How to manage repositories more efficiently: https://youtu.be/BII6ZY2Rnlc
🎬 How To Build A UI For An Internal Developer Platform (IDP) With Port?: https://youtu.be/ro-h7tsp0qI

▬▬▬▬▬▬ 💰 Sponsoships 💰 ▬▬▬▬▬▬
If you are interested in sponsoring this channel, please use https://calendly.com/vfarcic/meet to book a timeslot that suits and we’ll go over the details. Or feel free to contact me over Twitter or LinkedIn (see below)

▬▬▬▬▬▬ 🚀 Livestreams & podcasts 🚀 ▬▬▬▬▬▬
🎤 Podcast: https://www.devopsparadox.com/
💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬
➡ Follow me on Twitter: https://twitter.com/vfarcic
➡ Follow me on LinkedIn: https://www.linkedin.com/in/viktorfarcic/

How To Secure Everything Without Making Everyone Suffer

3 avril 2023 à 15:14

What makes a system secure? How do we secure everything, no matter whether it’s running inside Kubernetes clusters, Cloud providers like AWS, Azure, Google Cloud (GCP), or anything else?

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬
🎬 Virtual Machines (VMs) Inside Kubernetes Clusters With KubeVirt: https://youtu.be/oO8VEmpojz0
🎬 How To Create, Provision, And Operate Kubernetes With Cluster API (CAPI): https://youtu.be/8yUDUhZ6ako
🎬 Crossplane – GitOps-based Infrastructure as Code through Kubernetes API: https://youtu.be/n8KjVmuHm7A
🎬 Metacontroller – Custom Kubernetes Controllers The Easy Way: https://youtu.be/3xkLYOpXy2U
🎬 Cloud-Native Apps With Open Application Model (OAM) And KubeVela: https://youtu.be/2CBu6sOTtwk
🎬 How To Shift Left Infrastructure Management Using Crossplane Compositions: https://youtu.be/AtbS1u2j7po
🎬 How to apply policies in Kubernetes using Open Policy Agent (OPA) and Gatekeeper: https://youtu.be/14lGc7xMAe4
🎬 Kubernetes-Native Policy Management With Kyverno: https://youtu.be/DREjzfTzNpA
🎬 Admission Controllers Or CLI? Kubernetes Policy Validations with Datree: https://youtu.be/WTh84BPHC4o
🎬 Kubernetes Validating Admission Policy Changes The Game: https://youtu.be/EsZcDUaSUss
🎬 Argo CD – Applying GitOps Principles To Manage A Production Environment In Kubernetes: https://youtu.be/vpWQeoaiRM4
🎬 Flux CD v2 With GitOps Toolkit – Kubernetes Deployment And Sync Mechanism: https://youtu.be/R6OeIgb7lUI
🎬 Rancher Fleet: GitOps Across A Large Number Of Kubernetes Clusters: https://youtu.be/rIH_2CUXmwM
🎬 Signing And Verifying Container Images With Sigstore Cosign And Kyverno: https://youtu.be/HLb1Q086u6M
🎬 Manage Container (Docker) Images, Helm, CNAB, and Other Artifacts With Harbor: https://youtu.be/f931M4-my1k
🎬 Manage Kubernetes Secrets With External Secrets Operator (ESO): https://youtu.be/SyRZe5YVCVk
🎬 Eliminate Kubernetes Secrets With Secrets Store CSI Driver (SSCSID): https://youtu.be/DsQu66ZMG4M
🎬 Bitnami Sealed Secrets – How To Store Kubernetes Secrets In Git Repositories: https://youtu.be/xd2QoV6GJlc

▬▬▬▬▬▬ 💰 Sponsoships 💰 ▬▬▬▬▬▬
If you are interested in sponsoring this channel, please use https://calendly.com/vfarcic/meet to book a timeslot that suits and we’ll go over the details. Or feel free to contact me over Twitter or LinkedIn (see below)

▬▬▬▬▬▬ 🚀 Livestreams & podcasts 🚀 ▬▬▬▬▬▬
🎤 Podcast: https://www.devopsparadox.com/
💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬
➡ Follow me on Twitter: https://twitter.com/vfarcic
➡ Follow me on LinkedIn: https://www.linkedin.com/in/viktorfarcic/

You can't always have Kubernetes: running containers in Azure VM Scale Sets

9 mars 2021 à 15:51
You can't always have Kubernetes: running containers in Azure VM Scale Sets

Rule number 1 for running containers in production: don't run them on individual Docker servers. You want reliability, scale and automated upgrades and for that you need an orchestrator like Kubernetes, or a managed container platform like Azure Container Instances.

If you're choosing between container platforms, my new Pluralsight course Deploying Containerized Applications walks you through the major options.

But the thing about production is: you've got to get your system running, and real systems have technical constraints. Those constraints might mean you have to forget the rules. This post covers a client project I worked on where my design had to forsake rule number 1, and build a scalable and reliable system based on containers running on VMs.

This post is a mixture of architecture diagrams and scripts - just like the client engagement.

When Kubernetes won't do

I was brought in to design the production deployment, and build out the DevOps pipeline. The system was for provisioning bots which join online meetings. The client had run a successful prototype with a single bot running on a VM in Azure.

The goal was to scale the solution to run multiple bots, with each bot running in a Docker container. In production the system would need to scale quickly, spinning up more containers to join meetings on demand - and more hosts to provide capacity for more containers.

So far, so Kubernetes. Each bot needs to be individually addressable, and the connection from the bot to the meeting server uses mutual TLS. The bot has two communication channels - HTTPS for a REST API, and a direct TCP connection for the data stream from the meeting. That can all be done with Kubernetes - Services with custom ports for each bot, Secrets for the TLS certs, and a public IP address for each node.

If you want to learn how to model an app like that, my book Learn Kubernetes in a Month of Lunches is just the thing for you :)

But... The bot uses a Windows-only library to connect to the meeting, and the bot workload involves a lot of video manipulation. So that brought in the technical constraints for the containers:

  • they need to run with GPU access
  • the app uses the Windows video subsystem, and that needs the full (big!) Windows base Docker image.

Right now you can run GPU workloads in Kubernetes, but only in Linux Pods, and you can run containers with GPUs in in Azure Container Instances, but only for Linux containers. So we're looking at a valid scenario where orchestration and managed container services won't do.

The alternative - Docker containers on Windows VMs in Azure

You can run Docker containers with GPU access on Windows with the devices flag. You need to have your GPU drivers set up and configured, and then your containers will have GPU access (the DirectX Container Sample walks through it all):

# on Windows 10 20H2:
docker run --isolation process --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 sixeyed/winml-runner:20H2

# on Windows Server LTSC 2019:
docker run --isolation process --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 sixeyed/winml-runner:1809  

The container also needs to be running with process isolation - see my container show ECS-W4: Isolation and Versioning in Windows Containers on YouTube for more details on that.

Note - we're talking about the standard Docker Engine here. GPU access for containers used to require an Nvidia fork of Docker, but now GPU access is part of the main Docker runtime.

You can spin up Windows VMs with GPUs in Azure, and have Docker already installed using the Windows Server 2019 Datacenter with Containers VM image. And for the scaling requirements, there are Virtual Machine Scale Sets (VMSS), which let you run multiple instances of the same VM image - where each instance can run multiple containers.

The design I sketched out looked like this:

You can't always have Kubernetes: running containers in Azure VM Scale Sets

  • each VM hosts multiple containers, each using custom ports
  • a load balancer spans all the VMs in the scale set
  • load balancer rules are configured for each bot's ports

The idea is to run a minimum number of VMs, providing a stable pool of bot containers. Then we can scale up and add more VMs running more containers as required. Each bot is uniquely addressable within the pool, with a predictable address range, so bots.sixeyed.com:8031 would reach the first container on the third VM and bots.sixeyed.com:8084 would reach the fourth container on the eighth VM.

Using a custom VM image

With this approach the VM is the unit of scale. My assumption was that adding a new VM to provide more bot capacity would take several minutes - too long for a client waiting for a bot to join. So the plan was to run with spare capacity in the bot pool, scaling up the VMSS when the pool of free bots fell below a threshold.

Even so, scaling up to add a new VM had to be a quick operation - not waiting minutes to pull the super-sized Windows base image and extract all the layers. The first step in minmizing scale-up time is to use a custom VM image for the scale set.

A VMSS base image can be set up manually by running a VM and doing whatever you need to do. In this case I could use the Windows Server 2019 image with Docker configured, and then run an Azure extension to install the Nvidia GPU drivers:

# create vm:
az vm create `  
  --resource-group $rg `
  --name $vmName `
  --image 'MicrosoftWindowsServer:WindowsServer:2019-Datacenter-Core-with-Containers' `
  --size 'Standard_NC6_Promo' `
  --admin-username $username `
  --admin-password $password

# deploy the nvidia drivers:
az vm extension set `  
  --resource-group $rg `
  --vm-name $vmName `
  --name NvidiaGpuDriverWindows `
  --publisher Microsoft.HpcCompute `
  --version 1.3

The additional setup for this particular VM:

Then you can create a private base image from the VM, first deallocating and generalizing it:

az vm deallocate --resource-group $rg --name $vmName

az vm generalize --resource-group $rg --name $vmName

az image create --resource-group $rg `  
    --name $imageName --source $vmName

The image can be in its own Resource Group - you can use it for VMSSs in other Resources Groups.

Creating the VM Scale Set

Scripting all the setup with the Azure CLI makes for a nice repeatable process - which you can easily put into a GitHub workflow. The az documentation is excellent and you can build up pretty much any Azure solution using just the CLI.

There are a few nice features you can use with VMSS that simplify the rest of the deployment. This abridged command shows the main details:

az vmss create `  
   --image $imageId `
   --subnet $subnetId `
   --public-ip-per-vm `
   --public-ip-address-dns-name $vmssPipDomainName `
   --assign-identity `
  ...

That's going to use my custom base image, and attach the VMs in the scale set to a specific virtual network subnet - so they can connect to other components in the client's backend. Each VM will get its own public IP address, and a custom DNS name will be applied to the public IP address for the load balancer across the set.

The VMs will use managed identity - so they can securely use other Azure resources without passing credentials around. You can use az role assignment create to grant access for the VMSS managed identity to ACR.

When the VMSS is created, you can set up the rules for the load balancer, directing the traffic for each port to a specific bot container. This is what makes each container individually addressable - only one container in the VMSS will listen on a specific port. A health probe in the LB tests for a TCP connection on the port, so only the VM which is running that container will pass the probe and be sent traffic.

# health probe:
az network lb probe create `  
 --resource-group $rg --lb-name $lbName `
 -n "p$port" --protocol tcp --port $port

# LB rule:
az network lb rule create `  
 --resource-group $rgName --lb-name $lbName `
 --frontend-ip-name loadBalancerFrontEnd `
 --backend-pool-name $backendPoolName `
 --probe-name "p$port" -n "p$port" --protocol Tcp `
 --frontend-port $port --backend-port $port

Spinning up containers on VMSS instances

You can use the Azure VM custom script extension to run a script on a VM, and you can trigger that on all the instances in a VMSS. This is the deployment and upgrade process for the bot containers - run a script which pulls the app image and starts the containers.

Up until now the solution is pretty solid. This script is the ugly part, because we're going to manually spin up the containers using docker run:

docker container run -d `  
 -p "$($port):443" `
 --restart always `
 --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 `
 $imageName

The real script adds an env-file for config settings, and the run commands are in a loop so we can dynamically set the number of containers to run on each VM. So what's wrong with this? Nothing is managing the containers. The restart flag means Docker will restart the container if the app crashes, and start the containers if the VM restarts, but that's all the additional reliability we'll get.

In the client's solution, they added functionality to their backend API to manage the containers - but that sounds a lot like writing a custom orchestrator...

Moving on from the script, upgrading the VMSS instances is simple to do. The script and any additional assets - env files and certs - can be uploaded to private blob storage, using SAS tokens for the VM to download. You use JSON configuration for the script extension and you can split out sensitive settings.

# set the script on the VMSS:
az vmss extension set `  
    --publisher Microsoft.Compute `
    --version 1.10 `
    --name CustomScriptExtension `
    --resource-group $rg `
    --vmss-name $vmss `
    --settings $settings.Replace('"','\"') `
    --protected-settings $protectedSettings.Replace('"','\"')

# updating all instances triggers the script:
az vmss update-instances `  
 --instance-ids * `
 --name $vmss `
 --resource-group $rg

When you apply the custom script extension that updates the model for the VMSS - but it doesn't actually run the script. The next step does that, updating instances runs the script on each of them, replacing the containers with the new Docker image version.

Code and infra workflows

All the Azure scripts can live in a separate GitHub repo, with secrets added for the az authentication, cert passwords and everything else. The upgrade scripts to deploy the custom script extension and update the VMSS instances can sit in a workflow with a workflow_dispatch trigger and input parameters:

on:  
  workflow_dispatch:
    inputs:
      environment:
        description: 'Environment to deploy: dev, test or prod'     
        required: true
        default: 'dev'
      imageTag:
        description: 'Image tag to deploy, e.g. v1.0-175'     
        required: true
        default: 'v1.0'

The Dockerfile for the image lives in the source code repo with the rest of the bot code. The workflow in that repo build and pushes the image and ends by triggering the upgrade deployment in the infra repo - using Ben Coleman's benc-uk/workflow-dispatch action:

deploy-dev:  
  if: ${{ github.ref == 'refs/heads/dev' }}
  runs-on: ubuntu-18.04
  needs: build-teams-bot
    steps:
    - name: Dispatch upgrade workflow
      uses: benc-uk/workflow-dispatch@v1
      with:
        workflow: Upgrade bot containers
        repo: org/infra-repo
        token: ${{ secrets.ACCESS_TOKEN }}
        inputs: '{"environment":"dev", "imageTag":"v1.0-${{github.run_number}}"}'
        ref: master

So the final pipeline looks like this:

  • devs push to the main codebase
  • build workflow triggered - uses Docker to compile the code and package the image
  • if the build is successful, that triggers the publish workflow in the infrastructure repo
  • the publish workflow updates the VM script to use the new image label, and deploys it to the Azure VMSS.

I covered GitHub workflows with Docker in ECS-C2: Continuous Deployment with Docker and GitHub on YouTube

Neat and automated for a reliable and scalable deployment. Just don't tell anyone we're running containers on individual servers, instead of using an orchestrator...

❌
❌