Vue normale

Il y a de nouveaux articles disponibles, cliquez pour rafraîchir la page.
À partir d’avant-hierFlux principal

Shift-Left Testing with Testcontainers: Catching Bugs Early with Local Integration Tests

13 mars 2025 à 18:42

Modern software development emphasizes speed and agility, making efficient testing crucial. DORA research reveals that elite teams thrive with both high performance and reliability. They can achieve 127x faster lead times, 182x more deployments per year, 8x lower change failure rates and most impressively, 2,293x faster recovery times after incidents. The secret sauce is they “shift left.” 

Shift-Left is a practice that moves integration activities like testing and security earlier in the development cycle, allowing teams to detect and fix issues before they reach production. By incorporating local and integration tests early, developers can prevent costly late-stage defects, accelerate development, and improve software quality. 

In this article, you’ll learn how integration tests can help you catch defects earlier in the development inner loop and how Testcontainers can make them feel as lightweight and easy as unit tests. Finally, we’ll break down the impact that shifting left integration tests has on the development process velocity and lead time for changes according to DORA metrics. 

Real-world example: Case sensitivity bug in user registration

In a traditional workflow, integration and E2E tests are often executed in the outer loop of the development cycle, leading to delayed bug detection and expensive fixes. For example, if you are building a user registration service where users enter their email addresses, you must ensure that the emails are case-insensitive and not duplicated when stored. 

If case sensitivity is not handled properly and is assumed to be managed by the database, testing a scenario where users can register with duplicate emails differing only in letter case would only occur during E2E tests or manual checks. At that stage, it’s too late in the SDLC and can result in costly fixes.

By shifting testing earlier and enabling developers to spin up real services locally — such as databases, message brokers, cloud emulators, or other microservices — the testing process becomes significantly faster. This allows developers to detect and resolve defects sooner, preventing expensive late-stage fixes.

Let’s dive deep into this example scenario and how different types of tests would handle it.

Scenario

A new developer is implementing a user registration service and preparing for production deployment.

Code Example of the registerUser method

async registerUser(email: string, username: string): Promise<User> {
    const existingUser = await this.userRepository.findOne({
        where: { 
            email: email          
        }
    });

    if (existingUser) {
        throw new Error("Email already exists");
    }
    ...
}

The Bug

The registerUser method doesn’t handle case sensitivity properly and relies on the database or the UI framework to handle case insensitivity by default. So, in practice, users can register duplicate emails with both lower and upper letters  (e.g., user@example.com and USER@example.com).

Impact

  • Authentication issues arise because email case mismatches cause login failures.
  • Security vulnerabilities appear due to duplicate user identities.
  • Data inconsistencies complicate user identity management.

Testing method 1: Unit tests. 

These tests only validate the code itself, so email case sensitivity verification relies on the database where SQL queries are executed. Since unit tests don’t run against a real database, they can’t catch issues like case sensitivity. 

Testing method 2: End-to-end test or manual checks. 

These verifications will only catch the issue after the code is deployed to a staging environment. While automation can help, detecting issues this late in the development cycle delays feedback to developers and makes fixes more time-consuming and costly.

Testing method 3: Using mocks to simulate database interactions with Unit Tests. 

One approach that could work and allow us to iterate quickly would be to mock the database layer and define a mock repository that responds with the error. Then, we could write a unit test that executes really fast:

test('should prevent registration with same email in different case', async () => {
  const userService = new UserRegistrationService(new MockRepository());
  await userService.registerUser({ email: 'user@example.com', password: 'password123' });
  await expect(userService.registerUser({ email: 'USER@example.com', password: 'password123' }))
    .rejects.toThrow('Email already exists');
});

In the above example, the User service is created with a mock repository that’ll hold an in-memory representation of the database, i.e. as a map of users. This mock repository will detect if a user has passed twice, probably using the username as a non-case-sensitive key, returning the expected error. 

Here, we have to code the validation logic in the mock, replicating what the User service or the database should do. Whenever the user’s validation needs a change, e.g. not including special characters, we have to change the mock too. Otherwise, our tests will assert against an outdated state of the validations. If the usage of mocks is spread across the entire codebase, this maintenance could be very hard to do.

To avoid that, we consider that integration tests with real representations of the services we depend on. In the above example,  using the database repository is much better than mocks, because it provides us with more confidence on what we are testing.

Testing method 4: Shift-left local integration tests with Testcontainers 

Instead of using mocks, or waiting for staging to run the integration or E2E tests, we can detect the issue earlier.  This is achieved by enabling developers to run the integration tests for the project locally in the developer’s inner loop, using Testcontainers with a real PostgreSQL database.

Benefits

  • Time Savings: Tests run in seconds, catching the bug early.
  • More Realistic Testing: Uses an actual database instead of mocks.
  • Confidence in Production Readiness: Ensures business-critical logic behaves as expected.

Example integration test

First, let’s set up a PostgreSQL container using the Testcontainers library and create a userRepository to connect to this PostgreSQL instance:

let userService: UserRegistrationService;

beforeAll(async () => {
        container = await new PostgreSqlContainer("postgres:16")
            .start();
        
        dataSource = new DataSource({
            type: "postgres",
            host: container.getHost(),
            port: container.getMappedPort(5432),
            username: container.getUsername(),
            password: container.getPassword(),
            database: container.getDatabase(),
            entities: [User],
            synchronize: true,
            logging: true,
            connectTimeoutMS: 5000
        });
        await dataSource.initialize();
        const userRepository = dataSource.getRepository(User);
        userService = new UserRegistrationService(userRepository);
}, 30000);

Now, with initialized userService, we can use the registerUser method to test user registration with the real PostgreSQL instance:

test('should prevent registration with same email in different case', async () => {
  await userService.registerUser({ email: 'user@example.com', password: 'password123' });
  await expect(userService.registerUser({ email: 'USER@example.com', password: 'password123' }))
    .rejects.toThrow('Email already exists');
});

Why This Works

  • Uses a real PostgreSQL database via Testcontainers
  • Validates case-insensitive email uniqueness
  • Verifies email storage format

How Testcontainers helps

Testcontainers modules provide preconfigured implementations for the most popular technologies, making it easier than ever to write robust tests. Whether your application relies on databases, message brokers, cloud services like AWS (via LocalStack), or other microservices, Testcontainers has a module to streamline your testing workflow.

With Testcontainers, you can also mock and simulate service-level interactions or use contract tests to verify how your services interact with others. Combining this approach with local testing against real dependencies, Testcontainers provides a comprehensive solution for local integration testing and eliminates the need for shared integration testing environments, which are often difficult and costly to set up and manage. To run Testcontainers tests, you need a Docker context to spin up containers. Docker Desktop ensures seamless compatibility with Testcontainers for local testing. 

Testcontainers Cloud: Scalable Testing for High-Performing Teams

Testcontainers is a great solution to enable integration testing with real dependencies locally. If you want to take testing a step further — scaling Testcontainers usage across teams, monitoring images used for testing, or seamlessly running Testcontainers tests in CI — you should consider using Testcontainers Cloud. It provides ephemeral environments without the overhead of managing dedicated test infrastructure. Using Testcontainers Cloud locally and in CI ensures consistent testing outcomes, giving you greater confidence in your code changes. Additionally, Testcontainers Cloud allows you to seamlessly run integration tests in CI across multiple pipelines, helping to maintain high-quality standards at scale. Finally, Testcontainers Cloud is more secure and ideal for teams and enterprises who have more stringent requirements for containers’ security mechanisms.   

Measuring the business impact of shift-left testing

As we have seen, shift-left testing with Testcontainers significantly improves defect detection rate and time and reduces context switching for developers. Let’s take the example above and compare different production deployment workflows and how early-stage testing would impact developer productivity. 

Traditional workflow (shared integration environment)

Process breakdown:

The traditional workflow comprises writing feature code, running unit tests locally, committing changes, and creating pull requests for the verification flow in the outer loop. If a bug is detected in the outer loop, developers have to go back to their IDE and repeat the process of running the unit test locally and other steps to verify the fix. 

blog without shift left

Figure 1: Workflow of a traditional shared integration environment broken down by time taken for each step.

Lead Time for Changes (LTC): It takes at least 1 to 2 hours to discover and fix the bug (more depending on CI/CD load and established practices). In the best-case scenario, it would take approximately 2 hours from code commit to production deployment. In the worst-case scenario, it may take several hours or even days if multiple iterations are required.

Deployment Frequency (DF) Impact: Since fixing a pipeline failure can take around 2 hours and there’s a daily time constraint (8-hour workday), you can realistically deploy only 3 to 4 times per day. If multiple failures occur, deployment frequency can drop further.

Additional associated costs: Pipeline workers’ runtime minutes and Shared Integration Environment maintenance costs.

Developer Context Switching: Since bug detection occurs about 30 minutes after the code commit, developers lose focus. This leads to an increased cognitive load after they have to constantly context switch, debug, and then context switch again.

Shift-left workflow (local integration testing with Testcontainers)

Process breakdown:

The shift-left workflow is much simpler and starts with writing code and running unit tests. Instead of running integration tests in the outer loop, developers can run them locally in the inner loop to troubleshoot and fix issues. The changes are verified again before proceeding to the next steps and the outer loop. 

blog with shift left

Figure 2: Shift-Left Local Integration Testing with Testcontainers workflow broken down by time taken for each step. The feedback loop is much faster and saves developers time and headaches downstream.

Lead Time for Changes (LTC): It takes less than 20 minutes to discover and fix the bug in the developers’ inner loop. Therefore, local integration testing enables at least 65% faster defect identification than testing on a Shared Integration Environment.  

Deployment Frequency (DF) Impact: Since the defect was identified and fixed locally within 20 minutes, the pipeline would run to production, allowing for 10 or more deployments daily.

Additional associated costs: 5 Testcontainers Cloud minutes are consumed.  

Developer Context Switching: No context switching for the developer, as tests running locally provide immediate feedback on code changes and let the developer stay focused within the IDE and in the inner loop.

Key Takeaways

Traditional Workflow (Shared Integration Environment)Shift-Left Workflow (Local Integration Testing with Testcontainers)Improvements and further references
Faster Lead Time for Changes (LTCCode changes validated in hours or days. Developers wait for shared CI/CD environments.Code changes validated in minutes. Testing is immediate and local.>65% Faster Lead Time for Changes (LTC) Microsoft reduced lead time from days to hours by adopting shift-left practices.
Higher Deployment Frequency (DF)Deployment happens daily, weekly, or even monthly due to slow validation cycles.Continuous testing allows multiple deployments per day.2x Higher Deployment Frequency  2024 DORA Report shows shift-left practices more than double deployment frequency. Elite teams deploy 182x more often.
Lower Change Failure Rate (CFR)Bugs that escape into production can lead to costly rollbacks and emergency fixes.More bugs are caught earlier in CI/CD, reducing production failures.Lower Change Failure Rate – IBM’s Systems Sciences Institute estimates defects found in production cost 15x more to fix than those caught early.
Faster Mean Time to Recovery (MTTR)Fixes take hours, days, or weeks due to complex debugging in shared environments.Rapid bug resolution with local testing. Fixes verified in minutes.Faster MTTR—DORA’s elite performers restore service in less than one hour, compared to weeks to a month for low performers.
Cost SavingsExpensive shared environments, slow pipeline runs, high maintenance costs.Eliminates costly test environments, reducing infrastructure expenses.Significant Cost SavingsThoughtWorks Technology Radar highlights shared integration environments as fragile and expensive.

Table 1: Summary of key metrics improvement by using shifting left workflow with local testing using Testcontainers


Conclusion

Shift-left testing improves software quality by catching issues earlier, reducing debugging effort, enhancing system stability, and overall increasing developer productivity. As we’ve seen, traditional workflows relying on shared integration environments introduce inefficiencies, increasing lead time for changes, deployment delays, and cognitive load due to frequent context switching. In contrast, by introducing Testcontainers for local integration testing, developers can achieve:

  • Faster feedback loops – Bugs are identified and resolved within minutes, preventing delays.
  • More reliable application behavior – Testing in realistic environments ensures confidence in releases.
  • Reduced reliance on expensive staging environments – Minimizing shared infrastructure cuts costs and streamlines the CI/CD process.
  • Better developer flow state – Easily setting up local test scenarios and re-running them fast for debugging helps developers stay focused on innovation.

Testcontainers provides an easy and efficient way to test locally and catch expensive issues earlier. To scale across teams,  developers can consider using Docker Desktop and Testcontainers Cloud to run unit and integration tests locally, in the CI, or ephemeral environments without the complexity of maintaining dedicated test infrastructure. Learn more about Testcontainers and Testcontainers Cloud in our docs. 

Further Reading

Powered by Docker: Streamlining Engineering Operations as a Platform Engineer

Par : Neal Patel
18 février 2025 à 18:40

The Powered by Docker is a series of blog posts featuring use cases and success stories from Docker partners and practitioners. This story was contributed by Neal Patel from Siimpl.io. Neal has more than ten years of experience developing software and is a Docker Captain.

Background

As a platform engineer at a mid-size startup, I’m responsible for identifying bottlenecks and developing solutions to streamline engineering operations to keep up with the velocity and scale of the engineering organization. In this post, I outline some of the challenges we faced with one of our clients, how we addressed them, and provide guides on how to tackle these challenges at your company.

One of our clients faced critical engineering challenges, including poor synchronization between development and CI/CD environments, slow incident response due to inadequate rollback mechanisms, and fragmented telemetry tools that delayed issue resolution. Siimpl implemented strategic solutions to enhance development efficiency, improve system reliability, and streamline observability, turning obstacles into opportunities for growth.

Let’s walk through the primary challenges we encountered.

Inefficient development and deployment

  • Problem: We lacked parity between developer tooling and CI/CD tooling, which made it difficult for engineers to test changes confidently.
  • Goal: We needed to ensure consistent environments across development, testing, and production.

Unreliable incident response

  • Problem: If a rollback was necessary, we did not have the proper infrastructure to accomplish this efficiently.
  • Goal: We wanted to revert to stable versions in case of deployment issues easily.

Lack of comprehensive telemetry

  • Problem: Our SRE team created tooling to simplify collecting and publishing telemetry, but distribution and upgradability were poor. Also, we found adoption to be extremely low.
  • Goal: We needed to standardize how we configure telemetry collection, and simplify the configuration of auto-instrumentation libraries so the developer experience is turnkey.

Solution: Efficient development and deployment

blog Solution Efficient development 1200

CI/CD configuration with self-hosted GitHub runners and Docker Buildx

We had a requirement for multi-architecture support (arm64/amd64), which we initially implemented in CI/CD with Docker Buildx and QEMU. However, we noticed an extreme dip in performance due to the emulated architecture build times.

We were able to reduce build times by almost 90% by ditching QEMU (emulated builds), and targeting arm64 and amd64 self-hosted runners. This gave us the advantage of blazing-fast native architecture builds, but still allowed us to support multi-arch by publishing the manifest after-the-fact. 

Here’s a working example of the solution we will walk through: https://github.com/siimpl/multi-architecture-cicd

If you’d like to deploy this yourself, there’s a guide in the README.md.

Prerequisites

This project uses the following tools:

  • Docker Build Cloud (included in all Docker paid subscriptions.)
  • DBC cloud driver
  • GitHub/GitHub Actions
  • A managed container orchestration service like Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or  Google Kubernetes Engine (GKE)
  • Terraform
  • Helm

Because this project uses industry-standard tooling like Terraform, Kubernetes, and Helm, it can be easily adapted to any CI/CD or cloud solution you need.

Key features

The secret sauce of this solution is provisioning the self-hosted runners in a way that allows our CI/CD to specify which architecture to execute the build on.

The first step is to provision two node pools — an amd64 node pool and an arm64 node pool, which can be found in the aks.tf. In this example, the node_count is fixed at 1 for both node pools but for better scalability/flexibility you can also enable autoscaling for a dynamic pool.

resource "azurerm_kubernetes_cluster_node_pool" "amd64" {
  name                  = "amd64pool"
  kubernetes_cluster_id = azurerm_kubernetes_cluster.cicd.id
  vm_size               = "Standard_DS2_v2" # AMD-based instance
  node_count            = 1
  os_type               = "Linux"
  tags = {
    environment = "dev"
  }
}

resource "azurerm_kubernetes_cluster_node_pool" "arm64" {
  name                  = "arm64pool"
  kubernetes_cluster_id = azurerm_kubernetes_cluster.cicd.id
  vm_size               = "Standard_D4ps_v5" # ARM-based instance
  node_count            = 1
  os_type               = "Linux"
  tags = {
    environment = "dev"
  }
}

Next, we need to update the self-hosted runners’ values.yaml to have a configurable nodeSelector. This will allow us to deploy one runner scale set to the arm64pool and one to the amd64pool.

Once the Terraform resources are successfully created, the runners should be registered to the organization or repository you specified in the GitHub config URL. We can now update the REGISTRY values for the emulated-build and the native-build.

After creating a pull request with those changes, navigate to the Actions tab to witness the results.

blog Actions Tab 1200

You should see two jobs kick off, one using the emulated build path with QEMU, and the other using the self-hosted runners for native node builds. Depending on cache hits or the Dockerfile being built, the performance improvements can be up to 90%. Even with this substantial improvement, utilizing Docker Build Cloud can improve performance 95%. More importantly, you can reap the benefits during development builds! Take a look at the docker-build-cloud.yml workflow for more details. All you need is a Docker Build Cloud subscription and a cloud driver to take advantage of the improved pipeline.

Getting Started

1. Generate GitHub PAT

2. Update the variables.tf

3. Initialise AZ CLI

4. Deploy Cluster

5. Create a PR to validate pipelines

README.md for reference

Reliable Incident Response

Leveraging SemVer Tagged Containers for Easy Rollback

Recognizing that deployment issues can arise unexpectedly, we needed a mechanism to quickly and reliably rollback production deployments. Below is an example workflow for properly rolling back a deployment based on the tagging strategy we implemented above.

  1. Rollback Process:
    • In case of a problematic build, deployment was rolled back to a previous stable version using the tagged images.
    • AWS CLI commands were used to update ECS services with the desired image tag:
on:
  workflow_call:
    inputs:
      image-version:
        required: true
        type: string
jobs:
  rollback:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      context: read
    steps:
     - name: Rollback to previous version
       run: |
         aws ecs update-service --cluster my-cluster --service my-service --force-new-deployment --image ${{ secrets.REGISTRY }}/myapp:${{ inputs.image-version }}

Comprehensive Telemetry

Configuring Sidecar Containers in ECS for Aggregating/Publishing Telemetry Data (OTEL)

As we adopted a OpenTelemetry to standardize observability, we quickly realized that adoption was one of the toughest hurdles. As a team, we decided to bake in as much configuration as possible into the infrastructure (Terraform modules) so that we could easily distribute and maintain observability instrumentation.

  1. Sidecar Container Setup:
    • Sidecar containers were defined in the ECS task definitions to run OpenTelemetry collectors.
    • The collectors were configured to aggregate and publish telemetry data from the application containers.
  2. Task Definition Example:
{
  "containerDefinitions": [
    {
      "name": "myapp",
      "image": "myapp:1.0.0",
      "essential": true,
      "portMappings": [{ "containerPort": 8080 }]
    },
    {
      "name": "otel-collector",
      "image": "otel/opentelemetry-collector:latest",
      "essential": false,
      "portMappings": [{ "containerPort": 4317 }],
      "environment": [
        { "name": "OTEL_RESOURCE_ATTRIBUTES", "value": "service.name=myapp" }
      ]
    }
  ],
  "family": "my-task"
}

Configuring Multi-Stage Dockerfiles for OpenTelemetry Auto-Instrumentation Libraries (Node.js)

At the application level, configuring the auto-instrumentation posed a challenge since most applications varied in their build process. By leveraging multi-stage Dockerfiles, we were able to help standardize the way we initialized the auto-instrumentation libraries across microservices. We were primarily a nodejs shop, so below is an example Dockerfile for that.

  1. Multi-Stage Dockerfile:
    • The Dockerfile is divided into stages to separate the build environment from the final runtime environment, ensuring a clean and efficient image.
    • OpenTelemetry libraries are installed in the build stage and copied to the runtime stage:
# Stage 1: Build stage
FROM node:20 AS build
WORKDIR /app
COPY package.json package-lock.json ./
# package.json defines otel libs (ex. @opentelemetry/node @opentelemetry/tracing)
RUN npm install
COPY . .
RUN npm run build

# Stage 2: Runtime stage
FROM node:20
WORKDIR /app
COPY --from=build /app /app
CMD ["node", "dist/index.js"]

Results

By addressing these challenges we were able to reduce build times by ~90%, which alone dropped our DORA metrics for Lead time for changes and Time to restore by ~50%. With the rollback strategy and telemetry changes, we were able to reduce our Mean time to Detect (MTTD) and Mean time to resolve (MTTR) by ~30%. We believe that it could get to 50-60% with tuning of alerts and the addition of runbooks (automated and manual).

  1. Enhanced Development Efficiency: Consistent environments across development, testing, and production stages sped up the development process, and roughly 90% faster build times with the native architecture solution.
  2. Reliable Rollbacks: Quick and efficient rollbacks minimized downtime and maintained system integrity.
  3. Comprehensive Telemetry: Sidecar containers enabled detailed monitoring of system health and security without impacting application performance, and was baked right into the infrastructure developers were deploying. Auto-instrumentation of the application code was simplified drastically with the adoption of our Dockerfiles.

Siimpl: Transforming Enterprises with Cloud-First Solutions

With Docker at the core, Siimpl.io’s solutions demonstrate how teams can build faster, more reliable, and scalable systems. Whether you’re optimizing CI/CD pipelines, enhancing telemetry, or ensuring secure rollbacks, Docker provides the foundation for success. Try Docker today to unlock new levels of developer productivity and operational efficiency.

Learn more from our website or contact us at solutions@siimpl.io

Secure by Design for AI: Building Resilient Systems from the Ground Up

16 septembre 2024 à 14:23

As artificial intelligence (AI) has erupted, Secure by Design for AI has emerged as a critical paradigm. AI is integrating into every aspect of our lives — from healthcare and finance to developers to autonomous vehicles and smart cities — and its integration into critical infrastructure has necessitated that we move quickly to understand and combat threats. 

Necessity of Secure by Design for AI

AI’s rapid integration into critical infrastructure has accelerated the need to understand and combat potential threats. Security measures must be embedded into AI products from the beginning and evolve as the model evolves. This proactive approach ensures that AI systems are resilient against emerging threats and can adapt to new challenges as they arise. In this article, we will explore two polarizing examples — the developer industry and the healthcare industry.

Black padlock on light blue digital background

Complexities of threat modeling in AI

AI brings forth new challenges and conundrums when working on an accurate threat model. Before reaching a state in which the data has simple edit and validation checks that can be programmed systematically, AI validation checks need to learn with the system and focus on data manipulation, corruption, and extraction. 

  • Data poisoning: Data poisoning is a significant risk in AI, where the integrity of the data used by the system can be compromised. This can happen intentionally or unintentionally and can lead to severe consequences. For example, bias and discrimination in AI systems have already led to issues, such as the wrongful arrest of a man in Detroit due to a false facial recognition match. Such incidents highlight the importance of unbiased models and diverse data sets. Testing for bias and involving a diverse workforce in the development process are critical steps in mitigating these risks.

In healthcare, for example, bias may be simpler to detect. You can examine data fields based on areas such as gender, race, etc. 

In development tools, bias is less clear-cut. Bias could result from the underrepresentation of certain development languages, such as Clojure. Bias may even result from code samples based on regional differences in coding preferences and teachings. In developer tools, you likely won’t have the information available to detect this bias. IP addresses may give you information about where a person is living currently, but not about where they grew up or learned to code. Therefore, detecting bias will be more difficult. 

  • Data manipulation: Attackers can manipulate data sets with malicious intent, altering how AI systems behave. 
  • Privacy violations: Without proper data controls, personal or sensitive information could unintentionally be introduced into the system, potentially leading to privacy violations. Establishing strong data management practices to prevent such scenarios is crucial.
  • Evasion and abuse: Malicious actors may attempt to alter inputs to manipulate how an AI system responds, thereby compromising its integrity. There’s also the potential for AI systems to be abused in ways developers did not anticipate. For example, AI-driven impersonation scams have led to significant financial losses, such as the case where an employee transferred $26 million to scammers impersonating the company’s CFO.

These examples underscore the need for controls at various points in the AI data lifecycle to identify and mitigate “bad data” and ensure the security and reliability of AI systems.

Key areas for implementing Secure by Design in AI

To effectively secure AI systems, implementing controls in three major areas is essential (Figure 1):

Illustration showing flow of data from Users to Data Management to Model Tuning to Model Maintenance.
Figure 1: Key areas for implementing security controls.

1. Data management

The key to data management is to understand what data needs to be collected to train the model, to identify the sensitive data fields, and to prevent the collection of unnecessary data. Data management also involves ensuring you have the correct checks and balances to prevent the collection of unneeded data or bad data.

In healthcare, sensitive data fields are easy to identify. Doctors offices often collect national identifiers, such as drivers licenses, passports, and social security numbers. They also collect date of birth, race, and many other sensitive data fields. If the tool is aimed at helping doctors identify potential conditions faster based on symptoms, you would need anonymized data but would still need to collect certain factors such as age and race. You would not need to collect national identifiers.

In developer tools, sensitive data may not be as clearly defined. For example, an environment variable may be used to pass secrets or pass confidential information, such as the image name from the developer to the AI tool. There may be secrets in fields you would not suspect. Data management in this scenario involves blocking the collection of fields where sensitive data could exist and/or ensuring there are mechanisms to scrub sensitive data built into the tool so that data does not make it to the model. 

Data management should include the following:

  • Implementing checks for unexpected data: In healthcare, this process may involve “allow” lists for certain data fields to prevent collecting irrelevant or harmful information. In developer tools, it’s about ensuring the model isn’t trained on malicious code, such as unsanitized inputs that could introduce vulnerabilities.
  • Evaluating the legitimacy of users and their activities: In healthcare tools, this step could mean verifying that users are licensed professionals, while in developer tools, it might involve detecting and mitigating the impact of bot accounts or spam users.
  • Continuous data auditing: This process ensures that unexpected data is not collected and that the data checks are updated as needed. 

2. Alerting and monitoring 

With AI, alerting and monitoring is imperative to ensuring the health of the data model. Controls must be both adaptive and configurable to detect anomalous and malicious activities. As AI systems grow and adapt, so too must the controls. Establish thresholds for data, automate adjustments where possible, and conduct manual reviews where necessary.

In a healthcare AI tool, you might set a threshold before new data is surfaced to ensure its accuracy. For example, if patients begin reporting a new symptom that is believed to be associated with diabetes, you may not report this to doctors until it is reported by a certain percentage (15%) of total patients. 

In a developer tool, this might involve determining when new code should be incorporated into the model as a prompt for other users. The model would need to be able to log and analyze user queries and feedback, track unhandled or poorly handled requests, and detect new patterns in usage. Data should be analyzed for high frequencies of unhandled prompts, and alerts should be generated to ensure that additional data sets are reviewed and added to the model.

3. Model tuning and maintenance

Producers of AI tools should regularly review and adjust AI models to ensure they remain secure. This includes monitoring for unexpected data, adjusting algorithms as needed, and ensuring that sensitive data is scrubbed or redacted appropriately.

For healthcare, model tuning may be more intensive. Results may be compared to published medical studies to ensure that patient conditions are in line with other baselines established across the world. Audits should also be conducted to ensure that doctors with reported malpractice claims or doctors whose medical license has been revoked are scrubbed from the system to ensure that potentially compromised data sets are not influencing the model. 

In a developer tool, model tuning will look very different. You may look at hyperparameter optimization using techniques such as grid search, random search, and Bayesian search. You may study subsets of data; for example, you may perform regular reviews of the most recent data looking for new programming languages, frameworks, or coding practices. 

Model tuning and maintenance should include the following:

  • Perform data audits to ensure data integrity and that unnecessary data is not inadvertently being collected. 
  • Review whether “allow” lists and “deny” lists need to be updated.
  • Regularly audit and monitor alerts for algorithms to determine if adjustments need to be made; consider the population of your user base and how the model is being trained when adjusting these parameters.
  • Ensure you have the controls in place to isolate data sets for removal if a source has become compromised; consider unique identifiers that allow you to identify a source without providing unnecessary sensitive data.
  • Regularly back up data models so you can return to a previous version without heavy loss of data if the source becomes compromised.

AI security begins with design

Security must be a foundational aspect of AI development, not an afterthought. By identifying data fields upfront, conducting thorough AI threat modeling, implementing robust data management controls, and continuously tuning and maintaining models, organizations can build AI systems that are secure by design. 

This approach protects against potential threats and ensures that AI systems remain reliable, trustworthy, and compliant with regulatory requirements as they evolve alongside their user base.

Learn more

Adding the ESLint Tool to an AI Assistant: Improving Recommendations for JS/TS Projects

Par : Docker Labs
12 août 2024 à 13:04

This ongoing Docker Labs GenAI series will explore the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real-time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing things as open source so you can play, explore, and hack with us, too.

Can an AI assistant help you write better JavaScript or TypeScript?

Background and introduction

Projects that heavily rely on JavaScript (JS) or TypeScript (TS) are synonymous with the web, so there is a high demand for tools that can improve the consistency and quality of projects using these languages. In previous Docker Labs GenAI posts, we’ve introduced the idea that tools both enable AI assistants to understand our code more and enable them to take action based on this understanding.

In this article, we’ll be trying to enable our AI assistant to provide advice that is both helpful and actionable for linting JS/TS projects and to finally delve into the NPM ecosystem.

2400x1260 docker labs genai

Another simple prompt

As we learned in this previous Docker Labs GenAI article, you won’t get much help asking an LLM to tell you how to lint your project without any details. So, like before, we’re using our “linguist” tool to learn about the languages used in the project and augment the prompt (Figure 1):

How do I lint my project?
{{# linguist }}
This project contains code from the language {{ language }} so if you have any 
recommendations pertaining to {{ language }}, please include them.
{{/linguist}}

What LLMs provide out of the box

Exchange between developer and AI assistant, asking "How do I lint my project?" The response suggests using ESLint.
Figure 1: AI assistant responds with information about ESLint.

In Figure 2, we see that GPT-4 recognizes that ESLint is highly configurable and actually doesn’t work without a config, and so it is trying to provide that for us by either helping us run ESLint’s init tool or by writing a config to use.

Screenshot of AI response providing details for setting up, running, and configuring ESLint.
Figure 2: AI assistant provides information for setting up and running ESLint.

However, this response gives us either a config that does not work for many projects, or a boilerplate setup task for the user to do manually. This is in contrast with other linters, like Pylint or golangci-lint, where linguist was actually enough for the LLM to find a clear path to linting. So, with ESLint, we need to add more knowledge to help the LLM figure this out.

Configuring ESLint

Using StandardJS

StandardJS is a community-led effort to simplify ESLint configurations. Let’s start by nudging the assistant toward using this as a starting point. The ESLint config is published under its own package, StandardJS, so we can add the following prompt:

If there are no ESLint configuration files found, use StandardJS to lint the project with a consistent config.

We will also add a function definition so that our assistant knows how to run StandardJS. Note the container image defined at the bottom of the following definition:

- name: run-standardjs
  description: Lints the current project with StandardJS
  parameters:
    type: object
    properties:
      typescript:
   	type: boolean
   	description: Whether to lint Typescript files
      fix:
   	type: boolean
   	description: Whether to fix the files
      files:
   	type: array
   	items:
          type: string
          description: The filepaths to pass to the linter. Defaults to '.'
    required:
      - typescript
      - fix
  container:
    image: vonwig/standardjs:latest

This definition will work for both TypeScript and JavaScript projects using an argument. The assistant uses the project content to determine how to optimally set the TypeScript property.

When using StandardJS with TypeScript, two things happen in the container:

  • Lints with ts-standard instead of standard
  • Lints ts-standard from the working directory containing tsconfig.json.

But, with the right tools, this behavior is enabled with a single prompt:

When using StandardJS, use Typescript only if there are tsconfigs in the project.

Docker environments

Both ESLint and StandardJS run in Node.js environments. In our current prototype, our assistant uses three different Docker images.

Docker is significant because of the previously mentioned requirement of using ts-standard in a directory with tsconfig.json. When we baked this logic into the Docker image, we effectively introduced a contract bridging the AI Assistant, the linter tool, and the overall structure of the repository.

After determining that a project uses JavaScript or TypeScript, our assistant also adds Git Hooks. (See this GenAI article for details.) Docker gives us a way to reliably distribute these tools.

If we detect `TypeScript` in the project, we can add the following hook to the hooks entry in the `local` repo entry.

```yaml
id: standardjs-docker-ts
name: standardjs linter (TS and JS)
entry: vonwig/standardjs:latest '{"typescript": true, "fix": false}'
language: docker_image
files "\\.(ts|tsx)$"

Fixing violations

Linting output comes in the form of violations. A violation is attached to a range in the code file with the offending code and the violation reason. As mentioned previously, 75% of StandardJS violations are automatically fixable. Can we use the AI assistant to automatically fix the remaining violations?

Respond with an edit to resolve the violation using the following JSON format:

{
  "start": [1,4],
  "end": [1,6],
  "edit": "lorem ipsum"
}

If you take, for example, the lint rule for type casting, all of the models we tested will replace === with ==. Here’s the response when we ask for fixes to lines with the no-this-before-super violation:

Here's the proposed edit to resolve the violation:

```json
{
  "start": [1, 4],
  "end": [2, 6],
  "edit": "class myclass extends base {\n\tval: number;\n\tconstructor()
{\n\t\tsuper();\n\t\tthis.val = 10;\n\t}\n}"
}
```

If these models are able to fix these violations, why doesn’t ESLint just make them automatically fixable? In many of the cases, they represent riskier changes that still require some developer supervision. Perhaps the best thing an assistant can do is present these auto-fixes to the user directly in their editors. For example, a fix that has been generated by our assistant can be presented in VSCode (Figure 3).

A fix that has been generated by the AI assistant  presented in VSCode.
Figure 3: A fix is presented to the user.

Editor complaints

With the rise of tools like GitHub Copilot, developers are now becoming accustomed to assistants being present in their editors (Figure 4).

Screenshot showing Copilot's presence in the editor, with "Fix this" highlighted in green.
Figure 4: AI assistant is present in the editor.

Our work is showing that linting tools can improve the quality of these fixes.

For example, when asking Copilot to fix the line from earlier, it lacks the additional context from ESLint (Figure 5).

Screenshot of AI response, which lacks additional context from ESLint.
Figure 5: Additional context is needed.

The assistant is unable to infer that there is a violation there. In this instance, Copilot is hallucinating because it was triggered by the developer’s editor action without any of the context coming in from the linter. As far as Copilot knows, I just asked it to fix perfectly good code.

To improve this, we can use the output of a linter to “complain” about a violation. The editor allows us to surface a quick action to fix the code. Figure 6 shows the same “fix using Copilot” from the “problems” window, triggered by another violation:

Screenshot showing "Fix using Copilot” in the problems window.
Figure 6: “Fix using Copilot” is shown in the problems window.

This is shown in VSCode’s “problems” window, which helps developers locate problems in the codebase. An assistant can use the editor to put the ESLint tool in a more effective relationship with the developer (Figure 7).

Screenshot showing an immediate resolution, rather than a hallucination from the assistant.
Figure 7: A more complete fix.

Most importantly, we get an immediate resolution rather than a hallucination. We’re also hosting these tools in Docker, so these improvements do not require installs of Node.js, NPM, or ESLint.

Summary

We continue to investigate the use of tools for gathering context and improving suggestions. In this article, we have looked at how AI assistants can provide significant value to developers by:

  • Cutting out busy work setting up Node/NPM/ESLint.
  • Leveraging expert knowledge about ESLint to “level up” developers
  • Generating and surfacing actionable fixes directly to developers where they’re already working (in the editor)
  • Generating simple workflows as outcomes from natural language prompts and tools

As always, feel free to follow along in our new public repo and please reach out. Everything we’ve discussed in this blog post is available for you to try out on your own projects.

For more on what we’re doing at Docker, subscribe to our newsletter.

Learn more

How to Create Dockerfiles with GenAI 

Par : Docker Labs
29 juillet 2024 à 14:00

This ongoing Docker Labs GenAI series will explore the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing things as open source so you can play, explore, and hack with us, too.

As we learn how to work more effectively with generative AI, one of the tasks we return to is Dockerfile generation. Because we can equip our AI assistants with tools, does this allow us to reframe the problem of content generation as an interaction between LLMs and tools?

2400x1260 docker labs genai

If you ask ChatGPT to write a Dockerfile for your project, it will initiate a conversation with you and ask you to extract some details from your project (Figure 1).

Screenshot of conversation with ChatGPT starting with the question: Can you write a Dockerfile for this project?
Figure 1: Asking ChatGPT for help writing a Dockerfile.

However, we can also equip our assistant with a tool to extract this information from the project directly. Instead of asking the user, an assistant can ask for a tool instead.

Using tools

Here’s an alternative framing. Let’s prompt the assistant with two high-level instructions whenever it is asked to create a Dockerfile.

* First, analyze the project to determine how it should be built. 
* Once the analysis is complete, create a Dockerfile to build that project.

In addition to these new prompts, we will also supply the LLM with two function definitions.

[{"name": "analyze_project",
 "description": "Analyze a project to determine how it should be built"},
 {"name": "write_files",
  "description": "write a set of files to my project",
  "parameters": {...}}]

With these two functions, the assistant will have enough agency that it can write a Dockerfile directly to a user’s project. Unfortunately, without best practices, the actual Dockerfile content is never very good. 

Here’s an example of the kind of Dockerfile that gets generated when the assistant knows the kind of project but not much else.

# Use the official Node.js image from the Docker Hub
FROM node:14

# Create and change to the app directory
WORKDIR /usr/src/app

# Copy the package.json and package-lock.json files
COPY package*.json ./

# Install dependencies
RUN npm install

# Copy the rest of the application code
COPY . .

# Expose the port your app runs on
EXPOSE 3000

# Define the command to run your app
CMD ["npm", "start"]

This example shows several standard Dockerfile authoring problems.

  • Uses out of date base images.
  • Does not employ multi-stage builds.
  • Skips newer Docker features such as cache mounts.

So, what can we do about this? Let’s extract some best practices for authoring NPM project Dockerfiles.

Write Dockerfiles for NPM projects using three stages.  Do these three steps sequentially.
* the first node dependencies stage should be called "deps" 
  and it should fetch the runtime dependencies using npm ci
  with the --omit=dev flag.
* The second Dockerfile stage should be called build  and it should be based on the deps stage. 
  It should run npm ci and then npm build
* The third Dockerfile stage should select a base image 
  that is recommended by Scout and use that in the FROM line.
  This stage should do then do three things.
	1. it copies the node_modules directory from the deps stage.
    2. it copies the dist directory from the build stage.
    3. it then runs npm start

If you need to use a RUN statement containing npm ci always 
add the argument --mount=type=cache,target=/root/.npm to the RUN instruction.  
The --mount argument should be placed between the word RUN and the npm command.
This will cache the npm packages in the docker build cache and speed up the build process.
After selecting a base image, get a recommended tag. This recommendation will change over time but start with what is recommended now.

We’ll include this content in any conversation occurring in the context of an NPM project. We’ll also add a third function.

[{"name": "docker_scout_tag_recommendation",
  "description": "get a recommended tag",
  "parameters": {...}}]

At the time of writing, here’s the output from our assistant. In this example, the assistant is using the GPT-4 model, but we are seeing the same results with other local models.

# Stage 1 - Downloading dependencies
FROM node:22-slim AS deps
WORKDIR /usr/src/app
COPY package*.json ./
RUN --mount=type=cache,target=/root/.npm npm ci --omit=dev

# Stage 2 - Building application
FROM deps AS build
RUN --mount=type=cache,target=/root/.npm npm ci && npm build

# Stage 3 - Using a recommended base image from Scout
FROM node:22-slim
WORKDIR /usr/src/app
COPY --from=deps /usr/src/app/node_modules ./node_modules
COPY --from=build /usr/src/app/dist ./dist
CMD [ "npm", "start" ]

This is a big improvement. The three inputs that underlie this improvement are listed below.

  • The two top-level instructions that anchor the process our assistant should use:
* First, analyze the project to determine how it should be built. 
* Once the analysis is complete, create a Dockerfile to build that project.
  • The Markdown content that summarizes our Docker specific best practices.
  • The functions (analyze_project, write_files, and docker_scout_tag_recommendation).

Distilling best practices is certainly not easy, but a little can go a long way. Still, it is still hard to know whether this can be considered an improvement over just using code templates. Our continuing investigation centers around whether generative AI will help us to more easily adapt these models to new knowledge, new features, and new project types.

Conversation loop

It’s instructive to go through the actual messages passed between agents here. It’s particularly interesting to look at the interplay between the Assistant and the Tool agents. Near the end of the exchange, there is an unexpected twist where the write_files tool complains that it was passed an invalid JSON file. In this case, the assistant actually did generate invalid JSON and the tool was right to complain. However, the assistant also responded to this error by fixing the JSON and calling the tool a second time! That was unscripted.

Another interesting example of an unscripted moment is the last message from the assistant. The assistant correctly determines that the task is now complete and generates a summary of the work that was done and why.

Transcript

System:

You are an assistant who specializes in authoring Dockerfiles for projects. Since you are an expert and know about their project, be definitive about recommendations.

User:

  • First, analyze the project to determine how it should be built.
  • Once the analysis is complete, create a Dockerfile to build that project.

Assistant:

[{:function {:arguments "{}" :name "analyze_project"}}]

Tool (analyze_project):

This is an NPM project.

System:

Write Dockerfiles for NPM projects using three stages. Do these three steps sequentially …

… (system prompts excluded for brevity) ..

Assistant:

[{:function {:arguments "{
  \"repository\": \"node\"
}"
             :name "docker_scout_tag_recommendation"}}]

Tool (docker_scout_tag_recommendation):

22-slim

Assistant:

[{:function {:arguments "{
  \"files\": [
    {
      \"path\": \"Dockerfile\",
      \"content\": `
# Stage 1 - the build process
FROM node:22-slim AS deps
WORKDIR /usr/src/app
COPY package*.json ./
RUN --mount=type=cache,target=/root/.npm npm ci --omit=dev
# Stage 2 - the production environment
FROM deps AS build
RUN npm ci && npm run build
# Stage 3 - the production environment
FROM node:22-slim
WORKDIR /usr/src/app
COPY --from=deps /usr/src/app/node_modules ./node_modules
COPY --from=build /usr/src/app/dist ./dist
CMD [ \"npm\", \"start\" ]`,
      \"executable\": false
    }
  ]
}"
             :name "write_files"}
  :id "call_6VVCLiXcIBNyEGPQwftR7xEV"}]

Tool:

function call write_files failed call exited with non-zero code (1):  call exited with non-zero code (1): #error {\r\n :cause Unexpected character (‘`’)

Assistant:

I apologize for the error. Let me try that again.

[{:function {:arguments "{
  \"files\": [
    {
      \"path\": \"Dockerfile\",
      \"content\": \"FROM node:22-slim AS deps\\nWORKDIR /usr/src/app\\nCOPY package*.json ./\\nRUN --mount=type=cache,target=/root/.npm npm ci --omit=dev\\n\\nFROM deps AS build\\nRUN npm ci && npm run build\\n\\nFROM node:22-slim\\nWORKDIR /usr/src/app\\nCOPY --from=deps /usr/src/app/node_modules ./node_modules\\nCOPY --from=build /usr/src/app/dist ./dist\\nCMD [ \\\"npm\\\", \\\"start\\\" ]\",
      \"executable\": false
    }
  ]
}"
             :name "write_files"}}]

Tool (write_files):

wrote 1 files: Dockerfile

Assistant:

I have written the Dockerfile for your NPM project. You can find it in your project folder. This Dockerfile uses a multi-stage build process to efficiently build and run your application within a Docker container. The first stage fetches the runtime dependencies, the second stage builds the application, and the final stage runs the application. Please let me know if you need further assistance.

Summary

For a generative AI assistant to improve its ability to generate Dockerfile content, we have identified both tools and stack-specific best practice content as key enablers. Incorporating these elements into a generative AI based assistant can serve as an alternative to code templates. We continue to evaluate whether this approach allows us to more easily adapt to new project types.

As always, feel free to follow along in our public repo and reach out. Everything we’ve discussed in this blog post is available for you to try on your own projects.

For more on what we’re doing at Docker, subscribe to our newsletter.

Learn more

How an AI Assistant Can Help Configure Your Project’s Git Hooks

Par : Docker Labs
15 juillet 2024 à 13:00

This ongoing Docker Labs GenAI series will explore the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real-time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing things as open source so you can play, explore, and hack with us, too.

Can an AI assistant help configure your project’s Git hooks? 

Git hooks can save manual work during repetitive tasks, such as authoring commit messages or running checks before committing an update. But they can also be hard to configure, and are project dependent. Can generative AI make Git hooks easier to configure and use?

2400x1260 docker labs genai

Simple prompts

From a high level, the basic prompt is simple:

How do I set up git hooks?

Although this includes no details about the actual project, the response from many foundation models is still useful. If you run this prompt in ChatGPT, you’ll see that the response contains details about how to use the .git/hooks folder, hints about authoring hook scripts, and even practical next steps for what you’ll need to learn about next. However, the advice is general. It has not been grounded by your project.

Project context

Your project itself is an important source of information for an assistant. Let’s start by providing information about types of source code in a project. Fortunately, there are plenty of existing tools for extracting project context, and these tools are often already available in Docker containers. 

For example, here’s an image that will analyze any Git repository and return a list of languages being used. Let’s update our prompt with this new context.

How do I set up git hooks?

{{# linguist }}
This project contains code from the language {{ language }} so if you have any 
recommendations pertaining to {{ language }}, please include them.
{{/linguist}}

In this example, we use moustache templates to bind the output of our “linguist” analysis into this prompt.

The response from an LLM-powered assistant will change dramatically. Armed with specific advice about what kinds of files might be changed, the LLM will generate sample scripts and make suggestions about specific tools that might be useful for the kinds of code developed in this project. It might even be possible to cut and paste code out of the response to try setting up hooks yourself. 

The pattern is quite simple. We already have tools to analyze projects, so let’s plug these in locally and give the LLM more context to make better suggestions (Figure 1).

Flow chart illustration showing addition of tools to provide project context for the LLM. The process includes Project, Tool (go-linguist), Docker Desktop, Prompt templates and LLM.
Figure 1: Adding tools to provide context for LLM.

Expertise

Generative AI also offers new opportunities for experts to contribute knowledge that AI assistants can leverage to become even more useful. For example, we have learned that pre-commit can be helpful to organize the set of tools used to implement Git hooks. 

To represent this learning, we add this prompt:

When configuring git hooks, our organization uses a tool called
[pre-commit](https://github.com/pre-commit/pre-commit).

There’s also a base configuration that we have found useful in all projects. We also add that to our assistant’s knowledge base.

If a user wants to configure git hooks, use this template which will need to be written to pre-commit-config.yaml 
in the root of the user's project.

Start with the following code block:

```yaml
repos:
    - repo: http://github.com/pre-commit/pre-commit-hooks
      rev: v2.3.0
      hooks:
          - id: check-yaml
          - id: trailing-whitespace
          - id: check-merge-conflict
    - repo https://github.com/jorisroovers/gitlint
      rev: main
      hooks:
          - id: gitlint
    - repo: local
      hooks:
```

Finally, as we learn about new tools that are useful for certain projects, we describe this information. For example, as an expert, I might want to suggest that teams using Golang include a particular linting tool in the Git hooks configuration.

If we detect `Go` in the project, add the following hook to the hooks entry in the `local` repo entry.

```yaml
id: golangcli-lint
name: golang cli
entry: golangci/golangci-lint
files "\\.go$"
```

With these additions, the response from our assistant becomes precise. We have found that our assistant can now write hooks scripts and write complete YAML configuration files that are project-specific and ready to copy directly into a project. 

Somewhat surprisingly, the assistant can also now recommend tools not mentioned explicitly in our prompts but that use the same syntax established for other tools. Using these examples, the LLM appears to be capable of extending the assistant’s capabilities to other tools. Using our examples as guidance, the LLM suggests new tools but still configures them using our suggested framework and syntax.

Most importantly, the response from the assistant is now not only actionable to the developer, saving them time, but it is also specific enough that we could pass the response to a simple agent to take the action automatically.

Adding tools

For this example, the only tool we really need is a file-writer. The change to our prompt is to add one instruction to go ahead and write the configuration into the project.

Write the final yaml content to our project at the path pre-commit-config.yaml.  Write both the `pre-commit` and `commit-message` scripts to `git/hooks` and make them executable.

Besides the prompt, there is another crucial step that we are skipping at this point. The assistant must be told that it is capable of writing content into files. However, this is really just a registration step. 

The important thing is that we can give our agent the tools it needs to perform tasks. In doing so, the response from the LLM undergoes a transition. Instead of text output, the LLM responds with instructions for our agent. If we’re using an OpenAI function call, we’ll see a request that looks something like the following .json file. It’s not meant to be read by us, of course. It’s an instruction to the agent that knows how to update your project for you.

{
  "id": "call_4LCo0CQqCHCGGZea3qlaTg5h",
  "type": "function",
  "function": {
    "name": "write_file",
    "arguments": "{\n  \"path\": \"pre-commit-config.yaml\",\n  \"content\": \"repos:\\n    - repo: http://github.com/pre-commi
t/pre-commit-hooks\\n      rev: v2.3.0\\n      hooks:\\n          - id: check-yaml\\n          - id: trailing-whitespace\\n          - id
: check-merge-conflict\\n    - repo https://github.com/jorisroovers/gitlint\\n      rev: main\\n      hooks:\\n          - id: gitlint\\n
    - repo: local\\n      hooks:\\n          - id: markdownlint\\n            name: markdown linter\\n            entry: markdownlint/mar
kdownlint\\n            files: \\\"\\\\.md$\\\"\\n          - id: python-black\\n            name: python black formatter\\n            e
ntry: black\\n            files: \\\"\\\\.py$\\\"\"\n}"
    }
}

A more sophisticated version of the file-writer function might communicate with an editor agent capable of presenting recommended file changes to a developer using native IDE concepts, like editor quick-fixes and hints. In other words, tools can help generative AI to meet developers where they are. And the answer to the question:

How do I set up git hooks?

becomes, “Let me just show you.”

Docker as tool engine

The tools mentioned in the previous sections have all been delivered as Docker containers.  One goal of this work has been to verify that an assistant can bootstrap itself starting from a Docker-only environment. Docker is important here because it has been critical in smoothing over many of the system/environment gaps that LLMs struggle with. 

We have observed that a significant barrier to activating even simple local assistants is the complexity of managing a safe and reliable environment for running these tools. Therefore, we are constraining ourselves to use only tools that can be lazily pulled from public registries.

For AI assistants to transform how we consume tools, we believe that both tool distribution and knowledge distribution are key factors. In the above example, we can see how LLM responses can be transformed by tools from unactionable and vague to hyper-project-focused and actionable. The difference is tools.

To follow along with this effort, check out the GitHub repository for this project.

Learn more

How to Run Hugging Face Models Programmatically Using Ollama and Testcontainers

11 juillet 2024 à 13:00

Hugging Face now hosts more than 700,000 models, with the number continuously rising. It has become the premier repository for AI/ML models, catering to both general and highly specialized needs.

As the adoption of AI/ML models accelerates, more application developers are eager to integrate them into their projects. However, the entry barrier remains high due to the complexity of setup and lack of developer-friendly tools. Imagine if deploying an AI/ML model could be as straightforward as spinning up a database. Intrigued? Keep reading to find out how.

2400x1260 how to run hugging face models programmatically using ollama and testcontainers

Introduction to Ollama and Testcontainers

Recently, Ollama announced support for running models from Hugging Face. This development is exciting because it brings the rich ecosystem of AI/ML components from Hugging Face to Ollama end users, who are often developers. 

Testcontainers libraries already provide an Ollama module, making it straightforward to spin up a container with Ollama without needing to know the details of how to run Ollama using Docker:

import org.testcontainers.ollama.OllamaContainer; 

var ollama = new OllamaContainer("ollama/ollama:0.1.44"); 
ollama.start();

These lines of code are all that is needed to have Ollama running inside a Docker container effortlessly.

Running models in Ollama

By default, Ollama does not include any models, so you need to download the one you want to use. With Testcontainers, this step is straightforward by leveraging the execInContainer API provided by Testcontainers:

ollama.execInContainer("ollama", "pull", "moondream");

At this point, you have the moondream model ready to be used via the Ollama API. 

Excited to try it out? Hold on for a bit. This model is running in a container, so what happens if the container dies? Will you need to spin up a new container and pull the model again? Ideally not, as these models can be quite large.

Thankfully, Testcontainers makes it easy to handle this scenario, by providing an easy-to-use API to commit a container image programmatically:

public void createImage(String imageName) {
var ollama = new OllamaContainer("ollama/ollama:0.1.44");
ollama.start();
ollama.execInContainer("ollama", "pull", "moondream");
ollama.commitToImage(imageName);
}

This code creates an image from the container with the model included. In subsequent runs, you can create a container from that image, and the model will already be present. Here’s the pattern:

var imageName = "tc-ollama-moondream";
var ollama = new OllamaContainer(DockerImageName.parse(imageName)
.asCompatibleSubstituteFor("ollama/ollama:0.1.44"));
try {
ollama.start();
} catch (ContainerFetchException ex) {
// If image doesn't exist, create it. Subsequent runs will reuse the image.
createImage(imageName);
ollama.start();
}

Now, you have a model ready to be used, and because it is running in Ollama, you can interact with its API:

var image = getImageInBase64("/whale.jpeg");
String response = given()
.baseUri(ollama.getEndpoint())
.header(new Header("Content-Type", "application/json"))
.body(new CompletionRequest("moondream:latest", "Describe the image.", Collections.singletonList(image), false))
.post("/api/generate")
.getBody().as(CompletionResponse.class).response();

System.out.println("Response from LLM " + response);

Using Hugging Face models

The previous example demonstrated using a model already provided by Ollama. However, with the ability to use Hugging Face models in Ollama, your available model options have now expanded by thousands. 

To use a model from Hugging Face in Ollama, you need a GGUF file for the model. Currently, there are 20,647 models available in GGUF format. How cool is that?

The steps to run a Hugging Face model in Ollama are straightforward, but we’ve simplified the process further by scripting it into a custom OllamaHuggingFaceContainer. Note that this custom container is not part of the default library, so you can copy and paste the implementation of OllamaHuggingFaceContainer and customize it to suit your needs.

To run a Hugging Face model, do the following:

public void createImage(String imageName, String repository, String model) {
var model = new OllamaHuggingFaceContainer.HuggingFaceModel(repository, model);
var huggingFaceContainer = new OllamaHuggingFaceContainer(hfModel);
huggingFaceContainer.start();
huggingFaceContainer.commitToImage(imageName);
}

By providing the repository name and the model file as shown, you can run Hugging Face models in Ollama via Testcontainers. 

You can find an example using an embedding model and an example using a chat model on GitHub.

Customize your container

One key strength of using Testcontainers is its flexibility in customizing container setups to fit specific project needs by encapsulating complex setups into manageable containers. 

For example, you can create a custom container tailored to your requirements. Here’s an example of TinyLlama, a specialized container for spinning up the DavidAU/DistiLabelOrca-TinyLLama-1.1B-Q8_0-GGUF model from Hugging Face:

public class TinyLlama extends OllamaContainer {

    private final String imageName;

    public TinyLlama(String imageName) {
        super(DockerImageName.parse(imageName)
.asCompatibleSubstituteFor("ollama/ollama:0.1.44"));
        this.imageName = imageName;
    }

    public void createImage(String imageName) {
        var ollama = new OllamaContainer("ollama/ollama:0.1.44");
        ollama.start();
        try {
            ollama.execInContainer("apt-get", "update");
            ollama.execInContainer("apt-get", "upgrade", "-y");
            ollama.execInContainer("apt-get", "install", "-y", "python3-pip");
            ollama.execInContainer("pip", "install", "huggingface-hub");
            ollama.execInContainer(
                    "huggingface-cli",
                    "download",
                    "DavidAU/DistiLabelOrca-TinyLLama-1.1B-Q8_0-GGUF",
                    "distilabelorca-tinyllama-1.1b.Q8_0.gguf",
                    "--local-dir",
                    "."
            );
            ollama.execInContainer(
                    "sh",
                    "-c",
                    String.format("echo '%s' > Modelfile", "FROM distilabelorca-tinyllama-1.1b.Q8_0.gguf")
            );
            ollama.execInContainer("ollama", "create", "distilabelorca-tinyllama-1.1b.Q8_0.gguf", "-f", "Modelfile");
            ollama.execInContainer("rm", "distilabelorca-tinyllama-1.1b.Q8_0.gguf");
            ollama.commitToImage(imageName);
        } catch (IOException | InterruptedException e) {
            throw new ContainerFetchException(e.getMessage());
        }
    }

    public String getModelName() {
        return "distilabelorca-tinyllama-1.1b.Q8_0.gguf";
    }

    @Override
    public void start() {
        try {
            super.start();
        } catch (ContainerFetchException ex) {
            // If image doesn't exist, create it. Subsequent runs will reuse the image.
            createImage(imageName);
            super.start();
        }
    }
}

Once defined, you can easily instantiate and utilize your custom container in your application:

var tinyLlama = new TinyLlama("example");
tinyLlama.start();
String response = given()
.baseUri(tinyLlama.getEndpoint())
.header(new Header("Content-Type", "application/json"))
.body(new CompletionRequest(tinyLlama.getModelName() + ":latest", List.of(new Message("user", "What is the capital of France?")), false))
.post("/api/chat")
.getBody().as(ChatResponse.class).message.content;
System.out.println("Response from LLM " + response);

Note how all the implementation details are under the cover of the TinyLlama class, and the end user doesn’t need to know how to actually install the model into Ollama, what GGUF is, or that to get huggingface-cli you need to pip install huggingface-hub.

Advantages of this approach

  • Programmatic access: Developers gain seamless programmatic access to the Hugging Face ecosystem.
  • Reproducible configuration: All configuration, from setup to lifecycle management is codified, ensuring reproducibility across team members and CI environments.
  • Familiar workflows: By using containers, developers familiar with containerization can easily integrate AI/ML models, making the process more accessible.
  • Automated setups: Provides a straightforward clone-and-run experience for developers.

This approach leverages the strengths of both Hugging Face and Ollama, supported by the automation and encapsulation provided by the Testcontainers module, making powerful AI tools more accessible and manageable for developers across different ecosystems.

Conclusion

Integrating AI models into applications need not be a daunting task. By leveraging Ollama and Testcontainers, developers can seamlessly incorporate Hugging Face models into their projects with minimal effort. This approach not only simplifies the setup of the development environment process but also ensures reproducibility and ease of use. With the ability to programmatically manage models and containerize them for consistent environments, developers can focus on building innovative solutions without getting bogged down by complex setup procedures.

The combination of Ollama’s support for Hugging Face models and Testcontainers’ robust container management capabilities provides a powerful toolkit for modern AI development. As AI continues to evolve and expand, these tools will play a crucial role in making advanced models accessible and manageable for developers across various fields. So, dive in, experiment with different models, and unlock the potential of AI in your applications today.

Stay current on the latest Docker news. Subscribe to the Docker Newsletter.

Learn more

Using Generative AI to Create Runnable Markdown

Par : Docker Labs
1 juillet 2024 à 13:00

This ongoing GenAI Docker Labs series will explore the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real-time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing things as open source so you can play, explore, and hack with us, too.

Generative AI (GenAI) is changing how we interact with tools. Today, we might experience this predominantly through the use of new AI-powered chat assistants, but there are other opportunities for generative AI to improve the life of a developer.

When developers start working on a new project, they need to get up to speed on the tools used in that project. A common practice is to document these practices in a project README.md and to version that documentation along with the project. 

Can we use generative AI to generate this content? We want this content to represent best practices for how tools should be used in general but, more importantly, how tools should be used in this particular project.

We can think of this as a kind of conversation between developers, agents representing tools used by a project, and the project itself. Let’s look at this for the Docker tool itself.

2400x1260 docker labs genai

Generating Markdown in VSCode

For this project, we have written a VSCode extension that adds one new command called “Generate a runbook for this project.” Figure 1 shows it in action:

Animated gif showing VSCode extension to generate a runbook for this project.
Figure 1: VSCode extension to generate a runbook.

This approach combines prompts written by tool experts with knowledge about the project itself. This combined context improves the LLM’s ability to generate documentation (Figure 2).

 Illustration showing process flow from expert prompts plus project facts to LLM.
Figure 2: This approach combines expert prompts with knowledge about the project itself.

Although we’re illustrating this idea on a tool that we know very well (Docker!), the idea of generating content in this manner is quite generic. The prompts we used for getting started with the Docker build, run, and compose are available from GitHub. There is certainly an art to writing these prompts, but we think that tool experts have the right knowledge to create prompts of this kind, especially if AI assistants can then help them make their work easier to consume.

There is also an essential point here. If we think of the project as a database from which we can retrieve context, then we’re effectively giving an LLM the ability to retrieve facts about the project. This allows our prompts to depend on local context. For a Docker-specific example, we might want to prompt the AI to not talk about compose if the project has no compose.yaml files. 

“I am not using Docker Compose in this project.”

That turns out to be a transformative user prompt if it’s true. This is what we’d normally learn through a conversation. However, there are certain project details that are always useful. This is why having our assistants right there in the local project can be so helpful.

Runnable Markdown

Although Markdown files are mainly for reading, they often contain runnable things. LLMs converse with us in text that often contains code blocks that represent actual runnable commands. And, in VSCode, developers use the embedded terminal to run commands against the currently open project. Let’s short-circuit this interaction and make commands runnable directly from these Markdown runbooks.

In the current extension, we’ve added a code action to every code block that contains a shell command so that users can launch that command in the embedded terminal. During our exploration of this functionality, we have found that treating the Markdown file as a kind of REPL (read-eval-print-loop) can help to refine the output from the LLM and improve the final content. Figure 3 what this looks like in action:

 Animated gif showing addition of code action that contains a shell command so users can launch that command in the embedded terminal.
Figure 3: Adding code to allow users to launch the command in the embedded terminal.

Markdown extends your editor

In the long run, nobody is going to navigate to a Markdown file in order to run a command. However, we can treat these Markdown files as scripts that create commands for the developer’s edit session. We can even let developers bind them to keystrokes (e.g., type ,b to run the build code block from your project runbook).

In the end, this is just the AI Assistant talking to itself. The Assistant recommends a command. We find the command useful. We turn it into a shortcut. The Assistant remembers this shortcut because it’s in our runbook, and then makes it available whenever we’re developing this project.

Animated gif showing the AI Assistant generating context-aware content.
Figure 4: The Assistant in action.

Figure 4 shows a real feedback loop between the Assistant, the generated content, and the developer that is actually running these commands. 

As developers, we tend to vote with our keyboards. If this command is useful, let’s make it really easy to run! And if it’s useful for me, it might be useful for other members of my team, too.

The GitHub repository and install instructions are ready for you to try today.

For more, see this demo: VSCode Walkthrough of Runnable Markdown from GenAI.

Subscribe to Docker Navigator to stay current on the latest Docker news.

Learn more

Develop Kubernetes Operators in Java without Breaking a Sweat

6 juin 2024 à 13:48

Developing Kubernetes operators in Java is not yet the norm. So far, Go has been the language of choice here, not least because of its excellent support for writing corresponding tests. 

One challenge in developing Java-based projects has been the lack of easy automated integration testing that interacts with a Kubernetes API server. However, thanks to the open source library Kindcontainer, based on the widely used Testcontainers integration test library, this gap can be bridged, enabling easier development of Java-based Kubernetes projects. 

In this article, we’ll show how to use Testcontainers to test custom Kubernetes controllers and operators implemented in Java.

2400x1260 develop kubernetes operators in java without breaking a sweat

Kubernetes in Docker

Testcontainers allows starting arbitrary infrastructure components and processes running in Docker containers from tests running within a Java virtual machine (JVM). The framework takes care of binding the lifecycle and cleanup of Docker containers to the test execution. Even if the JVM is terminated abruptly during debugging, for example, it ensures that the started Docker containers are also stopped and removed. In addition to a generic class for any Docker image, Testcontainers offers specialized implementations in the form of subclasses — for components with sophisticated configuration options, for example. 

These specialized implementations can also be provided by third-party libraries. The open source project Kindcontainer is one such third-party library that provides specialized implementations for various Kubernetes containers based on Testcontainers:

  • ApiServerContainer
  • K3sContainer
  • KindContainer

Although ApiServerContainer focuses on providing only a small part of the Kubernetes control plane, namely the Kubernetes API server, K3sContainer and KindContainer launch complete single-node Kubernetes clusters in Docker containers. 

This allows for a trade-off depending on the requirements of the respective tests: If only interaction with the API server is necessary for testing, then the significantly faster-starting ApiServerContainer is usually sufficient. However, if testing complex interactions with other components of the Kubernetes control plane or even other operators is in the scope, then the two “larger” implementations provide the necessary tools for that — albeit at the expense of startup time. For perspective, depending on the hardware configuration, startup times can reach a minute or more.

A first example

To illustrate how straightforward testing against a Kubernetes container can be, let’s look at an example using JUnit 5:

@Testcontainers
public class SomeApiServerTest {
  @Container
  public ApiServerContainer<?> K8S = new ApiServerContainer<>();

  @Test
  public void verify_no_node_is_present() {
    Config kubeconfig = Config.fromKubeconfig(K8S.getKubeconfig());
    try (KubernetesClient client = new KubernetesClientBuilder()
           .withConfig(kubeconfig).build()) {
      // Verify that ApiServerContainer has no nodes
      assertTrue(client.nodes().list().getItems().isEmpty());
    }
  }
}

Thanks to the @Testcontainers JUnit 5 extension, lifecycle management of the ApiServerContainer is easily handled by marking the container that should be managed with the @Container annotation. Once the container is started, a YAML document containing the necessary details to establish a connection with the API server can be retrieved via the getKubeconfig() method. 

This YAML document represents the standard way of presenting connection information in the Kubernetes world. The fabric8 Kubernetes client used in the example can be configured using Config.fromKubeconfig(). Any other Kubernetes client library will offer similar interfaces. Kindcontainer does not impose any specific requirements in this regard.

All three container implementations rely on a common API. Therefore, if it becomes clear at a later stage of development that one of the heavier implementations is necessary for a test, you can simply switch to it without any further code changes — the already implemented test code can remain unchanged.

Customizing your Testcontainers

In many situations, after the Kubernetes container has started, a lot of preparatory work needs to be done before the actual test case can begin. For an operator, for example, the API server must first be made aware of a Custom Resource Definition (CRD), or another controller must be installed via a Helm chart. What may sound complicated at first is made simple by Kindcontainer along with intuitively usable Fluent APIs for the command-line tools kubectl and helm.

The following listing shows how a CRD is first applied from the test’s classpath using kubectl, followed by the installation of a Helm chart:

@Testcontainers
public class FluentApiTest {
  @Container
  public static final K3sContainer<?> K3S = new K3sContainer<>()
    .withKubectl(kubectl -> {
      kubectl.apply.fileFromClasspath(“manifests/mycrd.yaml”).run();
    })
    .withHelm3(helm -> {
      helm.repo.add.run(“repo”, “https://repo.example.com”);
      helm.repo.update.run();
      helm.install.run(“release”, “repo/chart”);
    );
  // Tests go here
}

Kindcontainer ensures that all commands are executed before the first test starts. If there are dependencies between the commands, they can be easily resolved; Kindcontainer guarantees that they are executed in the order they are specified.

The Fluent API is translated into calls to the respective command-line tools. These are executed in separate containers, which are automatically started with the necessary connection details and connected to the Kubernetes container via the Docker internal network. This approach avoids dependencies on the Kubernetes image and version conflicts regarding the available tooling within it.

Selecting your Kubernetes version

If nothing else is specified by the developer, Kindcontainer starts the latest supported Kubernetes version by default. However, this approach is generally discouraged, so the best practice would require you to explicitly specify one of the supported versions when creating the container, as shown:

@Testcontainers
public class SpecificVersionTest {
  @Container
  KindContainer<?> container = new KindContainer<>(KindContainerVersion.VERSION_1_24_1);
  // Tests go here
}

Each of the three container implementations has its own Enum, through which one of the supported Kubernetes versions can be selected. The test suite of the Kindcontainer project itself ensures — with the help of an elaborate matrix-based integration test setup — that the full feature set can be easily utilized for each of these versions. This elaborate testing process is necessary because the Kubernetes ecosystem evolves rapidly, and different initialization steps need to be performed depending on the Kubernetes version.

Generally, the project places great emphasis on supporting all currently maintained Kubernetes major versions, which are released every 4 months. Older Kubernetes versions are marked as @Deprecated and eventually removed when supporting them in Kindcontainer becomes too burdensome. However, this should only happen at a time when using the respective Kubernetes version is no longer recommended.

Bring your own Docker registry

Accessing Docker images from public sources is often not straightforward, especially in corporate environments that rely on an internal Docker registry with manual or automated auditing. Kindcontainer allows developers to specify their own coordinates for the Docker images used for this purpose. However, because Kindcontainer still needs to know which Kubernetes version is being used due to potentially different initialization steps, these custom coordinates are appended to the respective Enum value:

@Testcontainers
public class CustomKubernetesImageTest {
  @Container
  KindContainer<?> container = new KindContainer<>(KindContainerVersion.VERSION_1_24_1
    .withImage(“my-registry/kind:1.24.1”));
  // Tests go here
}

In addition to the Kubernetes images themselves, Kindcontainer also uses several other Docker images. As already explained, command-line tools such as kubectl and helm are executed in their own containers. Appropriately, the Docker images required for these tools are configurable as well. Fortunately, no version-dependent code paths are needed for their execution. 

Therefore, the configuration shown in the following is simpler than in the case of the Kubernetes image:

@Testcontainers
public class CustomFluentApiImageTest {
  @Container
  KindContainer<?> container = new KindContainer<>()
    .withKubectlImage(
      DockerImageName
        .parse(“my-registry/kubectl:1.21.9-debian-10-r10”))
    .withHelm3Image(DockerImageName.parse(“my-registry/helm:3.7.2”));
  // Tests go here
}

The coordinates of the images for all other containers started can also be easily chosen manually. However, it is always the developer’s responsibility to ensure the use of the same or at least compatible images. For this purpose, a complete list of the Docker images used and their versions can be found in the documentation of Kindcontainer on GitHub.

Admission controller webhooks

For the test scenarios shown so far, the communication direction is clear: A Kubernetes client running in the JVM accesses the locally or remotely running Kubernetes container over the network to communicate with the API server running inside it. Docker makes this standard case incredibly straightforward: A port is opened on the Docker container for the API server, making it accessible. 

Kindcontainer automatically performs the necessary configuration steps for this process and provides suitable connection information as Kubeconfig for the respective network configuration.

However, admission controller webhooks present a technically more challenging testing scenario. For these, the API server must be able to communicate with external webhooks via HTTPS when processing manifests. In our case, these webhooks typically run in the JVM where the test logic is executed. However, they may not be easily accessible from the Docker container.

To facilitate testing of these webhooks independently of the network setup, yet still make it simple, Kindcontainer employs a trick. In addition to the Kubernetes container itself, two more containers are started. An SSH server provides the ability to establish a tunnel from the test JVM into the Kubernetes container and set up reverse port forwarding, allowing the API server to communicate back to the JVM. 

Because Kubernetes requires TLS-secured communication with webhooks, an Nginx container is also started to handle TLS termination for the webhooks. Kindcontainer manages the administration of the required certificate material for this. 

The entire setup of processes, containers, and their network communication is illustrated in Figure 1.

Illustration of network setup for testing webhooks, showing JVM on the left side with Webhook server, SSH client, and JUnit test, and Docker network on right side with SSH server, Nginx container, and Kubernetes container.
Figure 1: Network setup for testing webhooks.

Fortunately, Kindcontainer hides this complexity behind an easy-to-use API:

@Testcontainers
public class WebhookTest {
    @Container
    ApiServerContainer<?> container = new ApiServerContainer<>()
.withAdmissionController(admission -> {
        admission.mutating()
                .withNewWebhook("mutating.example.com")
                .atPort(webhookPort) // Local port of webhook
                .withNewRule()
                .withApiGroups("")
                .withApiVersions("v1")
                .withOperations("CREATE", "UPDATE")
                .withResources("configmaps")
                .withScope("Namespaced")
                .endRule()
                .endWebhook()
                .build();
    });

    // Tests go here
}

The developer only needs to provide the port of the locally running webhook along with some necessary information for setting up in Kubernetes. Kindcontainer then automatically handles the configuration of SSH tunneling, TLS termination, and Kubernetes.

Consider Java

Starting from the simple example of a minimal JUnit test, we have shown how to test custom Kubernetes controllers and operators implemented in Java. We have explained how to use familiar command-line tools from the ecosystem with the help of Fluent APIs and how to easily execute integration tests even in restricted network environments. Finally, we have shown how even the technically challenging use case of testing admission controller webhooks can be implemented simply and conveniently with Kindcontainer. 

Thanks to these new testing possibilities, we hope more developers will consider Java as the language of choice for their Kubernetes-related projects in the future.

Learn more

A Promising Methodology for Testing GenAI Applications in Java

24 avril 2024 à 16:03

In the vast universe of programming, the era of generative artificial intelligence (GenAI) has marked a turning point, opening up a plethora of possibilities for developers.

Tools such as LangChain4j and Spring AI have democratized access to the creation of GenAI applications in Java, allowing Java developers to dive into this fascinating world. With Langchain4j, for instance, setting up and interacting with large language models (LLMs) has become exceptionally straightforward. Consider the following Java code snippet:

public static void main(String[] args) {
    var llm = OpenAiChatModel.builder()
            .apiKey("demo")
            .modelName("gpt-3.5-turbo")
            .build();
    System.out.println(llm.generate("Hello, how are you?"));
}

This example illustrates how a developer can quickly instantiate an LLM within a Java application. By simply configuring the model with an API key and specifying the model name, developers can begin generating text responses immediately. This accessibility is pivotal for fostering innovation and exploration within the Java community. More than that, we have a wide range of models that can be run locally, and various vector databases for storing embeddings and performing semantic searches, among other technological marvels.

Despite this progress, however, we are faced with a persistent challenge: the difficulty of testing applications that incorporate artificial intelligence. This aspect seems to be a field where there is still much to explore and develop.

In this article, I will share a methodology that I find promising for testing GenAI applications.

2400x1260 2024 GenAi

Project overview

The example project focuses on an application that provides an API for interacting with two AI agents capable of answering questions. 

An AI agent is a software entity designed to perform tasks autonomously, using artificial intelligence to simulate human-like interactions and responses. 

In this project, one agent uses direct knowledge already contained within the LLM, while the other leverages internal documentation to enrich the LLM through retrieval-augmented generation (RAG). This approach allows the agents to provide precise and contextually relevant answers based on the input they receive.

I prefer to omit the technical details about RAG, as ample information is available elsewhere. I’ll simply note that this example employs a particular variant of RAG, which simplifies the traditional process of generating and storing embeddings for information retrieval.

Instead of dividing documents into chunks and making embeddings of those chunks, in this project, we will use an LLM to generate a summary of the documents. The embedding is generated based on that summary.

When the user writes a question, an embedding of the question will be generated and a semantic search will be performed against the embeddings of the summaries. If a match is found, the user’s message will be augmented with the original document.

This way, there’s no need to deal with the configuration of document chunks, worry about setting the number of chunks to retrieve, or worry about whether the way of augmenting the user’s message makes sense. If there is a document that talks about what the user is asking, it will be included in the message sent to the LLM.

Technical stack

The project is developed in Java and utilizes a Spring Boot application with Testcontainers and LangChain4j.

For setting up the project, I followed the steps outlined in Local Development Environment with Testcontainers and Spring Boot Application Testing and Development with Testcontainers.

I also use Tescontainers Desktop to facilitate database access and to verify the generated embeddings as well as to review the container logs.

The challenge of testing

The real challenge arises when trying to test the responses generated by language models. Traditionally, we could settle for verifying that the response includes certain keywords, which is insufficient and prone to errors.

static String question = "How I can install Testcontainers Desktop?";
@Test
    void verifyRaggedAgentSucceedToAnswerHowToInstallTCD() {
        String answer  = restTemplate.getForObject("/chat/rag?question={question}", ChatController.ChatResponse.class, question).message();
        assertThat(answer).contains("https://testcontainers.com/desktop/");
    }

This approach is not only fragile but also lacks the ability to assess the relevance or coherence of the response.

An alternative is to employ cosine similarity to compare the embeddings of a “reference” response and the actual response, providing a more semantic form of evaluation. 

This method measures the similarity between two vectors/embeddings by calculating the cosine of the angle between them. If both vectors point in the same direction, it means the “reference” response is semantically the same as the actual response.

static String question = "How I can install Testcontainers Desktop?";
static String reference = """
       - Answer must indicate to download Testcontainers Desktop from https://testcontainers.com/desktop/
       - Answer must indicate to use brew to install Testcontainers Desktop in MacOS
       - Answer must be less than 5 sentences
       """;
@Test
    void verifyRaggedAgentSucceedToAnswerHowToInstallTCD() {
        String answer  = restTemplate.getForObject("/chat/rag?question={question}", ChatController.ChatResponse.class, question).message();
        double cosineSimilarity = getCosineSimilarity(reference, answer);
        assertThat(cosineSimilarity).isGreaterThan(0.8);
    }

However, this method introduces the problem of selecting an appropriate threshold to determine the acceptability of the response, in addition to the opacity of the evaluation process.

Toward a more effective method

The real problem here arises from the fact that answers provided by the LLM are in natural language and non-deterministic. Because of this, using current testing methods to verify them is difficult, as these methods are better suited to testing predictable values. 

However, we already have a great tool for understanding non-deterministic answers in natural language: LLMs themselves. Thus, the key may lie in using one LLM to evaluate the adequacy of responses generated by another LLM. 

This proposal involves defining detailed validation criteria and using an LLM as a “Validator Agent” to determine if the responses meet the specified requirements. This approach can be applied to validate answers to specific questions, drawing on both general knowledge and specialized information

By incorporating detailed instructions and examples, the Validator Agent can provide accurate and justified evaluations, offering clarity on why a response is considered correct or incorrect.

static String question = "How I can install Testcontainers Desktop?";
    static String reference = """
            - Answer must indicate to download Testcontainers Desktop from https://testcontainers.com/desktop/
            - Answer must indicate to use brew to install Testcontainers Desktop in MacOS
            - Answer must be less than 5 sentences
            """;

    @Test
    void verifyStraightAgentFailsToAnswerHowToInstallTCD() {
        String answer  = restTemplate.getForObject("/chat/straight?question={question}", ChatController.ChatResponse.class, question).message();
        ValidatorAgent.ValidatorResponse validate = validatorAgent.validate(question, answer, reference);
        assertThat(validate.response()).isEqualTo("no");
    }

    @Test
    void verifyRaggedAgentSucceedToAnswerHowToInstallTCD() {
        String answer  = restTemplate.getForObject("/chat/rag?question={question}", ChatController.ChatResponse.class, question).message();
        ValidatorAgent.ValidatorResponse validate = validatorAgent.validate(question, answer, reference);
        assertThat(validate.response()).isEqualTo("yes");
    }

We can even test more complex responses where the LLM should suggest a better alternative to the user’s question.

static String question = "How I can find the random port of a Testcontainer to connect to it?";
    static String reference = """
            - Answer must not mention using getMappedPort() method to find the random port of a Testcontainer
            - Answer must mention that you don't need to find the random port of a Testcontainer to connect to it
            - Answer must indicate that you can use the Testcontainers Desktop app to configure fixed port
            - Answer must be less than 5 sentences
            """;

    @Test
    void verifyRaggedAgentSucceedToAnswerHowToDebugWithTCD() {
        String answer  = restTemplate.getForObject("/chat/rag?question={question}", ChatController.ChatResponse.class, question).message();
        ValidatorAgent.ValidatorResponse validate = validatorAgent.validate(question, answer, reference);
        assertThat(validate.response()).isEqualTo("yes");
    }

Validator Agent

The configuration for the Validator Agent doesn’t differ from that of other agents. It is built using the LangChain4j AI Service and a list of specific instructions:

public interface ValidatorAgent {
    @SystemMessage("""
                ### Instructions
                You are a strict validator.
                You will be provided with a question, an answer, and a reference.
                Your task is to validate whether the answer is correct for the given question, based on the reference.
                
                Follow these instructions:
                - Respond only 'yes', 'no' or 'unsure' and always include the reason for your response
                - Respond with 'yes' if the answer is correct
                - Respond with 'no' if the answer is incorrect
                - If you are unsure, simply respond with 'unsure'
                - Respond with 'no' if the answer is not clear or concise
                - Respond with 'no' if the answer is not based on the reference
                
                Your response must be a json object with the following structure:
                {
                    "response": "yes",
                    "reason": "The answer is correct because it is based on the reference provided."
                }
                
                ### Example
                Question: Is Madrid the capital of Spain?
                Answer: No, it's Barcelona.
                Reference: The capital of Spain is Madrid
                ###
                Response: {
                    "response": "no",
                    "reason": "The answer is incorrect because the reference states that the capital of Spain is Madrid."
                }
                """)
    @UserMessage("""
            ###
            Question: {{question}}
            ###
            Answer: {{answer}}
            ###
            Reference: {{reference}}
            ###
            """)
    ValidatorResponse validate(@V("question") String question, @V("answer") String answer, @V("reference") String reference);

    record ValidatorResponse(String response, String reason) {}
}

As you can see, I’m using Few-Shot Prompting to guide the LLM on the expected responses. I also request a JSON format for responses to facilitate parsing them into objects, and I specify that the reason for the answer must be included, to better understand the basis of its verdict.

Conclusion

The evolution of GenAI applications brings with it the challenge of developing testing methods that can effectively evaluate the complexity and subtlety of responses generated by advanced artificial intelligences. 

The proposal to use an LLM as a Validator Agent represents a promising approach, paving the way towards a new era of software development and evaluation in the field of artificial intelligence. Over time, we hope to see more innovations that allow us to overcome the current challenges and maximize the potential of these transformative technologies.

Learn more

Get Started with the Latest Updates for Dockerfile Syntax (v1.7.0)

Par : Tonis Tiigi
9 avril 2024 à 15:16


Dockerfiles are fundamental tools for developers working with Docker, serving as a blueprint for creating Docker images. These text documents contain all the commands a user could call on the command line to assemble an image. Understanding and effectively utilizing Dockerfiles can significantly streamline the development process, allowing for the automation of image creation and ensuring consistent environments across different stages of development. Dockerfiles are pivotal in defining project environments, dependencies, and the configuration of applications within Docker containers.

With new versions of the BuildKit builder toolkit, Docker Buildx CLI, and Dockerfile frontend for BuildKit (v1.7.0), developers now have access to enhanced Dockerfile capabilities. This blog post delves into these new Dockerfile capabilities and explains how you can can leverage them in your projects to further optimize your Docker workflows.

2400x1260 dockerfile images

Versioning

Before we get started, here’s a quick reminder of how Dockerfile is versioned and what you should do to update it. 

Although most projects use Dockerfiles to build images, BuildKit is not limited only to that format. BuildKit supports multiple different frontends for defining the build steps for BuildKit to process. Anyone can create these frontends, package them as regular container images, and load them from a registry when you invoke the build.

With the new release, we have published two such images to Docker Hub: docker/dockerfile:1.7.0 and docker/dockerfile:1.7.0-labs.

To use these frontends, you need to specify a #syntax directive at the beginning of the file to tell BuildKit which frontend image to use for the build. Here we have set it to use the latest of the 1.x.x major version. For example:

#syntax=docker/dockerfile:1

FROM alpine
...

This means that BuildKit is decoupled from the Dockerfile frontend syntax. You can start using new Dockerfile features right away without worrying about which BuildKit version you’re using. All the examples described in this article will work with any version of Docker that supports BuildKit (the default builder as of Docker 23), as long as you define the correct #syntax directive on the top of your Dockerfile.

You can learn more about Dockerfile frontend versions in the documentation. 

Variable expansions

When you write Dockerfiles, build steps can contain variables that are defined using the build arguments (ARG) and environment variables (ENV) instructions. The difference between build arguments and environment variables is that environment variables are kept in the resulting image and persist when a container is created from it.

When you use such variables, you most likely use ${NAME} or, more simply, $NAME in COPY, RUN, and other commands.

You might not know that Dockerfile supports two forms of Bash-like variable expansion:

  • ${variable:-word}: Sets a value to word if the variable is unset
  • ${variable:+word}: Sets a value to word if the variable is set

Up to this point, these special forms were not that useful in Dockerfiles because the default value of ARG instructions can be set directly:

FROM alpine
ARG foo="default value"

If you are an expert in various shell applications, you know that Bash and other tools usually have many additional forms of variable expansion to ease the development of your scripts.

In Dockerfile v1.7, we have added:

  • ${variable#pattern} and ${variable##pattern} to remove the shortest or longest prefix from the variable’s value.
  • ${variable%pattern} and ${variable%%pattern} to remove the shortest or longest suffix from the variable’s value.
  • ${variable/pattern/replacement} to first replace occurrence of a pattern
  • ${variable//pattern/replacement} to replace all occurrences of a pattern

How these rules are used might not be completely obvious at first. So, let’s look at a few examples seen in actual Dockerfiles.

For example, projects often can’t agree on whether versions for downloading your dependencies should have a “v” prefix or not. The following allows you to get the format you need:

# example VERSION=v1.2.3
ARG VERSION=${VERSION#v}
# VERSION is now '1.2.3'

In the next example, multiple variants are used by the same project:

ARG VERSION=v1.7.13
ADD https://github.com/containerd/containerd/releases/download/${VERSION}/containerd-${VERSION#v}-linux-amd64.tar.gz / 

To configure different command behaviors for multi-platform builds, BuildKit provides useful built-in variables like TARGETOS and TARGETARCH. Unfortunately, not all projects use the same values. For example, in containers and the Go ecosystem, we refer to 64-bit ARM architecture as arm64, but sometimes you need aarch64 instead.

ADD https://github.com/oven-sh/bun/releases/download/bun-v1.0.30/bun-linux-${TARGETARCH/arm64/aarch64}.zip /

In this case, the URL also uses a custom name for AMD64 architecture. To pass a variable through multiple expansions, use another ARG definition with an expansion from the previous value. You could also write all the definitions on a single line, as ARG allows multiple parameters, which may hurt readability.

ARG ARCH=${TARGETARCH/arm64/aarch64}
ARG ARCH=${ARCH/amd64/x64}
ADD https://github.com/oven-sh/bun/releases/download/bun-v1.0.30/bun-linux-${ARCH}.zip /

Note that the example above is written in a way that if a user passes their own --build-arg ARCH=value, then that value is used as-is.

Now, let’s look at how new expansions can be useful in multi-stage builds.

One of the techniques described in “Advanced multi-stage build patterns” shows how build arguments can be used so that different Dockerfile commands run depending on the build-arg value. For example, you can use that pattern if you build a multi-platform image and want to run additional COPY or RUN commands only for specific platforms. If this method is new to you, you can learn more about it from that post.

In summarized form, the idea is to define a global build argument and then define build stages that use the build argument value in the stage name while pointing to the base of your target stage via the build-arg name.

Old example:

ARG BUILD_VERSION=1

FROM alpine AS base
RUN …

FROM base AS branch-version-1
RUN touch version1

FROM base AS branch-version-2
RUN touch version2

FROM branch-version-${BUILD_VERSION} AS after-condition

FROM after-condition
RUN …

When using this pattern for multi-platform builds, one of the limitations is that all the possible values for the build-arg need to be defined by your Dockerfile. This is problematic as we want Dockerfile to be built in a way that it can build on any platform and not limit it to a specific set. 

You can see other examples here and here of Dockerfiles where dummy stage aliases must be defined for all architectures, and no other architecture can be built. Instead, the pattern we would like to use is that there is one architecture that has a special behavior, and everything else shares another common behavior.

With new expansions, we can write this to demonstrate running special commands only on RISC-V, which is still somewhat new and may need custom behavior:

#syntax=docker/dockerfile:1.7

ARG ARCH=${TARGETARCH#riscv64}
ARG ARCH=${ARCH:+"common"}
ARG ARCH=${ARCH:-$TARGETARCH}

FROM --platform=$BUILDPLATFORM alpine AS base-common
ARG TARGETARCH
RUN echo "Common build, I am $TARGETARCH" > /out

FROM --platform=$BUILDPLATFORM alpine AS base-riscv64
ARG TARGETARCH
RUN echo "Riscv only special build, I am $TARGETARCH" > /out

FROM base-${ARCH} AS base

Let’s look at these ARCH definitions more closely.

  • The first sets ARCH to TARGETARCH but removes riscv64 from the value.
  • Next, as we described previously, we don’t actually want the other architectures to use their own values but instead want them all to share a common value. So, we set ARCH to common except if it was cleared from the previous riscv64 rule. 
  • Now, if we still have an empty value, we default it back to $TARGETARCH.
  • The last definition is optional, as we would already have a unique value for both cases, but it makes the final stage name base-riscv64 nicer to read.

Additional examples of including multiple conditions with shared conditions, or conditions based on architecture variants can be found in this GitHub Gist page.

Comparing this example to the initial example of conditions between stages, the new pattern isn’t limited to just controlling the platform differences of your builds but can be used with any build-arg. If you have used this pattern before, then you can effectively now define an “else” clause, whereas previously, you were limited to only “if” clauses.

Copy with keeping parent directories

The following feature has been released in the “labs” channel. Define the following at the top of your Dockerfile to use this feature.

#syntax=docker/dockerfile:1.7-labs

When you are copying files in your Dockerfile, for example, do this:

COPY app/file /to/dest/dir/

This example means the source file is copied directly to the destination directory. If your source path was a directory, all the files inside that directory would be copied directly to the destination path.

What if you have a file structure like the following:

.
├── app1
│   ├── docs
│   │   └── manual.md
│   └── src
│       └── server.go
└── app2
    └── src
        └── client.go

You want to copy only files in app1/src, but so that the final files at the destination would be /to/dest/dir/app1/src/server.go and not just /to/dest/dir/server.go.

With the new COPY --parents flag, you can write:

COPY --parents /app1/src/ /to/dest/dir/  

This will copy the files inside the src directory and recreate the app1/src directory structure for these files.

Things get more powerful when you start to use wildcard paths. To copy the src directories for both apps into their respective locations, you can write:

COPY --parents */src/ /to/dest/dir/ 

This will create both /to/dest/dir/app1 and /to/dest/dir/app2, but it will not copy the docs directory. Previously, this kind of copy was not possible with a single command. You would have needed multiple copies for individual files (as shown in this example) or used some workaround with the RUN --mount instruction instead.

You can also use double-star wildcard (**) to match files under any directory structure. For example, to copy only the Go source code files anywhere in your build context, you can write:

COPY --parents **/*.go /to/dest/dir/

If you are thinking about why you would need to copy specific files instead of just using COPY ./ to copy all files, remember that your build cache gets invalidated when you include new files in your build. If you copy all files, the cache gets invalidated when any file is added or changed, whereas if you copy only Go files, only changes in these files influence the cache.

The new --parents flag is not only for COPY instructions from your build context, but obviously, you can also use them in multi-stage builds when copying files between stages using COPY --from

Note that with COPY --from syntax, all source paths are expected to be absolute, meaning that if the --parents flag is used with such paths, they will be fully replicated as they were in the source stage. That may not always be desirable, and instead, you may want to keep some parents but discard and replace others. In that case, you can use a special /./ relative pivot point in your source path to mark which parents you wish to copy and which should be ignored. This special path component resembles how rsync works with the --relative flag.

#syntax=docker/dockerfile:1.7-labs
FROM ... AS base
RUN ./generate-lot-of-files -o /out/
# /out/usr/bin/foo
# /out/usr/lib/bar.so
# /out/usr/local/bin/baz

FROM scratch
COPY --from=base --parents /out/./**/bin/ /
# /usr/bin/foo
# /usr/local/bin/baz

This example above shows how only bin directories are copied from the collection of files that the intermediate stage generated, but all the directories will keep their paths relative to the out directory. 

Exclusion filters

The following feature has been released in the “labs” channel. Define the following at the top of your Dockerfile to use this feature:

#syntax=docker/dockerfile:1.7-labs

Another related case when moving files in your Dockerfile with COPY and ADD instructions is when you want to move a group of files but exclude a specific subset. Previously, your only options were to use RUN --mount or try to define your excluded files inside a .dockerignore file. 

.dockerignore files, however, are not a good solution for this problem, because they only list the files excluded from the client-side build context and not from builds from remote Git/HTTP URLs and are limited to one per Dockerfile. You should use them similarly to .gitignore to mark files that are never part of your project but not as a way to define your application-specific build logic.

With the new --exclude=[pattern] flag, you can now define such exclusion filters for your COPY and ADD commands directly in the Dockerfile. The pattern uses the same format as .dockerignore.

The following example copies all the files in a directory except Markdown files:

COPY --exclude=*.md app /dest/

You can use the flag multiple times to add multiple filters. The next example excludes Markdown files and also a file called README:

COPY --exclude=*.md --exclude=README app /dest/

Double-star wildcards exclude not only Markdown files in the copied directory but also in any subdirectory:

COPY --exclude=**/*.md app /dest/

As in .dockerignore files, you can also define exceptions to the exclusions with ! prefix. The following example excludes all Markdown files in any copied directory, except if the file is called important.md — in that case, it is still copied.

COPY --exclude=**/*.md --exclude=!**/important.md app /dest/

This double negative may be confusing initially, but note that this is a reversal of the previous exclude rule, and “include patterns” are defined by the source parameter of the COPY instruction.

When using --exclude together with previously described --parents copy mode, note that the exclude patterns are relative to the copied parent directories or to the pivot point /./ if one is defined. See the following directory structure for example:

assets
├── app1
│   ├── icons32x32
│   ├── icons64x64
│   ├── notes
│   └── backup
├── app2
│   └── icons32x32
└── testapp
    └── icons32x32
COPY --parents --exclude=testapp assets/./**/icons* /dest/

This command would create the directory structure below. Note that only directories with the icons prefix were copied, the root parent directory assets was skipped as it was before the relative pivot point, and additionally, testapp was not copied as it was defined with an exclusion filter.

dest
├── app1
│   ├── icons32x32
│   └── icons64x64
└── app2
    └── icons32x32

Conclusion

We hope this post gave you ideas for improving your Dockerfiles and that the patterns shown here will help you describe your build more efficiently. Remember that your Dockerfile can start using all these features today by defining the #syntax line on top, even if you haven’t updated to the latest Docker yet.

For a full list of other features in the new BuildKit, Buildx, and Dockerfile releases, check out the changelogs:

Thanks to community members @tstenner, @DYefimov, and @leandrosansilva for helping to implement these features!

If you have issues or suggestions you want to share, let us know in the issue tracker.

Learn more

Debian’s Dedication to Security: A Robust Foundation for Docker Developers

4 avril 2024 à 14:03

As security threats become more and more prevalent, building software with security top of mind is essential. Security has become an increasing concern for container workloads specifically and, commensurately, for container base-image choice. Many conversations around choosing a secure base image focus on CVE counts, but security involves a lot more than that. 

One organization that has been leading the way in secure software development is the Debian Project. In this post, I will outline how and why Debian operates as a secure basis for development.

White text on purple background with Docker logo and "Docker Official Images"

For more than 30 years, Debian’s diverse group of volunteers has provided a free, open, stable, and secure GNU/Linux distribution. Debian’s emphasis on engineering excellence and clean design, as well as its wide variety of packages and supported architectures, have made it not only a widely used distribution in its own right but also a meta-distribution. Many other Linux distributions, such as Ubuntu, Linux Mint, and Kali Linux, are built on top of Debian, as are many Docker Official Images (DOI). In fact, more than 1,000 Docker Official Images variants use the debian DOI or the Debian-derived ubuntu DOI as their base image. 

Why Debian?

As a bit of a disclaimer, I have been using Debian GNU/Linux for a long time. I remember installing Debian from floppy disks in the 1990s on a PC that I cobbled together, and later reinstalling so I could test prerelease versions of the netinst network installer. Installing over the network took a while using a 56-kbps modem. At those network speeds, you had to be very particular about which packages you chose in dselect

Having used a few other distributions before trying Debian, I still remember being amazed by how well-organized and architected the system was. No dangling or broken dependencies. No download failures. No incompatible shared libraries. No package conflicts, but rather a thoughtful handling of packages providing similar functionality. 

Much has changed over the years, no more floppies, dselect has been retired, my network connection speed has increased by a few orders of magnitude, and now I “install” Debian via docker pull debian. What has not changed is the feeling of amazement I have toward Debian and its community.

Open source software and security

Despite the achievements of the Debian project and the many other projects it has spawned, it is not without detractors. Like many other open source projects, Debian has received its share of criticsm in the past few years by opportunists lamenting the state of open source security. Writing about the software supply chain while bemoaning high-profile CVEs and pointing to malware that has been uploaded to an open source package ecosystem, such as PyPI or NPM, has become all too common. 

The pernicious assumption in such articles is that open source software is the problem. We know this is not the case. We’ve been through this before. Back when I was installing Debian over a 56-kbps modem, all sorts of fear, uncertainty, and doubt (FUD) was being spread by various proprietary software vendors. We learned then that open source is not a security problem — it is a security solution. 

Being open source does not automatically convey an improved security status compared to closed-source software, but it does provide significant advantages. In his Secure Programming HOWTO, David Wheeler provides a balanced summary of the relationship between open source software and security. A purported advantage conveyed by closed-source software is the nondisclosure of its source code, but we know that security through obscurity is no security at all. 

The transparency of open source software and open ecosystems allows us to better know our security posture. Openness allows for the rapid identification and remediation of vulnerabilities. Openness enables the vast majority of the security and supply chain tooling that developers regularly use. How many closed-source tools regularly publish CVEs? With proprietary software, you often only find out about a vulnerability after it is too late.

Debian’s rapid response strategy

Debian has been criticized for moving too slowly on the security front. But this narrative, like the open vs. closed-source narrative, captures neither the nuance nor reality. Although several distributions wait to publish CVEs until a fixed version is available, Debian opts for complete transparency and urgency when communicating security information to its users.

Furthermore, Debian maintainers are not a mindless fleet of automatons hastily applying patches and releasing new package versions. As a rule, Debian maintainers are experts among experts, deeply steeped in software and delivery engineering, open source culture, and the software they package.

zlib vulnerability example

A recent zlib vulnerability, CVE-2023-45853, provides an insightful example of the Debian project’s diligent, thorough approach to security. Several distributions grabbed a patch for the vulnerability, applied it, rebuilt, packaged, and released a new zlib package. The Debian security community took a closer look.

As mentioned in the CVE summary, the vulnerability was in minizip, which is a utility under the contrib directory of the zlib source code. No minizip source files are compiled into the zlib library, libz. As such, this vulnerability did not actually affect any zlib packages.

If that were where the story had ended, the only harm would be in updating a package unnecessarily. But the story did not end there. As detailed in the Debian bug thread, the offending minizip code was copied (i.e., vendored) and used in a lot of other widely used software. In fact, the vendored minizip code in both Chromium and Node.js was patched about a month before the zlib CVE was even published. 

Unfortunately, other commonly used software packages also had vendored copies of minizip that were still vulnerable. Thanks to the diligence of the Debian project, either the patch was applied to those projects as well, or they were compiled against the patched system minizip (not zlib!) dev package rather than the vendored version. In other distributions, those buggy vendored copies are in some cases still being compiled into software packages, with nary a mention in any CVE.

Thinking beyond CVEs

In the past 30 years, we have seen an astronomical increase in the role open source software plays in the tech industry. Despite the productivity gains that software engineers get by leveraging the massive amount of high-quality open source software available, we are once again hearing the same FUD we heard in the early days of open source. 

The next time you see an article about the dangers lurking in your open source dependencies, don’t be afraid to look past the headlines and question the assumptions. Open ecosystems lead to secure software, and the Debian project provides a model we would all do well to emulate. Debian’s goal is security, which encompasses a lot more than a report showing zero CVEs. Consumers of operating systems and container images would be wise to understand the difference. 

So go ahead and build on top of the debian DOI. FROM debian is never a bad way to start a Dockerfile!

Learn more

OpenSSH and XZ/liblzma: A Nation-State Attack Was Thwarted, What Did We Learn?

1 avril 2024 à 19:05
Black padlock on light blue digital background

I have been recently watching The Americans, a decade-old TV series about undercover KGB agents living disguised as a normal American family in Reagan’s America in a paranoid period of the Cold War. I was not expecting this weekend to be reading mailing list posts of the same type of operation being performed on open source maintainers by agents with equally shadowy identities (CVE-2024-3094).

As The Grugq explains, “The JK-persona hounds Lasse (the maintainer) over multiple threads for many months. Fortunately for Lasse, his new friend and star developer is there, and even more fortunately, Jia Tan has the time available to help out with maintenance tasks. What luck! This is exactly the style of operation a HUMINT organization will run to get an agent in place. They will position someone and then create a crisis for the target, one which the agent is able to solve.”

The operation played out over two years, getting the agent in place, setting up the infrastructure for the attack, hiding it from various tools, and then rushing to get it into Linux distributions before some recent changes in systemd were shipped that would have stopped this attack from working.

An equally unlikely accident resulted when Andres Freund, a Postgres maintainer, discovered the attack before it had reached the vast majority of systems, from a probably accidental performance slowdown. Andres says, “I didn’t even notice it while logging in with SSH or such. I was doing some micro-benchmarking at the time and was looking to quiesce the system to reduce noise. Saw sshd processes were using a surprising amount of CPU, despite immediately failing because of wrong usernames etc. Profiled sshd. Which showed lots of cpu time in code with perf unable to attribute it to a symbol, with the dso showing as liblzma. Got suspicious. Then I recalled that I had seen an odd valgrind complaint in my automated testing of Postgres, a few weeks earlier, after some package updates were installed. Really required a lot of coincidences.” 

It is hard to overstate how lucky we were here, as there are no tools that will detect this vulnerability. Even ex-post it is not possible to detect externally as we do not have the private key needed to trigger the vulnerability, and the code is very well hidden. While Linus’s law has been stated as “given enough eyeballs all bugs are shallow,” we have seen in the past this is not always true, or there are just not enough eyeballs looking at all the code we consume, even if this time it worked.

In terms of immediate actions, the attack appears to have been targeted at subset of OpenSSH servers patched to integrate with systemd. Running SSH servers in containers is rare, and the initial priority should be container hosts, although as the issue was caught early it is likely that few people updated. There is a stream of fixes to liblzma, the xz compression library where the exploit was placed, as the commits from the last two years are examined, although at present there is no evidence that there are exploits for any software other than OpenSSH included. In the Docker Scout web interface you can search for “lzma” in package names, and issues will be flagged in the “high profile vulnerabilities” policy.

So many commentators have simple technical solutions, and so many vendors are using this to push their tools. As a technical community, we want there to be technical solutions to problems like this. Vendors want to sell their products after events like this, even though none even detected it. Rewrite it in Rust, shoot autotools, stop using GitHub tarballs, and checked-in artifacts, the list goes on. These are not bad things to do, and there is no doubt that understandability and clarity are valuable for security, although we often will trade them off for performance. It is the case that m4 and autotools are pretty hard to read and understand, while tools like ifunc allow dynamic dispatch even in a mostly static ecosystem. Large investments in the ecosystem to fix these issues would be worthwhile, but we know that attackers would simply find new vectors and weird machines. Equally, there are many naive suggestions about the people, as if having an identity for open source developers would solve a problem, when there are very genuine people who wish to stay private while state actors can easily find fake identities, or “just say no” to untrusted people. Beware of people bringing easy solutions, there are so many in this hot-take world.

Where can we go from here? Awareness and observability first. Hyper awareness even, as we see in this case small clues matter. Don’t focus on the exact details of this attack, which will be different next time, but think more generally. Start by understanding your organization’s software consumption, supply chain, and critical points. Ask what you should be funding to make it different. Then build in resilience. Defense in depth, and diversity — not a monoculture. OpenSSH will always be a target because it is so widespread, and the OpenBSD developers are doing great work and the target was upstream of them because of this. But we need a diverse ecosystem with multiple strong solutions, and as an organization you need second suppliers for critical software. The third critical piece of security in this era is recoverability. Planning for the scenario in which the worst case has happened and understanding the outcomes and recovery process is everyone’s homework now, and making sure you are prepared with tabletop exercises around zero days. 

This is an opportunity for all of us to continue working together to strengthen the open source supply chain, and to work on resilience for when this happens next. We encourage dialogue and discussion on this within Docker communities.

Learn more

Crossplane Composition Functions | Tutorial (Part 5)

28 mars 2024 à 16:38

In this fifth installment of our Crossplane tutorial series, we are exploring Composition Functions. They allow us infinite flexibility to define which Managed Resources to create, manage, and delete, and how to do that.

▬▬▬▬▬▬ 📖 The Book 📖 ▬▬▬▬▬▬
Amazon: https://www.amazon.com/dp/B0CWCYP5CJ
LeanPub: https://leanpub.com/crossplane

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬
➡ Gist with the commands: https://gist.github.com/vfarcic/aa5ecfa315608d1257ba56df18088f2f
🔗 Crossplane: https://crossplane.io
🎬 Say Goodbye to Containers – Ephemeral Environments with Nix Shell: https://youtu.be/0ulldVwZiKA

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬
If you are interested in sponsoring this channel, please use https://calendly.com/vfarcic/meet to book a timeslot that suits and we’ll go over the details. Or feel free to contact me over Twitter or LinkedIn (see below)

▬▬▬▬▬▬ 🚀 Livestreams & podcasts 🚀 ▬▬▬▬▬▬
🎤 Podcast: https://www.devopsparadox.com/
💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬
➡ Follow me on Twitter: https://twitter.com/vfarcic
➡ Follow me on LinkedIn: https://www.linkedin.com/in/viktorfarcic/

Is Your Container Image Really Distroless?

27 mars 2024 à 13:25

Containerization helped drastically improve the security of applications by providing engineers with greater control over the runtime environment of their applications. However, a significant time investment is required to maintain the security posture of those applications, given the daily discovery of new vulnerabilities as well as regular releases of languages and frameworks. 

The concept of “distroless” images offers the promise of greatly reducing the time needed to keep applications secure by eliminating most of the software contained in typical container images. This approach also reduces the amount of time teams spend remediating vulnerabilities, allowing them to focus only on the software they are using. 

In this article, we explain what makes an image distroless, describe tools that make the creation of distroless images practical, and discuss whether distroless images live up to their potential.

2400x1260 is your image really distroless

What’s a distro?

A Linux distribution is a complete operating system built around the Linux kernel, comprising a package management system, GNU tools and libraries, additional software, and often a graphical user interface.

Common Linux distributions include Debian, Ubuntu, Arch Linux, Fedora, Red Hat Enterprise Linux, CentOS, and Alpine Linux (which is more common in the world of containers). These Linux distributions, like most Linux distros, treat security seriously, with teams working diligently to release frequent patches and updates to known vulnerabilities. A key challenge that all Linux distributions must face involves the usability/security dilemma. 

On its own, the Linux kernel is not very usable, so many utility commands are included in distributions to cover a large array of use cases. Having the right utilities included in the distribution without having to install additional packages greatly improves a distro’s usability. The downside of this increase in usability, however, is an increased attack surface area to keep up to date. 

A Linux distro must strike a balance between these two elements, and different distros have different approaches to doing so. A key aspect to keep in mind is that a distro that emphasizes usability is not “less secure” than one that does not emphasize usability. What it means is that the distro with more utility packages requires more effort from its users to keep it secure.

Multi-stage builds

Multi-stage builds allow developers to separate build-time dependencies from runtime ones. Developers can now start from a full-featured build image with all the necessary components installed, perform the necessary build step, and then copy only the result of those steps to a more minimal or even an empty image, called “scratch”. With this approach, there’s no need to clean up dependencies and, as an added bonus, the build stages are also cacheable, which can considerably reduce build time. 

The following example shows a Go program taking advantage of multi-stage builds. Because the Golang runtime is compiled into the binary, only the binary and root certificates need to be copied to the blank slate image.

FROM golang:1.21.5-alpine as build
WORKDIR /
COPY go.* .
RUN go mod download
COPY . .
RUN go build -o my-app


FROM scratch
COPY --from=build
  /etc/ssl/certs/ca-certificates.crt
  /etc/ssl/certs/ca-certificates.crt
COPY --from=build /my-app /usr/local/bin/my-app
ENTRYPOINT ["/usr/local/bin/my-app"]

BuildKit

BuildKit, the current engine used by docker build, helps developers create minimal images thanks to its extensible, pluggable architecture. It provides the ability to specify alternative frontends (with the default being the familiar Dockerfile) to abstract and hide the complexity of creating distroless images. These frontends can accept more streamlined and declarative inputs for builds and can produce images that contain only the software needed for the application to run. 

The following example shows the input for a frontend for creating Python applications called mopy by Julian Goede.

#syntax=cmdjulian/mopy
apiVersion: v1
python: 3.9.2
build-deps:
  - libopenblas-dev
  - gfortran
  - build-essential
envs:
  MYENV: envVar1
pip:
  - numpy==1.22
  - slycot
  - ./my_local_pip/
  - ./requirements.txt
labels:
  foo: bar
  fizz: ${mopy.sbom}
project: my-python-app/

So, is your image really distroless?

Thanks to new tools for creating container images like multi-stage builds and BuildKit, it is now a lot more practical to create images that only contain the required software and its runtime dependencies. 

However, many images claiming to be distroless still include a shell (usually Bash) and/or BusyBox, which provides many of the commands a Linux distribution does — including wget — that can leave containers vulnerable to Living off the land (LOTL) attacks. This raises the question, “Why would an image trying to be distroless still include key parts of a Linux distribution?” The answer typically involves container initialization. 

Developers often have to make their applications configurable to meet the needs of their users. Most of the time, those configurations are not known at build time so they need to be configured at run time. Often, these configurations are applied using shell initialization scripts, which in turn depend on common Linux utilities such as sed, grep, cp, etc. When this is the case, the shell and utilities are only needed for the first few seconds of the container’s lifetime. Luckily, there is a way to create true distroless images while still allowing initialization using tools available from most container orchestrators: init containers.

Init containers

In Kubernetes, an init container is a container that starts and must complete successfully before the primary container can start. By using a non-distroless container as an init container that shares a volume with the primary container, the runtime environment and application can be configured before the application starts. 

The lifetime of that init container is short (often just a couple seconds), and it typically doesn’t need to be exposed to the internet. Much like multi-stage builds allow developers to separate the build-time dependencies from the runtime dependencies, init containers allow developers to separate initialization dependencies from the execution dependencies. 

The concept of init container may be familiar if you are using relational databases, where an init container is often used to perform schema migration before a new version of an application is started.

Kubernetes example

Here are two examples of using init containers. First, using Kubernetes:

apiVersion: v1
kind: Pod
metadata:
  name: kubecon-postgress-pod
  labels:
    app.kubernetes.io/name: KubeConPostgress
spec:
  containers:
  - name: postgress
    image: laurentgoderre689/postgres-distroless
    securityContext:
      runAsUser: 70
      runAsGroup: 70
    volumeMounts:
    - name: db
      mountPath: /var/lib/postgresql/data/
  initContainers:
  - name: init-postgress
    image: postgres:alpine3.18
    env:
      - name: POSTGRES_PASSWORD
        valueFrom:
          secretKeyRef:
            name: kubecon-postgress-admin-pwd
            key: password
    command: ['docker-ensure-initdb.sh']
    volumeMounts:
    - name: db
      mountPath: /var/lib/postgresql/data/
  volumes:
  - name: db
    emptyDir: {}

- - - 

> kubectl apply -f pod.yml && kubectl get pods
pod/kubecon-postgress-pod created
NAME                    READY   STATUS     RESTARTS   AGE
kubecon-postgress-pod   0/1     Init:0/1   0          0s
> kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
kubecon-postgress-pod   1/1     Running   0          10s

Docker Compose example

The init container concept can also be emulated in Docker Compose for local development using service dependencies and conditions.

services:
 db:
   image: laurentgoderre689/postgres-distroless
   user: postgres
   volumes:
     - pgdata:/var/lib/postgresql/data/
   depends_on:
     db-init:
       condition: service_completed_successfully

 db-init:
   image: postgres:alpine3.18
   environment:
      POSTGRES_PASSWORD: example
   volumes:
     - pgdata:/var/lib/postgresql/data/
   user: postgres
    command: docker-ensure-initdb.sh

volumes:
 pgdata:

- - - 
> docker-compose up 
[+] Running 4/0
 ✔ Network compose_default      Created                                                                                                                      
 ✔ Volume "compose_pgdata"      Created                                                                                                                     
 ✔ Container compose-db-init-1  Created                                                                                                                      
 ✔ Container compose-db-1       Created                                                                                                                      
Attaching to db-1, db-init-1
db-init-1  | The files belonging to this database system will be owned by user "postgres".
db-init-1  | This user must also own the server process.
db-init-1  | 
db-init-1  | The database cluster will be initialized with locale "en_US.utf8".
db-init-1  | The default database encoding has accordingly been set to "UTF8".
db-init-1  | The default text search configuration will be set to "english".
db-init-1  | [...]
db-init-1 exited with code 0
db-1       | 2024-02-23 14:59:33.191 UTC [1] LOG:  starting PostgreSQL 16.1 on aarch64-unknown-linux-musl, compiled by gcc (Alpine 12.2.1_git20220924-r10) 12.2.1 20220924, 64-bit
db-1       | 2024-02-23 14:59:33.191 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
db-1       | 2024-02-23 14:59:33.191 UTC [1] LOG:  listening on IPv6 address "::", port 5432
db-1       | 2024-02-23 14:59:33.194 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db-1       | 2024-02-23 14:59:33.196 UTC [9] LOG:  database system was shut down at 2024-02-23 14:59:32 UTC
db-1       | 2024-02-23 14:59:33.198 UTC [1] LOG:  database system is ready to accept connections

As demonstrated by the previous example, an init container can be used alongside a container to remove the need for general-purpose software and allow the creation of true distroless images. 

Conclusion

This article explained how Docker build tools allow for the separation of build-time dependencies from run-time dependencies to create “distroless” images. For example, using init containers allows developers to separate the logic needed to configure a runtime environment from the environment itself and provide a more secure container. This approach also helps teams focus their efforts on the software they use and find a better balance between security and usability.

Learn more

💾

Don&#039;t miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon North America in Salt Lake City from November 12 - 15, 2024. Connect with o...

Crossplane Compositions | Tutorial (Part 3)

14 mars 2024 à 16:13

In this third installment of our Crossplane tutorial series, we are exploring Compositions, probably the most important feature in Crossplane. They allow us to define interfaces (CRDs) and controllers that represent services we can use to enable others to manage resources like databases, clusters, applications, or anything else.

▬▬▬▬▬▬ 📖 The Book 📖 ▬▬▬▬▬▬
Amazon: https://www.amazon.com/dp/B0CWCYP5CJ
LeanPub: https://leanpub.com/crossplane

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬
➡ Gist with the commands: https://gist.github.com/vfarcic/aa5ecfa315608d1257ba56df18088f2f
🔗 Crossplane: https://crossplane.io
🎬 Say Goodbye to Containers – Ephemeral Environments with Nix Shell: https://youtu.be/0ulldVwZiKA

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬
If you are interested in sponsoring this channel, please use https://calendly.com/vfarcic/meet to book a timeslot that suits and we’ll go over the details. Or feel free to contact me over Twitter or LinkedIn (see below)

▬▬▬▬▬▬ 🚀 Livestreams & podcasts 🚀 ▬▬▬▬▬▬
🎤 Podcast: https://www.devopsparadox.com/
💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬
➡ Follow me on Twitter: https://twitter.com/vfarcic
➡ Follow me on LinkedIn: https://www.linkedin.com/in/viktorfarcic/

Argo CD Synchronization is BROKEN! It Should Switch to Eventual Consistency!

26 février 2024 à 16:08

We dive into the challenges faced with Argo CD’s current synchronization strategy. Despite its wide adoption for managing Kubernetes deployments, there are cracks in the system’s design, particularly when it comes to handling sync operations. We’ll explore why Argo CD’s insistence on strong consistency is causing more problems than it solves and discuss how an eventual consistency model could be the game-changer that Argo CD needs to streamline deployment workflows and minimize outages.

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬
➡ Gist with the commands: https://gist.github.com/vfarcic/82e08206a786bc4373f7325af1d3fea9
🔗 Kubernetes Deployment Order: https://argoproj.github.io/cd/
🎬 Kubernetes? Database Schema? Schema Management with Atlas Operator: https://youtu.be/1iZoEFzlvhM
🎬 Manage Kubernetes Secrets With External Secrets Operator (ESO): https://youtu.be/SyRZe5YVCVk
🎬 Argo CD – Applying GitOps Principles To Manage A Production Environment In Kubernetes: https://youtu.be/vpWQeoaiRM4
🎬 How To Apply GitOps To Everything – Combining Argo CD And Crossplane: https://youtu.be/yrj4lmScKHQ
🎬 Mastering Argo CD Sync Waves: A Deep Dive into Effective GitOps Synchronization Strategies: https://youtu.be/LKuRtOTvlXk
🎬 Crossplane – GitOps-based Infrastructure as Code through Kubernetes API: https://youtu.be/n8KjVmuHm7A
🎬 How To Shift Left Infrastructure Management Using Crossplane Compositions: https://youtu.be/AtbS1u2j7po
🎬 Crossplane Composition Functions: Unleashing the Full Potential: https://youtu.be/jjtpEhvwgMw

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬
If you are interested in sponsoring this channel, please use https://calendly.com/vfarcic/meet to book a timeslot that suits and we’ll go over the details. Or feel free to contact me over Twitter or LinkedIn (see below)

▬▬▬▬▬▬ 🚀 Livestreams & podcasts 🚀 ▬▬▬▬▬▬
🎤 Podcast: https://www.devopsparadox.com/
💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬
➡ Follow me on Twitter: https://twitter.com/vfarcic
➡ Follow me on LinkedIn: https://www.linkedin.com/in/viktorfarcic/

Docker Security Advisory: Multiple Vulnerabilities in runc, BuildKit, and Moby

31 janvier 2024 à 20:05

February 1 updates:

January 31 updates:

  • Patches for runc, BuildKit, and Moby (Docker Engine) are now available.
  • Updates have been rolled out to Docker Build Cloud builders.

We at Docker prioritize the security and integrity of our software and the trust of our users. Security researchers at Snyk Labs recently identified and reported four security vulnerabilities in the container ecosystem. One of the vulnerabilities, CVE-2024-21626, concerns the runc container runtime, and the other three affect BuildKit (CVE-2024-23651, CVE-2024-23652, and CVE-2024-23653). We want to assure our community that our team, in collaboration with the reporters and open source maintainers, has been diligently working on coordinating and implementing necessary remediations.

banner docker security advisory

We are committed to maintaining the highest security standards. We will publish patched versions of runc, BuildKit, and Moby on January 31 and release an update for Docker Desktop on February 1 to address these vulnerabilities.  Additionally, our latest Moby and BuildKit releases will include fixes for CVE-2024-23650 and CVE-2024-24557, discovered respectively by an independent researcher and through Docker’s internal research initiatives.

 Versions impacted
runc<= 1.1.11
BuildKit<= 0.12.4
Moby (Docker Engine)<= 25.0.1 and <= 24.0.8
Docker Desktop<= 4.27.0

These vulnerabilities can only be exploited if a user actively engages with malicious content by incorporating it into the build process or running a container from a suspect image (particularly relevant for the CVE-2024-21626 container escape vulnerability). Potential impacts include unauthorized access to the host filesystem, compromising the integrity of the build cache, and, in the case of CVE-2024-21626, a scenario that could lead to full container escape. 

We strongly urge all customers to prioritize security by applying the available updates as soon as they are released. Timely application of these updates is the most effective measure to safeguard your systems against these vulnerabilities and maintain a secure and reliable Docker environment.

What should I do if I’m on an affected version?

If you are using affected versions of runc, BuildKit, Moby, or Docker Desktop, make sure to update to the latest versions as soon as patched versions become available (all to be released no later than February 1 and linked in the following table):

 Patched versions
runc>= 1.1.12
BuildKit>= 0.12.5
Moby (Docker Engine)>= 25.0.2 and >= 24.0.9*
Docker Desktop>= 4.27.1
* Only CVE-2024-21626 and CVE-2024-24557 were fixed in Moby 24.0.9.


If you are unable to update to an unaffected version promptly after it is released, follow these best practices to mitigate risk: 

  • Only use trusted Docker images (such as Docker Official Images).
  • Don’t build Docker images from untrusted sources or untrusted Dockerfiles.
  • If you are a Docker Business customer using Docker Desktop and unable to update to v4.27.1 immediately after it’s released, make sure to enable Hardened Docker Desktop features such as:
  • For CVE-2024-23650, CVE-2024-23651, CVE-2024-23652, and CVE-2024-23653, avoid using BuildKit frontend from an untrusted source. A frontend image is usually specified as the #syntax line on your Dockerfile, or with --frontend flag when using the buildctl build command.
  • To mitigate CVE-2024-24557, make sure to either use BuildKit or disable caching when building images. From the CLI this can be done via the DOCKER_BUILDKIT=1 environment variable (default for Moby >= v23.0 if the buildx plugin is installed) or the --no-cache flag. If you are using the HTTP API directly or through a client, the same can be done by setting nocache to true or version to 2 for the /build API endpoint.

Technical details and impact

CVE-2024-21626 (High)

In runc v1.1.11 and earlier, due to certain leaked file descriptors, an attacker can gain access to the host filesystem by causing a newly-spawned container process (from runc exec) to have a working directory in the host filesystem namespace, or by tricking a user to run a malicious image and allow a container process to gain access to the host filesystem through runc run. The attacks can also be adapted to overwrite semi-arbitrary host binaries, allowing for complete container escapes. Note that when using higher-level runtimes (such as Docker or Kubernetes), this vulnerability can be exploited by running a malicious container image without additional configuration or by passing specific workdir options when starting a container. The vulnerability can also be exploited from within Dockerfiles in the case of Docker.

  • The issue has been fixed in runc v1.1.12.

CVE-2024-23651 (High)

In BuildKit <= v0.12.4, two malicious build steps running in parallel sharing the same cache mounts with subpaths could cause a race condition, leading to files from the host system being accessible to the build container. This will only occur if a user is trying to build a Dockerfile of a malicious project.

  • The issue will be fixed in BuildKit v0.12.5.

CVE-2024-23652 (High)

In BuildKit <= v0.12.4, a malicious BuildKit frontend or Dockerfile using RUN --mount could trick the feature that removes empty files created for the mountpoints into removing a file outside the container from the host system. This will only occur if a user is using a malicious Dockerfile.

  • The issue will be fixed in BuildKit v0.12.5.

CVE-2024-23653 (High)

In addition to running containers as build steps, BuildKit also provides APIs for running interactive containers based on built images. In BuildKit <= v0.12.4, it is possible to use these APIs to ask BuildKit to run a container with elevated privileges. Normally, running such containers is only allowed if special security.insecure entitlement is enabled both by buildkitd configuration and allowed by the user initializing the build request.

  • The issue will be fixed in BuildKit v0.12.5.

CVE-2024-23650 (Medium)

In BuildKit <= v0.12.4, a malicious BuildKit client or frontend could craft a request that could lead to BuildKit daemon crashing with a panic.

  • The issue will be fixed in BuildKit v0.12.5.

CVE-2024-24557 (Medium)

In Moby <= v25.0.1 and <= v24.0.8, the classic builder cache system is prone to cache poisoning if the image is built FROM scratch. Also, changes to some instructions (most important being HEALTHCHECK and ONBUILD) would not cause a cache miss. An attacker with knowledge of the Dockerfile someone is using could poison their cache by making them pull a specially crafted image that would be considered a valid cache candidate for some build steps.

  • The issue will be fixed in Moby >= v25.0.2 and >= v24.0.9.

How are Docker products affected? 

The following Docker products are affected. No other products are affected by these vulnerabilities.

Docker Desktop

Docker Desktop v4.27.0 and earlier are affected. Docker Desktop v4.27.1 will be released on February 1 and includes runc, BuildKit, and dockerd binaries patches. In addition to updating to this new version, we encourage all Docker users to diligently use Docker images and Dockerfiles and ensure you only use trusted content in your builds.

As always, you should check Docker Desktop system requirements for your operating system (Windows, Linux, Mac) before updating to ensure full compatibility.

Docker Build Cloud

Any new Docker Build Cloud builder instances will be provisioned with the latest Docker Engine and BuildKit versions after fixes are released and will, therefore, be unaffected by these CVEs. Docker will also be rolling out gradual updates to any existing builder instances.

Security at Docker

At Docker, we know that part of being developer-obsessed is providing secure software to developers. We appreciate the responsible disclosure of these vulnerabilities. If you’re aware of potential security vulnerabilities in any Docker product, report them to security@docker.com. For more information on Docker’s security practices, see our website.

Advisory links

Harmony in Code: How Software Development Mirrors a Symphony Orchestra

14 novembre 2023 à 16:08

This video explores the fascinating parallels between software development and a symphony orchestra, highlighting the importance of teamwork, coordination, and harmony in creating a cohesive and beautiful end result. Discover the similarities and gain a new perspective on the intricate process behind both software engineering and musical composition. We’re exploring the role of developers, database administrators (DBAs), system engineers, and other roles in an attempt to produce symphony-like harmony.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Get one month of 🎉 VPS FREE 🎉 at 🔗 https://hivelocityinc.net/3SqKZZX 🔗.
Use code ”DEVOPS1” at checkout.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬
➡ Gist with the commands: https://gist.github.com/vfarcic/ac430638751525a6d00098f811fb76c5
🔗 Hivelocity: https://hivelocityinc.net/3FFpixP
🎬 How To Shift Left Infrastructure Management Using Crossplane Compositions: https://youtu.be/AtbS1u2j7po
🎬 How To Apply GitOps To Everything – Combining Argo CD And Crossplane: https://youtu.be/yrj4lmScKHQ
🎬 Kubernetes? Database Schema? Schema Management with Atlas Operator: https://youtu.be/1iZoEFzlvhM
🎬 GitHub CLI (gh) – How to manage repositories more efficiently: https://youtu.be/BII6ZY2Rnlc

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬
If you are interested in sponsoring this channel, please use https://calendly.com/vfarcic/meet to book a timeslot that suits and we’ll go over the details. Or feel free to contact me over Twitter or LinkedIn (see below)

▬▬▬▬▬▬ 🚀 Livestreams & podcasts 🚀 ▬▬▬▬▬▬
🎤 Podcast: https://www.devopsparadox.com/
💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬
➡ Follow me on Twitter: https://twitter.com/vfarcic
➡ Follow me on LinkedIn: https://www.linkedin.com/in/viktorfarcic/

❌
❌