Vue normale

Il y a de nouveaux articles disponibles, cliquez pour rafraîchir la page.
À partir d’avant-hierFlux principal

Simplify AI Development with the Model Context Protocol and Docker

Par : Docker Labs
15 janvier 2025 à 13:07

This ongoing Docker Labs GenAI series explores the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing software as open source so you can play, explore, and hack with us, too.

In December, we published The Model Context Protocol: Simplifying Building AI apps with Anthropic Claude Desktop and Docker. Along with the blog post, we also created Docker versions for each of the reference servers from Anthropic and published them to a new Docker Hub mcp namespace.

This provides lots of ways for you to experiment with new AI capabilities using nothing but Docker Desktop.

2400x1260 docker labs genai

For example, to extend Claude Desktop to use Puppeteer, update your claude_desktop_config.json file with the following snippet:

"puppeteer": {
    "command": "docker",
    "args": ["run", "-i", "--rm", "--init", "-e", "DOCKER_CONTAINER=true",          "mcp/puppeteer"]
  }

After restarting Claude Desktop, you can ask Claude to take a screenshot of any URL using a Headless Chromium browser running in Docker.

You can do the same thing for a Model Context Protocol (MCP) server that you’ve written. You will then be able to distribute this server to your users without requiring them to have anything besides Docker Desktop.

How to create an MCP server Docker Image

An MCP server can be written in any language. However, most of the examples, including the set of reference servers from Anthropic, are written in either Python or TypeScript and use one of the official SDKs documented on the MCP site.

For typical uv-based Python projects (projects with a pyproject.toml and uv.lock in the root), or npm TypeScript projects, it’s simple to distribute your server as a Docker image.

  1. If you don’t already have Docker Desktop, sign up for a free Docker Personal subscription so that you can push your images to others.
  2. Run docker login from your terminal.
  3. Copy either this npm Dockerfile or this Python Dockerfile template into the root of your project. The Python Dockerfile will need at least one update to the last line.
  4. Run the build with the Docker CLI (instructions below).

The two Dockerfiles shown above are just templates. If your MCP server includes other runtime dependencies, you can update the Dockerfiles to include these additions. The runtime of your MCP server should be self-contained for easy distribution.

If you don’t have an MCP server ready to distribute, you can use a simple mcp-hello-world project to practice. It’s a simple Python codebase containing a server with one tool call. Get started by forking the repo, cloning it to your machine, and then following the following instructions to build the MCP server image.

Building the image

Most sample MCP servers are still designed to run locally (on the same machine as the MCP client, communication over stdio). Over the next few months, you’ll begin to see more clients supporting remote MCP servers but for now, you need to plan for your server running on at least two different architectures (amd64 and arm64). This means that you should always distribute what we call multi-platform images when your target is local MCP servers. Fortunately, this is easy to do.

Create a multi-platform builder

The first step is to create a local builder that will be able to build both platforms. Don’t worry; this builder will use emulation to build the platforms that you don’t have. See the multi-platform documentation for more details.

docker buildx create \
  --name mcp-builder \
  --driver docker-container \
  --bootstrap

Build and push the image

In the command line below, substitute <your-account> and your mcp-server-name for valid values, then run a build and push it to your account.

docker buildx build \
  --builder=mcp-builder \
  --platform linux/amd64,linux/arm64 \
  -t <your-docker-account>/mcp-server-name \
  --push .

Extending Claude Desktop

Once the image is pushed, your users will be able to attach your MCP server to Claude Desktop by adding an entry to claude_desktop_config.json that looks something like:

"your-server-name": {
    "command": "docker",
    "args": ["run", "-i", "--rm", "--pull=always",
             "your-account/your-server-name"]
  }

This is a minimal set of arguments. You may want to pass in additional command-line arguments, environment variables, or volume mounts.

Next steps

The MCP protocol gives us a standard way to extend AI applications. Make sure your extension is easy to distribute by packaging it as a Docker image. Check out the Docker Hub mcp namespace for examples that you can try out in Claude Desktop today.

As always, feel free to follow along in our public repo.

For more on what we’re doing at Docker, subscribe to our newsletter.

Learn more

💾

This demo will use the Puppeteer MCP server to take a screenshot of a website and invert the colors using Claude Desktop and Docker Desktop. Doing this witho...

Meet Gordon: An AI Agent for Docker

Par : Docker Labs
13 janvier 2025 à 14:20

This ongoing Docker Labs GenAI series explores the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing software as open source so you can play, explore, and hack with us, too.

In previous articles, we focused on how AI-based tools can help developers streamline tasks and offered ideas for enabling agentic workflows, like reviewing branches and understanding code changes.

In this article, we’ll explore our experiments around the idea of creating a Docker AI Agent — something that could both help new users learn about our tools and products and help power users get things done faster.

2400x1260 docker labs genai

During our explorations around this Docker Agent and AI-based tools, we noticed that the main pain points we encountered were often the same:

  • LLMs need good context to provide good answers (garbage in -> garbage out).
  • Using AI tools often requires context switching (moving to another app, to a different website, etc.).
  • We’d like agents to be able to suggest and perform actions on behalf of the users.
  • Direct product integrations with AI are often more satisfying to use than chat interfaces.

At first, we tried to see what’s possible using off-the-shelf services like ChatGPT or Claude. 

By using testing prompts such as “optimize the following Dockerfile, following all best practices” and providing the model with a sub-par but common Dockerfile, we could sometimes get decent answers. Often, though, the resulting Dockerfile had subtle bugs, hallucinations, or simply wasn’t optimized or didn’t use many of the best practices we would’ve hoped for. Thus, this approach was not reliable enough.

Data ended up being the main issue. Training data for LLM models is always outdated by some amount of time, and the number of bad Dockerfiles that you can find online vastly outnumbers the amount of up-to-date Dockerfiles using all best practices, etc.

After doing proof-of-concept tests using a RAG approach, including some documents with lots of useful advice for creating good Dockerfiles, we realized that the AI Agent idea was definitely possible. However, setting up all the things required for a good RAG would’ve taken too much bandwidth from our small team.

Because of this, we opted to use kapa.ai for that specific part of our agent. Docker already uses them to provide the AI docs assistant on Docker docs, so most of our high-quality documentation is already available for us to reference as part of our LLM usage through their service. Using kapa.ai allowed us to experiment more, getting high-quality results faster, and allowing us to try different ideas around the AI agent concept.

Enter Gordon

Out of this experimentation came a new product that you can try: Gordon. With Gordon, we’d like to tackle these pain points. By integrating Gordon into Docker Desktop and the Docker CLI (Figure 1), we can:

  • Access much more context that can be used by the LLMs to best understand the user’s questions and provide better answers or even perform actions on the user’s behalf.
  • Be where the users are. If you launch a container via Docker Desktop and it fails, you can quickly debug with Gordon. If you’re in the terminal hacking away, Docker AI will be there, too.
  • Avoid being a purely chat-based agent by providing Gordon-based features directly as part of Docker Desktop UI elements. If Gordon detects certain scenarios, like a container that failed to start, a button will appear in the UI to directly get suggestions, or run actions, etc. (Figure 2).
Screenshot of Docker Desktop showing the Gordon icon next to a container name in the list of containers.
Figure 1: Gordon icon on Docker Desktop.
Screenshot of Docker Desktop showing the Ask Gordon tab next to Logs, Inspect, Files, Stats and other options.
Figure 2: Ask Gordon (beta).

What Gordon can do

We want to start with Gordon by optimizing for Docker-related tasks — not general-purpose questions — but we are not excluding expanding the scope to more development-related tasks as work on the agent continues.

Work on Gordon is at an early stage and its capabilities are constantly evolving, but it’s already really good at some things (Figure 3). Here are things to definitely try out:

  • Ask general Docker-related questions. Gordon knows Docker well and has access to all of our documentation.
  • Get help debugging container build or runtime errors.
  • Remediate policy deviations from Docker Scout.
  • Get help optimizing Docker-related files and configurations.
  • Ask it how to run specific containers (e.g., “How can I run MongoDB?”).
Screenshot of results after asking Docker AI to explain a Dockerfile.
Figure 3: Using Gordon to understand a Dockerfile.

How Gordon works

The Gordon backend lives on Docker servers, while the client is a CLI that lives on the user’s machine and is bundled with Docker Desktop. Docker Desktop uses the CLI to access the local machine’s files, asking the user for the directory each time it needs that context to answer a question. When using the CLI directly, it has access to the working directory it’s executed in. For example, if you are in a directory with a Dockerfile and you run “Docker AI, rate my Dockerfile”, it will find the one that’s present in that directory

Currently, Gordon does not have write access to any files, so it will not edit any of your files. We’re hard at work on future features that will allow the agent to do the work for you, instead of only suggesting solutions. 

Figure 4 shows a rough overview of how we are thinking about things behind the scenes.

Illustration showing an overview of how Gordon works, with flow steps starting with "Understand user's input" and going to "Gather context" to "prepare final prompts" then "check results", "reply to user", and more.
Figure 4: Overview of Gordon.

The first step of this pipeline, “Understand the user’s input and figure out which action to perform”, is done using “tool calling” (also known as “function calling”) with the OpenAI API

Although this is a popular approach, we noticed that the documentation online isn’t very good, and general best practices aren’t well defined yet. This led us to experiment a lot with the feature and try to figure out what works for us and what doesn’t.

Things we noticed:

  • Tool descriptions are important, and we should prefer more in-depth descriptions with examples.
  • Testing around tool-detection code is also important. Adding new tools to a request could confuse the LLM and cause it to no longer trigger the expected tool.
  • The LLM model used influences how the whole tool calling functionality should be implemented, as different models might prefer descriptions written in a certain way, behave better/worse under certain scenarios (e.g. when using lots of tools), etc.

Try Gordon for yourself

Gordon is available as an opt-in Beta feature starting with Docker Desktop version 4.37. To participate in the closed beta, all you need to do is fill out the form on the site.

Initially, Gordon will be available for use both in Docker Desktop and the Docker CLI, but our idea is to surface parts of this tech in various other parts of our products as well.

For more on what we’re doing at Docker, subscribe to our newsletter.

Learn more

How to Create and Use an AI Git Agent

Par : Docker Labs
16 décembre 2024 à 14:23

This ongoing Docker Labs GenAI series explores the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing software as open source so you can play, explore, and hack with us, too.

In our past experiments, we started our work from the assumption that we had a project ready to work on. That means someone like a UI tech writer would need to understand Git operations in order to use the tools we built for them. Naturally, because we have been touching on Git so frequently, we wanted to try getting a Git agent started. Then, we want to use this Git agent to understand PR branches for a variety of user personas — without anyone needing to know the ins and outs of Git.

2400x1260 docker labs genai

Git as an agent

We are exploring the idea that tools are agents. So, what would a Git agent do? 

Let’s tackle our UI use case prompt. 

Previously:

You are at $PWD of /project, which is a git repo.
Force checkout {{branch}}
Run a three-dot diff of the files changed in {{branch}} compared to main using --name-only.

A drawback that isn’t shown here, is that there is no authentication. So, if you haven’t fetched that branch or pulled commits already, this prompt at best will be unreliable and more than likely will fail (Figure 1):

Screenshot of Logs showing failure to authenticate.
Figure 1: No authentication occurs.

Now:

You are a helpful assistant that checks a PR for user-facing changes.
1. Fetch everything and get on latest main.
2. Checkout the PR branch and pull latest.
3. Run a three-dot git diff against main for just files. Write the output to /thread/diff.txt.

This time around, you can see that we are being less explicit about the Git operations, we have the ability to export outputs to the conversation thread and, most importantly, we have authentication with a new prompt!

Preparing GitHub authentication

Note: These prompts should be easily adaptable to other Git providers, but we use GitHub at Docker.

Before we can do anything with GitHub, we have to authenticate. There are several ways to do this, but for this post we’ll focus on SSH-based auth rather than using HTTPS through the CLI. Without getting too deep into the Git world, we will be authenticating with keys on our machine that are associated with our account. These keys and configurations are commonly located at ~/.ssh on Linux/Mac. Furthermore, users commonly maintain Git config at ~/.gitconfig

The .gitconfig file is particularly useful because it lets us specify carriage return rules — something that can easily cause Git to fail when running in a Linux container. We will also need to modify our SSH config to remove UseKeychain. We found these changes are enough to authenticate using SSH in Alpine/Git. But we, of course, don’t want to modify any host configuration.

We came up with a fairly simple flow that lets us prepare to use Git in a container without messing with any host SSH configs.

  1. Readonly mounts: Git config and SSH keys are stored on specific folders on the host machine. We need to mount those in.
    a. Mount ~/.ssh into a container as /root/.ssh-base readonly.
    b. Mount ~/.gitconfig into the same container as /root/.gitconfig.
  2. Copy /root/.ssh-base to /root/.ssh and make the new file readwrite.
  3. Make necessary changes to config.
  4. For the LLM, we also need it to verify the config is in the thread and the changes were made to it. In the event that it fails to make the right changes, the LLM can self-correct.
  5. Copy the .ssh directory and .gitconfig to /thread.

All of this is baked in a prompt you can find and run on GitHub. Simply replace <username> with your system username (Figure 2).

Screenshot of mounts, showing storage of Git config and SSH keys.
Figure 2: Readonly mounts.

If you’re using the default behavior in our extension for VSCode, then the /thread/ volume is ephemeral, so you would need to make sure your agent runs it before any Git tools. However, if you use the Set prompt thread ID command in VSCode, you can keep the /thread volume around under your own tag (Figure 3). 

Tagging the thread persists everything after the prompt completes, so you would only need to run the prompt once before going on to develop your Git tools.

Screenshot of setting the prompt thread ID.
Figure 3: Set the prompt thread ID.

If you opt to persist the thread, then this prompt would only need to be run once and the Git credentials will stick around (Figure 4).

Screenshot showing the stored configuration info.
Figure 4: Stored credential data.

Now that we have our credentials ready, we need our Git tool to pick up on them.

container:
   image: alpine/git
   entrypoint:
     - /bin/sh
     - -c
   command:
     - "cp -r /thread/.ssh /root/.ssh && cp -r /thread/.gitconfig /root/.gitconfig && git --no-pager {{args|safe}}" 

Our container definition copies the SSH credentials and .gitconfig files to the correct directory before running Git commands.

Unpacking objects:  90% (9/10)
Unpacking objects: 100% (10/10)
Unpacking objects: 100% (10/10), 4.14 KiB | 141.00 KiB/s, done.
From github.com:docker/repo
   abcd123..abcd123  main           -> origin/main
 * [new tag]           branch/v2982.0.0 -> branch/v2982.0.0
# ROLE assistant
I have fetched all origin branches. The latest updates have been applied to your local repository, and any new tags have been downloaded.
```
Now, we can see our Git tool is authenticated as if it were us.
Running the agent now, it can:
## ROLE assistant
I have run the git fetch --all command, which retrieves updates from all remotes. If you need further assistance with your Git repository, feel free to ask!
```

# ROLE assistant
```json
{"request":"checkout main"}
```

## ROLE assistant
```json
{"args":"checkout main"}
```

## ROLE assistant
You are already on the 'main' branch, and it is up to date with 'origin/main'.
```

# ROLE assistant
```json
{"request":"checkout the branch branch and pull latest changes."}
```

...
## ROLE assistant
The branch branch has been successfully checked out, and it's already up to date with the latest changes.
...

# ROLE assistant
```json
{"request":"Run a three-dot git diff against main and write the output for just files to /thread/diff.txt."}
```

...

## ROLE assistant
```json
{"args":"diff main... --name-only","outfile":"/thread/diff.txt"}
```

*entering tool*

## ROLE tool (git)
file1.tsx
file2.tsx
...
## ROLE assistant

The three-dot Git diff against the main branch has been executed, and the output has been saved to /thread/diff.txt. This includes the names of files that have differences.
``

With the new Git agent, we have enabled our assistant to fetch new information from upstream, switch branches, and pull in new code to review. If we provide our agent with credentials that are limited to read-only access, this agent could be used by someone that does not have a local git install.

For more on what we’re doing at Docker, subscribe to our newsletter.

Learn more

How AI Assistants Can Decode GitHub Repos for UI Writers

Par : Docker Labs
16 décembre 2024 à 14:19

This ongoing Docker Labs GenAI series explores the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing software as open source so you can play, explore, and hack with us, too.

Can an AI-powered assistant understand a GitHub repo enough to answer questions for UI writers?

2400x1260 docker labs genai

Across many projects, user-facing content is rendered based on some sort of client-side code. Whether a website, a game, or a mobile app, it’s critical to nail the text copy displayed to the user.

So let’s take a sample question: Do any open PRs in this project need to be reviewed for UI copy? In other words, we want to scan a GitHub repo’s PRs and gain intelligence about the changes included.

Disclaimer: The best practice to accomplish this at a mature organization would be to implement Localization (i18n), which would facilitate centralized user-facing text. However, in a world of AI-powered tools, we believe our assistants will help minimize friction for all projects, not just ones that have adopted i18n.

So, let’s start off by seeing what options we already have.

The first instinct someone might have is to open the new copilot friend in the GitHub nav

genai series 13 f1
Figure 1: Type / to search.

We tried to get it to answer basic questions, first: “How many PR’s are open?”

genai series 13 f2
Figure 2: How many PR’s are there open? The answer doesn’t give a number.

Despite having access to the GitHub repo, the Copilot agent provides less helpful information than we might expect.

genai series 13 f3
Figure 3: Copilot is powered by AI, so mistakes are possible.

We don’t even get a number like we asked, despite GitHub surfacing that information on the repository’s main page. Following up our first query with the main query we want to ask effectively just gives us the same answer

genai series 13 f4
Figure 4: The third PR is filesharing: add some missing contexts.

And, after inspecting the third PR in the list, it doesn’t contain user-facing changes. One great indicator for this web project is the lack of any clientside code being modified. This was a backend change so we didn’t want to see this one.

genai series 13 f5
Figure 5: The PR doesn’t contain user-facing changes.

So let’s try to improve this:

First prompt file

---
functions:
  - name: bash
	description: Run a bash script in the utilities container.
	parameters:
  	  type: object
  	  properties:
    	    command:
      	      type: string
      	description: The command to send to bash
	container:
    	  image: wbitt/network-multitool  
    	  command:
      	    - "bash"
      	    - "-c"
      	    - "{{command|safe}}"
  - name: git
	description: Run a git command.
	parameters:
  	  type: object
  	  properties:
    	    command:
      	      type: string
      	description: The git command to run, excluding the `git` command itself
	container:
  	  image: alpine/git
  	  entrypoint:
    	    - "/bin/sh"
  	  command:
    	    - "-c"
    	    - "git --no-pager {{command|safe}}"
---

# prompt system

You are a helpful assistant that helps the user to check if a PR contains any user-facing changes.

You are given a container to run bash in with the following tools:

  curl, wget, jq
and default alpine linux tools too.

# prompt user
You are at $PWD of /project, which is a git repo.

Checkout branch `{{branch}}`.

Diff the changes and report any containing user facing changes

This prompt was promising, but it ended up with a few blocking flaws. The reason is that using git to compare files is quite tricky for an LLM.

  • git diff uses a pager, and therefore needs the --no-pager arg to send stdout to the conversation.
  • The total number of files affected via git diff can be quite large.
  • Given each file, the raw diff output can be massive and difficult to parse.
  • The important files changed in a PR might be buried with many extra files in the diff output.
  • The container has many more tools than necessary, allowing the LLM to hallucinate.

The agent needs some understanding of the repo to determine the sorts of files that contain user-facing changes, and it needs to be capable of seeing just the important pieces of information.

Our next pass involves a few tweaks:

  • Switch to alpine git image and a file writer as the only tools necessary.
  • Use –files-only and –no-pager args.
# ROLE assistant


The following files are likely to contain user-facing changes as they mainly consist of UI components, hooks, and API functionalities.

```
file1.ts
fil2.tsx
file3.tsx
...
```
Remember that this isn't a guarantee of whether there are user-facing changes, but just an indication of where they might be if there are any.

Remember that this isn’t a guarantee of whether there are user-facing changes, but just an indication of where they might be if there are any.

Giving the agent the tool run-javascript-sandbox allowed our agent to write a script to save the output for later.

genai series 13 f6
Figure 6: Folder called user-changes with files.txt.

To check out the final prompt here, use our Gist.

Expert knowledge

This is a great start; however, we now need to inspect the files themselves for user-facing changes. When we started this, we realized that user-facing changes could manifest in a diverse set of “diff”s so we needed to include expert knowledge. We synced up with Mark Higson, a staff SWE currently working on the frontend platform here at Docker. Mark was able to help provide some key advice for what “user-facing” changes look like in many repos at Docker, so I baked the tips into the prompt.

Straightforward approaches

Looking for changes in text nodes found in a JSX tree is the easiest example.

JSX node with interpolation

<div>{functionReturningString()}</div>

If the result is a string, the result is probably user-facing, but the components that create the string could be elsewhere, so look for:

Nuanced indicators

  • Standard user-facing components. Example: notifications. If a notification’s props change, we can likely infer that it is a user-facing change.
  • Constructors for commonly used components. Example: errors. If an Error() is constructed with a different argument, we know that error could show up differently.

The key for UI reviewers is the overall amount of text is changed, rather than layout.

So, despite not being able to catch everything that could be rendered, we will be able to deliver value by focusing on these areas.

Diffing the files

There are a few approaches to finding changes in the files. To start, we’ll use git again.

Before including expert advice, git diff was hallucinating changes that weren’t actually represented in the output. After the expert advice, we are seeing much more focused outputs from the LLM.

Our second prompt just needs to read the user-facing files that we already saved, and then compare them to main. After a bit of trial and error with git, we found that a combination of --no-color and --minimal was able to give the LLM enough context without consuming too many tokens for each change.

There’s a lot of buzz around generating and reading diffs with LLMs. The teams at Aider and Cursor are both focusing on fine-tuning LLM’s using speculative decoding:

For our use cases, however, we found that the average “big” PR at Docker was not a problem for traditional LLMs when given good advice and the right tools.

So here’s the second prompt:

## User-facing content

The following pieces of code are what we want to focus on for user-facing content:

Text Nodes in any JSX or TSX files.

Text Nodes in any HTML files.

Standard user-facing components. Example: Definitions/Constructors for notifications, toasts, and errors.

# prompt user

Can you tell me if there are any user-facing changes in {{branch}}?

To check out the full prompt here, use our Gist.

Output

Note: Some full paths, branches, and diffs are trimmed or replaced with placeholders for this article.

# ROLE assistant

```json
{
  "args": ["checkout", "branch"]
}
```

# ROLE tool (git)


Already on 'branch'
Your branch is up to date with 'origin/branch'.

# ROLE assistant

```json
{
  "path": "/thread/user-changes/files.txt"
}
```

*entering tool*

# ROLE tool (read-file)

path1
path2
path3
...rest of paths

# ROLE assistant

```json
{
  "args": ["diff", "main", "--no-color", "--minimal", path1 path2 path3...]
}
```

*entering tool*

# ROLE tool (git)

<Diff output truncated>
# ROLE assistant

The following user-facing changes were detected between the "main" branch and the "branch" branch:

1. In `File1.tsx` file, the text "Sign In" was modified to "Sign in".

2. In `File2.ts` file, titles "Databases & Storage", "Monitoring & Observability", "Developer Tools", "New Extensions", were changed to "Databases & storage", "Monitoring & observability", "Developer tools", "New extensions" respectively.

3. In `File3.rtl.test.tsx.snap` and `ImageHeader/index.tsx files`, the text "Compressed Size " was changed to "Compressed size".

4. In `File4.rtl.test.tsx` and `ImageDetail/index.tsx files`, a test-case previously checking for the existence of "Compressed Size" now looks for "Compressed size".

5. In `File5.tsx file`, the button text "Add Collaborator" has been changed to "Add collaborator".

6. In `route.tsx file`, the collaborators table title "Current Collaborators" was changed to "Current collaborators".

7. In `File6.tsx` file, the description was updated.

8. In `File7.tsx`, `RepositoryOverview.tsx`, `RepositoryReadMe.tsx`, `TagsAndScans.tsx` files, the description when there is no category or description was updated.

9. In `File8.rtl.test.tsx`, `DeleteRepo.tsx`, the text for confirming repo deletion and the delete button text was adjusted.

10. In `route.tsx file`, the filter label "Trusted Content" was changed to "Trusted content".

11. In `File9.tsx`, the text "Your Organizations" has been changed to "Your organizations".

12. In `Component/index.tsx`, the search group label "Trusted Content" has been changed to "Trusted content".

13. In `file10.test.ts` file, a test-case was updated to check for the new text "Sign in".
```

Overall, this was a positive result:

genai series 13 f7
Figure 7: Mark’s response “Damn, that’s nice output (assuming it’s accurate).”

Follow up:

genai series 13 f8
Figure 8: Mark’s response, “I checked, it’s accurate.”

Another example from a different PR:

The changes in the `branch` branch compared to `main` include several modifications that affect user-facing content, particularly in JSX and TSX files. Here is a summary of the key changes:

...

These changes primarily involve simplification and refactoring of the existing UI logic, possibly to streamline the user interaction by removing complexities related to feature flags and reducing the use of modals or conditional rendering for specific purchasing flows.

Try it yourself

Here is a markdown file that you can paste into VSCode to try these prompts on your own branch. In the last line, update my-branch to one of your local branches that you’d like to review: https://gist.github.com/ColinMcNeil/2e8f25e2d4092f3c7a0ce8992d2e197c#file-readme-md

Next steps

This is already a promising flow. For example, a tech writer could clone the git repo and run this prompt to inspect a branch for user-facing changes. From here, we might extend the functionality:

  • Allow user input for PR to review without knowing the branch or git needing to use git.
  • Automatic git clone & pull with auth.
  • Support for larger >15 files changed PR by allowing agents to automate their tasks.
  • “Baking” the final flow into CI/CD so that it can automatically assign reviewers to relevant PRs.

If you’re interested in running this prompt on your own repo or just want to follow along with the code, watch our new public repo and reach out. We also appreciate your GitHub Stars.

Everything we’ve discussed in this blog post is available for you to try out on your own projects. 

For more on what we’re doing at Docker, subscribe to our newsletter.

Learn more

Extending the Interaction Between AI Agents and Editors

Par : Docker Labs
18 novembre 2024 à 18:28

This ongoing Docker Labs GenAI series explores the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing software as open source so you can play, explore, and hack with us, too.

We recently met up with developers at GitHub Universe at the Docker booth. In addition to demonstrating the upcoming Docker agent extension for GitHub Copilot, we also hosted a “Hack with Docker Labs” session. 

2400x1260 docker labs genai

To facilitate these sessions, we created a VSCode extension to explore the relationship between agents and tools. We encouraged attendees to think about how agents can change how we interact with tools by mixing tool definitions (anything you can package in a Docker container) with prompts using a simple Markdown-based canvas.

Many of these sessions followed a simple pattern.

  • Choose a tool and describe what you want it to do.
  • Let the agent interact with that tool.
  • Ask the agent to explain what it did or adjust the strategy and try again.

It was great to facilitate these discussions and learn more about how agents are challenging us to interact with tools in new ways.  

Figure 1 shows a short example of a session where we generated a QR code (qrencode). We start by defining both a tool and a prompt in the Markdown file. Then, we pass control over to the agent and let it interact with that tool (the output from the agent pops up on the right-hand side).

Animated gif showing generation of QR code using tool and prompt in Markdown file.
Figure 1: Generating a QR code.

Feel free to create an issue in our repo if you want to learn more.

Editors

This year’s trip to GitHub Universe also felt like an opportunity to reflect on how developer workflows are changing with the introduction of coding assistants. Developers may have had language services in the editor for a long time now, but coding assistants that can predict the next most likely tokens have taught us something new. We were all writing more or less the same programs (Figure 2).

Diagram showing process flow to and from Editor, Language Service, and LLM.
Figure 2: Language service interaction.

Other agents

Tools like Cursor, GitHub Copilot Chat, and others are also teaching us new ways in which coding assistants are expanding beyond simple predictions. In this series, we’ve been highlighting tools that typically work in the background. Agents armed with these tools will track other kinds of issues, such as build problems, outdated dependencies, fixable linting violations, and security remediations.

Extending the previous diagram, we can imagine an updated picture where agents send diagnostics and propose code actions, while still offering chat interfaces for other kinds of user input (Figure 3). If the ecosystem has felt closed, get ready for it to open up to new kinds of custom agents.

Diagram showing language service interaction on the left, with extension to Agent and Chat on the right.
Figure 3: Agent extension.

More to come

In the next few posts, we’ll take this series in a new direction and look at how agents are able to use LSPs to interact with developers in new ways. An agent that represents background tasks, such as updating dependencies, or fixing linting violations, can now start to use language services and editors as tools! We think this will be a great way for agents to start helping developers better understand the changes they’re making and open up these platforms to input from new kinds of tools.

GitHub Universe was a great opportunity to check in with developers, and we were excited to learn how many more tools developers wanted to bring to their workflows. As always, to follow along with this effort, check out the GitHub repository for this project.

For more on what we’re doing at Docker, subscribe to our newsletter.

Learn more

Using AI Tools to Convert a PDF into Images

Par : Docker Labs
4 novembre 2024 à 14:01

This ongoing Docker Labs GenAI series explores the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing software as open source so you can play, explore, and hack with us, too.

How do you turn a PDF into a set of images? A modern suggestion could be to get Claude or GPT to write you a CLI command for something like ImageMagick. Let’s follow that train of thought.

2400x1260 docker labs genai

If we enter a prompt like “How do I turn a PDF into a bunch of images using ImageMagick,” we will indeed get something that looks like it might be a solution.

Imagemagick includes a command-line tool called convert which you can use to convert PDF files into images. Here's how you can do it:

1. Open your terminal

2. Change directory to where your PDF is located:

```
cd /path/to/your/pdf

```
3. Use the following command to convert the PDF into images (in this case JPEG):

```
convert -density 300 -quality 100 your_file.pdf output_name.jpg

``` 

The agent will then explain how convert works and finish with a suggestion to “please make sure that you have installed ImageMagick before running these commands.”

Instead of worrying about whether ImageMagick is installed, however, let’s use Docker as our tool provider and prompt it to go ahead and run the command. If you’re following along in this series, you’ll have seen that we are using Markdown files to mix together tools and prompts.  Here’s the first prompt we tried:

---
tools:
  - name: imagemagick
---
# prompt user

Use Imagemagick to convert the family.pdf file into a bunch of jpg images.

After executing this prompt, the LLM generated a tool call, which we executed in the Docker runtime, and it successfully converted family.pdf into nine .jpg files (my family.pdf file had nine pages). 

Figure 1 shows the flow from our VSCode Extension.

Animated VSCode workflow showing the process of converting PDFs to images.
Figure 1: Workflow from VSCode Extension.

We have given enough context to the LLM that it is able to plan a call to this ImageMagick binary. And, because this tool is available on Docker Hub, we don’t have to “make sure that ImageMagick is installed.” This would be the equivalent command if you were to use docker run directly:

# family.pdf must be located in your $PWD

docker run --rm -v $PWD:/project --workdir /project vonwig/imageMagick:latest convert -density 300 -quality 300 family.pdf family.jpg 

The tool ecosystem

How did this work? The process relied on two things:

  • Tool distribution and discovery (pulling tools into Docker Hub for distribution to our Docker Desktop runtime).
  • Automatic generation of Agent Tool interfaces.

When we first started this project, we expected that we’d begin with a small set of tools because the interface for each tool would take time to design. We thought we were going to need to bootstrap an ecosystem of tools that had been prepared to be used in these agent workflows. 

However, we learned that we can use a much more generic approach. Most tools already come with documentation, such as command-line help, examples, and man pages. Instead of treating each tool as something special, we are using an architecture where an agent responds to failures by reading documentation and trying again (Figure 2).

Illustration of circular process showing "Run tool" leading to "Capture errors" leading to "Read docs" in a continuous loop.
Figure 2: Agent process.

We see a process of experimenting with tools that is not unlike what we, as developers, do on the command line. Try a command line, read a doc, adjust the command line, and try again.

The value of this kind of looping has changed our expectations. Step one is simply pulling the tool into Docker Hub and seeing whether the agent can use it with nothing more than its out-of-the-box documentation. We are also pulling open source software (OSS)  tools directly from nixpkgs, which gives us access to tens of thousands of different tools to experiment with. 

Docker keeps our runtimes isolated from the host operating system, while the nixpkgs ecosystem and maintainers provide a rich source of OSS tools.

As expected, packaging agents still run into issues that force us to re-plan how tools are packaged. For example, the prompt we showed above might have generated the correct tool call on the first try, but the ImageMagick container failed on the first run with this terrible-looking error message:

function call failed call exited with non-zero code (1): Error: sh: 1: gs: not found  

Fortunately, feeding that error back into the LLM resulted in the suggestion that convert needs another tool, called Ghostscript, to run successfully. Our agent was not able to fix this automatically today. However, we adjusted the image build slightly and now the “latest” version of the vonwig/imagemagick:latest no longer has this issue. This is an example of something we only need to learn once.

The LLM figured out convert on its own. But its agency came from the addition of a tool.

Read the Docker Labs GenAI series to see more of what we’ve been working on.

Learn more

Using Docker AI Tools for Devs to Provide Context for Better Code Fixes

Par : Docker Labs
21 octobre 2024 à 13:00

This ongoing Docker Labs GenAI series explores the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real-time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing software as open source so you can play, explore, and hack with us, too.

At Docker Labs, we’ve been exploring how LLMs can connect different parts of the developer workflow, bridging gaps between tools and processes. A key insight is that LLMs excel at fixing code issues when they have the right context. To provide this context, we’ve developed a process that maps out the codebase using linting violations and the structure of top-level code blocks. 

By combining these elements, we teach the LLM to construct a comprehensive view of the code, enabling it to fix issues more effectively. By leveraging containerization, integrating these tools becomes much simpler.

2400x1260 docker labs genai

Previously, my linting process felt a bit disjointed. I’d introduce an error, run Pylint, and receive a message that was sometimes cryptic, forcing me to consult Pylint’s manual to understand the issue. When OpenAI released ChatGPT, the process improved slightly. I could run Pylint, and if I didn’t grasp an error message, I’d copy the code and the violation into GPT to get a better explanation. Sometimes, I’d ask it to fix the code and then manually paste the solution back into my editor.

However, this approach still required several manual steps: copying code, switching between applications, and integrating fixes. How might we improve this process?

Docker’s AI Tools for Devs prompt runner is an architecture that allows us to integrate tools like Pylint directly into the LLM’s workflow through containerization. By containerizing Pylint and creating prompts that the LLM can use to interact with it, we’ve developed a system where the LLM can access the necessary tools and context to help fix code issues effectively.

Understanding the cognitive architecture

For the LLM to assist effectively, it needs a structured way of accessing and processing information. In our setup, the LLM uses the Docker prompt runner to interact with containerized tools and the codebase. The project context is extracted using tools such as Pylint and Tree-sitter that run against the project. This context is then stored and managed, allowing the LLM to access it when needed.

By having access to the codebase, linting tools, and the context of previous prompts, the LLM can understand where problems are, what they are, and have the right code fragments to fix them. This setup replaces the manual process of finding issues and feeding them to the LLM with something automatic and more engaging.

Streamlining the workflow

Now, within my workflow, I can ask the assistant about code quality and violations directly. The assistant, powered by an LLM, has immediate access to a containerized Pylint tool and a database of my code through the Docker prompt runner. This integration allows the LLM to use tools to assist me directly during development, making the programming experience more efficient.

This approach helps us rethink how we interact with our tools. By enabling a conversational interface with tools that map code to issues, we’re exploring possibilities for a more intuitive development experience. Instead of manually finding problems and feeding them to an AI, we can convert our relationship with tools themselves to be conversational partners that can automatically detect issues, understand the context, and provide solutions.

Walking through the prompts

Our project is structured around a series of prompts that guide the LLM through the tasks it needs to perform. These prompts are stored in a Git repository and can be versioned, tracked, and shared. They form the backbone of the project, allowing the LLM to interact with tools and the codebase effectively. We automate this entire process using Docker and a series of prompts stored in a Git repository. Each prompt corresponds to a specific task in the workflow, and Docker containers ensure a consistent environment for running tools and scripts.

Workflow steps

An immediate and existential challenge we encountered was that this class of problem has a lot of opportunities to overwhelm the context of the LLM. Want to read a source code file? It has to be small enough to read. Need to work on more than one file? Your realistic limit is three to four files at once. To solve this, we can instruct the LLM to automate its own workflow with tools, where each step runs in a Docker container.

Again, each step in this workflow runs in a Docker container, which ensures a consistent and isolated environment for running tools and scripts. The first four steps prepare the agent to be able to extract the right context for fixing violations. Once the agent has the necessary context, the LLM can effectively fix the code issues in step 5.

1. Generate violations report using Pylint:

Run Pylint to produce a violation report.

2. Create a SQLite database:

Set up the database schema to store violation data and code snippets.

3. Generate and run INSERT statements:

  • Decouple violations from the range they represent.
  • Use a script to convert every violation and range from the report into SQL insert statements.
  • Run the statements against the database to populate it with the necessary data.

4. Index code in the database:

  • Generate an abstract syntax tree (AST) of the project with Tree-sitter (Figure 1).
Screenshot of syntax tree, showing files, with detailed look at Example .py.parsed.
Figure 1: Generating an abstract syntax tree.
  • Find all second-level nodes (Figure 2). In Python’s grammar, second-level nodes are statements inside of a module.
Expanded look at Example .py.parsed with highlighted statements.
Figure 2: Extracting content for the database.
  • Index these top-level ranges into the database.
  • Populate a new table to store the source code at these top-level ranges.

5. Fix violations based on context:

Once the agent has gathered and indexed the necessary context, use prompts to instruct the LLM to query the database and fix the code issues (Figure 3).

Illustration of instructions, for example, to "fix the violation "some violation" which occurs in file.py on line 1" with information on the function it occurs in.
Figure 3: Instructions for fixing violations.

Each step from 1 to 4 builds the foundation for step 5, where the LLM, with the proper context, can effectively fix violations. The structured preparation ensures that the LLM has all the information it needs to address code issues with precision.

Refining the context for LLM fixes

To understand how our system improves code fixes, let’s consider a specific violation flagged by Pylint. Say we receive a message that there’s a violation on line 60 of our code file block_listed_name.py:

{
  "type": "convention",
  "module": "block_listed_name",
  "obj": "do_front",
  "line": 60,
  "column": 4,
  "endLine": 60,
  "endColumn": 7,
  "path": "cloned_repo/naming_conventions/block_listed_name.py",
  "symbol": "disallowed-name",
  "message": "Disallowed name \"foo\"",
  "message-id": "C0104"
}

From this Pylint violation, we know that the variable foo is a disallowed name. However, if we tried to ask the LLM to fix this issue based solely on this snippet of information, the response wouldn’t be as effective. Why? The LLM lacks the surrounding context — the full picture of the function in which this violation occurs.

This is where indexing the codebase becomes essential

Because we’ve mapped out the codebase, we can now ask the LLM to query the index and retrieve the surrounding code that includes the do_front function. The LLM can even generate the SQL query for us, thanks to its knowledge of the database schema. Once we’ve retrieved the full function definition, the LLM can work with a more complete view of the problem:

def do_front(front_filename, back_filename):
   """
   Front strategy: loop over front image,
   detect blue pixels there,
   substitute in pixels from back.
   Return changed front image.
   """
   foo = SimpleImage(front_filename)
   back = SimpleImage(back_filename)
   for y in range(foo.height):xc
       for x in range(foo.width):
           pixel = foo.get_pixel(x, y)
           # Detect blue pixels in front and replace with back pixels
           if pixel[2] > 2 * max(pixel[0], pixel[1]):
               back_pixel = back.get_pixel(x, y)
               foo.set_pixel(x, y, back_pixel)
   return foo

Now that the LLM can see the whole function, it’s able to propose a more meaningful fix. Here’s what it suggests after querying the indexed codebase and running the fix:

def do_front(front_filename, back_filename):
 """
    Front strategy: loop over front image,
 detect blue pixels there,
    substitute in pixels from back.
    Return changed front image.
    """
    front_image = SimpleImage(front)
    back_image = SimpleImage(back_filename)
    for y in range(front_image.height):
        for x in range(front_image.width            pixel = front_image.get_pixel(x y)
            # Detect blue pixels in front and replace with back pixels
            if pixel[2 >  * max(pixel[0 pixel[1]):
                back_pixel = back_image.get_pixel(x,)
                front_image.set_pixel(x,, back_pixel)
    return front_image

Here, the variable foo has been replaced with the more descriptive front_image, making the code more readable and understandable. The key step was providing the LLM with the correct level of detail — the top-level range — instead of just a single line or violation message. With the right context, the LLM’s ability to fix code becomes much more effective, which ultimately streamlines the development process.

Remember, all of this information is retrieved and indexed by the LLM itself through the prompts we’ve set up. Through this series of prompts, we’ve reached a point where the assistant has a comprehensive understanding of the codebase. 

At this stage, not only can I ask for a fix, but I can even ask questions like “what’s the violation at line 60 in naming_conventions/block_listed_name.py?” and the assistant responds with:

On line 60 of naming_conventions/block_listed_name.py, there's a violation: Disallowed name 'foo'. The variable name 'foo' is discouraged because it doesn't convey meaningful information about its purpose.

Although Pylint has been our focus here, this approach points to a new conversational way to interact with many tools that map code to issues. By integrating LLMs with containerized tools through architectures like the Docker prompt runner, we can enhance various aspects of the development workflow.

We’ve learned that combining tool integration, cognitive preparation of the LLM, and a seamless workflow can significantly improve the development experience. This integration allows an LLM to use tools to directly help while developing, and while Pylint has been the focus here, this also points to a new conversational way to interact with many tools that map code to issues.

To follow along with this effort, check out the GitHub repository for this project.

For more on what we’re doing at Docker, subscribe to our newsletter.

Learn more

Using an AI Assistant to Script Tools

Par : Docker Labs
7 octobre 2024 à 13:48

This ongoing Docker Labs GenAI series explores the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing software as open source so you can play, explore, and hack with us, too.

LLMs are now quite good at transforming data. For example, we were recently working with some data generated by the Pylint tool. This tool generates big arrays of code violations. 

2400x1260 docker labs genai

Here’s an example showing the kind of data that gets returned.

[
    {
        "type": "convention",
        "module": "app",
        "line": 1,
        "column": 0,
        "endLine": 1,
        "endColumn": 13,
        "path": "src/app.py",
        "symbol": "missing-module-docstring",
        "message": "Missing module docstring",
        "message-id": "C0114"
    },
    {
       ...
    },
    ...
]

During this session with our AI assistant, we decided that it would be helpful to create a database and insert the data to make it easier for the AI to analyze (LLMs are very good at writing SQL). As is now our habit, we wrote a quick prompt to see if the assistant could generate the SQL:

1. Read the json data from /thread/violations.json
2. For each element in the array, transform each element into two SQL INSERT statements.

* The first should insert columns PATH, START_LINE, END_LINE, START_COLUMN, END_COLUMN into a table named RANGES.
* The second should insert columns MESSAGE, TYPE, RANGE, and VIOLATION_ID into a table named VIOLATIONS.

LLMs are obviously good at tasks of this kind, and this was no exception. Our prompt engine had been augmented with a function to read local files but, besides that, this was a pretty straightforward prompt (we used GPT-3.5). The LLM responded with the correct INSERT statements.

INSERT INTO RANGES (PATH, START_LINE, END_LINE, START_COLUMN, END_COLUMN)
VALUES ('src/app.py', 1, 1, 0, 13);
INSERT INTO VIOLATIONS (MESSAGE, TYPE, RANGE, VIOLATION_ID)
VALUES ('Missing module docstring', 'convention', (SELECT ID FROM RANGES WHERE PATH='src/app.py' AND START_LINE=1 AND START_COLUMN=0), 'C0114');
...
...

We’re starting to get accustomed to this kind of capability, so the result wasn’t too surprising. However, what about the context window here? It really doesn’t make a lot of sense to pass all of this data to the LLM, especially if this is a task that we’ll need to do continuously. Plus, this is really not how a programmer would have solved the problem.

Programmers would write programs.

So, instead of asking the LLM to do a thing, we should try asking it to write a program to do that same thing. Starting with the same prompt, let’s prefix it with “Please write a JavaScript program to …” In other words, let’s ask the LLM to describe how it would accomplish the task in JavaScript, and hopefully automate itself out of the loop. Here’s the adjusted prompt:

Write a javascript function that reads json from a file named `/thread/violations.json`.

For each element in the array, transform each element into two SQL INSERT statements.

* The first should insert columns PATH, START_LINE, END_LINE, START_COLUMN, END_COLUMN into a table named RANGES.
* The second should insert columns MESSAGE, TYPE, RANGE, and VIOLATION_ID into a table named VIOLATIONS.

and write them to the file /thread/insert.sql

Now execute the javascript code in a container.

Tool code

Current LLMs can do tasks of this kind. Here’s a sample of the output generated by GPT-3.5. A quick scan will convince many of you that this is probably going to do the trick. However, if our prompt engine is already running in Docker, we might as well just run it:

const fs = require('fs');
const crypto = require('crypto');

function getRandomId() {
    return crypto.randomBytes(16).toString('hex');
}

function escapeString(str) {
    return str.replace(/'/g, \\\"''\\\");
}

let data = JSON.parse(fs.readFileSync('/thread/violations.json', 'utf8'));
let sqlStatements = '';

data.forEach((violation) => {
    let id = getRandomId();
    sqlStatements += `INSERT INTO RANGES(ID, PATH, START_LINE, END_LINE, START_COLUMN, END_COLUMN) VALUES('${id}', '${escapeString(violation.path)}', ${violation.line}, ${violation.endLine}, ${violation.column}, ${violation.endColumn});\\\\n`;
    sqlStatements += `INSERT INTO VIOLATIONS(MESSAGE, TYPE, RANGE) VALUES('${escapeString(violation.message)}', '${escapeString(violation.type)}', '${id}');\\\\n`;
    });

fs.writeFileSync('/thread/insert.sql', sqlStatements);

Running the tool

You may not have noticed but at the end of our prompt, we added a final instruction: “Now execute the JavaScript code in a container”. This is a nice addition to our session, because it means we get to see the results.

This is also where tool calling comes back into the picture. To give our AI the capacity to try running the program that it has just written, we have defined a new function to create an isolated runtime sandbox for trying out our new tool.

Here’s the agent’s new tool definition:

tools:
  - name: run-javascript-sandbox
    description: execute javascript code in a container
    parameters:
      type: object
      properties:
        javascript:
          type: string
          description: the javascript code to run
    container:
      image: vonwig/javascript-runner
      command:
        - "{{javascript|safe}}"

We’ve asked the AI assistant to generate a tool from a description of that tool. As long as the description of the tools doesn’t change, the workflow won’t have to go back to the AI to ask it to build a new tool version.

The role of Docker in this pattern is to create the sandbox for this code to run. This function really doesn’t need much of a runtime, so we give it a pretty small sandbox.

  • No access to a network.
  • No access to the host file system (does have access to isolated volumes for sharing data between tools).
  • No access to GPU.
  • Almost no access to software besides the Node.js runtime (no shell for example).

The ability for one tool to create another tool is not just a trick. It has very practical implications for the kinds of workflows that we can build up because it gives us a way for us to control the volume of data sent to LLMs, and it gives the assistant a way to “automate” itself out of the loop.

Next steps

This example was a bit abstract but in our next post, we will describe the practical scenarios that have driven us to look at this idea of prompts generating new tools. Most of the workflows we’re exploring are still just off-the-shelf tools like Pylint, SQLite, and tree_sitter (which we embed using Docker, of course!). For example:

  1. Use pylint to extract violations from my codebase.
  2. Transform the violations into SQL and then send that to a new SQLite.
  3. Find the most common violations of type error and show me the top level code blocks containing them.

However, you’ll also see that part of being able to author workflows of this kind is being able to recognize when you just need to add a custom tool to the mix.

Read the Docker Labs GenAI series to see more of what we’ve been working on.

Learn more

Using an AI Assistant to Read Tool Documentation

Par : Docker Labs
23 septembre 2024 à 13:41

This ongoing Docker Labs GenAI series explores the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real-time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing software as open source so you can play, explore, and hack with us, too.

Using new tools on the command line can be frustrating. Even if we are confident that we’ve found the right tool, we might not know how to use it.

Telling an agent to RT(F)M

A typical workflow might look something like the following.

  • Install tool.
  • Read the documentation.
  • Run the command.
  • Repeat.

Can we improve this flow using LLMs?

2400x1260 docker labs genai

Install tool

Docker provides us with isolated environments to run tools. Instead of requiring that commands be installed, we have created minimal Docker images for each tool so that using the tool does not impact the host system. Leave no trace, so to speak.

Read the documentation

Man pages are one of the ways that authors of tools ship content about how to use that tool. This content also comes with standard retrieval mechanisms (the man tool). A tool might also support a command-line option like --help. Let’s start with the idealistic notion that we should be able to retrieve usage information from the tool itself.

In this experiment, we’ve created two entry points for each tool. The first entry point is the obvious one. It is a set of arguments passed directly to a command-line program. The OpenAI-compatible description that we generate for this entry point is shown below. We are using the same interface for every tool.

{"name": "run_my_tool",
   "description": "Run the my_tool command.",
   "parameters":
   {"type": "object",
    "properties":
    {"args":
     {"type": "string",
      "description": "The arguments to pass to my_tool"}}},
   "container": {"image": "namespace/my_tool:latest"}}

The second entrypoint gives the agent the ability to read the man page and, hopefully, improve its ability to run the first entrypoint. The second entrypoint is simpler, because it only does one thing (asks a tool how to use it).

{"name": "my_tool_manual",
   "description": "Read the man page for my_tool",
   "container": {"image": "namespace/my_tool:latest", "command": ["man"]}}

Run the command

Let’s start with a simple example. We want to use a tool called qrencode to generate a QR code for a link. We have used our image generation pipeline to package this tool into a minimal image for qrencode. We will now pass this prompt to a few different LLMs; we are using LLMs that have been trained for tool calling (e.g., GPT 4, Llama 3.1, and Mistral). Here’s the prompt that we are testing:

Generate a QR code for the content https://github.com/docker/labs-ai-tools-for-devs/blob/main/prompts/qrencode/README.md. Save the generated image to qrcode.png.
If the command fails, read the man page and try again.

Note the optimism in this prompt. Because it’s hard to predict what different LLMs have already seen in their training sets, and many command-line tools use common names for arguments, it’s interesting to see what LLM will infer before adding the man page to the context.

The output of the prompt is shown below. Grab your phone and check it out.

Black and white QR code generated by AI assistant.
Figure 1: Content QR code generated by AI assistant.

Repeat

When an LLM generates a description of how to run something, it will usually format that output in such a way that it will be easy for a user to cut and paste the response into a terminal:

qrencode -o qrcode.png 'my content'

However, if the LLM is generating tool calls, we’ll see output that is instead formatted to be easier to run:

[{"function": {"arguments": "{
  \"args\": \"-o qrcode.png 'my content'\"
}"
               "name": "qrencode"}
  "id": "call_Vdw2gDFMAwaEUMgxLYBTg8MB"}]

We respond to this by spinning up a Docker container.

Running the tool as part of the conversation loop is useful even when the command fails. In Unix, there are standard ways to communicate failures. For example, we have exit codes, and stderr streams. This is how tools create feedback loops and correct our behavior while we’re iterating at the terminal. This same mechanism can be used in a conversation loop involving an LLM.

To illustrate, here is another simple example. We’ll try running a tool that writes ASCII art with the following prompt.

Write ascii art with the message "Docker" using the font "helvetica".  Read the man page for figlet if you detect any errors and then try again.

In our test, this did fail. However, it also described the apparent issue on the stderr output stream.

Error: error: could not load font helvetica

By including this message in the conversation loop, the assistant can suggest different courses of action. Different LLMs produced different results here. For example, Llama 3.1 gives instructions for how to install the missing font. On the other hand, GPT 4 re-ran the command, but only after having made the “executive” decision to try a different font.

I'm sorry, it seems the font Helvetica is not available. Let's try it with the default font.

Here is the output of the figlet command with the default font (term): 

 ____             _             
|   \    _       | | _____ _ 
| | | |/  \ / _| |/ /  \ '_|
| |_| | (_) | (__|   <  __/ |   
|____/ \___/ \___|_|\_\___|_|

We are very early in understanding how to take advantage of this apparent capacity to try different approaches. But this is another reason why quarantining these tools in Docker containers is useful. It limits their blast radius while we encourage experimentation.

Results

We started by creating a pipeline to produce minimal Docker images for each tool. The set of tools was selected based on whether they have outputs useful for developer-facing workflows. We continue to add new tools as we think of new use cases. The initial set is listed below.

gh pylint commitlint scalafix gitlint yamllint checkmake gqlint sqlint golint golangci-lint hadolint markdownlint-cli2 cargo-toml-lint ruff dockle clj-kondo selene tflint rslint yapf puppet-lint oxlint kube-linter csslint cpplint ansible-lint actionlint black checkov jfmt datefmt rustfmt cbfmt yamlfmt whatstyle rufo fnlfmt shfmt zprint jet typos docker-ls nerdctl diffoci dive kompose git-test kubectl fastly infracost sops curl fzf ffmpeg babl unzip jq graphviz pstree figlet toilet tldr qrencode clippy go-tools ripgrep awscli2 azure-cli luaformatter nixpkgs-lint hclfmt fop dnstracer undocker dockfmt fixup_yarn_lock github-runner swiftformat swiftlint nix-linter go-critic regal textlint formatjson5 commitmsgfmt

There was a set of initial problems with context extraction.

Missing manual pages

Only about 60% of the tools we selected have man pages. However, even in those cases, there are usually other ways to get help content. The following steps show the final procedure we used:

  • Try to run the man page.
  • Try to run the tool with the argument --help.
  • Try to run the tool with the argument -h.
  • Try to run the tool with --broken args and then read stderr.

Using this procedure, every tool in the list above eventually produced documentation.

Long manual pages

Limited context lengths impacted some of the longer manual pages, so it was still necessary to employ standard RAG techniques to summarize verbose man pages. Our tactic was to focus on descriptions of command-line arguments and sections that had sample usage. These had the largest impact on the quality of the agent’s output. The structure of Unix man pages helped with the chunking, because we were able to rely on standard sections to chunk the content.

Subcommands

For a small set of tools, it was necessary to traverse a tree of help menus. However, these were all relatively popular tools, and the LLMs we deployed already knew about this command structure. It’s easy to check this out for yourself. Ask an LLM, for example: “What are the subcommands of Git?” or “What are the subcommands of Docker?” Maybe only popular tools get big enough that they start to be broken up into subcommands.

Summary

We should consider the active role that agents can play when determining how to use a tool. The Unix model has given us standards such as man pages, stderr streams, and exit codes, and we can take advantage of these conventions when asking an assistant to learn a tool. Beyond distribution, Docker also provides us with process isolation, which is useful when creating environments for safe exploration.

Whether or not an AI can successfully generate tool calls may also become a metric for whether or not a tool has been well documented.

To follow along with this effort, check out the GitHub repository for this project.

Learn more

Getting Started with the Labs AI Tools for Devs Docker Desktop Extension

Par : Docker Labs
9 septembre 2024 à 16:32

This ongoing Docker Labs GenAI series explores the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real-time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing software as open source so you can play, explore, and hack with us, too.

We’ve released a simple way to run AI tools in Docker Desktop. With the Labs AI Tools for Devs Docker Desktop extension, people who want a simple way to run prompts can easily get started. 

If you’re a prompt author, this approach also allows you to build, run, and share your prompts more easily. Here’s how you can get started.

2400x1260 docker labs genai

Get the extension

You can download the extension from Docker Hub. Once it’s installed, enter an OpenAI key.

Import a project

With our approach, the information a prompt needs should be extractable from a project. Add projects here that you want to run SDLC tools inside (Figure 1).

Screenshot showing blue "Add project" button.
Figure 1: Add projects.

Inputting prompts

A prompt can be a git ref or a git URL, which will convert to a ref. You can also import your own local prompt files, which allows you to quickly iterate on building custom prompts.

Sample prompts

(copy + paste the ref)

ToolGit RefLinkDescription
Dockergithub.com:docker/labs-ai-tools-for-devs?ref=main&path=prompts/dockerhttps://github.com/docker/labs-ai-tools-for-devs/tree/main/prompts/dockerGenerates a runbook for any Docker project
Dockerfilesgithub.com:docker/labs-ai-tools-for-devs?ref=main&path=prompts/dockerfileshttps://github.com/docker/labs-ai-tools-for-devs/tree/main/prompts/dockerfilesGenerate multi-stage Dockerfiles for NPM projects
Lazy Dockergithub.com:docker/labs-ai-tools-for-devs?ref=main&path=prompts/lazy_dockerhttps://github.com/docker/labs-ai-tools-for-devs/tree/main/prompts/lazy_dockerGenerates a runbook for Lazy Docker
NPMgithub.com:docker/labs-ai-tools-for-devs?ref=main&path=prompts/npmhttps://github.com/docker/labs-ai-tools-for-devs/tree/main/prompts/npmResponds with helpful information about NPM projects
ESLintgithub.com:docker/labs-ai-tools-for-devs?ref=main&path=prompts/eslinthttps://github.com/docker/labs-ai-tools-for-devs/tree/main/prompts/eslintRuns ESLint in your project
ESLint Fixgithub.com:docker/labs-ai-tools-for-devs?ref=main&path=prompts/eslint_fixhttps://github.com/docker/labs-ai-tools-for-devs/tree/main/prompts/eslint_fixRuns ESLint in your project and responds with a fix for the first violation it finds
Pylintgithub.com:docker/labs-ai-tools-for-devs?ref=main&path=prompts/pylinthttps://github.com/docker/labs-ai-tools-for-devs/tree/main/prompts/pylintRuns Pylint in your project, and responds with a fix for the first violation it finds
Screenshot showing blue "Add local prompt" button next to text box in which to enter GitHub ref or URL
Figure 2: Enter a GitHub ref or URL.

Writing and testing your own prompt

Create a prompt file

A prompt file is a markdown file. Here’s an example: prompt.md

# prompt system
You are an assistant who can write comedic monologs in the style of Stephen Colbert.

# prompt user
Tell me about my project.

Now, we need to add information about the project. Doing so is done with mustache templates:

# prompt system
You are an assistant who can write comedic monologues in the style of Stephen Colbert.

# prompt user
Tell me about my project. 

My project uses the following languages:
{{project.languages}}

My project has the following files:
{{project.files}}

Leverage tools in your project

Just like extractors, which can be used to render prompts, we define tools in the form of Docker images. A function image follows the same spec as extractors but in reverse. 

  • The Docker image is automatically bind-mounted to the project.
  • The Docker image entry point is automatically run within the project using –workdir.
  • The first argument will be a JSON payload. This payload is generated when the LLM tries to call our function.
- name: write_files
  description: Write a set of files to my project
  parameters:
    type: object
    properties:
      	files:
        type: array
        items:
          type: object
          properties:
            path:
              type: string
              description: the relative path to the file that should be written
            content:
              type: string
              description: the content that should be written to a file
            executable:
              type: boolean
              description: whether to make the file executable
  container:
    image: vonwig/function_write_files:latest

Test your prompt

  1. Add the file to a Git repository and push to a public remote.
  2. Paste the URL to reference the file on GitHub.

Alternatively, import a local prompt and select the file on your computer.

Screenshot showing text box for entering the URL for the folder.
Figure 3: Add the URL for the folder.

3. Run.

## ROLE assistant

Don't even get me started on the files, I mean, have you ever sat down and really looked at a list of files? This project has got more layers than that seven-layer bean dip I had at last weekend's potluck. This project isn't just files on files, its files within files, its dot something after dot something else – and before you ask: Yes, all of these are REQUIRED!

Coming down to Dockerfile. Now, I've seen some Dockerfiles but our Dockerfile, folks, it's something else. It lifts, it grinds, it effectively orchestrates our code like a veteran conductor at the symphony. We also have multiple templates because who doesn't love a good template, right?

Oh, and did I mention the walkthroughs and the resources? Let's just say this isn't a "teach a man to fish" situation. This is more of a “teach a man to create an entire fishing corporation” scenario. Now THAT'S dedication.

Finally we've got the main.js, and let's be real, is it even a project without a main.js anymore?

As always, feel free to follow along in our new public repo. Everything we’ve discussed in this blog post is available for you to try out on your own projects.

For more on what we’re doing at Docker, subscribe to our newsletter.

Learn more

How to Fix ESLint Violations with AI Assistance

Par : Docker Labs
26 août 2024 à 13:10

This ongoing Docker Labs GenAI series will explore the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing things as open source so you can play, explore, and hack with us, too.

In a previous article in this series, we discussed linters, and we talked about running ESLint with an AI assistant. In this post, we will cover AI-assisted violation resolutions. 

2400x1260 docker labs genai

Before an AI assistant can take action on violations via the editor or Git (Figure 1), we first need to answer the question: Can an LLM generate resolutions for ESLint violations with the right knowledge and tools?

 Illustration that places this article in the context of using AI to fix violations found by ESLint.
Figure 1: Fixing ESLint violations.
Fix any lint violations you can in my project.

ESLint violations

To resolve a violation, the LLM needs information about the violation from the linter, and some context about the source code causing it. A level of human supervision may also be necessary. We want to optimize to shrink context, because models both limit context windows and yield better results with less context. This makes us wonder: “What is the minimum amount of context and supervision necessary to resolve a lint violation?”

For example, without human instruction, our assistant can’t know how to balance our parentheses:

// export const dockerfileRegex = new RegExp('(.*\\.?Dockerfile', 'i');
// Goal: new RegExp('(.*)\\.?Dockerfile', 'i');

Your `dockerfileRegex` regular expression is currently unterminated. This means the group you started at `(.*\\.?Dockerfile` doesn't have an ending bracket. Here is a fixed version of it:
```typescript
export const dockerfileRegex = new RegExp('(.*\\.?Dockerfile)', 'i');
```
I added a ")" to close the group.

Focusing on certain violations

We want to narrow the list of ESLint rules to violations the LLM can help with.

In Figure 2, we have compiled every ESLint rule that:

  • Is recommended
  • Is not auto-fixable by ESLint itself

And we want rules that:

  • Do not require supervision to make reasonable fixes.
Text list showing Context and Supervision requirements (for example, "Do we need source code to make the fix?") to help identify groups of violations.
Screenshot showing text and a table specifying rules for violations that LLM can help with, including Violation, Context, and Supervision.
Figure 2: Violations that the LLM can help with.

This context yields four total groups of violations:

Group 1 (No context needed from the surrounding source code or supervision needed.) Scripts can be applied to fix these violations.

Group 2 (No context needed from the surrounding source code, but these might be mistakes that necessitate the LLM to evaluate some of the code.) LLM can generate scripts to resolve these violations, but we would be assuming violations are mistakes. There are some violations that the LLM can evaluate on its own.

Group 3 (Context is needed from surrounding source code, but no supervision is necessary.) This is the best opportunity for our LLM to provide help. We can use tools to pull the right context in.

Group 4 (Context is needed from surrounding source code and supervision might be needed.)

  • Invalid Regex
  • Unsafe optional chaining
  • Constant condition

Depends a lot on the exact situation, but LLMs may be useful?

Thankfully, nearly all the violations could have reasonable fixes applied without supervision. These are the violations that we will focus on.

Initial prompts

First, we create the prompts to attempt to fix ESLint violations.

You are an AI assistant who specializes in resolving lint violations in projects. Use the tools available to quickly take action and be very brief.
1. Run linter.
2. Evaluate total violations.
 // What to do?

Unfortunately, we run into a roadblock when it comes to learning about lint violations from our ESLint tool. When using summary output, we don’t have enough information to know what we’re fixing. However, when using JSON output, we found that as few as 100 violations in a project caused ESLint to send over 10,000 characters over standard out. That would be a problem since many models at current limit us to 4-8k tokens. We need a way to persist this large output but without consuming tokens.

Artifacts

While we want to use ESLint, it can easily balloon an AI conversation past the model’s context size. To address this, we modified the ESLint tool to write eslint.json to a new ephemeral Docker volume. Reading and writing to this volume serves as another conversation between tools, but sandboxed away from the LLM’s context-sensitive function call output. 

We added tools to consume the ESLint artifact in different ways depending on the number of violations summarized.

- name: parse_lint_results
  description: Loads lint violations grouped by type.
  parameters:
    type: object
      properties:
        outputLevel:
   	  type: string
   	  description: Supports condensed or complaints
  container:
    image: vonwig/read_eslint

Now we can use this tool to finish our prompt.

You are an AI assistant who specializes in resolving lint violations in projects. Use the tools available to quickly take action and be very brief.
1. Run lint. // Will write /thread/eslint.json
2. Evaluate total violations. // Will read /thread/eslint.json
<10 violations: Parse output with complaints output.
10+ violations: Parse output with condensed output.
3. Fix the violations using the following steps:
## Condensed:
{>fixing}

A user prompt of:

1. Get the line Use read_eslint with the `path` arg to get all of the violations for a file.
2. Make the correction Respond in the following format:

```json
{
"start": [1, 4],
"end": [2, 4],
"edit": "Lorem ipsum"
}
```

Once you have fixed one file, move on to the next.
You are able to fix the following violations:
no-async-promise-executor
... 

yields:

The violation is located in the file src/utils/promiseUtils.ts at line 30-31. The current code is:
```javascript
return new Promise(async (resolve, reject) => {
  ...
});
```

To resolve the no-async-promise-executor violation, remove the async keyword from the `Promise` executor function. The corrected code would look like:

```javascript
return new Promise((resolve, reject) => {
  ...
});
```

Unfortunately, we’re still not seeing a good result. Without the violating context from the code, again, our assistant can’t provide an actionable fix. So we need to make a tool to get that.

Tree Sitter

Without the right tools, the LLM is, at best, vague and helpful — and at worse quite liable to hallucinate. We need tools to let the LLM be actionable here. Tree-sitter is a popular open source tool we can use to parse a variety of languages into abstract syntax trees (or ASTs). 

The powerful part of Tree-sitter that we want to take advantage of is the community of grammar authors providing support for almost any language our assistant could encounter. We have Dockerized this tool to get context given a file path and line number.

Results

We have trimmed debug finish responses from the payload but kept function calls.

## ROLE assistant
...({:function {:name run_lint, :arguments {}}, :id call_MaK7CGpOISktz1fZP3GS7m4H})

## ROLE assistant
.........................({:function {:name run-eslint, :arguments {
  "args": ".",
"version": 8,
"outputLevel": "summary"
}}, :id call_RHX2hztWvghxbyZx5cvSXgMU})

## ROLE tool (run-eslint)
Running npx with args: eslint@8 --format json .
Peparing output
Found 41 violations in 271 files.

## ROLE tool (run_lint)
I ran ESLint (version 8) on your entire project. Here are the results:

- 271 files were checked for violations
- Out of these, 41 files had one or more violations

## ROLE assistant
.............({:function {:name parse_lint_results, :arguments {
  "outputLevel": "condensed"
}}, :id call_sYHP8tXuKbXGaXdNCdO3W5kU})

## ROLE tool (parse_lint_results)
{   "@typescript-eslint/no-floating-promises": [ "src/commands/help.ts",
"src/commands/registerWorkspaceCommand.ts"   ],   "@typescript-eslint/no-explicit-any": [ "src/commands/registerLocalCommand.ts"   ],   "@typescript-eslint/naming-convention": [ "src/constants.ts"   ],   "eqeqeq": [ "src/debugging/DockerDebugConfigurationProvider.ts", "src/debugging/DockerServerReadyAction.ts", "src/runtimes/files/ContainerFilesProvider.ts", "src/tree/LocalRootTreeItemBase.ts"   ],   "@typescript-eslint/no-unused-vars": [ "src/extension.ts", "src/scaffolding/wizard/netCore/NetCoreGatherInformationStep.ts"   ],   "@typescript-eslint/semi": [ "src/utils/registryExperience.ts"   ] }

## ROLE assistant
...............({:function {:name violations_for_file, :arguments {
  "path": "src/commands/help.ts"
}}, :id call_fAN5kTrXndaNs7Ix4iH51E4T})

## ROLE tool (violations_for_file)
Getting eslint.json for path: src/commands/help.ts
ESLint violations for src/commands/help.ts:
{ "filePath": "src/commands/help.ts", "messages": [ { "ruleId": "@typescript-eslint/no-floating-promises", "severity": 1, "message": "Promises must be awaited, end with a call to .catch, end with a call to .then with a rejection handler or be explicitly marked as ignored with the void operator.", "line": 35, "column": 5, "nodeType": "ExpressionStatement", "messageId": "floatingVoid", "endLine": 35, "endColumn": 100] }

## ROLE assistant
......................({:function {:name run_tree_sitter, :arguments {
  "path": "src/commands/help.ts",
  "line": 35
}}, :id call_cKnNRQs5hmMsTszTfDnwfK12})

## ROLE tool (run_tree_sitter)
{
  offending_line: '}',
  line_node: SyntaxNode {
type: },
startPosition: {row: 35, column: 0},
endPosition: {row: 35, column: 1},
childCount: 0,
  },
  parent: SetNode {
type: set,
startPosition: {row: 33, column: 45},
endPosition: {row: 35, column: 1},
childCount: 4,
  },
  parent_text: '{\n' +
" vscode.env.openExternal(vscode.Uri.parse('https://github.com/microsoft/vscode-docker/issues'));\n" +
'}'
}

## ROLE assistant
The violation on line 35 of the `src/commands/help.ts` file is due to a floating promise. To resolve this, add the void operator to the beginning of the line as below:
```js
void vscode.env.openExternal(vscode.Uri.parse('https://github.com/microsoft/vscode-docker/issues'));
```.

This result demonstrates that the LLM can orchestrate a small code improvement using the right tools.

Failures

Unfortunately, it terminates after coming to the end of the first violation rather than continuing on to the next. This seems to be difficult to change, even trying two other combinations of prompts failed to get a loop going.

Fix any lint violations you can using the tools provided.
After resolving one violation, move onto the next.
Once you have fixed one file, move on to the next.

Successful prompt

However, what we did find was that specifying the lint violation to focus on did work:

Please only fix no-unused-vars

yields:

## ROLE assistant
The violation in the file "src/extension.ts" is a declared variable 'unuseVar' at line 41, which is not used anywhere. The fix would be the removal of that line..nil

Figure 3 lists a few interesting violations we tested.

Screenshot of a table showing results for violations tested, with columns for Violation, Result, and Feedback.
Figure 3: Violations tested.

Overall, we can see that our assistant, when given the right tools to find and fix violations, can make good code change suggestions for even difficult lint violations. We welcome you to check out the code and test with different violations that we didn’t cover.

Summary

We continue to investigate the use of tools for gathering context and improving suggestions. In this installment, we have looked at how AI assistants can provide significant value to developers by:

  • Using a linter to learn about violations in a TS project without Node, NPM, or ESLint.
  • Leveraging an ephemeral volume to store large context without using tokens.
  • Using Tree_sitter to load precise, relevant code context from a project.
  • Generating fixes for violations in a TS project.

As always, feel free to follow along in our new public repo and please reach out. Everything we’ve discussed in this blog post is available for you to try out on your own projects.

Learn more

Adding the ESLint Tool to an AI Assistant: Improving Recommendations for JS/TS Projects

Par : Docker Labs
12 août 2024 à 13:04

This ongoing Docker Labs GenAI series will explore the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real-time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing things as open source so you can play, explore, and hack with us, too.

Can an AI assistant help you write better JavaScript or TypeScript?

Background and introduction

Projects that heavily rely on JavaScript (JS) or TypeScript (TS) are synonymous with the web, so there is a high demand for tools that can improve the consistency and quality of projects using these languages. In previous Docker Labs GenAI posts, we’ve introduced the idea that tools both enable AI assistants to understand our code more and enable them to take action based on this understanding.

In this article, we’ll be trying to enable our AI assistant to provide advice that is both helpful and actionable for linting JS/TS projects and to finally delve into the NPM ecosystem.

2400x1260 docker labs genai

Another simple prompt

As we learned in this previous Docker Labs GenAI article, you won’t get much help asking an LLM to tell you how to lint your project without any details. So, like before, we’re using our “linguist” tool to learn about the languages used in the project and augment the prompt (Figure 1):

How do I lint my project?
{{# linguist }}
This project contains code from the language {{ language }} so if you have any 
recommendations pertaining to {{ language }}, please include them.
{{/linguist}}

What LLMs provide out of the box

Exchange between developer and AI assistant, asking "How do I lint my project?" The response suggests using ESLint.
Figure 1: AI assistant responds with information about ESLint.

In Figure 2, we see that GPT-4 recognizes that ESLint is highly configurable and actually doesn’t work without a config, and so it is trying to provide that for us by either helping us run ESLint’s init tool or by writing a config to use.

Screenshot of AI response providing details for setting up, running, and configuring ESLint.
Figure 2: AI assistant provides information for setting up and running ESLint.

However, this response gives us either a config that does not work for many projects, or a boilerplate setup task for the user to do manually. This is in contrast with other linters, like Pylint or golangci-lint, where linguist was actually enough for the LLM to find a clear path to linting. So, with ESLint, we need to add more knowledge to help the LLM figure this out.

Configuring ESLint

Using StandardJS

StandardJS is a community-led effort to simplify ESLint configurations. Let’s start by nudging the assistant toward using this as a starting point. The ESLint config is published under its own package, StandardJS, so we can add the following prompt:

If there are no ESLint configuration files found, use StandardJS to lint the project with a consistent config.

We will also add a function definition so that our assistant knows how to run StandardJS. Note the container image defined at the bottom of the following definition:

- name: run-standardjs
  description: Lints the current project with StandardJS
  parameters:
    type: object
    properties:
      typescript:
   	type: boolean
   	description: Whether to lint Typescript files
      fix:
   	type: boolean
   	description: Whether to fix the files
      files:
   	type: array
   	items:
          type: string
          description: The filepaths to pass to the linter. Defaults to '.'
    required:
      - typescript
      - fix
  container:
    image: vonwig/standardjs:latest

This definition will work for both TypeScript and JavaScript projects using an argument. The assistant uses the project content to determine how to optimally set the TypeScript property.

When using StandardJS with TypeScript, two things happen in the container:

  • Lints with ts-standard instead of standard
  • Lints ts-standard from the working directory containing tsconfig.json.

But, with the right tools, this behavior is enabled with a single prompt:

When using StandardJS, use Typescript only if there are tsconfigs in the project.

Docker environments

Both ESLint and StandardJS run in Node.js environments. In our current prototype, our assistant uses three different Docker images.

Docker is significant because of the previously mentioned requirement of using ts-standard in a directory with tsconfig.json. When we baked this logic into the Docker image, we effectively introduced a contract bridging the AI Assistant, the linter tool, and the overall structure of the repository.

After determining that a project uses JavaScript or TypeScript, our assistant also adds Git Hooks. (See this GenAI article for details.) Docker gives us a way to reliably distribute these tools.

If we detect `TypeScript` in the project, we can add the following hook to the hooks entry in the `local` repo entry.

```yaml
id: standardjs-docker-ts
name: standardjs linter (TS and JS)
entry: vonwig/standardjs:latest '{"typescript": true, "fix": false}'
language: docker_image
files "\\.(ts|tsx)$"

Fixing violations

Linting output comes in the form of violations. A violation is attached to a range in the code file with the offending code and the violation reason. As mentioned previously, 75% of StandardJS violations are automatically fixable. Can we use the AI assistant to automatically fix the remaining violations?

Respond with an edit to resolve the violation using the following JSON format:

{
  "start": [1,4],
  "end": [1,6],
  "edit": "lorem ipsum"
}

If you take, for example, the lint rule for type casting, all of the models we tested will replace === with ==. Here’s the response when we ask for fixes to lines with the no-this-before-super violation:

Here's the proposed edit to resolve the violation:

```json
{
  "start": [1, 4],
  "end": [2, 6],
  "edit": "class myclass extends base {\n\tval: number;\n\tconstructor()
{\n\t\tsuper();\n\t\tthis.val = 10;\n\t}\n}"
}
```

If these models are able to fix these violations, why doesn’t ESLint just make them automatically fixable? In many of the cases, they represent riskier changes that still require some developer supervision. Perhaps the best thing an assistant can do is present these auto-fixes to the user directly in their editors. For example, a fix that has been generated by our assistant can be presented in VSCode (Figure 3).

A fix that has been generated by the AI assistant  presented in VSCode.
Figure 3: A fix is presented to the user.

Editor complaints

With the rise of tools like GitHub Copilot, developers are now becoming accustomed to assistants being present in their editors (Figure 4).

Screenshot showing Copilot's presence in the editor, with "Fix this" highlighted in green.
Figure 4: AI assistant is present in the editor.

Our work is showing that linting tools can improve the quality of these fixes.

For example, when asking Copilot to fix the line from earlier, it lacks the additional context from ESLint (Figure 5).

Screenshot of AI response, which lacks additional context from ESLint.
Figure 5: Additional context is needed.

The assistant is unable to infer that there is a violation there. In this instance, Copilot is hallucinating because it was triggered by the developer’s editor action without any of the context coming in from the linter. As far as Copilot knows, I just asked it to fix perfectly good code.

To improve this, we can use the output of a linter to “complain” about a violation. The editor allows us to surface a quick action to fix the code. Figure 6 shows the same “fix using Copilot” from the “problems” window, triggered by another violation:

Screenshot showing "Fix using Copilot” in the problems window.
Figure 6: “Fix using Copilot” is shown in the problems window.

This is shown in VSCode’s “problems” window, which helps developers locate problems in the codebase. An assistant can use the editor to put the ESLint tool in a more effective relationship with the developer (Figure 7).

Screenshot showing an immediate resolution, rather than a hallucination from the assistant.
Figure 7: A more complete fix.

Most importantly, we get an immediate resolution rather than a hallucination. We’re also hosting these tools in Docker, so these improvements do not require installs of Node.js, NPM, or ESLint.

Summary

We continue to investigate the use of tools for gathering context and improving suggestions. In this article, we have looked at how AI assistants can provide significant value to developers by:

  • Cutting out busy work setting up Node/NPM/ESLint.
  • Leveraging expert knowledge about ESLint to “level up” developers
  • Generating and surfacing actionable fixes directly to developers where they’re already working (in the editor)
  • Generating simple workflows as outcomes from natural language prompts and tools

As always, feel free to follow along in our new public repo and please reach out. Everything we’ve discussed in this blog post is available for you to try out on your own projects.

For more on what we’re doing at Docker, subscribe to our newsletter.

Learn more

How to Create Dockerfiles with GenAI 

Par : Docker Labs
29 juillet 2024 à 14:00

This ongoing Docker Labs GenAI series will explore the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing things as open source so you can play, explore, and hack with us, too.

As we learn how to work more effectively with generative AI, one of the tasks we return to is Dockerfile generation. Because we can equip our AI assistants with tools, does this allow us to reframe the problem of content generation as an interaction between LLMs and tools?

2400x1260 docker labs genai

If you ask ChatGPT to write a Dockerfile for your project, it will initiate a conversation with you and ask you to extract some details from your project (Figure 1).

Screenshot of conversation with ChatGPT starting with the question: Can you write a Dockerfile for this project?
Figure 1: Asking ChatGPT for help writing a Dockerfile.

However, we can also equip our assistant with a tool to extract this information from the project directly. Instead of asking the user, an assistant can ask for a tool instead.

Using tools

Here’s an alternative framing. Let’s prompt the assistant with two high-level instructions whenever it is asked to create a Dockerfile.

* First, analyze the project to determine how it should be built. 
* Once the analysis is complete, create a Dockerfile to build that project.

In addition to these new prompts, we will also supply the LLM with two function definitions.

[{"name": "analyze_project",
 "description": "Analyze a project to determine how it should be built"},
 {"name": "write_files",
  "description": "write a set of files to my project",
  "parameters": {...}}]

With these two functions, the assistant will have enough agency that it can write a Dockerfile directly to a user’s project. Unfortunately, without best practices, the actual Dockerfile content is never very good. 

Here’s an example of the kind of Dockerfile that gets generated when the assistant knows the kind of project but not much else.

# Use the official Node.js image from the Docker Hub
FROM node:14

# Create and change to the app directory
WORKDIR /usr/src/app

# Copy the package.json and package-lock.json files
COPY package*.json ./

# Install dependencies
RUN npm install

# Copy the rest of the application code
COPY . .

# Expose the port your app runs on
EXPOSE 3000

# Define the command to run your app
CMD ["npm", "start"]

This example shows several standard Dockerfile authoring problems.

  • Uses out of date base images.
  • Does not employ multi-stage builds.
  • Skips newer Docker features such as cache mounts.

So, what can we do about this? Let’s extract some best practices for authoring NPM project Dockerfiles.

Write Dockerfiles for NPM projects using three stages.  Do these three steps sequentially.
* the first node dependencies stage should be called "deps" 
  and it should fetch the runtime dependencies using npm ci
  with the --omit=dev flag.
* The second Dockerfile stage should be called build  and it should be based on the deps stage. 
  It should run npm ci and then npm build
* The third Dockerfile stage should select a base image 
  that is recommended by Scout and use that in the FROM line.
  This stage should do then do three things.
	1. it copies the node_modules directory from the deps stage.
    2. it copies the dist directory from the build stage.
    3. it then runs npm start

If you need to use a RUN statement containing npm ci always 
add the argument --mount=type=cache,target=/root/.npm to the RUN instruction.  
The --mount argument should be placed between the word RUN and the npm command.
This will cache the npm packages in the docker build cache and speed up the build process.
After selecting a base image, get a recommended tag. This recommendation will change over time but start with what is recommended now.

We’ll include this content in any conversation occurring in the context of an NPM project. We’ll also add a third function.

[{"name": "docker_scout_tag_recommendation",
  "description": "get a recommended tag",
  "parameters": {...}}]

At the time of writing, here’s the output from our assistant. In this example, the assistant is using the GPT-4 model, but we are seeing the same results with other local models.

# Stage 1 - Downloading dependencies
FROM node:22-slim AS deps
WORKDIR /usr/src/app
COPY package*.json ./
RUN --mount=type=cache,target=/root/.npm npm ci --omit=dev

# Stage 2 - Building application
FROM deps AS build
RUN --mount=type=cache,target=/root/.npm npm ci && npm build

# Stage 3 - Using a recommended base image from Scout
FROM node:22-slim
WORKDIR /usr/src/app
COPY --from=deps /usr/src/app/node_modules ./node_modules
COPY --from=build /usr/src/app/dist ./dist
CMD [ "npm", "start" ]

This is a big improvement. The three inputs that underlie this improvement are listed below.

  • The two top-level instructions that anchor the process our assistant should use:
* First, analyze the project to determine how it should be built. 
* Once the analysis is complete, create a Dockerfile to build that project.
  • The Markdown content that summarizes our Docker specific best practices.
  • The functions (analyze_project, write_files, and docker_scout_tag_recommendation).

Distilling best practices is certainly not easy, but a little can go a long way. Still, it is still hard to know whether this can be considered an improvement over just using code templates. Our continuing investigation centers around whether generative AI will help us to more easily adapt these models to new knowledge, new features, and new project types.

Conversation loop

It’s instructive to go through the actual messages passed between agents here. It’s particularly interesting to look at the interplay between the Assistant and the Tool agents. Near the end of the exchange, there is an unexpected twist where the write_files tool complains that it was passed an invalid JSON file. In this case, the assistant actually did generate invalid JSON and the tool was right to complain. However, the assistant also responded to this error by fixing the JSON and calling the tool a second time! That was unscripted.

Another interesting example of an unscripted moment is the last message from the assistant. The assistant correctly determines that the task is now complete and generates a summary of the work that was done and why.

Transcript

System:

You are an assistant who specializes in authoring Dockerfiles for projects. Since you are an expert and know about their project, be definitive about recommendations.

User:

  • First, analyze the project to determine how it should be built.
  • Once the analysis is complete, create a Dockerfile to build that project.

Assistant:

[{:function {:arguments "{}" :name "analyze_project"}}]

Tool (analyze_project):

This is an NPM project.

System:

Write Dockerfiles for NPM projects using three stages. Do these three steps sequentially …

… (system prompts excluded for brevity) ..

Assistant:

[{:function {:arguments "{
  \"repository\": \"node\"
}"
             :name "docker_scout_tag_recommendation"}}]

Tool (docker_scout_tag_recommendation):

22-slim

Assistant:

[{:function {:arguments "{
  \"files\": [
    {
      \"path\": \"Dockerfile\",
      \"content\": `
# Stage 1 - the build process
FROM node:22-slim AS deps
WORKDIR /usr/src/app
COPY package*.json ./
RUN --mount=type=cache,target=/root/.npm npm ci --omit=dev
# Stage 2 - the production environment
FROM deps AS build
RUN npm ci && npm run build
# Stage 3 - the production environment
FROM node:22-slim
WORKDIR /usr/src/app
COPY --from=deps /usr/src/app/node_modules ./node_modules
COPY --from=build /usr/src/app/dist ./dist
CMD [ \"npm\", \"start\" ]`,
      \"executable\": false
    }
  ]
}"
             :name "write_files"}
  :id "call_6VVCLiXcIBNyEGPQwftR7xEV"}]

Tool:

function call write_files failed call exited with non-zero code (1):  call exited with non-zero code (1): #error {\r\n :cause Unexpected character (‘`’)

Assistant:

I apologize for the error. Let me try that again.

[{:function {:arguments "{
  \"files\": [
    {
      \"path\": \"Dockerfile\",
      \"content\": \"FROM node:22-slim AS deps\\nWORKDIR /usr/src/app\\nCOPY package*.json ./\\nRUN --mount=type=cache,target=/root/.npm npm ci --omit=dev\\n\\nFROM deps AS build\\nRUN npm ci && npm run build\\n\\nFROM node:22-slim\\nWORKDIR /usr/src/app\\nCOPY --from=deps /usr/src/app/node_modules ./node_modules\\nCOPY --from=build /usr/src/app/dist ./dist\\nCMD [ \\\"npm\\\", \\\"start\\\" ]\",
      \"executable\": false
    }
  ]
}"
             :name "write_files"}}]

Tool (write_files):

wrote 1 files: Dockerfile

Assistant:

I have written the Dockerfile for your NPM project. You can find it in your project folder. This Dockerfile uses a multi-stage build process to efficiently build and run your application within a Docker container. The first stage fetches the runtime dependencies, the second stage builds the application, and the final stage runs the application. Please let me know if you need further assistance.

Summary

For a generative AI assistant to improve its ability to generate Dockerfile content, we have identified both tools and stack-specific best practice content as key enablers. Incorporating these elements into a generative AI based assistant can serve as an alternative to code templates. We continue to evaluate whether this approach allows us to more easily adapt to new project types.

As always, feel free to follow along in our public repo and reach out. Everything we’ve discussed in this blog post is available for you to try on your own projects.

For more on what we’re doing at Docker, subscribe to our newsletter.

Learn more

How an AI Assistant Can Help Configure Your Project’s Git Hooks

Par : Docker Labs
15 juillet 2024 à 13:00

This ongoing Docker Labs GenAI series will explore the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real-time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing things as open source so you can play, explore, and hack with us, too.

Can an AI assistant help configure your project’s Git hooks? 

Git hooks can save manual work during repetitive tasks, such as authoring commit messages or running checks before committing an update. But they can also be hard to configure, and are project dependent. Can generative AI make Git hooks easier to configure and use?

2400x1260 docker labs genai

Simple prompts

From a high level, the basic prompt is simple:

How do I set up git hooks?

Although this includes no details about the actual project, the response from many foundation models is still useful. If you run this prompt in ChatGPT, you’ll see that the response contains details about how to use the .git/hooks folder, hints about authoring hook scripts, and even practical next steps for what you’ll need to learn about next. However, the advice is general. It has not been grounded by your project.

Project context

Your project itself is an important source of information for an assistant. Let’s start by providing information about types of source code in a project. Fortunately, there are plenty of existing tools for extracting project context, and these tools are often already available in Docker containers. 

For example, here’s an image that will analyze any Git repository and return a list of languages being used. Let’s update our prompt with this new context.

How do I set up git hooks?

{{# linguist }}
This project contains code from the language {{ language }} so if you have any 
recommendations pertaining to {{ language }}, please include them.
{{/linguist}}

In this example, we use moustache templates to bind the output of our “linguist” analysis into this prompt.

The response from an LLM-powered assistant will change dramatically. Armed with specific advice about what kinds of files might be changed, the LLM will generate sample scripts and make suggestions about specific tools that might be useful for the kinds of code developed in this project. It might even be possible to cut and paste code out of the response to try setting up hooks yourself. 

The pattern is quite simple. We already have tools to analyze projects, so let’s plug these in locally and give the LLM more context to make better suggestions (Figure 1).

Flow chart illustration showing addition of tools to provide project context for the LLM. The process includes Project, Tool (go-linguist), Docker Desktop, Prompt templates and LLM.
Figure 1: Adding tools to provide context for LLM.

Expertise

Generative AI also offers new opportunities for experts to contribute knowledge that AI assistants can leverage to become even more useful. For example, we have learned that pre-commit can be helpful to organize the set of tools used to implement Git hooks. 

To represent this learning, we add this prompt:

When configuring git hooks, our organization uses a tool called
[pre-commit](https://github.com/pre-commit/pre-commit).

There’s also a base configuration that we have found useful in all projects. We also add that to our assistant’s knowledge base.

If a user wants to configure git hooks, use this template which will need to be written to pre-commit-config.yaml 
in the root of the user's project.

Start with the following code block:

```yaml
repos:
    - repo: http://github.com/pre-commit/pre-commit-hooks
      rev: v2.3.0
      hooks:
          - id: check-yaml
          - id: trailing-whitespace
          - id: check-merge-conflict
    - repo https://github.com/jorisroovers/gitlint
      rev: main
      hooks:
          - id: gitlint
    - repo: local
      hooks:
```

Finally, as we learn about new tools that are useful for certain projects, we describe this information. For example, as an expert, I might want to suggest that teams using Golang include a particular linting tool in the Git hooks configuration.

If we detect `Go` in the project, add the following hook to the hooks entry in the `local` repo entry.

```yaml
id: golangcli-lint
name: golang cli
entry: golangci/golangci-lint
files "\\.go$"
```

With these additions, the response from our assistant becomes precise. We have found that our assistant can now write hooks scripts and write complete YAML configuration files that are project-specific and ready to copy directly into a project. 

Somewhat surprisingly, the assistant can also now recommend tools not mentioned explicitly in our prompts but that use the same syntax established for other tools. Using these examples, the LLM appears to be capable of extending the assistant’s capabilities to other tools. Using our examples as guidance, the LLM suggests new tools but still configures them using our suggested framework and syntax.

Most importantly, the response from the assistant is now not only actionable to the developer, saving them time, but it is also specific enough that we could pass the response to a simple agent to take the action automatically.

Adding tools

For this example, the only tool we really need is a file-writer. The change to our prompt is to add one instruction to go ahead and write the configuration into the project.

Write the final yaml content to our project at the path pre-commit-config.yaml.  Write both the `pre-commit` and `commit-message` scripts to `git/hooks` and make them executable.

Besides the prompt, there is another crucial step that we are skipping at this point. The assistant must be told that it is capable of writing content into files. However, this is really just a registration step. 

The important thing is that we can give our agent the tools it needs to perform tasks. In doing so, the response from the LLM undergoes a transition. Instead of text output, the LLM responds with instructions for our agent. If we’re using an OpenAI function call, we’ll see a request that looks something like the following .json file. It’s not meant to be read by us, of course. It’s an instruction to the agent that knows how to update your project for you.

{
  "id": "call_4LCo0CQqCHCGGZea3qlaTg5h",
  "type": "function",
  "function": {
    "name": "write_file",
    "arguments": "{\n  \"path\": \"pre-commit-config.yaml\",\n  \"content\": \"repos:\\n    - repo: http://github.com/pre-commi
t/pre-commit-hooks\\n      rev: v2.3.0\\n      hooks:\\n          - id: check-yaml\\n          - id: trailing-whitespace\\n          - id
: check-merge-conflict\\n    - repo https://github.com/jorisroovers/gitlint\\n      rev: main\\n      hooks:\\n          - id: gitlint\\n
    - repo: local\\n      hooks:\\n          - id: markdownlint\\n            name: markdown linter\\n            entry: markdownlint/mar
kdownlint\\n            files: \\\"\\\\.md$\\\"\\n          - id: python-black\\n            name: python black formatter\\n            e
ntry: black\\n            files: \\\"\\\\.py$\\\"\"\n}"
    }
}

A more sophisticated version of the file-writer function might communicate with an editor agent capable of presenting recommended file changes to a developer using native IDE concepts, like editor quick-fixes and hints. In other words, tools can help generative AI to meet developers where they are. And the answer to the question:

How do I set up git hooks?

becomes, “Let me just show you.”

Docker as tool engine

The tools mentioned in the previous sections have all been delivered as Docker containers.  One goal of this work has been to verify that an assistant can bootstrap itself starting from a Docker-only environment. Docker is important here because it has been critical in smoothing over many of the system/environment gaps that LLMs struggle with. 

We have observed that a significant barrier to activating even simple local assistants is the complexity of managing a safe and reliable environment for running these tools. Therefore, we are constraining ourselves to use only tools that can be lazily pulled from public registries.

For AI assistants to transform how we consume tools, we believe that both tool distribution and knowledge distribution are key factors. In the above example, we can see how LLM responses can be transformed by tools from unactionable and vague to hyper-project-focused and actionable. The difference is tools.

To follow along with this effort, check out the GitHub repository for this project.

Learn more

Using Generative AI to Create Runnable Markdown

Par : Docker Labs
1 juillet 2024 à 13:00

This ongoing GenAI Docker Labs series will explore the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real-time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing things as open source so you can play, explore, and hack with us, too.

Generative AI (GenAI) is changing how we interact with tools. Today, we might experience this predominantly through the use of new AI-powered chat assistants, but there are other opportunities for generative AI to improve the life of a developer.

When developers start working on a new project, they need to get up to speed on the tools used in that project. A common practice is to document these practices in a project README.md and to version that documentation along with the project. 

Can we use generative AI to generate this content? We want this content to represent best practices for how tools should be used in general but, more importantly, how tools should be used in this particular project.

We can think of this as a kind of conversation between developers, agents representing tools used by a project, and the project itself. Let’s look at this for the Docker tool itself.

2400x1260 docker labs genai

Generating Markdown in VSCode

For this project, we have written a VSCode extension that adds one new command called “Generate a runbook for this project.” Figure 1 shows it in action:

Animated gif showing VSCode extension to generate a runbook for this project.
Figure 1: VSCode extension to generate a runbook.

This approach combines prompts written by tool experts with knowledge about the project itself. This combined context improves the LLM’s ability to generate documentation (Figure 2).

 Illustration showing process flow from expert prompts plus project facts to LLM.
Figure 2: This approach combines expert prompts with knowledge about the project itself.

Although we’re illustrating this idea on a tool that we know very well (Docker!), the idea of generating content in this manner is quite generic. The prompts we used for getting started with the Docker build, run, and compose are available from GitHub. There is certainly an art to writing these prompts, but we think that tool experts have the right knowledge to create prompts of this kind, especially if AI assistants can then help them make their work easier to consume.

There is also an essential point here. If we think of the project as a database from which we can retrieve context, then we’re effectively giving an LLM the ability to retrieve facts about the project. This allows our prompts to depend on local context. For a Docker-specific example, we might want to prompt the AI to not talk about compose if the project has no compose.yaml files. 

“I am not using Docker Compose in this project.”

That turns out to be a transformative user prompt if it’s true. This is what we’d normally learn through a conversation. However, there are certain project details that are always useful. This is why having our assistants right there in the local project can be so helpful.

Runnable Markdown

Although Markdown files are mainly for reading, they often contain runnable things. LLMs converse with us in text that often contains code blocks that represent actual runnable commands. And, in VSCode, developers use the embedded terminal to run commands against the currently open project. Let’s short-circuit this interaction and make commands runnable directly from these Markdown runbooks.

In the current extension, we’ve added a code action to every code block that contains a shell command so that users can launch that command in the embedded terminal. During our exploration of this functionality, we have found that treating the Markdown file as a kind of REPL (read-eval-print-loop) can help to refine the output from the LLM and improve the final content. Figure 3 what this looks like in action:

 Animated gif showing addition of code action that contains a shell command so users can launch that command in the embedded terminal.
Figure 3: Adding code to allow users to launch the command in the embedded terminal.

Markdown extends your editor

In the long run, nobody is going to navigate to a Markdown file in order to run a command. However, we can treat these Markdown files as scripts that create commands for the developer’s edit session. We can even let developers bind them to keystrokes (e.g., type ,b to run the build code block from your project runbook).

In the end, this is just the AI Assistant talking to itself. The Assistant recommends a command. We find the command useful. We turn it into a shortcut. The Assistant remembers this shortcut because it’s in our runbook, and then makes it available whenever we’re developing this project.

Animated gif showing the AI Assistant generating context-aware content.
Figure 4: The Assistant in action.

Figure 4 shows a real feedback loop between the Assistant, the generated content, and the developer that is actually running these commands. 

As developers, we tend to vote with our keyboards. If this command is useful, let’s make it really easy to run! And if it’s useful for me, it might be useful for other members of my team, too.

The GitHub repository and install instructions are ready for you to try today.

For more, see this demo: VSCode Walkthrough of Runnable Markdown from GenAI.

Subscribe to Docker Navigator to stay current on the latest Docker news.

Learn more

❌
❌