Vue normale

Il y a de nouveaux articles disponibles, cliquez pour rafraîchir la page.
À partir d’avant-hierFlux principal

Simplify AI Development with the Model Context Protocol and Docker

Par : Docker Labs
15 janvier 2025 à 13:07

This ongoing Docker Labs GenAI series explores the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing software as open source so you can play, explore, and hack with us, too.

In December, we published The Model Context Protocol: Simplifying Building AI apps with Anthropic Claude Desktop and Docker. Along with the blog post, we also created Docker versions for each of the reference servers from Anthropic and published them to a new Docker Hub mcp namespace.

This provides lots of ways for you to experiment with new AI capabilities using nothing but Docker Desktop.

2400x1260 docker labs genai

For example, to extend Claude Desktop to use Puppeteer, update your claude_desktop_config.json file with the following snippet:

"puppeteer": {
    "command": "docker",
    "args": ["run", "-i", "--rm", "--init", "-e", "DOCKER_CONTAINER=true",          "mcp/puppeteer"]
  }

After restarting Claude Desktop, you can ask Claude to take a screenshot of any URL using a Headless Chromium browser running in Docker.

You can do the same thing for a Model Context Protocol (MCP) server that you’ve written. You will then be able to distribute this server to your users without requiring them to have anything besides Docker Desktop.

How to create an MCP server Docker Image

An MCP server can be written in any language. However, most of the examples, including the set of reference servers from Anthropic, are written in either Python or TypeScript and use one of the official SDKs documented on the MCP site.

For typical uv-based Python projects (projects with a pyproject.toml and uv.lock in the root), or npm TypeScript projects, it’s simple to distribute your server as a Docker image.

  1. If you don’t already have Docker Desktop, sign up for a free Docker Personal subscription so that you can push your images to others.
  2. Run docker login from your terminal.
  3. Copy either this npm Dockerfile or this Python Dockerfile template into the root of your project. The Python Dockerfile will need at least one update to the last line.
  4. Run the build with the Docker CLI (instructions below).

The two Dockerfiles shown above are just templates. If your MCP server includes other runtime dependencies, you can update the Dockerfiles to include these additions. The runtime of your MCP server should be self-contained for easy distribution.

If you don’t have an MCP server ready to distribute, you can use a simple mcp-hello-world project to practice. It’s a simple Python codebase containing a server with one tool call. Get started by forking the repo, cloning it to your machine, and then following the following instructions to build the MCP server image.

Building the image

Most sample MCP servers are still designed to run locally (on the same machine as the MCP client, communication over stdio). Over the next few months, you’ll begin to see more clients supporting remote MCP servers but for now, you need to plan for your server running on at least two different architectures (amd64 and arm64). This means that you should always distribute what we call multi-platform images when your target is local MCP servers. Fortunately, this is easy to do.

Create a multi-platform builder

The first step is to create a local builder that will be able to build both platforms. Don’t worry; this builder will use emulation to build the platforms that you don’t have. See the multi-platform documentation for more details.

docker buildx create \
  --name mcp-builder \
  --driver docker-container \
  --bootstrap

Build and push the image

In the command line below, substitute <your-account> and your mcp-server-name for valid values, then run a build and push it to your account.

docker buildx build \
  --builder=mcp-builder \
  --platform linux/amd64,linux/arm64 \
  -t <your-docker-account>/mcp-server-name \
  --push .

Extending Claude Desktop

Once the image is pushed, your users will be able to attach your MCP server to Claude Desktop by adding an entry to claude_desktop_config.json that looks something like:

"your-server-name": {
    "command": "docker",
    "args": ["run", "-i", "--rm", "--pull=always",
             "your-account/your-server-name"]
  }

This is a minimal set of arguments. You may want to pass in additional command-line arguments, environment variables, or volume mounts.

Next steps

The MCP protocol gives us a standard way to extend AI applications. Make sure your extension is easy to distribute by packaging it as a Docker image. Check out the Docker Hub mcp namespace for examples that you can try out in Claude Desktop today.

As always, feel free to follow along in our public repo.

For more on what we’re doing at Docker, subscribe to our newsletter.

Learn more

💾

This demo will use the Puppeteer MCP server to take a screenshot of a website and invert the colors using Claude Desktop and Docker Desktop. Doing this witho...

Meet Gordon: An AI Agent for Docker

Par : Docker Labs
13 janvier 2025 à 14:20

This ongoing Docker Labs GenAI series explores the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing software as open source so you can play, explore, and hack with us, too.

In previous articles, we focused on how AI-based tools can help developers streamline tasks and offered ideas for enabling agentic workflows, like reviewing branches and understanding code changes.

In this article, we’ll explore our experiments around the idea of creating a Docker AI Agent — something that could both help new users learn about our tools and products and help power users get things done faster.

2400x1260 docker labs genai

During our explorations around this Docker Agent and AI-based tools, we noticed that the main pain points we encountered were often the same:

  • LLMs need good context to provide good answers (garbage in -> garbage out).
  • Using AI tools often requires context switching (moving to another app, to a different website, etc.).
  • We’d like agents to be able to suggest and perform actions on behalf of the users.
  • Direct product integrations with AI are often more satisfying to use than chat interfaces.

At first, we tried to see what’s possible using off-the-shelf services like ChatGPT or Claude. 

By using testing prompts such as “optimize the following Dockerfile, following all best practices” and providing the model with a sub-par but common Dockerfile, we could sometimes get decent answers. Often, though, the resulting Dockerfile had subtle bugs, hallucinations, or simply wasn’t optimized or didn’t use many of the best practices we would’ve hoped for. Thus, this approach was not reliable enough.

Data ended up being the main issue. Training data for LLM models is always outdated by some amount of time, and the number of bad Dockerfiles that you can find online vastly outnumbers the amount of up-to-date Dockerfiles using all best practices, etc.

After doing proof-of-concept tests using a RAG approach, including some documents with lots of useful advice for creating good Dockerfiles, we realized that the AI Agent idea was definitely possible. However, setting up all the things required for a good RAG would’ve taken too much bandwidth from our small team.

Because of this, we opted to use kapa.ai for that specific part of our agent. Docker already uses them to provide the AI docs assistant on Docker docs, so most of our high-quality documentation is already available for us to reference as part of our LLM usage through their service. Using kapa.ai allowed us to experiment more, getting high-quality results faster, and allowing us to try different ideas around the AI agent concept.

Enter Gordon

Out of this experimentation came a new product that you can try: Gordon. With Gordon, we’d like to tackle these pain points. By integrating Gordon into Docker Desktop and the Docker CLI (Figure 1), we can:

  • Access much more context that can be used by the LLMs to best understand the user’s questions and provide better answers or even perform actions on the user’s behalf.
  • Be where the users are. If you launch a container via Docker Desktop and it fails, you can quickly debug with Gordon. If you’re in the terminal hacking away, Docker AI will be there, too.
  • Avoid being a purely chat-based agent by providing Gordon-based features directly as part of Docker Desktop UI elements. If Gordon detects certain scenarios, like a container that failed to start, a button will appear in the UI to directly get suggestions, or run actions, etc. (Figure 2).
Screenshot of Docker Desktop showing the Gordon icon next to a container name in the list of containers.
Figure 1: Gordon icon on Docker Desktop.
Screenshot of Docker Desktop showing the Ask Gordon tab next to Logs, Inspect, Files, Stats and other options.
Figure 2: Ask Gordon (beta).

What Gordon can do

We want to start with Gordon by optimizing for Docker-related tasks — not general-purpose questions — but we are not excluding expanding the scope to more development-related tasks as work on the agent continues.

Work on Gordon is at an early stage and its capabilities are constantly evolving, but it’s already really good at some things (Figure 3). Here are things to definitely try out:

  • Ask general Docker-related questions. Gordon knows Docker well and has access to all of our documentation.
  • Get help debugging container build or runtime errors.
  • Remediate policy deviations from Docker Scout.
  • Get help optimizing Docker-related files and configurations.
  • Ask it how to run specific containers (e.g., “How can I run MongoDB?”).
Screenshot of results after asking Docker AI to explain a Dockerfile.
Figure 3: Using Gordon to understand a Dockerfile.

How Gordon works

The Gordon backend lives on Docker servers, while the client is a CLI that lives on the user’s machine and is bundled with Docker Desktop. Docker Desktop uses the CLI to access the local machine’s files, asking the user for the directory each time it needs that context to answer a question. When using the CLI directly, it has access to the working directory it’s executed in. For example, if you are in a directory with a Dockerfile and you run “Docker AI, rate my Dockerfile”, it will find the one that’s present in that directory

Currently, Gordon does not have write access to any files, so it will not edit any of your files. We’re hard at work on future features that will allow the agent to do the work for you, instead of only suggesting solutions. 

Figure 4 shows a rough overview of how we are thinking about things behind the scenes.

Illustration showing an overview of how Gordon works, with flow steps starting with "Understand user's input" and going to "Gather context" to "prepare final prompts" then "check results", "reply to user", and more.
Figure 4: Overview of Gordon.

The first step of this pipeline, “Understand the user’s input and figure out which action to perform”, is done using “tool calling” (also known as “function calling”) with the OpenAI API

Although this is a popular approach, we noticed that the documentation online isn’t very good, and general best practices aren’t well defined yet. This led us to experiment a lot with the feature and try to figure out what works for us and what doesn’t.

Things we noticed:

  • Tool descriptions are important, and we should prefer more in-depth descriptions with examples.
  • Testing around tool-detection code is also important. Adding new tools to a request could confuse the LLM and cause it to no longer trigger the expected tool.
  • The LLM model used influences how the whole tool calling functionality should be implemented, as different models might prefer descriptions written in a certain way, behave better/worse under certain scenarios (e.g. when using lots of tools), etc.

Try Gordon for yourself

Gordon is available as an opt-in Beta feature starting with Docker Desktop version 4.37. To participate in the closed beta, all you need to do is fill out the form on the site.

Initially, Gordon will be available for use both in Docker Desktop and the Docker CLI, but our idea is to surface parts of this tech in various other parts of our products as well.

For more on what we’re doing at Docker, subscribe to our newsletter.

Learn more

Unlocking Efficiency with Docker for AI and Cloud-Native Development

Par : Yiwen Xu
8 janvier 2025 à 14:22

The need for secure and high quality software becomes more critical every day as the impact of vulnerabilities increases and related costs continue to rise. For example, flawed software cost the U.S. economy $2.08 trillion in 2020 alone, according to the Consortium for Information and Software Quality (CISQ). And, a software defect that might cost $100 to fix if found early in the development process can grow exponentially to $10,000 if discovered later in production. 

Docker helps you deliver secure, efficient applications by providing consistent environments and fast, reliable container management, building on best practices that let you discover and resolve issues earlier in the software development life cycle (SDLC).

2400x1260 docker evergreen logo blog E

Shifting left to ensure fewer defects

In a previous blog post, we talked about using the right tools, including Docker’s suite of products to boost developer productivity. Besides having the right tools, you also need to implement the right processes to optimize your software development and improve team productivity. 

The software development process is typically broken into two distinct loops, the inner and the outer loops. At Docker, we believe that investing in the inner loop is crucial. This means shifting security left and identifying problems as soon as you can. This approach improves efficiency and reduces costs by helping teams find and fix software issues earlier.

Using Docker tools to adopt best practices

Docker’s products help you adopt these best practices — we are focused on enhancing the software development lifecycle, especially around refining the inner loop. Products like Docker Desktop allow your dev team in the inner loop to run, test, code, and build everything fast and consistently. This consistency eliminates the “it works on my machine” issue, meaning applications behave the same in both development and production.  

Shifting left lets your dev team identify problems earlier in your software project lifecycle. When you detect issues sooner, you increase efficiency and help ensure secure builds and compliance. By shifting security left with Docker Scout, your dev teams can identify vulnerabilities sooner and help avoid issues down the road. 

Another example of shifting left involves testing — doing testing earlier in the process leads to more robust software and faster release cycles. This is when Testcontainers Cloud comes in handy because it enables developers to run reliable integration tests, with real dependencies defined in code. 

Accelerate development within the hybrid inner loop

We see more and more companies adopting the so-called hybrid inner loop, which combines the best of two worlds — local and cloud. The results provide greater flexibility for your dev teams and encourage better collaboration. For example, Docker Build Cloud uses the power of the cloud to speed up build time without sacrificing the local development experience that developers love. 

By using these Docker products across the software development life cycle, teams get quick feedback loops and faster issue resolution, ensuring a smooth development flow from inception to deployment. 

Simplifying AI application development

When you’re using the right tools and processes to accelerate your application delivery and maximize efficiency throughout your SDLC, processes that were once cumbersome become your new baseline, freeing up time for true innovation. 

Docker also helps accelerate innovation by simplifying AI/ML development. We are continually investing in AI to help your developers deliver AI-backed applications that differentiate your business and enhance competitiveness.

Docker AI tools

Docker’s GenAI Stack accelerates the incorporation of large language models (LLMs) and AI/ML into your code, enabling the delivery of AI-backed applications. All containers work harmoniously and are managed directly from Docker Desktop, allowing your team to monitor and adjust components without leaving their development environment. Deploying the GenAI Stack is quick and easy, and leveraging Docker’s containerization technology helps speed setup and simplify scaling as applications grow.

Earlier this year, we announced the preview of Docker Extension for GitHub Copilot. By standardizing best practices and enabling integrations with tools like GitHub Copilot, Docker empowers developers to focus on innovation, closing the gap from the first line of code to production.

And, more recently, we launched the Docker AI Catalog in Docker Hub. This new feature simplifies the process of integrating AI into applications by providing trusted and ready-to-use content supported by comprehensive documentation. Your dev team will benefit from shorter development cycles, improved productivity, and a more streamlined path to integrating AI into both new and existing applications.

Wrapping up

Docker products help you establish sound processes and practices related to shifting left and discovering issues earlier to avoid headaches down the road. This approach ultimately unlocks developer productivity, giving your dev team more time to code and innovate. Docker also allows you to quickly use AI to close knowledge gaps and offers trusted tools to build AI/ML applications and accelerate time to market. 

To see how Docker continues to empower developers with the latest innovations and tools, check out our Docker 2024 Highlights.

Learn about Docker’s updated subscriptions and find the ideal plan for your team’s needs.

Learn more

How to Create and Use an AI Git Agent

Par : Docker Labs
16 décembre 2024 à 14:23

This ongoing Docker Labs GenAI series explores the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing software as open source so you can play, explore, and hack with us, too.

In our past experiments, we started our work from the assumption that we had a project ready to work on. That means someone like a UI tech writer would need to understand Git operations in order to use the tools we built for them. Naturally, because we have been touching on Git so frequently, we wanted to try getting a Git agent started. Then, we want to use this Git agent to understand PR branches for a variety of user personas — without anyone needing to know the ins and outs of Git.

2400x1260 docker labs genai

Git as an agent

We are exploring the idea that tools are agents. So, what would a Git agent do? 

Let’s tackle our UI use case prompt. 

Previously:

You are at $PWD of /project, which is a git repo.
Force checkout {{branch}}
Run a three-dot diff of the files changed in {{branch}} compared to main using --name-only.

A drawback that isn’t shown here, is that there is no authentication. So, if you haven’t fetched that branch or pulled commits already, this prompt at best will be unreliable and more than likely will fail (Figure 1):

Screenshot of Logs showing failure to authenticate.
Figure 1: No authentication occurs.

Now:

You are a helpful assistant that checks a PR for user-facing changes.
1. Fetch everything and get on latest main.
2. Checkout the PR branch and pull latest.
3. Run a three-dot git diff against main for just files. Write the output to /thread/diff.txt.

This time around, you can see that we are being less explicit about the Git operations, we have the ability to export outputs to the conversation thread and, most importantly, we have authentication with a new prompt!

Preparing GitHub authentication

Note: These prompts should be easily adaptable to other Git providers, but we use GitHub at Docker.

Before we can do anything with GitHub, we have to authenticate. There are several ways to do this, but for this post we’ll focus on SSH-based auth rather than using HTTPS through the CLI. Without getting too deep into the Git world, we will be authenticating with keys on our machine that are associated with our account. These keys and configurations are commonly located at ~/.ssh on Linux/Mac. Furthermore, users commonly maintain Git config at ~/.gitconfig

The .gitconfig file is particularly useful because it lets us specify carriage return rules — something that can easily cause Git to fail when running in a Linux container. We will also need to modify our SSH config to remove UseKeychain. We found these changes are enough to authenticate using SSH in Alpine/Git. But we, of course, don’t want to modify any host configuration.

We came up with a fairly simple flow that lets us prepare to use Git in a container without messing with any host SSH configs.

  1. Readonly mounts: Git config and SSH keys are stored on specific folders on the host machine. We need to mount those in.
    a. Mount ~/.ssh into a container as /root/.ssh-base readonly.
    b. Mount ~/.gitconfig into the same container as /root/.gitconfig.
  2. Copy /root/.ssh-base to /root/.ssh and make the new file readwrite.
  3. Make necessary changes to config.
  4. For the LLM, we also need it to verify the config is in the thread and the changes were made to it. In the event that it fails to make the right changes, the LLM can self-correct.
  5. Copy the .ssh directory and .gitconfig to /thread.

All of this is baked in a prompt you can find and run on GitHub. Simply replace <username> with your system username (Figure 2).

Screenshot of mounts, showing storage of Git config and SSH keys.
Figure 2: Readonly mounts.

If you’re using the default behavior in our extension for VSCode, then the /thread/ volume is ephemeral, so you would need to make sure your agent runs it before any Git tools. However, if you use the Set prompt thread ID command in VSCode, you can keep the /thread volume around under your own tag (Figure 3). 

Tagging the thread persists everything after the prompt completes, so you would only need to run the prompt once before going on to develop your Git tools.

Screenshot of setting the prompt thread ID.
Figure 3: Set the prompt thread ID.

If you opt to persist the thread, then this prompt would only need to be run once and the Git credentials will stick around (Figure 4).

Screenshot showing the stored configuration info.
Figure 4: Stored credential data.

Now that we have our credentials ready, we need our Git tool to pick up on them.

container:
   image: alpine/git
   entrypoint:
     - /bin/sh
     - -c
   command:
     - "cp -r /thread/.ssh /root/.ssh && cp -r /thread/.gitconfig /root/.gitconfig && git --no-pager {{args|safe}}" 

Our container definition copies the SSH credentials and .gitconfig files to the correct directory before running Git commands.

Unpacking objects:  90% (9/10)
Unpacking objects: 100% (10/10)
Unpacking objects: 100% (10/10), 4.14 KiB | 141.00 KiB/s, done.
From github.com:docker/repo
   abcd123..abcd123  main           -> origin/main
 * [new tag]           branch/v2982.0.0 -> branch/v2982.0.0
# ROLE assistant
I have fetched all origin branches. The latest updates have been applied to your local repository, and any new tags have been downloaded.
```
Now, we can see our Git tool is authenticated as if it were us.
Running the agent now, it can:
## ROLE assistant
I have run the git fetch --all command, which retrieves updates from all remotes. If you need further assistance with your Git repository, feel free to ask!
```

# ROLE assistant
```json
{"request":"checkout main"}
```

## ROLE assistant
```json
{"args":"checkout main"}
```

## ROLE assistant
You are already on the 'main' branch, and it is up to date with 'origin/main'.
```

# ROLE assistant
```json
{"request":"checkout the branch branch and pull latest changes."}
```

...
## ROLE assistant
The branch branch has been successfully checked out, and it's already up to date with the latest changes.
...

# ROLE assistant
```json
{"request":"Run a three-dot git diff against main and write the output for just files to /thread/diff.txt."}
```

...

## ROLE assistant
```json
{"args":"diff main... --name-only","outfile":"/thread/diff.txt"}
```

*entering tool*

## ROLE tool (git)
file1.tsx
file2.tsx
...
## ROLE assistant

The three-dot Git diff against the main branch has been executed, and the output has been saved to /thread/diff.txt. This includes the names of files that have differences.
``

With the new Git agent, we have enabled our assistant to fetch new information from upstream, switch branches, and pull in new code to review. If we provide our agent with credentials that are limited to read-only access, this agent could be used by someone that does not have a local git install.

For more on what we’re doing at Docker, subscribe to our newsletter.

Learn more

Extending the Interaction Between AI Agents and Editors

Par : Docker Labs
18 novembre 2024 à 18:28

This ongoing Docker Labs GenAI series explores the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing software as open source so you can play, explore, and hack with us, too.

We recently met up with developers at GitHub Universe at the Docker booth. In addition to demonstrating the upcoming Docker agent extension for GitHub Copilot, we also hosted a “Hack with Docker Labs” session. 

2400x1260 docker labs genai

To facilitate these sessions, we created a VSCode extension to explore the relationship between agents and tools. We encouraged attendees to think about how agents can change how we interact with tools by mixing tool definitions (anything you can package in a Docker container) with prompts using a simple Markdown-based canvas.

Many of these sessions followed a simple pattern.

  • Choose a tool and describe what you want it to do.
  • Let the agent interact with that tool.
  • Ask the agent to explain what it did or adjust the strategy and try again.

It was great to facilitate these discussions and learn more about how agents are challenging us to interact with tools in new ways.  

Figure 1 shows a short example of a session where we generated a QR code (qrencode). We start by defining both a tool and a prompt in the Markdown file. Then, we pass control over to the agent and let it interact with that tool (the output from the agent pops up on the right-hand side).

Animated gif showing generation of QR code using tool and prompt in Markdown file.
Figure 1: Generating a QR code.

Feel free to create an issue in our repo if you want to learn more.

Editors

This year’s trip to GitHub Universe also felt like an opportunity to reflect on how developer workflows are changing with the introduction of coding assistants. Developers may have had language services in the editor for a long time now, but coding assistants that can predict the next most likely tokens have taught us something new. We were all writing more or less the same programs (Figure 2).

Diagram showing process flow to and from Editor, Language Service, and LLM.
Figure 2: Language service interaction.

Other agents

Tools like Cursor, GitHub Copilot Chat, and others are also teaching us new ways in which coding assistants are expanding beyond simple predictions. In this series, we’ve been highlighting tools that typically work in the background. Agents armed with these tools will track other kinds of issues, such as build problems, outdated dependencies, fixable linting violations, and security remediations.

Extending the previous diagram, we can imagine an updated picture where agents send diagnostics and propose code actions, while still offering chat interfaces for other kinds of user input (Figure 3). If the ecosystem has felt closed, get ready for it to open up to new kinds of custom agents.

Diagram showing language service interaction on the left, with extension to Agent and Chat on the right.
Figure 3: Agent extension.

More to come

In the next few posts, we’ll take this series in a new direction and look at how agents are able to use LSPs to interact with developers in new ways. An agent that represents background tasks, such as updating dependencies, or fixing linting violations, can now start to use language services and editors as tools! We think this will be a great way for agents to start helping developers better understand the changes they’re making and open up these platforms to input from new kinds of tools.

GitHub Universe was a great opportunity to check in with developers, and we were excited to learn how many more tools developers wanted to bring to their workflows. As always, to follow along with this effort, check out the GitHub repository for this project.

For more on what we’re doing at Docker, subscribe to our newsletter.

Learn more

Getting Started with the Labs AI Tools for Devs Docker Desktop Extension

Par : Docker Labs
9 septembre 2024 à 16:32

This ongoing Docker Labs GenAI series explores the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real-time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing software as open source so you can play, explore, and hack with us, too.

We’ve released a simple way to run AI tools in Docker Desktop. With the Labs AI Tools for Devs Docker Desktop extension, people who want a simple way to run prompts can easily get started. 

If you’re a prompt author, this approach also allows you to build, run, and share your prompts more easily. Here’s how you can get started.

2400x1260 docker labs genai

Get the extension

You can download the extension from Docker Hub. Once it’s installed, enter an OpenAI key.

Import a project

With our approach, the information a prompt needs should be extractable from a project. Add projects here that you want to run SDLC tools inside (Figure 1).

Screenshot showing blue "Add project" button.
Figure 1: Add projects.

Inputting prompts

A prompt can be a git ref or a git URL, which will convert to a ref. You can also import your own local prompt files, which allows you to quickly iterate on building custom prompts.

Sample prompts

(copy + paste the ref)

ToolGit RefLinkDescription
Dockergithub.com:docker/labs-ai-tools-for-devs?ref=main&path=prompts/dockerhttps://github.com/docker/labs-ai-tools-for-devs/tree/main/prompts/dockerGenerates a runbook for any Docker project
Dockerfilesgithub.com:docker/labs-ai-tools-for-devs?ref=main&path=prompts/dockerfileshttps://github.com/docker/labs-ai-tools-for-devs/tree/main/prompts/dockerfilesGenerate multi-stage Dockerfiles for NPM projects
Lazy Dockergithub.com:docker/labs-ai-tools-for-devs?ref=main&path=prompts/lazy_dockerhttps://github.com/docker/labs-ai-tools-for-devs/tree/main/prompts/lazy_dockerGenerates a runbook for Lazy Docker
NPMgithub.com:docker/labs-ai-tools-for-devs?ref=main&path=prompts/npmhttps://github.com/docker/labs-ai-tools-for-devs/tree/main/prompts/npmResponds with helpful information about NPM projects
ESLintgithub.com:docker/labs-ai-tools-for-devs?ref=main&path=prompts/eslinthttps://github.com/docker/labs-ai-tools-for-devs/tree/main/prompts/eslintRuns ESLint in your project
ESLint Fixgithub.com:docker/labs-ai-tools-for-devs?ref=main&path=prompts/eslint_fixhttps://github.com/docker/labs-ai-tools-for-devs/tree/main/prompts/eslint_fixRuns ESLint in your project and responds with a fix for the first violation it finds
Pylintgithub.com:docker/labs-ai-tools-for-devs?ref=main&path=prompts/pylinthttps://github.com/docker/labs-ai-tools-for-devs/tree/main/prompts/pylintRuns Pylint in your project, and responds with a fix for the first violation it finds
Screenshot showing blue "Add local prompt" button next to text box in which to enter GitHub ref or URL
Figure 2: Enter a GitHub ref or URL.

Writing and testing your own prompt

Create a prompt file

A prompt file is a markdown file. Here’s an example: prompt.md

# prompt system
You are an assistant who can write comedic monologs in the style of Stephen Colbert.

# prompt user
Tell me about my project.

Now, we need to add information about the project. Doing so is done with mustache templates:

# prompt system
You are an assistant who can write comedic monologues in the style of Stephen Colbert.

# prompt user
Tell me about my project. 

My project uses the following languages:
{{project.languages}}

My project has the following files:
{{project.files}}

Leverage tools in your project

Just like extractors, which can be used to render prompts, we define tools in the form of Docker images. A function image follows the same spec as extractors but in reverse. 

  • The Docker image is automatically bind-mounted to the project.
  • The Docker image entry point is automatically run within the project using –workdir.
  • The first argument will be a JSON payload. This payload is generated when the LLM tries to call our function.
- name: write_files
  description: Write a set of files to my project
  parameters:
    type: object
    properties:
      	files:
        type: array
        items:
          type: object
          properties:
            path:
              type: string
              description: the relative path to the file that should be written
            content:
              type: string
              description: the content that should be written to a file
            executable:
              type: boolean
              description: whether to make the file executable
  container:
    image: vonwig/function_write_files:latest

Test your prompt

  1. Add the file to a Git repository and push to a public remote.
  2. Paste the URL to reference the file on GitHub.

Alternatively, import a local prompt and select the file on your computer.

Screenshot showing text box for entering the URL for the folder.
Figure 3: Add the URL for the folder.

3. Run.

## ROLE assistant

Don't even get me started on the files, I mean, have you ever sat down and really looked at a list of files? This project has got more layers than that seven-layer bean dip I had at last weekend's potluck. This project isn't just files on files, its files within files, its dot something after dot something else – and before you ask: Yes, all of these are REQUIRED!

Coming down to Dockerfile. Now, I've seen some Dockerfiles but our Dockerfile, folks, it's something else. It lifts, it grinds, it effectively orchestrates our code like a veteran conductor at the symphony. We also have multiple templates because who doesn't love a good template, right?

Oh, and did I mention the walkthroughs and the resources? Let's just say this isn't a "teach a man to fish" situation. This is more of a “teach a man to create an entire fishing corporation” scenario. Now THAT'S dedication.

Finally we've got the main.js, and let's be real, is it even a project without a main.js anymore?

As always, feel free to follow along in our new public repo. Everything we’ve discussed in this blog post is available for you to try out on your own projects.

For more on what we’re doing at Docker, subscribe to our newsletter.

Learn more

Adding the ESLint Tool to an AI Assistant: Improving Recommendations for JS/TS Projects

Par : Docker Labs
12 août 2024 à 13:04

This ongoing Docker Labs GenAI series will explore the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real-time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing things as open source so you can play, explore, and hack with us, too.

Can an AI assistant help you write better JavaScript or TypeScript?

Background and introduction

Projects that heavily rely on JavaScript (JS) or TypeScript (TS) are synonymous with the web, so there is a high demand for tools that can improve the consistency and quality of projects using these languages. In previous Docker Labs GenAI posts, we’ve introduced the idea that tools both enable AI assistants to understand our code more and enable them to take action based on this understanding.

In this article, we’ll be trying to enable our AI assistant to provide advice that is both helpful and actionable for linting JS/TS projects and to finally delve into the NPM ecosystem.

2400x1260 docker labs genai

Another simple prompt

As we learned in this previous Docker Labs GenAI article, you won’t get much help asking an LLM to tell you how to lint your project without any details. So, like before, we’re using our “linguist” tool to learn about the languages used in the project and augment the prompt (Figure 1):

How do I lint my project?
{{# linguist }}
This project contains code from the language {{ language }} so if you have any 
recommendations pertaining to {{ language }}, please include them.
{{/linguist}}

What LLMs provide out of the box

Exchange between developer and AI assistant, asking "How do I lint my project?" The response suggests using ESLint.
Figure 1: AI assistant responds with information about ESLint.

In Figure 2, we see that GPT-4 recognizes that ESLint is highly configurable and actually doesn’t work without a config, and so it is trying to provide that for us by either helping us run ESLint’s init tool or by writing a config to use.

Screenshot of AI response providing details for setting up, running, and configuring ESLint.
Figure 2: AI assistant provides information for setting up and running ESLint.

However, this response gives us either a config that does not work for many projects, or a boilerplate setup task for the user to do manually. This is in contrast with other linters, like Pylint or golangci-lint, where linguist was actually enough for the LLM to find a clear path to linting. So, with ESLint, we need to add more knowledge to help the LLM figure this out.

Configuring ESLint

Using StandardJS

StandardJS is a community-led effort to simplify ESLint configurations. Let’s start by nudging the assistant toward using this as a starting point. The ESLint config is published under its own package, StandardJS, so we can add the following prompt:

If there are no ESLint configuration files found, use StandardJS to lint the project with a consistent config.

We will also add a function definition so that our assistant knows how to run StandardJS. Note the container image defined at the bottom of the following definition:

- name: run-standardjs
  description: Lints the current project with StandardJS
  parameters:
    type: object
    properties:
      typescript:
   	type: boolean
   	description: Whether to lint Typescript files
      fix:
   	type: boolean
   	description: Whether to fix the files
      files:
   	type: array
   	items:
          type: string
          description: The filepaths to pass to the linter. Defaults to '.'
    required:
      - typescript
      - fix
  container:
    image: vonwig/standardjs:latest

This definition will work for both TypeScript and JavaScript projects using an argument. The assistant uses the project content to determine how to optimally set the TypeScript property.

When using StandardJS with TypeScript, two things happen in the container:

  • Lints with ts-standard instead of standard
  • Lints ts-standard from the working directory containing tsconfig.json.

But, with the right tools, this behavior is enabled with a single prompt:

When using StandardJS, use Typescript only if there are tsconfigs in the project.

Docker environments

Both ESLint and StandardJS run in Node.js environments. In our current prototype, our assistant uses three different Docker images.

Docker is significant because of the previously mentioned requirement of using ts-standard in a directory with tsconfig.json. When we baked this logic into the Docker image, we effectively introduced a contract bridging the AI Assistant, the linter tool, and the overall structure of the repository.

After determining that a project uses JavaScript or TypeScript, our assistant also adds Git Hooks. (See this GenAI article for details.) Docker gives us a way to reliably distribute these tools.

If we detect `TypeScript` in the project, we can add the following hook to the hooks entry in the `local` repo entry.

```yaml
id: standardjs-docker-ts
name: standardjs linter (TS and JS)
entry: vonwig/standardjs:latest '{"typescript": true, "fix": false}'
language: docker_image
files "\\.(ts|tsx)$"

Fixing violations

Linting output comes in the form of violations. A violation is attached to a range in the code file with the offending code and the violation reason. As mentioned previously, 75% of StandardJS violations are automatically fixable. Can we use the AI assistant to automatically fix the remaining violations?

Respond with an edit to resolve the violation using the following JSON format:

{
  "start": [1,4],
  "end": [1,6],
  "edit": "lorem ipsum"
}

If you take, for example, the lint rule for type casting, all of the models we tested will replace === with ==. Here’s the response when we ask for fixes to lines with the no-this-before-super violation:

Here's the proposed edit to resolve the violation:

```json
{
  "start": [1, 4],
  "end": [2, 6],
  "edit": "class myclass extends base {\n\tval: number;\n\tconstructor()
{\n\t\tsuper();\n\t\tthis.val = 10;\n\t}\n}"
}
```

If these models are able to fix these violations, why doesn’t ESLint just make them automatically fixable? In many of the cases, they represent riskier changes that still require some developer supervision. Perhaps the best thing an assistant can do is present these auto-fixes to the user directly in their editors. For example, a fix that has been generated by our assistant can be presented in VSCode (Figure 3).

A fix that has been generated by the AI assistant  presented in VSCode.
Figure 3: A fix is presented to the user.

Editor complaints

With the rise of tools like GitHub Copilot, developers are now becoming accustomed to assistants being present in their editors (Figure 4).

Screenshot showing Copilot's presence in the editor, with "Fix this" highlighted in green.
Figure 4: AI assistant is present in the editor.

Our work is showing that linting tools can improve the quality of these fixes.

For example, when asking Copilot to fix the line from earlier, it lacks the additional context from ESLint (Figure 5).

Screenshot of AI response, which lacks additional context from ESLint.
Figure 5: Additional context is needed.

The assistant is unable to infer that there is a violation there. In this instance, Copilot is hallucinating because it was triggered by the developer’s editor action without any of the context coming in from the linter. As far as Copilot knows, I just asked it to fix perfectly good code.

To improve this, we can use the output of a linter to “complain” about a violation. The editor allows us to surface a quick action to fix the code. Figure 6 shows the same “fix using Copilot” from the “problems” window, triggered by another violation:

Screenshot showing "Fix using Copilot” in the problems window.
Figure 6: “Fix using Copilot” is shown in the problems window.

This is shown in VSCode’s “problems” window, which helps developers locate problems in the codebase. An assistant can use the editor to put the ESLint tool in a more effective relationship with the developer (Figure 7).

Screenshot showing an immediate resolution, rather than a hallucination from the assistant.
Figure 7: A more complete fix.

Most importantly, we get an immediate resolution rather than a hallucination. We’re also hosting these tools in Docker, so these improvements do not require installs of Node.js, NPM, or ESLint.

Summary

We continue to investigate the use of tools for gathering context and improving suggestions. In this article, we have looked at how AI assistants can provide significant value to developers by:

  • Cutting out busy work setting up Node/NPM/ESLint.
  • Leveraging expert knowledge about ESLint to “level up” developers
  • Generating and surfacing actionable fixes directly to developers where they’re already working (in the editor)
  • Generating simple workflows as outcomes from natural language prompts and tools

As always, feel free to follow along in our new public repo and please reach out. Everything we’ve discussed in this blog post is available for you to try out on your own projects.

For more on what we’re doing at Docker, subscribe to our newsletter.

Learn more

Local LLM Messenger: Chat with GenAI on Your iPhone

23 juillet 2024 à 15:00

In this AI/ML Hackathon post, we want to share another winning project from last year’s Docker AI/ML Hackathon. This time we will dive into Local LLM Messenger, an honorable mention winner created by Justin Garrison.

Developers are pushing the boundaries to bring the power of artificial intelligence (AI) to everyone. One exciting approach involves integrating Large Language Models (LLMs) with familiar messaging platforms like Slack and iMessage. This isn’t just about convenience; it’s about transforming these platforms into launchpads for interacting with powerful AI tools.

AI/ML hackathon

Imagine this: You need a quick code snippet or some help brainstorming solutions to coding problems. With LLMs integrated into your messaging app, you can chat with your AI assistant directly within the familiar interface to generate creative ideas or get help brainstorming solutions. No more complex commands or clunky interfaces — just a natural conversation to unlock the power of AI.

Integrating with messaging platforms can be a time-consuming task, especially for macOS users. That’s where Local LLM Messenger (LoLLMM) steps in, offering a streamlined solution for connecting with your AI via iMessage. 

What makes LoLLM Messenger unique?

The following demo, which was submitted to the AI/ML Hackathon, provides an overview of LoLLM Messenger (Figure 1).

Figure 1: Demo of LoLLM Messenger as submitted to the AI/ML Hackathon.

The LoLLM Messenger bot allows you to send iMessages to Generative AI (GenAI) models running directly on your computer. This approach eliminates the need for complex setups and cloud services, making it easier for developers to experiment with LLMs locally.

Key features of LoLLM Messenger

LoLLM Messenger includes impressive features that make it a standout among similar projects, such as:

  • Local execution: Runs on your computer, eliminating the need for cloud-based services and ensuring data privacy.
  • Scalability: Handles multiple AI models simultaneously, allowing users to experiment with different models and switch between them easily.
  • User-friendly interface: Offers a simple and intuitive interface, making it accessible to users of all skill levels.
  • Integration with Sendblue: Integrates seamlessly with Sendblue, enabling users to send iMessages to the bot and receive responses directly in their inbox.
  • Support for ChatGPT: Supports the GPT-3.5 Turbo and DALL-E 2 models, providing users with access to powerful AI capabilities.
  • Customization: Allows users to customize the bot’s behavior by modifying the available commands and integrating their own AI models.

How does it work?

The architecture diagram shown in Figure 2 provides a high-level overview of the components and interactions within the LoLLM Messenger project. It illustrates how the main application, AI models, messaging platform, and external APIs work together to enable users to send iMessages to AI models running on their computers.

Illustration showing components and processes in LoLLM Messenger, including User, SendBlue API, Docker, and AI Models.
Figure 2: Overview of the components and interactions in the LoLLM Messenger project.

By leveraging Docker, Sendblue, and Ollama, LoLLM Messenger offers a seamless and efficient solution for those seeking to explore AI models without the need for cloud-based services. LoLLM Messenger utilizes Docker Compose to manage the required services. 

Docker Compose simplifies the process by handling the setup and configuration of multiple containers, including the main application, ngrok (for creating a secure tunnel), and Ollama (a server that bridges the gap between messaging apps and AI models).

Technical stack

The LoLLM Messenger tech stack includes:

  • Lollmm service: This service is responsible for running the main application. It handles incoming iMessages, processing user requests, and interacting with the AI models. The lollmm service communicates with the Ollama model, which is a powerful AI model for text and image generation.
  • Ngrok: This service is used to expose the main application’s port 8000 to the internet using ngrok. It runs in the Alpine image and forwards traffic from port 8000 to the ngrok tunnel. The service is set to run in the host network mode.
  • Ollama: This service runs the Ollama model, which is a powerful AI model for text and image generation. It listens on port 11434 and mounts a volume from ./run/ollama to /home/ollama. The service is set to deploy with GPU resources, ensuring that it can utilize an NVIDIA GPU if available.
  • Sendblue: The project integrates with Sendblue to handle iMessages. You can set up Sendblue by adding your API Key and API Secret in the app/.env file and adding your phone number as a Sendblue contact.

Getting started

To get started, ensure that you have installed and set up the following components:

Clone the repository

Open a terminal window and run the following command to clone this sample application:

git clone https://github.com/dockersamples/local-llm-messenger

You should now have the following files in your local-llm-messenger directory:

.
├── LICENSE
├── README.md
├── app
│   ├── Dockerfile
│   ├── Pipfile
│   ├── Pipfile.lock
│   ├── default.ai
│   ├── log_conf.yaml
│   └── main.py
├── docker-compose.yaml
├── img
│   ├── banner.png
│   ├── lasers.gif
│   └── lollm-demo-1.gif
├── justfile
└── test
    ├── msg.json
    └── ollama.json

4 directories, 15 files

The script main.py file under the /app directory is a Python script that uses the FastAPI framework to create a web server for an AI-powered messaging application. The script interacts with OpenAI’s GPT-3 model and an Ollama endpoint for generating responses. It uses Sendblue’s API for sending messages.

The script first imports necessary libraries, including FastAPI, requests, logging, and other required modules.

from dotenv import load_dotenv
import os, requests, time, openai, json, logging
from pprint import pprint
from typing import Union, List

from fastapi import FastAPI
from pydantic import BaseModel

from sendblue import Sendblue

This section sets up configuration variables, such as API keys, callback URL, Ollama API endpoint, and maximum context and word limits.

SENDBLUE_API_KEY = os.environ.get("SENDBLUE_API_KEY")
SENDBLUE_API_SECRET = os.environ.get("SENDBLUE_API_SECRET")
openai.api_key = os.environ.get("OPENAI_API_KEY")
OLLAMA_API = os.environ.get("OLLAMA_API_ENDPOINT", "http://ollama:11434/api")
# could also use request.headers.get('referer') to do dynamically
CALLBACK_URL = os.environ.get("CALLBACK_URL")
MAX_WORDS = os.environ.get("MAX_WORDS")

Next, the script performs the logging configuration, setting the log level to INFO. Creates a file handler for logging messages to a file named app.log.

It then defines various functions for interacting with the AI models, managing context, sending messages, handling callbacks, and executing slash commands.

def set_default_model(model: str):
    try:
        with open("default.ai", "w") as f:
            f.write(model)
            f.close()
            return
    except IOError:
        logger.error("Could not open file")
        exit(1)


def get_default_model() -> str:
    try:
        with open("default.ai") as f:
            default = f.readline().strip("\n")
            f.close()
            if default != "":
                return default
            else:
                set_default_model("llama2:latest")
                return ""
    except IOError:
        logger.error("Could not open file")
        exit(1)


def validate_model(model: str) -> bool:
    available_models = get_model_list()
    if model in available_models:
        return True
    else:
        return False


def get_ollama_model_list() -> List[str]:
    available_models = []
   
    tags = requests.get(OLLAMA_API + "/tags")
    all_models = json.loads(tags.text)
    for model in all_models["models"]:
        available_models.append(model["name"])
    return available_models


def get_openai_model_list() -> List[str]:
    return ["gpt-3.5-turbo", "dall-e-2"]


def get_model_list() -> List[str]:
    ollama_models = []
    openai_models = []
    all_models = []
    if "OPENAI_API_KEY" in os.environ:
        # print(openai.Model.list())
        openai_models = get_openai_model_list()

    ollama_models = get_ollama_model_list()
    all_models = ollama_models + openai_models
    return all_models


DEFAULT_MODEL = get_default_model()

if DEFAULT_MODEL == "":
    # This is probably the first run so we need to install a model
    if "OPENAI_API_KEY" in os.environ:
        print("No default model set. openai is enabled. using gpt-3.5-turbo")
        DEFAULT_MODEL = "gpt-3.5-turbo"
    else:
        print("No model found and openai not enabled. Installing llama2:latest")
        pull_data = '{"name": "llama2:latest","stream": false}'
        try:
            pull_resp = requests.post(OLLAMA_API + "/pull", data=pull_data)
            pull_resp.raise_for_status()
        except requests.exceptions.HTTPError as err:
            raise SystemExit(err)
        set_default_model("llama2:latest")
        DEFAULT_MODEL = "llama2:latest"


if validate_model(DEFAULT_MODEL):
    logger.info("Using model: " + DEFAULT_MODEL)
else:
    logger.error("Model " + DEFAULT_MODEL + " not available.")
    logger.info(get_model_list())

    pull_data = '{"name": "' + DEFAULT_MODEL + '","stream": false}'
    try:
        pull_resp = requests.post(OLLAMA_API + "/pull", data=pull_data)
        pull_resp.raise_for_status()
    except requests.exceptions.HTTPError as err:
        raise SystemExit(err)



def set_msg_send_style(received_msg: str):
    """Will return a style for the message to send based on matched words in received message"""
    celebration_match = ["happy"]
    shooting_star_match = ["star", "stars"]
    fireworks_match = ["celebrate", "firework"]
    lasers_match = ["cool", "lasers", "laser"]
    love_match = ["love"]
    confetti_match = ["yay"]
    balloons_match = ["party"]
    echo_match = ["what did you say"]
    invisible_match = ["quietly"]
    gentle_match = []
    loud_match = ["hear"]
    slam_match = []

    received_msg_lower = received_msg.lower()
    if any(x in received_msg_lower for x in celebration_match):
        return "celebration"
    elif any(x in received_msg_lower for x in shooting_star_match):
        return "shooting_star"
    elif any(x in received_msg_lower for x in fireworks_match):
        return "fireworks"
    elif any(x in received_msg_lower for x in lasers_match):
        return "lasers"
    elif any(x in received_msg_lower for x in love_match):
        return "love"
    elif any(x in received_msg_lower for x in confetti_match):
        return "confetti"
    elif any(x in received_msg_lower for x in balloons_match):
        return "balloons"
    elif any(x in received_msg_lower for x in echo_match):
        return "echo"
    elif any(x in received_msg_lower for x in invisible_match):
        return "invisible"
    elif any(x in received_msg_lower for x in gentle_match):
        return "gentle"
    elif any(x in received_msg_lower for x in loud_match):
        return "loud"
    elif any(x in received_msg_lower for x in slam_match):
        return "slam"
    else:
        return

Two classes, Msg and Callback, are defined to represent the structure of incoming messages and callback data. The code also includes various functions and classes to handle different aspects of the messaging platform, such as setting default models, validating models, interacting with the Sendblue API, and processing messages. It also includes functions to handle slash commands, create messages from context, and append context to a file.

class Msg(BaseModel):
    accountEmail: str
    content: str
    media_url: str
    is_outbound: bool
    status: str
    error_code: int | None = None
    error_message: str | None = None
    message_handle: str
    date_sent: str
    date_updated: str
    from_number: str
    number: str
    to_number: str
    was_downgraded: bool | None = None
    plan: str


class Callback(BaseModel):
    accountEmail: str
    content: str
    is_outbound: bool
    status: str
    error_code: int | None = None
    error_message: str | None = None
    message_handle: str
    date_sent: str
    date_updated: str
    from_number: str
    number: str
    to_number: str
    was_downgraded: bool | None = None
    plan: str



def msg_openai(msg: Msg, model="gpt-3.5-turbo"):
    """Sends a message to openai"""
    message_with_context = create_messages_from_context("openai")

    # Add the user's message and system context to the messages list
    messages = [
        {"role": "user", "content": msg.content},
        {"role": "system", "content": "You are an AI assistant. You will answer in haiku."},
    ]

    # Convert JSON strings to Python dictionaries and add them to messages
    messages.extend(
        [
            json.loads(line)  # Convert each JSON string back into a dictionary
            for line in message_with_context
        ]
    )

    # Send the messages to the OpenAI model
    gpt_resp = client.chat.completions.create(
        model=model,
        messages=messages,
    )

    # Append the system context to the context file
    append_context("system", gpt_resp.choices[0].message.content)

    # Send a message to the sender
    msg_response = sendblue.send_message(
        msg.from_number,
        {
            "content": gpt_resp.choices[0].message.content,
            "status_callback": CALLBACK_URL,
        },
    )
    
    return




def msg_ollama(msg: Msg, model=None):
    """Sends a message to the ollama endpoint"""
    if model is None:
        logger.error("Model is None when calling msg_ollama")
        return  # Optionally handle the case more gracefully

    ollama_headers = {"Content-Type": "application/json"}
    ollama_data = (
        '{"model":"' + model +
        '", "stream": false, "prompt":"' +
        msg.content +
        " in under " +
        str(MAX_WORDS) +  # Make sure MAX_WORDS is a string
        ' words"}'
    )
    ollama_resp = requests.post(
        OLLAMA_API + "/generate", headers=ollama_headers, data=ollama_data
    )
    response_dict = json.loads(ollama_resp.text)
    if ollama_resp.ok:
        send_style = set_msg_send_style(msg.content)
        append_context("system", response_dict["response"])
        msg_response = sendblue.send_message(
            msg.from_number,
            {
                "content": response_dict["response"],
                "status_callback": CALLBACK_URL,
                "send_style": send_style,
            },
        )
    else:
        msg_response = sendblue.send_message(
            msg.from_number,
            {
                "content": "I'm sorry, I had a problem processing that question. Please try again.",
                "status_callback": CALLBACK_URL,
            },
        )
    return

Navigate to the app/ directory and create a new file for adding environment variables.

touch .env
SENDBLUE_API_KEY=your_sendblue_api_key
SENDBLUE_API_SECRET=your_sendblue_api_secret
OLLAMA_API_ENDPOINT=http://host.docker.internal:11434/api
OPENAI_API_KEY=your_openai_api_key

Next, add the ngrok authtoken to the Docker Compose file. You can get the authtoken from this link.

services:
  lollm:
    build: ./app
    # command:
      # - sleep
      # - 1d
    ports:
      - 8000:8000
    env_file: ./app/.env
    volumes:
      - ./run/lollm:/run/lollm
    depends_on:
      - ollama
    restart: unless-stopped
    network_mode: "host"
  ngrok:
    image: ngrok/ngrok:alpine
    command:
      - "http"
      - "8000"
      - "--log"
      - "stdout"
    environment:
      - NGROK_AUTHTOKEN=2i6iXXXXXXXXhpqk1aY1
    network_mode: "host"
  ollama:
    image: ollama/ollama
    ports:
      - 11434:11434
    volumes:
      - ./run/ollama:/home/ollama
    network_mode: "host"

Running the application stack

Next, you can run the application stack, as follows:

$ docker compose up

You will see output similar to the following:

[+] Running 4/4
 ✔ Container local-llm-messenger-ollama-1                           Create...                                          0.0s
 ✔ Container local-llm-messenger-ngrok-1                            Created                                            0.0s
 ✔ Container local-llm-messenger-lollm-1                            Recreat...                                         0.1s
 ! lollm Published ports are discarded when using host network mode                                                    0.0s
Attaching to lollm-1, ngrok-1, ollama-1
ollama-1  | 2024/06/20 03:14:46 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
ollama-1  | time=2024-06-20T03:14:46.308Z level=INFO source=images.go:725 msg="total blobs: 0"
ollama-1  | time=2024-06-20T03:14:46.309Z level=INFO source=images.go:732 msg="total unused blobs removed: 0"
ollama-1  | time=2024-06-20T03:14:46.309Z level=INFO source=routes.go:1057 msg="Listening on [::]:11434 (version 0.1.44)"
ollama-1  | time=2024-06-20T03:14:46.309Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2210839504/runners
ngrok-1   | t=2024-06-20T03:14:46+0000 lvl=info msg="open config file" path=/var/lib/ngrok/ngrok.yml err=nil
ngrok-1   | t=2024-06-20T03:14:46+0000 lvl=info msg="open config file" path=/var/lib/ngrok/auth-config.yml err=nil
ngrok-1   | t=2024-06-20T03:14:46+0000 lvl=info msg="starting web service" obj=web addr=0.0.0.0:4040 allow_hosts=[]
ngrok-1   | t=2024-06-20T03:14:46+0000 lvl=info msg="client session established" obj=tunnels.session
ngrok-1   | t=2024-06-20T03:14:46+0000 lvl=info msg="tunnel session started" obj=tunnels.session
ngrok-1   | t=2024-06-20T03:14:46+0000 lvl=info msg="started tunnel" obj=tunnels name=command_line addr=http://localhost:8000 url=https://94e1-223-185-128-160.ngrok-free.app
ollama-1  | time=2024-06-20T03:14:48.602Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cuda_v11]"
ollama-1  | time=2024-06-20T03:14:48.603Z level=INFO source=types.go:71 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="7.7 GiB" available="3.9 GiB"
lollm-1   | INFO:     Started server process [1]
lollm-1   | INFO:     Waiting for application startup.
lollm-1   | INFO:     Application startup complete.
lollm-1   | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
ngrok-1   | t=2024-06-20T03:16:58+0000 lvl=info msg="join connections" obj=join id=ce119162e042 l=127.0.0.1:8000 r=[2401:4900:8838:8063:f0b0:1866:e957:b3ba]:54384
lollm-1   | OLLAMA API IS http://host.docker.internal:11434/api
lollm-1   | INFO:     2401:4900:8838:8063:f0b0:1866:e957:b3ba:0 - "GET / HTTP/1.1" 200 OK

If you’re testing it on a system without an NVIDIA GPU, then you can skip the deploy attribute of the Compose file. 

Watch the output for your ngrok endpoint. In our case, it shows: https://94e1-223-185-128-160.ngrok-free.app/

Next, append /msg to the following ngrok webhooks URL: https://94e1-223-185-128-160.ngrok-free.app/

Then, add it under the webhooks URL section on Sendblue and save it (Figure 3).  The ngrok service is configured to expose the lollmm service on port 8000 and provide a secure tunnel to the public internet using the ngrok.io domain. 

The ngrok service logs indicate that it has started the web service and established a client session with the tunnels. They also show that the tunnel session has started and has been successfully established with the lollmm service.

The ngrok service is configured to use the specified ngrok authentication token, which is required to access the ngrok service. Overall, the ngrok service is running correctly and is able to establish a secure tunnel to the lollmm service.

Screenshot of SendBlue showing  addition of ngrok authentication token to Webhooks.
Figure 3: Adding ngrok authentication token to webhooks.

Ensure that there are no error logs when you run the ngrok container (Figure 4).

Screenshot showing local-llm-messenger-ngrok-1 log output.
Figure 4: Checking the logs for errors.

Ensure that the LoLLM Messenger container is actively up and running (Figure 5).

Screen showing local-llm-messenger-ngrok-1 status.
Figure 5: Ensure the LoLLM Messenger container is running.

The logs show that the Ollama service has opened the specified port (11434) and is listening for incoming connections. The logs also indicate that the Ollama service has mounted the /home/ollama directory from the host machine to the /home/ollama directory within the container.

Overall, the Ollama service is running correctly and is ready to provide AI models for inference.

Testing the functionality

To test the functionality of the lollm service, you first need to add your contact number to the Sendblue dashboard. Then you should be able to send messages to the Sendblue number and observe the responses from the lollmm service (Figure 6).

 iMessage image showing messages sent to the SendBlue number and responses from the lollm service.
Figure 6: Testing functionality of lollm service.

The Sendblue platform will send HTTP requests to the /msg endpoint of your lollmm service, and your lollmm service will process these requests and return the appropriate responses.

  1. The lollmm service is set up to listen on port 8000.
  2. The ngrok tunnel is started and provides a public URL, such as https://94e1-223-185-128-160.ngrok-free.app.
  3. The lollmm service receives HTTP requests from the ngrok tunnel, including GET requests to the root path (/) and other paths, such as /favicon.ico, /predict, /mdg, and /msg.
  4. The lollmm service responds to these requests with appropriate HTTP status codes, such as 200 OK for successful requests and 404 Not Found for requests to paths that do not exist.
  5. The ngrok tunnel logs the join connections, indicating that clients are connecting to the lollmm service through the ngrok tunnel.
iMessage image showing requests (/list and /help) and responses in chat.
Figure 7: Sending requests and receiving responses.

The first time you chat with LLM by typing /list (Figure 7), you can check the logs as shown:

ngrok-1   | t=2024-07-09T02:34:30+0000 lvl=info msg="join connections" obj=join id=12bd50a8030b l=127.0.0.1:8000 r=18.223.220.3:44370
lollm-1   | OLLAMA API IS http://host.docker.internal:11434/api
lollm-1   | INFO:     18.223.220.3:0 - "POST /msg HTTP/1.1" 200 OK
ngrok-1   | t=2024-07-09T02:34:53+0000 lvl=info msg="join connections" obj=join id=259fda936691 l=127.0.0.1:8000 r=18.223.220.3:36712
lollm-1   | INFO:     18.223.220.3:0 - "POST /msg HTTP/1.1" 200 OK

Next, let’s install the codellama model by typing /install codellama:latest (Figure 8).

iMessage image installation of codellama model by typing /install codellama:latest.
Figure 8: Installing the `codellama` model.

You can see the following container logs once you set the default model to codellama:latest as shown:

ngrok-1   | t=2024-07-09T03:39:23+0000 lvl=info msg="join connections" obj=join id=026d8fad5c87 l=127.0.0.1:8000 r=18.223.220.3:36282
lollm-1   | setting default model 
lollm-1   | INFO:     18.223.220.3:0 - "POST /msg HTTP/1.1" 200 OK

The lollmm service is running correctly and can handle HTTP requests from the ngrok tunnel. You can use the ngrok tunnel URL to test the functionality of the lollmm service by sending HTTP requests to the appropriate paths (Figure 9).

iMessage image showing sample questions sent to test functionality, such as "Who won the FIFA World Cup 2022?".
Figure 9: Testing the messaging functionality.

Conclusion

LoLLM Messenger is a valuable tool for developers and enthusiasts looking to push the boundaries of LLM integration within messaging apps. It allows developers to craft custom chatbots for specific needs, add real-time sentiment analysis to messages, or explore entirely new AI features in your messaging experience. 

To get started, you can explore the LoLLM Messenger project on GitHub and discover the potential of local LLM.

Learn more

💾

Install multiple AI models with ollama and message them via sendblue.None of these tools were sponsored. I built this app for the Docker AI/ML Hackathon http...

How an AI Assistant Can Help Configure Your Project’s Git Hooks

Par : Docker Labs
15 juillet 2024 à 13:00

This ongoing Docker Labs GenAI series will explore the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real-time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing things as open source so you can play, explore, and hack with us, too.

Can an AI assistant help configure your project’s Git hooks? 

Git hooks can save manual work during repetitive tasks, such as authoring commit messages or running checks before committing an update. But they can also be hard to configure, and are project dependent. Can generative AI make Git hooks easier to configure and use?

2400x1260 docker labs genai

Simple prompts

From a high level, the basic prompt is simple:

How do I set up git hooks?

Although this includes no details about the actual project, the response from many foundation models is still useful. If you run this prompt in ChatGPT, you’ll see that the response contains details about how to use the .git/hooks folder, hints about authoring hook scripts, and even practical next steps for what you’ll need to learn about next. However, the advice is general. It has not been grounded by your project.

Project context

Your project itself is an important source of information for an assistant. Let’s start by providing information about types of source code in a project. Fortunately, there are plenty of existing tools for extracting project context, and these tools are often already available in Docker containers. 

For example, here’s an image that will analyze any Git repository and return a list of languages being used. Let’s update our prompt with this new context.

How do I set up git hooks?

{{# linguist }}
This project contains code from the language {{ language }} so if you have any 
recommendations pertaining to {{ language }}, please include them.
{{/linguist}}

In this example, we use moustache templates to bind the output of our “linguist” analysis into this prompt.

The response from an LLM-powered assistant will change dramatically. Armed with specific advice about what kinds of files might be changed, the LLM will generate sample scripts and make suggestions about specific tools that might be useful for the kinds of code developed in this project. It might even be possible to cut and paste code out of the response to try setting up hooks yourself. 

The pattern is quite simple. We already have tools to analyze projects, so let’s plug these in locally and give the LLM more context to make better suggestions (Figure 1).

Flow chart illustration showing addition of tools to provide project context for the LLM. The process includes Project, Tool (go-linguist), Docker Desktop, Prompt templates and LLM.
Figure 1: Adding tools to provide context for LLM.

Expertise

Generative AI also offers new opportunities for experts to contribute knowledge that AI assistants can leverage to become even more useful. For example, we have learned that pre-commit can be helpful to organize the set of tools used to implement Git hooks. 

To represent this learning, we add this prompt:

When configuring git hooks, our organization uses a tool called
[pre-commit](https://github.com/pre-commit/pre-commit).

There’s also a base configuration that we have found useful in all projects. We also add that to our assistant’s knowledge base.

If a user wants to configure git hooks, use this template which will need to be written to pre-commit-config.yaml 
in the root of the user's project.

Start with the following code block:

```yaml
repos:
    - repo: http://github.com/pre-commit/pre-commit-hooks
      rev: v2.3.0
      hooks:
          - id: check-yaml
          - id: trailing-whitespace
          - id: check-merge-conflict
    - repo https://github.com/jorisroovers/gitlint
      rev: main
      hooks:
          - id: gitlint
    - repo: local
      hooks:
```

Finally, as we learn about new tools that are useful for certain projects, we describe this information. For example, as an expert, I might want to suggest that teams using Golang include a particular linting tool in the Git hooks configuration.

If we detect `Go` in the project, add the following hook to the hooks entry in the `local` repo entry.

```yaml
id: golangcli-lint
name: golang cli
entry: golangci/golangci-lint
files "\\.go$"
```

With these additions, the response from our assistant becomes precise. We have found that our assistant can now write hooks scripts and write complete YAML configuration files that are project-specific and ready to copy directly into a project. 

Somewhat surprisingly, the assistant can also now recommend tools not mentioned explicitly in our prompts but that use the same syntax established for other tools. Using these examples, the LLM appears to be capable of extending the assistant’s capabilities to other tools. Using our examples as guidance, the LLM suggests new tools but still configures them using our suggested framework and syntax.

Most importantly, the response from the assistant is now not only actionable to the developer, saving them time, but it is also specific enough that we could pass the response to a simple agent to take the action automatically.

Adding tools

For this example, the only tool we really need is a file-writer. The change to our prompt is to add one instruction to go ahead and write the configuration into the project.

Write the final yaml content to our project at the path pre-commit-config.yaml.  Write both the `pre-commit` and `commit-message` scripts to `git/hooks` and make them executable.

Besides the prompt, there is another crucial step that we are skipping at this point. The assistant must be told that it is capable of writing content into files. However, this is really just a registration step. 

The important thing is that we can give our agent the tools it needs to perform tasks. In doing so, the response from the LLM undergoes a transition. Instead of text output, the LLM responds with instructions for our agent. If we’re using an OpenAI function call, we’ll see a request that looks something like the following .json file. It’s not meant to be read by us, of course. It’s an instruction to the agent that knows how to update your project for you.

{
  "id": "call_4LCo0CQqCHCGGZea3qlaTg5h",
  "type": "function",
  "function": {
    "name": "write_file",
    "arguments": "{\n  \"path\": \"pre-commit-config.yaml\",\n  \"content\": \"repos:\\n    - repo: http://github.com/pre-commi
t/pre-commit-hooks\\n      rev: v2.3.0\\n      hooks:\\n          - id: check-yaml\\n          - id: trailing-whitespace\\n          - id
: check-merge-conflict\\n    - repo https://github.com/jorisroovers/gitlint\\n      rev: main\\n      hooks:\\n          - id: gitlint\\n
    - repo: local\\n      hooks:\\n          - id: markdownlint\\n            name: markdown linter\\n            entry: markdownlint/mar
kdownlint\\n            files: \\\"\\\\.md$\\\"\\n          - id: python-black\\n            name: python black formatter\\n            e
ntry: black\\n            files: \\\"\\\\.py$\\\"\"\n}"
    }
}

A more sophisticated version of the file-writer function might communicate with an editor agent capable of presenting recommended file changes to a developer using native IDE concepts, like editor quick-fixes and hints. In other words, tools can help generative AI to meet developers where they are. And the answer to the question:

How do I set up git hooks?

becomes, “Let me just show you.”

Docker as tool engine

The tools mentioned in the previous sections have all been delivered as Docker containers.  One goal of this work has been to verify that an assistant can bootstrap itself starting from a Docker-only environment. Docker is important here because it has been critical in smoothing over many of the system/environment gaps that LLMs struggle with. 

We have observed that a significant barrier to activating even simple local assistants is the complexity of managing a safe and reliable environment for running these tools. Therefore, we are constraining ourselves to use only tools that can be lazily pulled from public registries.

For AI assistants to transform how we consume tools, we believe that both tool distribution and knowledge distribution are key factors. In the above example, we can see how LLM responses can be transformed by tools from unactionable and vague to hyper-project-focused and actionable. The difference is tools.

To follow along with this effort, check out the GitHub repository for this project.

Learn more

How to Run Hugging Face Models Programmatically Using Ollama and Testcontainers

11 juillet 2024 à 13:00

Hugging Face now hosts more than 700,000 models, with the number continuously rising. It has become the premier repository for AI/ML models, catering to both general and highly specialized needs.

As the adoption of AI/ML models accelerates, more application developers are eager to integrate them into their projects. However, the entry barrier remains high due to the complexity of setup and lack of developer-friendly tools. Imagine if deploying an AI/ML model could be as straightforward as spinning up a database. Intrigued? Keep reading to find out how.

2400x1260 how to run hugging face models programmatically using ollama and testcontainers

Introduction to Ollama and Testcontainers

Recently, Ollama announced support for running models from Hugging Face. This development is exciting because it brings the rich ecosystem of AI/ML components from Hugging Face to Ollama end users, who are often developers. 

Testcontainers libraries already provide an Ollama module, making it straightforward to spin up a container with Ollama without needing to know the details of how to run Ollama using Docker:

import org.testcontainers.ollama.OllamaContainer; 

var ollama = new OllamaContainer("ollama/ollama:0.1.44"); 
ollama.start();

These lines of code are all that is needed to have Ollama running inside a Docker container effortlessly.

Running models in Ollama

By default, Ollama does not include any models, so you need to download the one you want to use. With Testcontainers, this step is straightforward by leveraging the execInContainer API provided by Testcontainers:

ollama.execInContainer("ollama", "pull", "moondream");

At this point, you have the moondream model ready to be used via the Ollama API. 

Excited to try it out? Hold on for a bit. This model is running in a container, so what happens if the container dies? Will you need to spin up a new container and pull the model again? Ideally not, as these models can be quite large.

Thankfully, Testcontainers makes it easy to handle this scenario, by providing an easy-to-use API to commit a container image programmatically:

public void createImage(String imageName) {
var ollama = new OllamaContainer("ollama/ollama:0.1.44");
ollama.start();
ollama.execInContainer("ollama", "pull", "moondream");
ollama.commitToImage(imageName);
}

This code creates an image from the container with the model included. In subsequent runs, you can create a container from that image, and the model will already be present. Here’s the pattern:

var imageName = "tc-ollama-moondream";
var ollama = new OllamaContainer(DockerImageName.parse(imageName)
.asCompatibleSubstituteFor("ollama/ollama:0.1.44"));
try {
ollama.start();
} catch (ContainerFetchException ex) {
// If image doesn't exist, create it. Subsequent runs will reuse the image.
createImage(imageName);
ollama.start();
}

Now, you have a model ready to be used, and because it is running in Ollama, you can interact with its API:

var image = getImageInBase64("/whale.jpeg");
String response = given()
.baseUri(ollama.getEndpoint())
.header(new Header("Content-Type", "application/json"))
.body(new CompletionRequest("moondream:latest", "Describe the image.", Collections.singletonList(image), false))
.post("/api/generate")
.getBody().as(CompletionResponse.class).response();

System.out.println("Response from LLM " + response);

Using Hugging Face models

The previous example demonstrated using a model already provided by Ollama. However, with the ability to use Hugging Face models in Ollama, your available model options have now expanded by thousands. 

To use a model from Hugging Face in Ollama, you need a GGUF file for the model. Currently, there are 20,647 models available in GGUF format. How cool is that?

The steps to run a Hugging Face model in Ollama are straightforward, but we’ve simplified the process further by scripting it into a custom OllamaHuggingFaceContainer. Note that this custom container is not part of the default library, so you can copy and paste the implementation of OllamaHuggingFaceContainer and customize it to suit your needs.

To run a Hugging Face model, do the following:

public void createImage(String imageName, String repository, String model) {
var model = new OllamaHuggingFaceContainer.HuggingFaceModel(repository, model);
var huggingFaceContainer = new OllamaHuggingFaceContainer(hfModel);
huggingFaceContainer.start();
huggingFaceContainer.commitToImage(imageName);
}

By providing the repository name and the model file as shown, you can run Hugging Face models in Ollama via Testcontainers. 

You can find an example using an embedding model and an example using a chat model on GitHub.

Customize your container

One key strength of using Testcontainers is its flexibility in customizing container setups to fit specific project needs by encapsulating complex setups into manageable containers. 

For example, you can create a custom container tailored to your requirements. Here’s an example of TinyLlama, a specialized container for spinning up the DavidAU/DistiLabelOrca-TinyLLama-1.1B-Q8_0-GGUF model from Hugging Face:

public class TinyLlama extends OllamaContainer {

    private final String imageName;

    public TinyLlama(String imageName) {
        super(DockerImageName.parse(imageName)
.asCompatibleSubstituteFor("ollama/ollama:0.1.44"));
        this.imageName = imageName;
    }

    public void createImage(String imageName) {
        var ollama = new OllamaContainer("ollama/ollama:0.1.44");
        ollama.start();
        try {
            ollama.execInContainer("apt-get", "update");
            ollama.execInContainer("apt-get", "upgrade", "-y");
            ollama.execInContainer("apt-get", "install", "-y", "python3-pip");
            ollama.execInContainer("pip", "install", "huggingface-hub");
            ollama.execInContainer(
                    "huggingface-cli",
                    "download",
                    "DavidAU/DistiLabelOrca-TinyLLama-1.1B-Q8_0-GGUF",
                    "distilabelorca-tinyllama-1.1b.Q8_0.gguf",
                    "--local-dir",
                    "."
            );
            ollama.execInContainer(
                    "sh",
                    "-c",
                    String.format("echo '%s' > Modelfile", "FROM distilabelorca-tinyllama-1.1b.Q8_0.gguf")
            );
            ollama.execInContainer("ollama", "create", "distilabelorca-tinyllama-1.1b.Q8_0.gguf", "-f", "Modelfile");
            ollama.execInContainer("rm", "distilabelorca-tinyllama-1.1b.Q8_0.gguf");
            ollama.commitToImage(imageName);
        } catch (IOException | InterruptedException e) {
            throw new ContainerFetchException(e.getMessage());
        }
    }

    public String getModelName() {
        return "distilabelorca-tinyllama-1.1b.Q8_0.gguf";
    }

    @Override
    public void start() {
        try {
            super.start();
        } catch (ContainerFetchException ex) {
            // If image doesn't exist, create it. Subsequent runs will reuse the image.
            createImage(imageName);
            super.start();
        }
    }
}

Once defined, you can easily instantiate and utilize your custom container in your application:

var tinyLlama = new TinyLlama("example");
tinyLlama.start();
String response = given()
.baseUri(tinyLlama.getEndpoint())
.header(new Header("Content-Type", "application/json"))
.body(new CompletionRequest(tinyLlama.getModelName() + ":latest", List.of(new Message("user", "What is the capital of France?")), false))
.post("/api/chat")
.getBody().as(ChatResponse.class).message.content;
System.out.println("Response from LLM " + response);

Note how all the implementation details are under the cover of the TinyLlama class, and the end user doesn’t need to know how to actually install the model into Ollama, what GGUF is, or that to get huggingface-cli you need to pip install huggingface-hub.

Advantages of this approach

  • Programmatic access: Developers gain seamless programmatic access to the Hugging Face ecosystem.
  • Reproducible configuration: All configuration, from setup to lifecycle management is codified, ensuring reproducibility across team members and CI environments.
  • Familiar workflows: By using containers, developers familiar with containerization can easily integrate AI/ML models, making the process more accessible.
  • Automated setups: Provides a straightforward clone-and-run experience for developers.

This approach leverages the strengths of both Hugging Face and Ollama, supported by the automation and encapsulation provided by the Testcontainers module, making powerful AI tools more accessible and manageable for developers across different ecosystems.

Conclusion

Integrating AI models into applications need not be a daunting task. By leveraging Ollama and Testcontainers, developers can seamlessly incorporate Hugging Face models into their projects with minimal effort. This approach not only simplifies the setup of the development environment process but also ensures reproducibility and ease of use. With the ability to programmatically manage models and containerize them for consistent environments, developers can focus on building innovative solutions without getting bogged down by complex setup procedures.

The combination of Ollama’s support for Hugging Face models and Testcontainers’ robust container management capabilities provides a powerful toolkit for modern AI development. As AI continues to evolve and expand, these tools will play a crucial role in making advanced models accessible and manageable for developers across various fields. So, dive in, experiment with different models, and unlock the potential of AI in your applications today.

Stay current on the latest Docker news. Subscribe to the Docker Newsletter.

Learn more

Using Generative AI to Create Runnable Markdown

Par : Docker Labs
1 juillet 2024 à 13:00

This ongoing GenAI Docker Labs series will explore the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real-time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing things as open source so you can play, explore, and hack with us, too.

Generative AI (GenAI) is changing how we interact with tools. Today, we might experience this predominantly through the use of new AI-powered chat assistants, but there are other opportunities for generative AI to improve the life of a developer.

When developers start working on a new project, they need to get up to speed on the tools used in that project. A common practice is to document these practices in a project README.md and to version that documentation along with the project. 

Can we use generative AI to generate this content? We want this content to represent best practices for how tools should be used in general but, more importantly, how tools should be used in this particular project.

We can think of this as a kind of conversation between developers, agents representing tools used by a project, and the project itself. Let’s look at this for the Docker tool itself.

2400x1260 docker labs genai

Generating Markdown in VSCode

For this project, we have written a VSCode extension that adds one new command called “Generate a runbook for this project.” Figure 1 shows it in action:

Animated gif showing VSCode extension to generate a runbook for this project.
Figure 1: VSCode extension to generate a runbook.

This approach combines prompts written by tool experts with knowledge about the project itself. This combined context improves the LLM’s ability to generate documentation (Figure 2).

 Illustration showing process flow from expert prompts plus project facts to LLM.
Figure 2: This approach combines expert prompts with knowledge about the project itself.

Although we’re illustrating this idea on a tool that we know very well (Docker!), the idea of generating content in this manner is quite generic. The prompts we used for getting started with the Docker build, run, and compose are available from GitHub. There is certainly an art to writing these prompts, but we think that tool experts have the right knowledge to create prompts of this kind, especially if AI assistants can then help them make their work easier to consume.

There is also an essential point here. If we think of the project as a database from which we can retrieve context, then we’re effectively giving an LLM the ability to retrieve facts about the project. This allows our prompts to depend on local context. For a Docker-specific example, we might want to prompt the AI to not talk about compose if the project has no compose.yaml files. 

“I am not using Docker Compose in this project.”

That turns out to be a transformative user prompt if it’s true. This is what we’d normally learn through a conversation. However, there are certain project details that are always useful. This is why having our assistants right there in the local project can be so helpful.

Runnable Markdown

Although Markdown files are mainly for reading, they often contain runnable things. LLMs converse with us in text that often contains code blocks that represent actual runnable commands. And, in VSCode, developers use the embedded terminal to run commands against the currently open project. Let’s short-circuit this interaction and make commands runnable directly from these Markdown runbooks.

In the current extension, we’ve added a code action to every code block that contains a shell command so that users can launch that command in the embedded terminal. During our exploration of this functionality, we have found that treating the Markdown file as a kind of REPL (read-eval-print-loop) can help to refine the output from the LLM and improve the final content. Figure 3 what this looks like in action:

 Animated gif showing addition of code action that contains a shell command so users can launch that command in the embedded terminal.
Figure 3: Adding code to allow users to launch the command in the embedded terminal.

Markdown extends your editor

In the long run, nobody is going to navigate to a Markdown file in order to run a command. However, we can treat these Markdown files as scripts that create commands for the developer’s edit session. We can even let developers bind them to keystrokes (e.g., type ,b to run the build code block from your project runbook).

In the end, this is just the AI Assistant talking to itself. The Assistant recommends a command. We find the command useful. We turn it into a shortcut. The Assistant remembers this shortcut because it’s in our runbook, and then makes it available whenever we’re developing this project.

Animated gif showing the AI Assistant generating context-aware content.
Figure 4: The Assistant in action.

Figure 4 shows a real feedback loop between the Assistant, the generated content, and the developer that is actually running these commands. 

As developers, we tend to vote with our keyboards. If this command is useful, let’s make it really easy to run! And if it’s useful for me, it might be useful for other members of my team, too.

The GitHub repository and install instructions are ready for you to try today.

For more, see this demo: VSCode Walkthrough of Runnable Markdown from GenAI.

Subscribe to Docker Navigator to stay current on the latest Docker news.

Learn more

Build Your Own AI-Driven Code Analysis Chatbot for Developers with the GenAI Stack

6 juin 2024 à 14:26

The topic of GenAI is everywhere now, but even with so much interest, many developers are still trying to understand what the real-world use cases are. Last year, Docker hosted an AI/ML Hackathon, and genuinely interesting projects were submitted. 

In this AI/ML Hackathon post, we will dive into a winning submission, Code Explorer, in the hope that it sparks project ideas for you. 

AI/ML hackathon

For developers, understanding and navigating codebases can be a constant challenge. Even popular AI assistant tools like ChatGPT can fail to understand the context of your projects through code access and struggle with complex logic or unique project requirements. Although large language models (LLMs) can be valuable companions during development, they may not always grasp the specific nuances of your codebase. This is where the need for a deeper understanding and additional resources comes in.

Imagine you’re working on a project that queries datasets for both cats and dogs. You already have functional code in DogQuery.py that retrieves dog data using pagination (a technique for fetching data in parts). Now, you want to update CatQuery.py to achieve the same functionality for cat data. Wouldn’t it be amazing if you could ask your AI assistant to reference the existing code in DogQuery.py and guide you through the modification process? 

This is where Code Explorer, an AI-powered chatbot comes in. 

What makes Code Explorer unique?

The following demo, which was submitted to the AI/ML Hackathon, provides an overview of Code Explorer (Figure 1).

Figure 1: Demo of the Code Explorer extension as submitted to the AI/ML Hackathon.

Code Explorer helps you find answers about your code by searching relevant information based on the programming language and folder location. Unlike chatbots, Code Explorer goes beyond generic coding knowledge. It leverages a powerful AI technique called retrieval-augmented generation (RAG) to understand your code’s specific context. This allows it to provide more relevant and accurate answers based on your actual project.

Code Explorer supports a variety of programming languages, such as *.swift, *.py, *.java, *.cs, etc. This tool can be useful for learning or debugging your code projects, such as Xcode projects, Android projects, AI applications, web dev, and more.

Benefits of the CodeExplorer include:

  • Effortless learning: Explore and understand your codebase more easily.
  • Efficient debugging: Troubleshoot issues faster by getting insights from your code itself.
  • Improved productivity: Spend less time deciphering code and more time building amazing things.
  • Supports various languages: Works with popular languages like Python, Java, Swift, C#, and more.

Use cases include:

  • Understanding complex logic: “Explain how the calculate_price function interacts with the get_discount function in billing.py.”
  • Debugging errors: “Why is my getUserData function in user.py returning an empty list?”
  • Learning from existing code: “How can I modify search.py to implement pagination similar to search_results.py?”

How does it work?

Code Explorer leverages the power of a RAG-based AI framework, providing context about your code to an existing LLM model. Figure 2 shows the magic behind the scenes.

Alt text: Flow diagram detailing 3 main steps of Code Explorer — Step 1: Process documents, Step 2 Create LLM chains, Step 3: User asks questions and AI chatbot responds — along with Docker Services used.
Figure 2: Diagram of Code Explorer steps.

Step 1. Process documents

The user selects a codebase folder through the Streamlit app. The process_documents function in the file db.py is called. This function performs the following actions:

  1. Parsing code: It reads and parses the code files within the selected folder. This involves using language-specific parsers (e.g., ast module for Python) to understand the code structure and syntax.
  2. Extracting information: It extracts relevant information from the code, such as:
    • Variable names and their types
    • Function names, parameters, and return types
    • Class definitions and properties
    • Code comments and docstrings
  3. Documents are loaded and chunked: It creates a RecursiveCharacterTextSplitter object based on the language. This object splits each document into smaller chunks of a specified size (5000 characters) with some overlap (500 characters) for better context.
  4. Creating Neo4j vector store: It creates a Neo4j vector store, a type of database that stores and connects code elements using vectors. These vectors represent the relationships and similarities between different parts of the code.
    • Each code element (e.g., function, variable) is represented as a node in the Neo4j graph database.
    • Relationships between elements (e.g., function call, variable assignment) are represented as edges connecting the nodes.

Step 2. Create LLM chains

This step is triggered only after the codebase has been processed (Step 1).

Two LLM chains are created:

  • Create Documents QnA chain: This chain allows users to talk to the chatbot in a question-and-answer style. It will refer to the vector database when answering the coding question, referring to the source code files.
  • Create Agent chain: A separate Agent chain is created, which uses the QnA chain as a tool. You can think of it as an additional layer on top of the QnA chain that allows you to communicate with the chatbot more casually. Under the hood, the chatbot may ask the QnA chain if it needs help with the coding question, which is an AI discussing with another AI the user’s question before returning the final answer. In testing, the agent appears to summarize rather than give a technical response as opposed to the QA agent only.

Langchain is used to orchestrate the chatbot pipeline/flow.

Step 3. User asks questions and AI chatbot responds

The Streamlit app provides a chat interface for users to ask questions about their code. The user interacts with the Streamlit app’s chat interface, and user inputs are stored and used to query the LLM or the QA/Agent models. Based on the following factors, the app chooses how to answer the user:

  • Codebase processed:
    • Yes: The QA RAG chain is used if the user has selected Detailed mode in the sidebar. This mode leverages the processed codebase for in-depth answers.
    • Yes: A custom agent logic (using the get_agent function) is used if the user has selected Agent mode. This mode might provide more concise answers compared to the QA RAG model.
  • Codebase not processed:
    • The LLM chain is used directly if the user has not processed the codebase yet.

Getting started

To get started with Code Explorer, check the following:

Then, complete the four steps explained below.

1. Clone the repository

Open a terminal window and run the following command to clone the sample application.

https://github.com/dockersamples/CodeExplorer

You should now have the following files in your CodeExplorer directory:

tree
.
├── LICENSE
├── README.md
├── agent.py
├── bot.Dockerfile
├── bot.py
├── chains.py
├── db.py
├── docker-compose.yml
├── images
│   ├── app.png
│   └── diagram.png
├── pull_model.Dockerfile
├── requirements.txt
└── utils.py

2 directories, 13 files

2. Create environment variables

Before running the GenAI stack services, open the .env and modify the following variables according to your needs. This file stores environment variables that influence your application’s behavior.

OPENAI_API_KEY=sk-XXXXX
LLM=codellama:7b-instruct
OLLAMA_BASE_URL=http://host.docker.internal:11434
NEO4J_URI=neo4j://database:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=XXXX
EMBEDDING_MODEL=ollama
LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
LANGCHAIN_TRACING_V2=true # false
LANGCHAIN_PROJECT=default
LANGCHAIN_API_KEY=ls__cbaXXXXXXXX06dd

Note:

  • If using EMBEDDING_MODEL=sentence_transformer, uncomment code in requirements.txt and chains.py. It was commented out to reduce code size.
  • Make sure to set the OLLAMA_BASE_URL=http://llm:11434 in the .env file when using the Ollama Docker container. If you’re running on Mac, set OLLAMA_BASE_URL=http://host.docker.internal:11434 instead.

3. Build and run Docker GenAI services

Run the following command to build and bring up Docker Compose services:

docker compose --profile linux up --build

This gets the following output:

+] Running 5/5
 ✔ Network codeexplorer_net             Created                                              0.0s
 ✔ Container codeexplorer-database-1    Created                                              0.1s
 ✔ Container codeexplorer-llm-1         Created                                              0.1s
 ✔ Container codeexplorer-pull-model-1  Created                                              0.1s
 ✔ Container codeexplorer-bot-1         Created                                              0.1s
Attaching to bot-1, database-1, llm-1, pull-model-1
llm-1         | Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
llm-1         | Your new public key is:
llm-1         |
llm-1         | ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGEM2BIxSSje6NFssxK7J1+X+46n+cWTQufEQjMUzLGC
llm-1         |
llm-1         | 2024/05/23 15:05:47 routes.go:1008: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
llm-1         | time=2024-05-23T15:05:47.265Z level=INFO source=images.go:704 msg="total blobs: 0"
llm-1         | time=2024-05-23T15:05:47.265Z level=INFO source=images.go:711 msg="total unused blobs removed: 0"
llm-1         | time=2024-05-23T15:05:47.265Z level=INFO source=routes.go:1054 msg="Listening on [::]:11434 (version 0.1.38)"
llm-1         | time=2024-05-23T15:05:47.266Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2106292006/runners
pull-model-1  | pulling ollama model codellama:7b-instruct using http://host.docker.internal:11434
database-1    | Installing Plugin 'apoc' from /var/lib/neo4j/labs/apoc-*-core.jar to /var/lib/neo4j/plugins/apoc.jar
database-1    | Applying default values for plugin apoc to neo4j.conf
pulling manifest
pull-model-1  | pulling 3a43f93b78ec... 100% ▕████████████████▏ 3.8 GB
pulling manifest
pulling manifest
pull-model-1  | pulling 3a43f93b78ec... 100% ▕████████████████▏ 3.8 GB
pull-model-1  | pulling 8c17c2ebb0ea... 100% ▕████████████████▏ 7.0 KB
pull-model-1  | pulling 590d74a5569b... 100% ▕████████████████▏ 4.8 KB
pull-model-1  | pulling 2e0493f67d0c... 100% ▕████████████████▏   59 B
pull-model-1  | pulling 7f6a57943a88... 100% ▕████████████████▏  120 B
pull-model-1  | pulling 316526ac7323... 100% ▕████████████████▏  529 B
pull-model-1  | verifying sha256 digest
pull-model-1  | writing manifest
pull-model-1  | removing any unused layers
pull-model-1  | success
llm-1         | time=2024-05-23T15:05:52.802Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cuda_v11]"
llm-1         | time=2024-05-23T15:05:52.806Z level=INFO source=types.go:71 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="7.7 GiB" available="2.5 GiB"
pull-model-1 exited with code 0
database-1    | 2024-05-23 15:05:53.411+0000 INFO  Starting...
database-1    | 2024-05-23 15:05:53.933+0000 INFO  This instance is ServerId{ddce4389} (ddce4389-d9fd-4d98-9116-affa229ad5c5)
database-1    | 2024-05-23 15:05:54.431+0000 INFO  ======== Neo4j 5.11.0 ========
database-1    | 2024-05-23 15:05:58.048+0000 INFO  Bolt enabled on 0.0.0.0:7687.
database-1    | [main] INFO org.eclipse.jetty.server.Server - jetty-10.0.15; built: 2023-04-11T17:25:14.480Z; git: 68017dbd00236bb7e187330d7585a059610f661d; jvm 17.0.8.1+1
database-1    | [main] INFO org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.h.MovedContextHandler@7c007713{/,null,AVAILABLE}
database-1    | [main] INFO org.eclipse.jetty.server.session.DefaultSessionIdManager - Session workerName=node0
database-1    | [main] INFO org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@5bd5ace9{/db,null,AVAILABLE}
database-1    | [main] INFO org.eclipse.jetty.webapp.StandardDescriptorProcessor - NO JSP Support for /browser, did not find org.eclipse.jetty.jsp.JettyJspServlet
database-1    | [main] INFO org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.w.WebAppContext@38f183e9{/browser,jar:file:/var/lib/neo4j/lib/neo4j-browser-5.11.0.jar!/browser,AVAILABLE}
database-1    | [main] INFO org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@769580de{/,null,AVAILABLE}
database-1    | [main] INFO org.eclipse.jetty.server.AbstractConnector - Started http@6bd87866{HTTP/1.1, (http/1.1)}{0.0.0.0:7474}
database-1    | [main] INFO org.eclipse.jetty.server.Server - Started Server@60171a27{STARTING}[10.0.15,sto=0] @5997ms
database-1    | 2024-05-23 15:05:58.619+0000 INFO  Remote interface available at http://localhost:7474/
database-1    | 2024-05-23 15:05:58.621+0000 INFO  id: F2936F8E5116E0229C97F43AD52142685F388BE889D34E000D35E074D612BE37
database-1    | 2024-05-23 15:05:58.621+0000 INFO  name: system
database-1    | 2024-05-23 15:05:58.621+0000 INFO  creationDate: 2024-05-23T12:47:52.888Z
database-1    | 2024-05-23 15:05:58.622+0000 INFO  Started.

The logs indicate that the application has successfully started all its components, including the LLM, Neo4j database, and the main application container. You should now be able to interact with the application through the user interface.

You can view the services via the Docker Desktop dashboard (Figure 3).

Screenshot of Docker Desktop showing Code Explorer running.
Figure 3: The Docker Desktop dashboard showing the running Code Explorer powered with GenAI stack.

The Code Explorer stack consists of the following services:

Bot

  • The bot service is the core application. 
  • Built with Streamlit, it provides the user interface through a web browser. The build section uses a Dockerfile named bot.Dockerfile to build a custom image, containing your Streamlit application code. 
  • This service exposes port 8501, which makes the bot UI accessible through a web browser.

Pull model

  • This service downloads the codellama:7b-instruct model. 
  • The model is based on the Llama2 model, which achieves similar performance to OpenAI’s LLM but is trained with additional code context. 
  • However, codellama:7b-instruct is additionally trained on code-related contexts and fine-tuned to understand and respond in human language. 
  • This specialization makes it particularly adept at handling questions about code.

Note: You may notice that pull-model-1 service exits with code 0, which indicates successful execution. This service is designed just to download the LLM model (codellama:7b-instruct). Once the download is complete, there’s no further need for this service to remain running. Exiting with code 0 signifies that the service finished its task successfully (downloading the model).

Database

  • This service manages a Neo4j graph database.
  • It efficiently stores and retrieves vector embeddings, which represent the code files in a mathematical format suitable for analysis by the LLM model.
  • The Neo4j vector database can be explored at http://localhost:7474 (Figure 4).
 Screenshot of Code Explorer database service showing Neo4j database information.
Figure 4: Neo4j database information.

LLM

  • This service acts as the LLM host, utilizing the Ollama framework
  • It manages the downloaded LLM model (not the embedding), making it accessible for use by the bot application.

4. Access the application

You can now view your Streamlit app in your browser by accessing http://localhost:8501 (Figure 5).

Screenshot of Code Explorer showing "Process files" option in left sidebar.
Figure 5: View the app.

In the sidebar, enter the path to your code folder and select Process files (Figure 6). Then, you can start asking questions about your code in the main chat.

Screenshot of Code Explorer showing red arrow pointing to running process in left sidebar.
Figure 6: The app is running.

You will find a toggle switch in the sidebar. By default Detailed mode is enabled. Under this mode, the QA RAG chain chain is used (detailedMode=true) . This mode leverages the processed codebase for in-depth answers. 

When you toggle the switch to another mode (detailedMode=false), the Agent chain gets selected. This is similar to how one AI discusses with another AI to create the final answer. In testing, the agent appears to summarize rather than a technical response as opposed to the QA agent only.

Here’s a result when detailedMode=true (Figure 7):

Screenshot of Code Explorer response to "What's the purpose of LLMSingleActionAgent() function?" when detailMode=true.
Figure 7: Result when detailedMode=true.

Figure 8 shows a result when detailedMode=false:

Screenshot of Code Explorer response to "What's the purpose of LLMSingleActionAgent() function?" when detailMode=false
Figure 8: Result when detailedMode=false.

Start exploring

Code Explorer, powered by the GenAI Stack, offers a compelling solution for developers seeking AI assistance with coding. This chatbot leverages RAG to delve into your codebase, providing insightful answers to your specific questions. Docker containers ensure smooth operation, while Langchain orchestrates the workflow. Neo4j stores code representations for efficient analysis. 

Explore Code Explorer and the GenAI Stack to unlock the potential of AI in your development journey!

Learn more

💾

Learn from Docker experts to simplify and advance your app development and management with Docker. Stay up to date on Docker events and new version announcements!

Creating AI-Enhanced Document Management with the GenAI Stack

Par : Angel Borroy
7 mai 2024 à 14:39

Organizations must deal with countless reports, contracts, research papers, and other documents, but managing, deciphering, and extracting pertinent information from these documents can be challenging and time-consuming. In such scenarios, an AI-powered document management system can offer a transformative solution.

Developing Generative AI (GenAI) technologies with Docker offers endless possibilities not only for summarizing lengthy documents but also for categorizing them and generating detailed descriptions and even providing prompt insights you may have missed. This multi-faceted approach, powered by AI, changes the way organizations interact with textual data, saving both time and effort.

In this article, we’ll look at how to integrate Alfresco, a robust document management system, with the GenAI Stack to open up possibilities such as enhancing document analysis, automating content classification, transforming search capabilities, and more.

2400x1260 2024 gen ai stack v1

High-level architecture of Alfresco document management 

Alfresco is an open source content management platform designed to help organizations manage, share, and collaborate on digital content and documents. It provides a range of features for document management, workflow automation, collaboration, and records management.

You can find the Alfresco Community platform on Docker Hub. The Docker image for the UI, named alfresco-content-app, has more than 10 million pulls, while other core platform services have more than 1 million pulls.

Alfresco Community platform (Figure 1) provides various open source technologies to create a Content Service Platform, including:

  • Alfresco content repository is the core of Alfresco and is responsible for storing and managing content. This component exposes a REST API to perform operations in the repository.
  • Database: PostgreSQL, among others, serves as the database management system, storing the metadata associated with a document.
  • Apache Solr: Enhancing search capabilities, Solr enables efficient content and metadata searches within Alfresco.
  • Apache ActiveMQ: As an open source message broker, ActiveMQ enables asynchronous communication between various Alfresco services. Its Messaging API handles asynchronous messages in the repository.
  • UI reference applications: Share and Alfresco Content App provide intuitive interfaces for user interaction and accessibility.

For detailed instructions on deploying Alfresco Community with Docker Compose, refer to the official Alfresco documentation.

 Illustration of Alfresco Community platform architecture, showing PostgreSQL, ActiveMQ, repo, share, Alfresco content app, and more.
Figure 1: Basic diagram for Alfresco Community deployment with Docker.

Why integrate Alfresco with the GenAI Stack?

Integrating Alfresco with the GenAI Stack unlocks a powerful suite of GenAI services, significantly enhancing document management capabilities. Enhancing Alfresco document management with the GenAI stack services has different benefits:

  • Use different deployments according to resources available: Docker allows you to easily switch between different Large Language Models (LLMs) of different sizes. Additionally, if you have access to GPUs, you can deploy a container with a GPU-accelerated LLM for faster inference. Conversely, if GPU resources are limited or unavailable, you can deploy a container with a CPU-based LLM.
  • Portability: Docker containers encapsulate the GenAI service, its dependencies, and runtime environment, ensuring consistent behavior across different environments. This portability allows you to develop and test the AI model locally and then deploy it seamlessly to various platforms.
  • Production-ready: The stack provides support for GPU-accelerated computing, making it well suited for deploying GenAI models in production environments. Docker’s declarative approach to deployment allows you to define the desired state of the system and let Docker handle the deployment details, ensuring consistency and reliability.
  • Integration with applications: Docker facilitates integration between GenAI services and other applications deployed as containers. You can deploy multiple containers within the same Docker environment and orchestrate communication between them using Docker networking. This integration enables you to build complex systems composed of microservices, where GenAI capabilities can be easily integrated into larger applications or workflows.

How does it work?

Alfresco provides two main APIs for integration purposes: the Alfresco REST API and the Alfresco Messaging API (Figure 2).

  • The Alfresco REST API provides a set of endpoints that allow developers to interact with Alfresco content management functionalities over HTTP. It enables operations such as creating, reading, updating, and deleting documents, folders, users, groups, and permissions within Alfresco. 
  • The Alfresco Messaging API provides a messaging infrastructure for asynchronous communication built on top of Apache ActiveMQ and follows the publish-subscribe messaging pattern. Integration with the Messaging API allows developers to build event-driven applications and workflows that respond dynamically to changes and updates within the Alfresco Repository.

The Alfresco Repository can be updated with the enrichment data provided by GenAI Service using both APIs:

  • The Alfresco REST API may retrieve metadata and content from existing repository nodes to be sent to GenAI Service, and update back the node.
  • The Alfresco Messaging API may be used to consume new and updated nodes in the repository and obtain the result from the GenAI Service.
 Illustration showing integration of two main Alfresco APIs: REST API and Messaging API.
Figure 2: Alfresco provides two main APIs for integration purposes: the Alfresco REST API and the Alfresco Messaging API.

Technically, Docker deployment includes both the Alfresco and GenAI Stack platforms running over the same Docker network (Figure 3). 

The GenAI Stack works as a REST API service with endpoints available in genai:8506, whereas Alfresco uses a REST API client (named alfresco-ai-applier) and a Messages API client (named alfresco-ai-listener) to integrate with AI services. Both clients can also be run as containers.

 Illustration of deployment architecture, showing Alfresco and GenAI Stack.
Figure 3: Deployment architecture for Alfresco integration with GenAI Stack services.

The GenAI Stack service provides the following endpoints:

  • summary: Returns a summary of a document together with several tags. It allows some customization, like the language of the response, the number of words in the summary and the number of tags.
  • classify: Returns a term from a list that best matches the document. It requires a list of terms as input in addition to the document to be classified.
  • prompt: Replies to a custom user prompt using retrieval-augmented generation (RAG) for the document to limit the scope of the response.
  • describe: Returns a text description for an input picture.

The implementation of GenAI Stack services loads the document text into chunks in Neo4j VectorDB to improve QA chains with embeddings and prevent hallucinations in the response. Pictures are processed using an LLM with a visual encoder (LlaVA) to generate descriptions (Figure 4). Note that Docker GenAI Stack allows for the use of multiple LLMs for different goals.

 Illustration of GenAI Stack Services, showing Document Loader, LLM embeddings, VectorDB, QA Chain, and more.
Figure 4: The GenAI Stack services are implemented using RAG and an LLM with visual encoder (LlaVA) for describing pictures.

Getting started 

To get started, check the following:

Obtaining the amount of RAM available for Docker Desktop can be done using following command:

docker info --format '{{json .MemTotal}}'

If the result is under 20 GiB, follow the instructions in Docker official documentation for your operating system to boost the memory limit for Docker Desktop.

Clone the repository

Use the following command to close the repository:

git clone https://github.com/aborroy/alfresco-genai.git

The project includes the following components:

  • genai-stack folder is using https://github.com/docker/genai-stack project to build a REST endpoint that provides AI services for a given document.
  • alfresco folder includes a Docker Compose template to deploy Alfresco Community 23.1.
  • alfresco-ai folder includes a set of projects related to Alfresco integration.
    • alfresco-ai-model defines a custom Alfresco content model to store summaries, terms and prompts to be deployed in Alfresco Repository and Share App.
    • alfresco-ai-applier uses the Alfresco REST API to apply summaries or terms for a populated Alfresco Repository.
    • alfresco-ai-listener listens to messages and generates summaries for created or updated nodes in Alfresco Repository.
  • compose.yaml file describes a deployment for Alfresco and GenAI Stack services using include directive.

Starting Docker GenAI service

The Docker GenAI Service for Alfresco, located in the genai-stack folder, is based on the Docker GenAI Stack project, and provides the summarization service as a REST endpoint to be consumed from Alfresco integration.

cd genai-stack

Before running the service, modify the .env file to adjust available preferences:

# Choose any of the on premise models supported by ollama
LLM=mistral
LLM_VISION=llava
# Any language name supported by chosen LLM
SUMMARY_LANGUAGE=English
# Number of words for the summary
SUMMARY_SIZE=120
# Number of tags to be identified with the summary
TAGS_NUMBER=3

Start the Docker Stack using the standard command:

docker compose up --build --force-recreate

After the service is up and ready, the summary REST endpoint becomes accessible. You can test its functionality using a curl command.

Use a local PDF file (file.pdf in the following sample) to obtain a summary and a number of tags.

curl --location 'http://localhost:8506/summary' \
--form 'file=@"./file.pdf"'
{ 
  "summary": " The text discusses...", 
  "tags": " Golang, Merkle, Difficulty", 
  "model": "mistral"
}

Use a local PDF file (file.pdf in the following sample) and a list of terms (such as Japanese or Spanish) to obtain a classification of the document.

curl --location \
'http://localhost:8506/classify?termList=%22Japanese%2CSpanish%22' \
--form 'file=@"./file.pdf"'
{
    "term": " Japanese",
    "model": "mistral"
}

Use a local PDF file (file.pdf in the following sample) and a prompt (such as “What is the name of the son?”) to obtain a response regarding the document.

curl --location \
'http://localhost:8506/prompt?prompt=%22What%20is%20the%20name%20of%20the%20son%3F%22' \
--form 'file=@"./file.pdf"'
{
    "answer": " The name of the son is Musuko.",
    "model": "mistral"
}

Use a local picture file (picture.jpg in the following sample) to obtain a text description of the image.

curl --location 'http://localhost:8506/describe' \
--form 'image=@"./picture.jpg"'
{
    "description": " The image features a man standing... ",
    "model": "llava"
}

Note that, in this case, LlaVA LLM is used instead of Mistral.

Make sure to stop Docker Compose before continuing to the next step.

Starting Alfresco

The Alfresco Platform, located in the alfresco folder, provides a sample deployment of the Alfresco Repository including a customized content model to store results obtained from the integration with the GenAI Service.

Because we want to run both Alfresco and GenAI together, we’ll use the compose.yaml file located in the project’s main folder.

include:
  - genai-stack/compose.yaml
  - alfresco/compose.yaml
#  - alfresco/compose-ai.yaml

In this step, we’re deploying only GenAI Stack and Alfresco, so make sure to leave the compose.ai.yaml line commented out.

Start the stack using the standard command:

docker compose up --build --force-recreate

After the service is up and ready, the Alfresco Repository becomes accessible. You can test the platform using default credentials (admin/admin) in the following URLs:

Enhancing existing documents within Alfresco 

The AI Applier application, located in the alfresco-ai/alfresco-ai-applier folder, contains a Spring Boot application that retrieves documents stored in an Alfresco folder, obtains the response from the GenAI Service and updates the original document in Alfresco.

Before running the application for the first time, you’ll need to build the source code using Maven.

cd alfresco-ai/alfresco-ai-applier
mvn clean package

As we have GenAI Service and Alfresco Platform up and running from the previous steps, we can upload documents to the Alfresco Shared Files/summary folder and run the program to update the documents with the summary.

java -jar target/alfresco-ai-applier-0.8.0.jar \
--applier.root.folder=/app:company_home/app:shared/cm:summary \
--applier.action=SUMMARY
...
Processing 2 documents of a total of 2
END: All documents have been processed. The app may need to be executed again for nodes without existing PDF rendition.

Once the process has been completed, every Alfresco document in the Shared Files/summary folder will include the information obtained by the GenAI Stack service: summary, tags, and LLM used (Figure 5).

Screenshot of Document details in Alfresco, showing Document properties, Summary, tags, and LLM used.
Figure 5: The document has been updated in Alfresco Repository with summary, tags and model (LLM).

You can now upload documents to the Alfresco Shared Files/classify folder to prepare the repository for the next step.

Classifying action can be applied to documents in the Alfresco Shared Files/classify folder using the following command. GenAI Service will pick the term from the list (English, Spanish, Japanese) that best matches each document in the folder.

java -jar target/alfresco-ai-applier-0.8.0.jar \
--applier.root.folder=/app:company_home/app:shared/cm:classify \
--applier.action=CLASSIFY \
--applier.action.classify.term.list=English,Spanish,Japanese
...
Processing 2 documents of a total of 2
END: All documents have been processed. The app may need to be executed again for nodes without existing PDF rendition.

Upon completion, every Alfresco document in the Shared Files folder will include the information obtained by the GenAI Stack service: a term from the list of terms and the LLM used (Figure 6).

Screenshot showing document classification update in Alfresco Repository.
Figure 6: The document has been updated in Alfresco Repository with term and model (LLM).

You can upload pictures to the Alfresco Shared Files/picture folder to prepare the repository for the next step.

To obtain a text description from pictures, create a new folder named picture under the Shared Files folder. Upload any image file to this folder and run the following command:

java -jar target/alfresco-ai-applier-0.8.0.jar \
--applier.root.folder=/app:company_home/app:shared/cm:picture \
--applier.action=DESCRIBE
...
Processing 1 documents of a total of 1
END: All documents have been processed. The app may need to be executed again for nodes without existing PDF rendition.

Following this process, every Alfresco image in the picture folder will include the information obtained by the GenAI Stack service: a text description and the LLM used (Figure 7).

Screenshot showing document description update in Alfresco repository.
Figure 7: The document has been updated in Alfresco Repository with text description and model (LLM).

Enhancing new documents uploaded to Alfresco

The AI Listener application, located in the alfresco-ai/alfresco-ai-listener folder, contains a Spring Boot application that listens to Alfresco messages, obtains the response from the GenAI Service and updates the original document in Alfresco.

Before running the application for the first time, you’ll need to build the source code using Maven and to build the Docker image.

cd alfresco-ai/alfresco-ai-listener
mvn clean package
docker build . -t alfresco-ai-listener

As we are using the AI Listener application as a container, stop the Alfresco deployment and uncomment the alfresco-ai-listener in the compose.yaml file.

include:
  - genai-stack/compose.yaml
  - alfresco/compose.yaml
  - alfresco/compose-ai.yaml

Start the stack using the standard command:

docker compose up --build --force-recreate

After the service is again up and ready, the Alfresco Repository becomes accessible. You can verify that the platform is working by using default credentials (admin/admin) in the following URLs:

Summarization

Next, upload a new document and apply the “Summarizable with AI” aspect to the document. After a while, the document will include the information obtained by the GenAI Stack service: summary, tags, and LLM used.

Description

If you want to use AI enhancement, you might want to set up a folder that automatically applies the necessary aspect, instead of doing it manually.

Create a new folder named pictures in Alfresco Repository and create a rule with the following settings in it:

  • Name: description
  • When: Items are created or enter this folder
  • If all criteria are met: All Items
  • Perform Action: Add “Descriptable with AI” aspect

Upload a new picture to this folder. After a while, without manual setting of the aspect, the document will include the information obtained by the GenAI Stack service: description and LLM used.

Classification

Create a new folder named classifiable in Alfresco Repository. Apply the “Classifiable with AI” aspect to this folder and add a list of terms separated by comma in the “Terms” property (such as English, Japanese, Spanish).

Create a new rule for classifiable folder with the following settings:

  • Name: classifiable
  • When: Items are created or enter this folder
  • If all criteria are met: All Items
  • Perform Action: Add “Classified with AI” aspect

Upload a new document to this folder. After a while, the document will include the information obtained by the GenAI Stack service: term and LLM used.

A degree of automation can be achieved when using classification with AI. To do this, a simple Alfresco Repository script named classify.js needs to be created in the folder “Repository/Data Dictionary/Scripts” with following content.

document.move(
  document.parent.childByNamePath(    
    document.properties["genai:term"]));

Create a new rule for classifiable folder to apply this script with following settings:

  • Name: move
  • When: Items are updated
  • If all criteria are met: All Items
  • Perform Action: Execute classify.js script

Create a child folder of the classifiable folder with the name of every term defined in the “Terms” property. 

When you set up this configuration, any documents uploaded to the folder will automatically be moved to a subfolder based on the identified term. This means that the documents are classified automatically.

Prompting

Finally, to use the prompting GenAI feature, apply the “Promptable with AI” aspect to an existing document. Type your question in the “Question” property.

After a while, the document will include the information obtained by the GenAI Stack service: answer and LLM used.

A new era of document management

By embracing this framework, you can not only unlock a new level of efficiency, productivity, and user experience but also lay the foundation for limitless innovation. With Alfresco and GenAI Stack, the possibilities are endless — from enhancing document analysis and automating content classification to revolutionizing search capabilities and beyond.

If you’re unsure about any part of this process, check out the following video, which demonstrates all the steps live:

Learn more

💾

This video plays a live demo on the features provided by the project https://github.com/aborroy/alfresco-genaiThe project provides a custom implementation fo...

A Promising Methodology for Testing GenAI Applications in Java

24 avril 2024 à 16:03

In the vast universe of programming, the era of generative artificial intelligence (GenAI) has marked a turning point, opening up a plethora of possibilities for developers.

Tools such as LangChain4j and Spring AI have democratized access to the creation of GenAI applications in Java, allowing Java developers to dive into this fascinating world. With Langchain4j, for instance, setting up and interacting with large language models (LLMs) has become exceptionally straightforward. Consider the following Java code snippet:

public static void main(String[] args) {
    var llm = OpenAiChatModel.builder()
            .apiKey("demo")
            .modelName("gpt-3.5-turbo")
            .build();
    System.out.println(llm.generate("Hello, how are you?"));
}

This example illustrates how a developer can quickly instantiate an LLM within a Java application. By simply configuring the model with an API key and specifying the model name, developers can begin generating text responses immediately. This accessibility is pivotal for fostering innovation and exploration within the Java community. More than that, we have a wide range of models that can be run locally, and various vector databases for storing embeddings and performing semantic searches, among other technological marvels.

Despite this progress, however, we are faced with a persistent challenge: the difficulty of testing applications that incorporate artificial intelligence. This aspect seems to be a field where there is still much to explore and develop.

In this article, I will share a methodology that I find promising for testing GenAI applications.

2400x1260 2024 GenAi

Project overview

The example project focuses on an application that provides an API for interacting with two AI agents capable of answering questions. 

An AI agent is a software entity designed to perform tasks autonomously, using artificial intelligence to simulate human-like interactions and responses. 

In this project, one agent uses direct knowledge already contained within the LLM, while the other leverages internal documentation to enrich the LLM through retrieval-augmented generation (RAG). This approach allows the agents to provide precise and contextually relevant answers based on the input they receive.

I prefer to omit the technical details about RAG, as ample information is available elsewhere. I’ll simply note that this example employs a particular variant of RAG, which simplifies the traditional process of generating and storing embeddings for information retrieval.

Instead of dividing documents into chunks and making embeddings of those chunks, in this project, we will use an LLM to generate a summary of the documents. The embedding is generated based on that summary.

When the user writes a question, an embedding of the question will be generated and a semantic search will be performed against the embeddings of the summaries. If a match is found, the user’s message will be augmented with the original document.

This way, there’s no need to deal with the configuration of document chunks, worry about setting the number of chunks to retrieve, or worry about whether the way of augmenting the user’s message makes sense. If there is a document that talks about what the user is asking, it will be included in the message sent to the LLM.

Technical stack

The project is developed in Java and utilizes a Spring Boot application with Testcontainers and LangChain4j.

For setting up the project, I followed the steps outlined in Local Development Environment with Testcontainers and Spring Boot Application Testing and Development with Testcontainers.

I also use Tescontainers Desktop to facilitate database access and to verify the generated embeddings as well as to review the container logs.

The challenge of testing

The real challenge arises when trying to test the responses generated by language models. Traditionally, we could settle for verifying that the response includes certain keywords, which is insufficient and prone to errors.

static String question = "How I can install Testcontainers Desktop?";
@Test
    void verifyRaggedAgentSucceedToAnswerHowToInstallTCD() {
        String answer  = restTemplate.getForObject("/chat/rag?question={question}", ChatController.ChatResponse.class, question).message();
        assertThat(answer).contains("https://testcontainers.com/desktop/");
    }

This approach is not only fragile but also lacks the ability to assess the relevance or coherence of the response.

An alternative is to employ cosine similarity to compare the embeddings of a “reference” response and the actual response, providing a more semantic form of evaluation. 

This method measures the similarity between two vectors/embeddings by calculating the cosine of the angle between them. If both vectors point in the same direction, it means the “reference” response is semantically the same as the actual response.

static String question = "How I can install Testcontainers Desktop?";
static String reference = """
       - Answer must indicate to download Testcontainers Desktop from https://testcontainers.com/desktop/
       - Answer must indicate to use brew to install Testcontainers Desktop in MacOS
       - Answer must be less than 5 sentences
       """;
@Test
    void verifyRaggedAgentSucceedToAnswerHowToInstallTCD() {
        String answer  = restTemplate.getForObject("/chat/rag?question={question}", ChatController.ChatResponse.class, question).message();
        double cosineSimilarity = getCosineSimilarity(reference, answer);
        assertThat(cosineSimilarity).isGreaterThan(0.8);
    }

However, this method introduces the problem of selecting an appropriate threshold to determine the acceptability of the response, in addition to the opacity of the evaluation process.

Toward a more effective method

The real problem here arises from the fact that answers provided by the LLM are in natural language and non-deterministic. Because of this, using current testing methods to verify them is difficult, as these methods are better suited to testing predictable values. 

However, we already have a great tool for understanding non-deterministic answers in natural language: LLMs themselves. Thus, the key may lie in using one LLM to evaluate the adequacy of responses generated by another LLM. 

This proposal involves defining detailed validation criteria and using an LLM as a “Validator Agent” to determine if the responses meet the specified requirements. This approach can be applied to validate answers to specific questions, drawing on both general knowledge and specialized information

By incorporating detailed instructions and examples, the Validator Agent can provide accurate and justified evaluations, offering clarity on why a response is considered correct or incorrect.

static String question = "How I can install Testcontainers Desktop?";
    static String reference = """
            - Answer must indicate to download Testcontainers Desktop from https://testcontainers.com/desktop/
            - Answer must indicate to use brew to install Testcontainers Desktop in MacOS
            - Answer must be less than 5 sentences
            """;

    @Test
    void verifyStraightAgentFailsToAnswerHowToInstallTCD() {
        String answer  = restTemplate.getForObject("/chat/straight?question={question}", ChatController.ChatResponse.class, question).message();
        ValidatorAgent.ValidatorResponse validate = validatorAgent.validate(question, answer, reference);
        assertThat(validate.response()).isEqualTo("no");
    }

    @Test
    void verifyRaggedAgentSucceedToAnswerHowToInstallTCD() {
        String answer  = restTemplate.getForObject("/chat/rag?question={question}", ChatController.ChatResponse.class, question).message();
        ValidatorAgent.ValidatorResponse validate = validatorAgent.validate(question, answer, reference);
        assertThat(validate.response()).isEqualTo("yes");
    }

We can even test more complex responses where the LLM should suggest a better alternative to the user’s question.

static String question = "How I can find the random port of a Testcontainer to connect to it?";
    static String reference = """
            - Answer must not mention using getMappedPort() method to find the random port of a Testcontainer
            - Answer must mention that you don't need to find the random port of a Testcontainer to connect to it
            - Answer must indicate that you can use the Testcontainers Desktop app to configure fixed port
            - Answer must be less than 5 sentences
            """;

    @Test
    void verifyRaggedAgentSucceedToAnswerHowToDebugWithTCD() {
        String answer  = restTemplate.getForObject("/chat/rag?question={question}", ChatController.ChatResponse.class, question).message();
        ValidatorAgent.ValidatorResponse validate = validatorAgent.validate(question, answer, reference);
        assertThat(validate.response()).isEqualTo("yes");
    }

Validator Agent

The configuration for the Validator Agent doesn’t differ from that of other agents. It is built using the LangChain4j AI Service and a list of specific instructions:

public interface ValidatorAgent {
    @SystemMessage("""
                ### Instructions
                You are a strict validator.
                You will be provided with a question, an answer, and a reference.
                Your task is to validate whether the answer is correct for the given question, based on the reference.
                
                Follow these instructions:
                - Respond only 'yes', 'no' or 'unsure' and always include the reason for your response
                - Respond with 'yes' if the answer is correct
                - Respond with 'no' if the answer is incorrect
                - If you are unsure, simply respond with 'unsure'
                - Respond with 'no' if the answer is not clear or concise
                - Respond with 'no' if the answer is not based on the reference
                
                Your response must be a json object with the following structure:
                {
                    "response": "yes",
                    "reason": "The answer is correct because it is based on the reference provided."
                }
                
                ### Example
                Question: Is Madrid the capital of Spain?
                Answer: No, it's Barcelona.
                Reference: The capital of Spain is Madrid
                ###
                Response: {
                    "response": "no",
                    "reason": "The answer is incorrect because the reference states that the capital of Spain is Madrid."
                }
                """)
    @UserMessage("""
            ###
            Question: {{question}}
            ###
            Answer: {{answer}}
            ###
            Reference: {{reference}}
            ###
            """)
    ValidatorResponse validate(@V("question") String question, @V("answer") String answer, @V("reference") String reference);

    record ValidatorResponse(String response, String reason) {}
}

As you can see, I’m using Few-Shot Prompting to guide the LLM on the expected responses. I also request a JSON format for responses to facilitate parsing them into objects, and I specify that the reason for the answer must be included, to better understand the basis of its verdict.

Conclusion

The evolution of GenAI applications brings with it the challenge of developing testing methods that can effectively evaluate the complexity and subtlety of responses generated by advanced artificial intelligences. 

The proposal to use an LLM as a Validator Agent represents a promising approach, paving the way towards a new era of software development and evaluation in the field of artificial intelligence. Over time, we hope to see more innovations that allow us to overcome the current challenges and maximize the potential of these transformative technologies.

Learn more

Building a Video Analysis and Transcription Chatbot with the GenAI Stack

28 mars 2024 à 14:32

Videos are full of valuable information, but tools are often needed to help find it. From educational institutions seeking to analyze lectures and tutorials to businesses aiming to understand customer sentiment in video reviews, transcribing and understanding video content is crucial for informed decision-making and innovation. Recently, advancements in AI/ML technologies have made this task more accessible than ever. 

Developing GenAI technologies with Docker opens up endless possibilities for unlocking insights from video content. By leveraging transcription, embeddings, and large language models (LLMs), organizations can gain deeper understanding and make informed decisions using diverse and raw data such as videos. 

In this article, we’ll dive into a video transcription and chat project that leverages the GenAI Stack, along with seamless integration provided by Docker, to streamline video content processing and understanding. 

2400x1260 building next gen video analysis transcription chatbot with genai stack

High-level architecture 

The application’s architecture is designed to facilitate efficient processing and analysis of video content, leveraging cutting-edge AI technologies and containerization for scalability and flexibility. Figure 1 shows an overview of the architecture, which uses Pinecone to store and retrieve the embeddings of video transcriptions. 

Two-part illustration showing “yt-whisper” process on the left, which involves downloading audio, transcribing it using Whisper (an audio transcription system), computing embeddings (mathematical representations of the audio features), and saving those embeddings into Pinecone. On the right side (labeled "dockerbot"), the process includes computing a question embedding, completing a chat with the question combined with provided transcriptions and knowledge, and retrieving relevant transcriptions.
Figure 1: Schematic diagram outlining a two-component system for processing and interacting with video data.

The application’s high-level service architecture includes the following:

  • yt-whisper: A local service, run by Docker Compose, that interacts with the remote OpenAI and Pinecone services. Whisper is an automatic speech recognition (ASR) system developed by OpenAI, representing a significant milestone in AI-driven speech processing. Trained on an extensive dataset of 680,000 hours of multilingual and multitask supervised data sourced from the web, Whisper demonstrates remarkable robustness and accuracy in English speech recognition. 
  • Dockerbot: A local service, run by Docker Compose, that interacts with the remote OpenAI and Pinecone services. The service takes the question of a user, computes a corresponding embedding, and then finds the most relevant transcriptions in the video knowledge database. The transcriptions are then presented to an LLM, which takes the transcriptions and the question and tries to provide an answer based on this information.
  • OpenAI: The OpenAI API provides an LLM service, which is known for its cutting-edge AI and machine learning technologies. In this application, OpenAI’s technology is used to generate transcriptions from audio (using the Whisper model) and to create embeddings for text data, as well as to generate responses to user queries (using GPT and chat completions).
  • Pinecone: A vector database service optimized for similarity search, used for building and deploying large-scale vector search applications. In this application, Pinecone is employed to store and retrieve the embeddings of video transcriptions, enabling efficient and relevant search functionality within the application based on user queries.

Getting started

To get started, complete the following steps:

The application is a chatbot that can answer questions from a video. Additionally, it provides timestamps from the video that can help you find the sources used to answer your question.

Clone the repository 

The next step is to clone the repository:

git clone https://github.com/dockersamples/docker-genai.git

The project contains the following directories and files:

├── docker-genai/
│ ├── docker-bot/
│ ├── yt-whisper/
│ ├── .env.example
│ ├── .gitignore
│ ├── LICENSE
│ ├── README.md
│ └── docker-compose.yaml

Specify your API keys

In the /docker-genai directory, create a text file called .env, and specify your API keys inside. The following snippet shows the contents of the .env.example file that you can refer to as an example.

#-------------------------------------------------------------
# OpenAI
#-------------------------------------------------------------
OPENAI_TOKEN=your-api-key # Replace your-api-key with your personal API key

#-------------------------------------------------------------
# Pinecone
#--------------------------------------------------------------
PINECONE_TOKEN=your-api-key # Replace your-api-key with your personal API key

Build and run the application

In a terminal, change directory to your docker-genai directory and run the following command:

docker compose up --build

Next, Docker Compose builds and runs the application based on the services defined in the docker-compose.yaml file. When the application is running, you’ll see the logs of two services in the terminal.

In the logs, you’ll see the services are exposed on ports 8503 and 8504. The two services are complementary to each other.

The yt-whisper service is running on port 8503. This service feeds the Pinecone database with videos that you want to archive in your knowledge database. The next section explores the yt-whisper service.

Using yt-whisper

The yt-whisper service is a YouTube video processing service that uses the OpenAI Whisper model to generate transcriptions of videos and stores them in a Pinecone database. The following steps outline how to use the service.

Open a browser and access the yt-whisper service at http://localhost:8503. Once the application appears, specify a YouTube video URL in the URL field and select Submit. The example shown in Figure 2 uses a video from David Cardozo.

Screenshot showing example of processed content with "download transcription" option for a video from David Cardozo on how to "Develop ML interactive gpu-workflows with Visual Studio Code, Docker and Docker Hub."
Figure 2: A web interface showcasing processed video content with a feature to download transcriptions.

Submitting a video

The yt-whisper service downloads the audio of the video, then uses Whisper to transcribe it into a WebVTT (*.vtt) format (which you can download). Next, it uses the “text-embedding-3-small” model to create embeddings and finally uploads those embeddings into the Pinecone database.

After the video is processed, a video list appears in the web app that informs you which videos have been indexed in Pinecone. It also provides a button to download the transcript.

Accessing Dockerbot chat service

You can now access the Dockerbot chat service on port 8504 and ask questions about the videos as shown in Figure 3.

Screenshot of Dockerbot interaction with user asking a question about Nvidia containers and Dockerbot responding with links to specific timestamps in the video.
Figure 3: Example of a user asking Dockerbot about NVIDIA containers and the application giving a response with links to specific timestamps in the video.

Conclusion

In this article, we explored the exciting potential of GenAI technologies combined with Docker for unlocking valuable insights from video content. It shows how the integration of cutting-edge AI models like Whisper, coupled with efficient database solutions like Pinecone, empowers organizations to transform raw video data into actionable knowledge. 

Whether you’re an experienced developer or just starting to explore the world of AI, the provided resources and code make it simple to embark on your own video-understanding projects. 

Learn more

❌
❌