### Introduction

Since their debut in the ML/AI landscape, large language models (LLMs) have seen widespread adoption, delivering significant value across diverse fields. Today, a variety of LLMs are available, each with unique capabilities and specialized strengths. Because of their varied focuses, integrating multiple LLMs into a software product offers the opportunity to build AI-powered products that adapt to diverse requirements and workloads with increased reliability and robustness, resulting in an improved overall user experience.

[Portkey](https://portkey.ai/), a control panel for AI apps, offers a suite of development tools to help with this. Among them is [AI Gateway](https://portkey.ai/features/ai-gateway), which connects, load balances, and manages multiple LLMs through a single, consistent API. Portkey's AI Gateway supports over 100 AI models offering seamless access to vision, audio, and image generation capabilities and ensuring uninterrupted performance by allowing model switching during failures.

In this tutorial, you will create a simple LLM querying application with the option to submit questions to two different LLMs — Llama 3 and Groq — using Portkey's AI gateway.

## Prerequisites

To successfully follow this tutorial, you'll need:

- Node.js and npm installed. The demo app in this tutorial uses version 20 of Node.js.
- A [Together AI](https://api.together.xyz/) account.
- A [Groq](https://console.groq.com/) account.
- A [Koyeb](https://app.koyeb.com/) account.

## Get LLM API Keys

The two LLMs used in this tutorial require valid API keys for access. In this section, you'll obtain the API keys for both.

First, log into your Together AI account. Click the **profile icon** in the top right corner and go to the **settings** page. Then, navigate to the **API KEYS** tab, copy your API key, and store it securely for future use.

Next, log into your Groq account. In the left sidebar, click the **API Keys** link and click the **Create API Key** button to create an API key. Copy your API key and store it securely for future use.

In the next section, you will setup Portkey's AI Gateway using Docker.

## Deploy the AI Gateway

Portkey provides, amongst other options, a [Docker image](https://hub.docker.com/r/portkeyai/gateway) for deploying the AI Gateway. This ready-to-use service provides an authenticated API on port 8787, with endpoints for chat and image features from supported LLMs.

To access the AI Gateway API, you must first deploy the Docker image and start the service. Begin by logging into your [Koyeb control panel](https://app.koyeb.com/) and following these steps:

1. Click the **Create Service** button in the sidebar.
2. Choose the **Docker** web service option.
3. Type `portkeyai/gateway:latest` into the **Docker image field**.
4. Select your preferred instance and region.
5. In the **Exposed ports** section, change the **Port** value to `8787`.
6. Choose a name for your service in the **Service name** section.
7. Click **Deploy**.

Koyeb handles the pulling, building, and running of the AI Gateway Docker image. Once the deployment is finished, make sure to copy the service's public URL and save it for future reference.

In the next section, you'll create an `npm` project for the demo application.

## Create a demo project

In this section, you'll set up an `npm` project and install the essential packages for the demo application. To get started, run the following command in your terminal:

```bash copy
mkdir example-portkey
```

The command creates an `example-portkey` directory on your development machine, which will be the application's root directory. Next, run the commands below to initialize a Git repository within the `example-portkey` directory:

```bash copy
cd example-portkey
git init
```

The first command switches your terminal to the `example-portkey` directory, and the second command initializes a Git repository within the directory.

Next, initialize an `npm` project in the root directory by running this command in your terminal:

```bash copy
npm init -y
```

The command above creates an `npm` project with the default configurations in the `example-portkey` directory, creating a `package.json` file in the process. Next, install the required packages by executing the commands below:

```bash copy
npm install axios body-parser ejs express
npm install -D dotenv nodemon
```

These commands install the specified JavaScript packages from the `npm` registry, with the `-D` flag indicating that these packages are meant for development only. The installed packages include:

- `axios`: A promise based HTTP client for the browser and Node.js.
- `body-parser`: A body parsing middleware for Node.js
- `ejs`: A JavaScript templating engine.
- `express`: A web framework for Node.js.

The development-only packages include:

- `dotenv`: A package for handling environment variables during development.
- `nodemon`: A package that automatically restarts development servers whenever code changes are detected.

With the packages installed, you've set up an `npm` project for the demo application. Next, you'll configure an Express service for the application.

## Set up the Express server

In this section, you'll configure an Express web server for the demo application.

First, create a file named `index.js` in the root directory. Then, add the following code to that file:

```javascript copy
require('dotenv').config()

const express = require('express')
const path = require('path')
const bodyParser = require('body-parser')

const app = express()
const port = process.env.PORT || 3000

app.use(express.json())
app.use(bodyParser.urlencoded({ extended: true }))

app.set('view engine', 'ejs')
app.set('views', path.join(__dirname, 'views'))

app.get('/', (_req, res) => {
  res.render('index')
})

app.listen(port, () => {
  console.log(`Server is running on http://localhost:${port}`)
})
```

The code begins by importing the following packages:

- `dotenv`: to manage environment variables.
- `express`: to create and manage a web server.
- `path`: to handle file and directory paths.
- `body-parser`: to parse the body of incoming requests.

It then creates an instance of an Express application and sets the server to listen on the port defined by the `PORT` environment variable, using port `3000` if the variable is not set. The server is configured to parse JSON and URL-encoded data, uses `ejs` as the view engine, and looks for EJS templates in the `views` directory.

A route handler is defined for the root path (`/`), which renders the index view when accessed. Finally, the server starts listening for requests on the specified port and logs a confirmation message that it is running.

Now that the Express server is set up, the next section will walk you through creating a page to query the LLMs.

## Set up query page

The LLM query page will include a form with an input field for questions, a dropdown menu to select the LLM, and a submit button. Upon submission, the LLM's response will be displayed on the page.

To begin, create a `views` directory in the root of your project:

```bash copy
mkdir views
```

Inside this new `views` directory, create an `index.ejs` file and add the following code to it:

```html copy
<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Portkey Gateway Questionnaire</title>
    <link
      href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/css/bootstrap.min.css"
      rel="stylesheet"
      integrity="sha384-QWTKZyjpPEjISv5WaRU9OFeRpok6YctnYmDr5pNlyT2bRjXh0JMhjY6hW+ALEwIH"
      crossorigin="anonymous"
    />
  </head>

  <body>
    <div class="container py-4">
      <div class="bg-light rounded-3 mb-4 p-5">
        <div class="container-fluid">
          <h1 class="display-5 fw-bold">Ask a Question</h1>
          <div class="col-md-8 fs-6">
            <form id="questionForm" method="POST" action="/ask">
              <div class="mb-3">
                <label for="question">Question</label>
                <input
                  type="text"
                  class="form-control col-6"
                  id="question"
                  name="question"
                  placeholder="Type your question here"
                  required
                />
              </div>
              <div class="mb-3">
                <label for="model">Model</label>
                <select class="form-control col-6" id="model" name="model" required>
                  <option value="together">Together AI</option>
                  <option value="groq">Groq</option>
                </select>
              </div>
              <button type="submit" class="btn btn-primary">Ask</button>
            </form>

            <% if(typeof response !=='undefined' ) {%>
            <h2 class="display-7 fw-bold mt-5">Answer:</h2>
            <p id="answer" class="h-100 text-bg-dark rounded-3 px-3 py-3"><%= response %></p>
            <%}%>
          </div>
        </div>
      </div>
    </div>
  </body>
</html>
```

The code added in the file above provides the HTML structure for the index view, which is rendered by the root route handler. It contains:

- [Bootstrap](https://getbootstrap.com/) for styling.
- An HTML form with an input field and a select dropdown.
- A submit button.
- A section to display the LLM response.

To view the page, modify the `script` section of the `package.json` file with the following code:

{/* prettier-ignore-start */}

```json copy
. . .
"scripts": {
  "dev": "nodemon index.js",  // [!code ++]
  "test": "echo \"Error: no test specified\" && exit 1"
}
. . .
```

{/* prettier-ignore-end */}

The code adds a `dev` script for starting the development server. It executes the `index.js` file using `nodemon`.

To run the demo application on your local machine, enter the following command in your terminal:

```bash copy
npm run dev
```

Running the command starts the Express server and shows a message confirming that it's running on the specified port. To view the page, open your web browser and go to `http://localhost:<YOUR_PORT>`. You should see the query form displayed.

In the next section, you'll set up the logic to query the LLMs through the AI Gateway.

## Add LLM querying functionality

The AI Gateway provides a [chat](https://docs.portkey.ai/docs/provider-endpoints/chat) endpoint at `/v1/chat/completions` where you can send POST requests to generate LLM responses for chat conversations. In this section, you'll add a route handler to process form data, call the chat endpoint to get a response, and return the response to the page.

Firstly, create a `.env` file in your root directory and add the code below to the file, substituting your own API keys and gateway URL:

```bash copy
TOGETHER_API_KEY="<YOUR TOGETHER API KEY>"
GROQ_API_KEY="<YOUR GROQ API KEY>"
GATEWAY_URL="<YOUR DEPLOYED AI GATEWAY URL>" # URL without the trailing slash (/)
```

Since the environment variables entered above are sensitive, make sure they aren't committed to your Git history. To prevent this, run the following command in your terminal:

```bash copy
printf "%s\n" ".env" "node_modules" > .gitignore
```

The command creates a `.gitignore` file and adds the `.env` file and `node_modules` directory to it, excluding them from the Git history.

Next, make the following changes to the code in your `index.js` file:

{/* prettier-ignore-start */}

```javascript copy
require('dotenv').config()

const express = require('express')
const path = require('path')
const bodyParser = require('body-parser')
const axios = require('axios') // [!code ++]

const app = express()
const port = process.env.PORT || 3000

// Middleware
app.use(express.json())
app.use(bodyParser.urlencoded({ extended: true }))

// set up EJS as view engine
app.set('view engine', 'ejs')
app.set('views', path.join(__dirname, 'views'))

const MODEL_MAP = { // [!code ++]
  groq: { // [!code ++]
    providerSlug: 'groq', // [!code ++]
    model: 'mixtral-8x7b-32768', // [!code ++]
    apiKey: process.env.GROQ_API_KEY, // [!code ++]
  }, // [!code ++]
  together: { // [!code ++]
    providerSlug: 'together-ai', // [!code ++]
    model: 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo', // [!code ++]
    apiKey: process.env.TOGETHER_API_KEY, // [!code ++]
  }, // [!code ++]
} // [!code ++]

app.get('/', (_req, res) => {
  res.render('index')
})

app.post('/ask', async (req, res) => { // [!code ++]
  const { question, model } = req.body // [!code ++]
  const modelInfo = MODEL_MAP[model] // [!code ++]

  if (!modelInfo) { // [!code ++]
    return res.status(400).json({ error: 'Model not found' }) // [!code ++]
  } // [!code ++]

  const { providerSlug: provider, apiKey, model: modelName } = modelInfo // [!code ++]
  const data = { // [!code ++]
    model: modelName, // [!code ++]
    messages: [{ role: 'user', content: question }], // [!code ++]
  } // [!code ++]

  try { // [!code ++]
    const url = `${process.env.GATEWAY_URL}/v1/chat/completions` // [!code ++]
    const response = await axios.post(url, data, { // [!code ++]
      headers: { // [!code ++]
        Authorization: `Bearer ${apiKey}`, // [!code ++]
        'Content-Type': 'application/json', // [!code ++]
        'x-portkey-provider': provider, // [!code ++]
      }, // [!code ++]
    }) // [!code ++]

    res.render('index', { response: `${response.data.choices[0].message.content}` }) // [!code ++]
  } catch (error) { // [!code ++]
    res.status(500).json({ error: error.message }) // [!code ++]
  } // [!code ++]
}) // [!code ++]

app.listen(port, () => {
  console.log(`Server is running on http://localhost:${port}`)
})
```

{/* prettier-ignore-end */}

The modified code imports the `axios` library and defines a `MODEL_MAP` object, which stores the configurations for two LLMs. Each configuration includes the provider's name, the model name, and the API key needed for access.

Next, the code sets up a `POST` route handler for the `/ask` endpoint. When a request is received, it extracts the question and model from the request body. It then looks up the model's configuration in the `MODEL_MAP` object and returns an error if the model is not found.

Afterwards, it creates the request payload for the AI Gateway and sends the request, including the API key in the `Authorization` header and the provider name in the `x-portkey-provider` header.

Finally, the response from the AI Gateway is returned to the client.

To test the functionality, start the server, open the UI page in your browser, enter a question, choose your preferred LLM, and submit. The response should appear on the page.

In the next section, you will deploy the demo application online on Koyeb.

## Deploy to Koyeb

The demo application is now complete and interacts with the deployed AI Gateway service to answer questions using two different LLMs. The final step is to deploy the demo application to the cloud on Koyeb.

To get started, update the `script` section in your `package.json` file with the code below:

{/* prettier-ignore-start */}

```json copy
...
"scripts": {
  "dev": "nodemon index.js",
  "start": "node index.js",  // [!code ++]
  "test": "echo \"Error: no test specified\" && exit 1"
}
...
```

{/* prettier-ignore-end */}

The code above modifies the `scripts` section of the `package.json` file, adding a `start` script which runs the `index.js` file using `node`.

Next, [create a GitHub repository](https://github.com/new) for your code, then use the following command to push your local code to the repository:

```sh
git add --all
git commit -m "Complete AI Gateway powered LLM query app."
git remote add origin git@github.com/<YOUR_GITHUB_USERNAME>/<YOUR_REPOSITORY_NAME>.git
git branch -M main
git push -u origin main
```

To deploy the code from the GitHub repository, go to the [Koyeb control panel](https://app.koyeb.com/). Then, on the **Overview** page:

1. Click **Create Service** in the left sidebar.
2. Choose the **GitHub** deploy option.
3. Search for and select your repository. Alternatively, you can use the [public example repo](https://github.com/koyeb/example-portkey) for this article by pasting the following in the **Public GitHub repository** field: `https://github.com/koyeb/example-portkey`.
4. Choose your preferred instance and deployment region.
5. Under **Environment variables**, for each variable in your `.env` file:

- Enter the variable name.
- Select **Secret** as the type.
- For the value, click **Create secret**, then specify the secret name and value, and click **Create**.

6. In the **Service name** section, enter a name for the service or use the default.
7. Click **Deploy** to start the deployment.

The Koyeb platform builds and deploys your code, then starts the application using the `start` script from the `package.json` file. You can track the deployment progress through the provided logs. Once the deployment is complete and health checks pass, your application will be up and running.

Click the provided public URL to access your live application.

### Conclusion

In this tutorial, you built a simple application that queries two different LLMs using Portkey's AI Gateway. The AI Gateway offers more than just chat completion, with features like caching, fallbacks, and load balancing. For more details on these features, refer to the [Portkey Gateway documentation](https://docs.portkey.ai/docs/product/ai-gateway).

When your application is deployed from your own repository using the Git deployment option, any code push to the deployed branch will automatically trigger a new build. The changes will go live once the deployment succeeds. If the deployment fails, Koyeb will keep the last successful production deployment active, ensuring your application continues to run without interruption.


Learn how to deploy Portkey Gateway, a request and prompt router for LLMs with a unified API, and build an application that can query more than one LLM easily.

Deploy Portkey Gateway to Koyeb to Streamline Requests to 200+ LLMs


## Introduction

Real-time automated transcription is incredibly useful for anyone who needs to capture spoken content quickly and accurately. Whether you're creating subtitles for videos, transcribing podcast episodes, or documenting meeting notes, having an automated system can save you a lot of time and effort.

In practical terms, automated transcription can be used in various real-world scenarios. For instance, journalists can transcribe interviews on the fly, educators can provide real-time captions for their lectures, and businesses can document conference calls and meetings more efficiently.

[OpenAI Whisper](https://github.com/openai/whisper) is an open-source solution built for this purpose. It uses state-of-the-art machine learning algorithms to transcribe speech with high accuracy, even handling different accents and speaking speeds.

In this tutorial, you will learn how to set up a [Streamlit](https://streamlit.io/) application, integrate OpenAI Whisper for real-time podcast transcription, and deploy the application using Docker and Koyeb, creating a scalable transcription service.

You can consult the [project repository](https://github.com/koyeb/example-whisper-transcription) as work through this guide. You can deploy the podcast transcription application as built in this tutorial using the [Deploy to Koyeb](https://www.koyeb.com/docs/build-and-deploy/deploy-to-koyeb-button) button below:

[![Deploy to Koyeb](https://www.koyeb.com/static/images/deploy/button.svg)](https://app.koyeb.com/deploy?name=whisper-transcription&type=git&repository=koyeb%2Fexample-whisper-transcription&branch=main&builder=dockerfile&instance_type=gpu-nvidia-rtx-4000-sff-ada&env%5B%5D=&ports=8000%3Bhttp%3B%2F)

## Requirements

To successfully follow this tutorial, you will need the following:

- [Git](https://git-scm.com/) installed
- [FFmpeg](https://ffmpeg.org/) installed
- [Python 3.6+](https://www.python.org/downloads/release/python-360/) or later
- A [Koyeb](https://app.koyeb.com/) account

## Demo

Before we jump into the technical details, let me give you a sneak peek of what you will be building in this tutorial:

<iframe
  width="560"
  height="315"
  src="https://www.youtube.com/embed/HlVhwUriUnE"
  title="Using OpenAI Whisper to Transcribe Podcasts"
  frameborder="0"
  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
  referrerpolicy="strict-origin-when-cross-origin"
  allowfullscreen
></iframe>

## Understanding the components

### OpenAI Whisper

OpenAI Whisper is a sophisticated speech-to-text (STT) model designed to transcribe spoken words into written text with high accuracy. Utilizing advanced machine learning algorithms, Whisper is capable of recognizing various accents, dialects, and speaking speeds. It can be integrated into voice assistants, dictation software, and real-time translation services to convert spoken language into text efficiently.

OpenAI Whisper can be used in sectors such as healthcare for medical dictation, in customer service for automated call transcriptions, and in media for generating subtitles for videos and podcasts. Its ability to handle complex speech patterns and languages makes it the go-to service in any application requiring high-quality speech-to-text conversion.

### Streamlit

Streamlit is an open-source Python library designed to create interactive data applications, often referred to as dashboards. It empowers developers to build and share data apps simply and intuitively, eliminating the need for extensive web development expertise.

Streamlit apps are created as Python scripts, which are then executed within the Streamlit environment. The library offers a set of functions that can be used to add interactive elements to the app such as upload file button.

## Steps

To build the transcription service, we'll complete the following steps:

1. [Set up the environment](#set-up-the-environment): Start by setting up your project directory, installing necessary dependencies, and configuring environment variables.
2. [Set up Streamlit](#set-up-streamlit): Next, install Streamlit and create the initial user interface for your application.
3. [Transcribe podcasts with OpenAI Whisper](#transcribe-podcasts-with-open-ai-whisper): Use OpenAI Whisper to transcribe podcasts into text with timestamps.
4. [Dockerize the Streamlit application](#dockerize-the-streamlit-application): Create a Dockerfile to containerize your application for consistent deployment.
5. [Deploy to Koyeb](#deploy-to-koyeb): Finally, deploy your application on the Koyeb platform.

## Set up the environment

Let's start by creating a new Streamlit project. To keep your Python dependencies organized you should create a virtual environment.

First, create and navigate into a local directory:

```bash copy
# Create and move to the new directory
mkdir example-whisper-koyeb-gpu
cd example-whisper-koyeb-gpu
```

Afterwards, create and activate a new virtual environment:

```bash copy
# Create a virtual environment
python -m venv venv

# Active the virtual environment (Windows)
.\venv\Scripts\activate.bat

# Active the virtual environment (Linux)
source ./venv/bin/activate
```

Now, you can install the required dependencies. Open a `requirements.txt` file in your project directory with the following contents:

```none copy
streamlit
openai-whisper
watchdog
```

Pass the file to `pip` to install the dependencies:

```bash copy
pip install -r requirements.txt
```

For the dependencies, we have included Streamlit for building the web app, OpenAI Whisper for real-time transcription, and watchdog to monitor file system events.

Don't forget to save your dependencies to a `requirements.txt` file:

```bash copy
pip freeze > requirements.txt
```

Now, let's move on to creating a new Streamlit project.

## Set up Streamlit

In this step, you will set up the Streamlit UI that will allow users to upload an audio file, click a button to start the transcribing process, and finally present the segmented transcriptions in an user-friendly manner. All of the logic for the project will reside in this file, so you can start by creating a `app.py` file with the following code:

{/* prettier-ignore-start */}

```python copy
# File: app.py

import streamlit

stream_button_styles = """
<style>
    header { display: none !important; }
</style>
"""

page_styles = """
<style>
    h1 { font-size: 2rem; font-weight: 700; }
    h2 { font-size: 1.7rem; font-weight: 600; }
    .timestamp { color: gray; font-size: 0.9rem; }
</style>
"""

page_title = "Using OpenAI Whisper to Transcribe Podcasts"

page_description = "A demo showcasing the use of OpenAI Whisper to accurately and efficiently convert spoken content from podcasts into written text."

koyeb_box = "To deploy Whisper within minutes, <a href=\"https://koyeb.com/ai\">Koyeb GPUs</a> provide the easiest and most efficient way. Koyeb offers a seamless platform for deploying AI models, leveraging high-performance GPUs to ensure fast and reliable transcriptions."

step_1 = "1. Upload Podcast"

step_2 = "2. Invoke OpenAI Whisper to transcribe podcast 👇🏻"

step_3 = "3. Transcription &nbsp; 🎉"

def unsafe_html(tag, text):
    return streamlit.markdown(f"<{tag}>{text}</{tag}>", unsafe_allow_html=True)

def main():
    # Set title for the page
    streamlit.set_page_config(page_title, layout="centered")
    # Inject hide buttons CSS
    streamlit.markdown(stream_button_styles, unsafe_allow_html=True)
    # Inject page CSS
    streamlit.markdown(page_styles, unsafe_allow_html=True)
    # Create a H1 heading on the page
    streamlit.title(page_title)
    unsafe_html("h2", page_description)
    unsafe_html("p", koyeb_box)
    audio_file = streamlit.file_uploader(step_1, type=["mp3", "mp4", "wav", "m4a"])
    if audio_file:
        # If file is received
        # Write the file on the server
        # Show next step
        unsafe_html("small", step_2)
        if streamlit.button("Transcribe"):
            # Get the transcription
            unsafe_html("small", step_3)
            # Showcase the transcription

if __name__ == "__main__":
    main()
```

{/* prettier-ignore-end */}

The code above does the following:

- Begins by importing the Streamlit module
- Defines CSS for hiding the navigation bar and styling the headings
- Defines text values for the page's title, description, and each step
- Defines the `unsafe_html` function to dynamically create the HTML tags with content
- Accepts an audio file using Streamlit's builtin `file_uploader` function

With this, you have setup a UI that is able to accept podcast audio files from the user. Now, let's move on to transcribing the audio file obtained.

## Transcribe podcasts with OpenAI Whisper

In this step, you will invoke OpenAI Whisper's base model to transcribe an audio file. By default, the model is able to return the timestamps along with the transcription. This enables you to use the generated transcriptions as subtitles as well. Make the following additions in the `app.py` file:

{/* prettier-ignore-start */}

```python copy
# File: app.py

# Existing imports
# . . .
import whisper # [!code ++]

model = whisper.load_model("base") # [!code ++]

# ...

def unsafe_html(tag, text):
    # ...

# Generate transcription of each segment
def timestamp_html(segment): # [!code ++]
    return f'<span class="timestamp">[{segment["start"]:.2f} - {segment["end"]:.2f}]</span> {segment["text"]}' # [!code ++]

# Transcribe an audio file
def transcribe_audio(audio_file): # [!code ++]
    return model.transcribe(audio_file.name) # [!code ++]

# Write the audio file on server
def write_audio(audio_file): # [!code ++]
    with open(audio_file.name, "wb") as f: # [!code ++]
        f.write(audio_file.read()) # [!code ++]

def main():
    # ...
    if audio_file:
        # If file is received
        # Write the file on the server
        write_audio(audio_file) # [!code ++]
        # Show next step
        unsafe_html("small", step_2)
        if streamlit.button("Transcribe"):
            # Get the transcription
            transcript_text = transcribe_audio(audio_file) # [!code ++]
            unsafe_html("small", step_3)
            # Showcase the transcription
            for segment in transcript_text["segments"]: # [!code ++]
                unsafe_html("div", timestamp_html(segment)) # [!code ++]

if __name__ == "__main__":
    main()
```

{/* prettier-ignore-end */}

The changes above do the following:

- Import and instantiate the OpenAI Whisper base model
- Define a `timestamp_html` function to display the transcription with **start** and **end** timestamps
- Define a `transcribe_audio` function which invokes the model to generate transcriptions of the audio file
- Define a `write_audio` function to write the audio file on the server
- If an audio file is found, it writes the file on the server
- If the **Transcribe** button is clicked in the UI, `transcribe_audio` and `timestamp_html` functions are invoked to generate and display the transcriptions of the podcast

Now, you can run the Streamlit application with:

```bash copy
streamlit run ./app.py --server.port 8000
```

The application would now be ready on `http://localhost:8000`. Test the application in action by uploading one of your favorite podcasts file and see the transcriptions generated in real-time.

Next, let's dockerize the application to ensure consistency between multiple deployments.

## Dockerize the Streamlit application

Dockerizing deployments creates a consistent and reproducible environment, ensuring that the application runs the same way on any system. It simplifies dependency management and enhances scalability, making deployments more efficient and reliable. To dockerize, create a `Dockerfile` at the root of your project with the following content:

{/* prettier-ignore-start */}

```dockerfile {16,25,27} copy
FROM python:3.9 AS builder

WORKDIR /app

RUN python3 -m venv venv
ENV VIRTUAL_ENV=/app/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

COPY requirements.txt .
RUN pip install -r requirements.txt

FROM python:3.9 AS runner

WORKDIR /app

RUN apt-get update && apt-get install -y ffmpeg && rm -rf /var/lib/apt/lists/*

COPY --from=builder /app/venv venv
COPY app.py app.py

ENV VIRTUAL_ENV=/app/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

EXPOSE 8000

CMD ["streamlit", "run", "./app.py", "--server.port", "8000"]
```

{/* prettier-ignore-end */}

Apart from the usual Dockerfile to deploy Python applications, the following tweaks and additions have been made in this code:

- `RUN apt-get update && apt-get install -y ffmpeg && rm -rf /var/lib/apt/lists/*` is used to install `ffmpeg`, and then clean up package lists to reduce image size
- `EXPOSE 8000` is used to specify the port on which the Streamlit application will run
- `CMD ["streamlit", "run", "./app.py", "--server.port", "8000"]` is used to define the command to start the Streamlit app on port 8000

With everything configured, let's move on to deploy the application to Koyeb.

## Deploy to Koyeb

Now that you have the application running locally you can also deploy it on Koyeb and make it available on the internet.

Create a [new repository on your GitHub account](https://github.com/new) so that you can push your code.

You can download a [standard `.gitignore` file](https://raw.githubusercontent.com/github/gitignore/main/Python.gitignore) for Python from GitHub to exclude certain directories and files from being pushed to the repository:

```bash copy
curl -L https://raw.githubusercontent.com/github/gitignore/main/Python.gitignore -o .gitignore
```

Run the following commands in your terminal to commit and push your code to the repository:

```bash copy
git init
git add app.py Dockerfile requirements.txt .gitignore
git commit -m "first commit"
git branch -M main
git remote add origin [Your GitHub repository URL]
git push -u origin main
```

You should now have all of your local code in your remote repository. Now it is time to deploy the application.

Within the [Koyeb control panel](https://app.koyeb.com/), while on the Overview tab, initiate the app creation and deployment process by clicking **Create Service** and then choosing **Create web service**.

1. Select **GitHub** as the deployment source.
2. Select your repository from the menu. Alternatively, deploy from the [example repository associated with this tutorial](https://github.com/koyeb/example-whisper-transcription) by entering `https://github.com/koyeb/example-whisper-transcription` in the public repository field.
3. In the **Instance** selection, select a GPU Instance.
4. In the **Builder** section, choose **Dockerfile**.
5. Finally, click the **Deploy** button.

Once the application is deployed, you can visit the Koyeb service URL (ending in `.koyeb.app`) to access the Streamlit application.

## Conclusion

In this tutorial, you built a podcast transcription application with the Streamlit framework and OpenAI Whisper. During the process, you learned how to invoke the OpenAI Whisper model in Python to generate transcription with timestamps, and how to use the Streamlit framework to quickly prototype a user interface with a functioning upload button in a few lines of code.

Given that the application was deployed using the Git deployment option, subsequent code push to the deployed branch will automatically initiate a new build for your application. Changes to your application will become live once the deployment is successful. In the event of a failed deployment, Koyeb retains the most recent operational production deployment, ensuring the uninterrupted operation of your application.


Learn how to use OpenAI Whisper to build an app to generate transcription of podcast audio files in real-time.

Using OpenAI Whisper to Transcribe Podcasts on Koyeb


## Introduction

[Continue](https://continue.dev) is an open-source AI code assistant that connects any models and context to build custom autocomplete prompts and chat experiences inside the IDE, like VS Code and JetBrains.

[Ollama](https://ollama.com/) is a self-hosted AI solution to run open-source large language models on your own infrastructure, and [Codestral](https://ollama.com/library/codestral) is [MistralAI's](https://mistral.ai/) first-ever code model designed for code generation tasks.

In this guide, we will demonstrate how to use Continue with [Ollama](/deploy/ollama), the [Mistral Codestral](https://mistral.ai/news/codestral/) model, and [Koyeb GPUs](/blog/gpus-public-preview-run-ai-workloads-on-h100-a100-l40s-and-more) to build a custom, self-hosted AI code assistant.

When complete, you will have a private AI code assistant for autocomplete prompts and chat available within VS Code and JetBrains.

## Requirements

To successfully follow and complete this guide, you need:

- A [Koyeb account](https://app.koyeb.com) to deploy and run Ollama
- [VS Code](https://code.visualstudio.com/) or [JetBrains](https://www.jetbrains.com/) installed on your machine

## Steps

To complete this guide and build a custom AI code assistant using Continue, Ollama, Codestral, and Koyeb GPUs, you need to follow these steps:

1. [**Deploy Ollama on Koyeb's GPUs**](#deploy-ollama-on-koyebs-gpus)
2. [**Install and configure the Continue package in VS Code**](#install-and-configure-the-continue-package-in-vs-code)
3. [**Get started with your custom AI code assistant**](#get-started-with-your-custom-ai-code-assistant)

## Deploy Ollama on Koyeb's GPUs

To get started, we will deploy Ollama on Koyeb's GPUs. Ollama will be used to run the Mistral Codestral model on a Koyeb RTX 4000 SFF ADA which is ideal for cost-effective AI inference and running open-source large language models.

To create and deploy Ollama on Koyeb, we will use the [Deploy to Koyeb](https://www.koyeb.com/docs/build-and-deploy/deploy-to-koyeb-button) button below:

[![Deploy to Koyeb](https://www.koyeb.com/static/images/deploy/button.svg)](https://app.koyeb.com/deploy?name=ollama&type=docker&image=ollama%2Follama&command=serve&instance_type=gpu-nvidia-rtx-4000-sff-ada&env%5B%5D=&ports=11434%3Bhttp%3B%2F)

On the service configuration page, you can customize the [Service](/docs/reference/services) name, [Instance](/docs/reference/instances) type, and other settings to match your requirements.

When you are ready, click the **Deploy** button to create the service and start the deployment process.

After a few seconds, your Ollama service will be deployed and running on Koyeb.

The next step is to pull the Mistral Codestral model to use it with Ollama. To do so, retrieve the Service URL from the Koyeb dashboard and run the following command in your terminal:

```bash
curl https://<YOUR_SUBDOMAIN>.koyeb.app/api/pull -d '{
  "name": "codestral"
}'
```

<Admonition>Take care to replace the base URL ending in `koyeb.app` with your actual service URL.</Admonition>

Ollama will pull the Mistral Codestral model and prepare it for use. This might take a few moments. Once it's done, we can move to the next step and configure Continue to use the `ollama` provider.

## Install and configure the Continue package in VS Code

With Ollama deployed, we will show how to configure Continue for VS Code to use `ollama` as a provider. For JetBrains, please refer to the [Continue documentation](https://docs.continue.dev/quickstart#jetbrains).

Get started by installing the [Continue VS Code extension](https://marketplace.visualstudio.com/items?itemName=Continue.continue).
This will open the Continue extension page for VS Code. Click the **Install** button to install the extension.

Once the install has completed, open the `~/.continue/config.json` file on your machine and edit it to match the format below:

```json
{
  "models": [
    {
      "title": "Codestral on Koyeb",
      "apiBase": "https://<YOUR_SUBDOMAIN>.koyeb.app/",
      "provider": "ollama",
      "model": "llama3:8b"
    }
  ]
}
```

The above configuration tells Continue to:

1. use the `ollama` provider
2. use the Mistral Codestral model
3. use the Ollama Instance located at the Koyeb Service URL

<Admonition>Take care to replace the `apiBase` value with your Ollama Service URL.</Admonition>

Restart VS Code to apply the changes and get started using the AI code assistant.

## Get started with your Custom AI code assistant

Use the following shortcuts to access Continue and interact with the AI code assistant:

- cmd+L (MacOS)
- ctrl-L (Windows / Linux)

You can now start asking questions about your codebase, get autocomplete suggestions, and more.

<Banner
  title="Blazing-Fast AI Deployments"
  description="Enjoy automatic continuous deployment, global load balancing, real-time metrics and monitoring, autoscaling, and more."
  type="claim-free"
  buttonText="Deploy Now"
  buttonLink="https://app.koyeb.com/"
/>

## Conclusion

In this guide, we demonstrated how to use Continue, Ollama, MistralAI's Codestral, and Koyeb GPUs to build a custom autocomplete and chat experience inside of VS Code.
This tutorial covers the basics of how to get started using Continue. To go further, be sure to check out the [Continue documentation](https://docs.continue.dev/intro)
to learn more about how to use Continue.


Run your AI workloads worldwide on high-performance infrastructure.

Global deployments of LLMs and AI apps

This guide shows how to use Continue with Ollama, a self-hosted AI solution to run the Mistral Codestral model on Koyeb GPUs

Use Continue, Ollama, Codestral, and Koyeb GPUs to Build a Custom AI Code Assistant

Deploy your AI applications across continents and run on an edge network of 250 locations.

Low latency AI workloads

Learn how to set up a vLLM Instance to run inference workloads and host your own OpenAI-compatible API on Koyeb.

Deploy the vLLM Inference Engine to Run Large Language Models (LLM) on Koyeb

Learn how to use Groq, speech-to-text (STT), and text-to-speech (TTS) to build an app to automatically translate between languages in real-time.

Using Groq to Build a Real-Time Language Translation App

Deploy LLaVA and your favorite cutting-edge AI tech across continents. No credit card required.

This tutorial walks through how to build a multimodal vision chat app powered by LLaVa, Chainlit, and Replicate.

Build a Multimodal Chat App using LLava, Chainlit, and Replicate

Deploy your favorite cutting-edge AI tech across continents. No credit card required.

This tutorial walks through how to build a chatbot powered by MistralAI, with FastAPI as the backend and FastUI as the front end.

Use MistralAI, FastAPI, and FastUI to Build a Conversational AI Chatbot

Leverage Koyeb's high-end bare metal machines for your AI workloads. No credit card required.

Run high-performance AI workloads

Learn step-by-step how to set up and utilize AutoGen within Chainlit. You'll discover how to create and interact with AI personas that are tailored to your specific needs, be it scriptwriting for YouTube content or ideating SaaS products.

Use AutoGen, Chainlit, and OpenAI to Generate Dynamic AI Personas

Deploy your fulll stack apps next to fully-managed serverless PostgreSQL databases on high-end infrastructure. No credit card required.

Serverless PostgreSQL with built-in pgvector

In this tutorial, we showcase how to deploy a FAQ search service built with Hugging Face's Inference API, pgvector, Koyeb's Managed Postgres. The optimized FAQ Search leverages sentence similarity searching to provide the most relevant results to a user's search terms.

Use pgvector and Hugging Face to Build an Optimized FAQ Search with Sentence Similarity

Deploy LangChain, Mistral 7B, and your favorite cutting-edge AI tech across continents. No credit card required.

Run global AI workloads

This guide explains how to build a YouTube video summarization using Langchain, Deepgram, and Mistral 7B. Deploy your AI workload on Koyeb to enjoy high-performance microVMs, seamless scaling, and fast global deployments.