You can find the deployed app over here
Transcription is the process of converting audio to text. Although you can implement a Machine Learning Model to get the text from audio, it is cumbersome.
- Extensive knowledge of audio signal processing is needed to extract features from an audio signal.
- A large amount of data will have to be mined/scraped from various sources
- Knowledge of Machine Learning Libraries such as PyTorch or Tensorflow is required
Fortunately, AssemblyAI has a free tier version available which lets us transcribe audio by making a few requests.
In this article, we will build a web app that can transcribe audio using AssemblyAI and Streamlit, a Python Library to build UIs for Machine Learning Models.
You can find the repo with the entire source code here
Requirements
- An AssemblyAI account (sign up for free here)
- An AssemblyAI API key (you can find it here)
- Basic Knowledge of Python 3.5+( Note: I will using Python 3.9 for this tutorial)
- Although not required, familiarity with the requests library will be helpful
Libraries we will use
AssemblyAI
AssemblyAI is used to convert audio to text. It provides a REST API that can be used in any language that make a REST API call such as JavaScript, PHP, Python, etc. We will be using Python to make requests to the API.
Streamlit
Streamlit is an open source app framework for building a UI for Machine Learning Models without needing to know HTML, CSS, or JavaScript. It has an extensive library of pre-built components that can be used to build a simple UI in a matter of minutes.
Requests
We will be using the Requests library to make requests to AssemblyAI’s REST API.
Python-dotenv
We will be using the Python-dotenv library to read variables from .env files.
Setting up the Project Directory
Create a new folder/directory using the command line
mkdir ASSEMBLYAI
To keep secret credentials secret, it’s good practice to store credentials inside a .env file. We can then use the Python-dotenv library to read the credentials from the .env file. We could also store them in environment variables, if preferred.
Inside the new directory you created, ASSEMBLYAI, create two new Python files and a .env file
If using Windows:
New-Item main.py, transcribe.py, .env
And if using macOS or Linux:
touch touch main.py && transcribe.py && touch .env
The file main.py will contain all the code related to the Streamlit UI while the file transcribe.py will contain the helper functions and the code which interacts with AssemblyAI’s API.
You can download a sample MP3 file from here. Name the file “testData” and save the it inside the ASSEMBLYAI directory.
Setting up the Project Environment
Ensure you are in the ASSEMBLYAI directory, if you are not just use the cd command to change directory.
cd ASSEMBLYAI
If this is your first time working with virtual environments, you’ll have to install virtualenv
If using Windows:
python -m pip install — user virtualenv
And if using macOS or Linux:
python3 -m pip install — user virtualenv
First, we need to create a virtual environment by entering the following code on the command line:
If using Windows:
python -m venv venv
And if using macOS or Linux:
Python3 -m venv venv
We will then need to activate the local virtual environment with the following command on the command-line:
If using Windows:
venv/Scripts/activate
And if using macOS or Linux:
source venv/bin/activate
For more details on how to setup a virtual environment, refer to this website.
To install the Requests, Steamlit and Python-dotenv libraries respectively, we can enter this one line on the command line:
pip install streamlit, requests, python-dotenv
This will install the up-to-date libraries required.
This is how your file structure should look.
Add API Key to .env file
- Open the .env file you had created in the “Setting up the Project Environment” section
- Add the following
API_TOKEN = “Your API Key”
- Replace the string “Your API Key” with the API Key given to you by Assembly AI
How to clone the repo and run it
- Go to the GitHub repo and download it
- Start a command line and change directory to downloaded folder
- Setup a virtual environment by following the previous section
- Activate the virtual environment
- Create a .env file inside the downloaded folder and add your API Key (Refer to the previous section)
- To install the required libraries, you can type the names manually and install them or use the requirements.txt file provided
If using Windows:
pip install streamlit, requests, python-dotenv
And if using macOS or Linux:
pip install streamlit requests python-dotenv
Or
pip install -r requirements.txt
- Once all the requirements have been successfully installed, type the following command
streamlit run main.py
This should run the webapp. You can try uploading a file.
Transcribing mp3 files
Before building the UI, we will need a few helper functions which we can use to upload the file to AssemblyAI’s server for the model to work on and return the transcribed text.
The code for the helper functions should be written in the transcribe.py file
Import the required modules
This piece of code should be present at the beginning of the transcribe.py file
import os
from dotenv import load_dotenv
import requests
Helper Function 1: Uploading a local audio file to AssemblyAI
The first function we need to write is a way to upload an audio file stored on our local machine. This function should be present inside the transcribe.py file
The AssemblyAI model expects the file to be accessible via a URL. Therefore, we will need to upload the audio file to blob storage to make it accessible via a URL. Fortunately, AssemblyAI provides a quick and easy way to do this.
We need to make a POST request to the following AssemblyAI API endpoint:
https://api.assemblyai.com/v2/upload
The response will contain a temporary URL to the file, we can pass this URL to the back to the AssemblyAI ‘transcript` API endpoint. The URL is a private URL accessible only to the AssemblyAI servers.
All the uploads are immediately deleted after transcription and never stored.
We will use Python’s request library that we installed earlier to make the POST request
def get_url(token,data):
‘’’
Parameter:
token: The API key
data : The File Object to upload
Return Value:
url : Url to uploaded file
‘’’
headers = {‘authorization’: token}
response = requests.post(‘https://api.assemblyai.com/v2/upload',
headers=headers,
data=data)
url = response.json()[“upload_url”]
print(“Uploaded File and got temporary URL to file”)
return url
- The function accepts a couple of the parameters: the API token and the file object to be uploaded
- We make a POST request to the above-mentioned AssemblyAI Upload API endpoint and include the API token and the file object as a part of the request body.
- The response object contains the URL to the uploaded file. This URL is returned by the function
Helper Function 2: Uploading a file for transcription
Now that we have a function to get a URL for our audio file, we will use this URL and make a request to the endpoint which will actually transcribe the file. This function should also be present inside the transcribe.py file
Initially, when we request a transcription the audio file has a status of “queued”. We will talk more about how the file goes from being “queued” to “complete” in the last helper function. For now, we only need to make a request to the Transcription Endpoint along with the URL to the file. We need to make a request to the following AssemblyAI API endpoint:
https://api.assemblyai.com/v2/transcript
This function is pretty similar to the previous function.
def get_transcribe_id(token,url):
‘’’
Parameter:
token: The API key
url : Url to uploaded file
Return Value:
id : The transcribe id of the file
‘’’
endpoint = “https://api.assemblyai.com/v2/transcript"
json = {
“audio_url”: url
}
headers = {
“authorization”: token,
“content-type”: “application/json”
}
response = requests.post(endpoint, json=json, headers=headers)
id = response.json()[‘id’]
print(“Made request and file is currently queued”)
return id
- The function accepts a couple of the parameters: the API token and the audio file URL from the previous function.
- We make a POST request to the AssemblyAI “transcript” API endpoint.. If an audio file is not currently in progress, the new file is immediately processed. If there is a transcription in progress,, then the new audio file is queued until the previous job is complete.
If you wish to be able to run multiple jobs simultaneously, you will need to upgrade to a premium plan
- The response object will contain the ID of the transcription. This ID along with a separate endpoint will be used to get the status of the transcription.
- The function will then return this ID
Helper Function 3: Downloading an audio transcription
Once we have the transcription ID of and audio file, we can make a GET request to the following AssemblyAI API endpoint to check the status of the transcription
https://api.assemblyai.com/v2/transcript/{transcribe_id}
The status of transcription changes from “queued” to “processing” to “completed” as long as no errors are encountered.
We will need to poll this endpoint until we get a response object with the status “completed”.
We can make use of a while loop to continuously make requests to the endpoint. During each iteration of the loop, we will check the status of the transcription. The loop will keep on running till the status is “completed”. This process of making requests and waiting till the status is complete is known as polling. We will implement this polling feature in the “Building the Streamlit UI” section.
The following function will simply get the current status of the prescription. This function should be present in the transcribe.py file
def get_text(token,transcribe_id):
‘’’
Parameter:
token: The API key
transcribe_id: The ID of the file which is being
Return Value:
result : The response object
‘’’
endpoint= f”https://api.assemblyai.com/v2/transcript/{transcribe_id}"
headers = {
“authorization”: token
}
result = requests.get(endpoint, headers=headers).json()
return result
Helper Function 4: Requesting a transcription from the UI
Our third function will call both of the previous functions successively.
This function will also be connected to the “Upload” button in our Streamlit UI. The function has only one parameter: the file object. The function will do the following
- It will load the API token from our .env file.
- It will use the token to call the previously defined functions
- It will return the transcription ID
Below is the code snippet for the function. This function should be present in the transcribe.py file
def upload_file(fileObj):
‘’’
Parameter:
fileObj: The File Object to transcribe
Return Value:
token : The API key
transcribe_id: The ID of the file which is being transcribed
‘’’
load_dotenv()
token = os.getenv(“API_TOKEN”)
file_url = get_url(token,fileObj)
transcribe_id = get_transcribe_id(token,file_url)
return token,transcribe_id
- We will use the load_dotenv() function to load our .env file. Then using the We will use the get() function in the os library, to read the API token from the .env file.
- Call the get_url() function with the file object and token as parameters
- Call the get_transcribe_id() function with the token and file_url returned by the get_url() function
- Return the token and the transcription ID
Building the Streamlit UI
Now that we have all the required helper functions, we can begin work on the Streamlit UI.
However, before moving on to the actual code of the Streamlit UI, let’s take a look at the Streamlit components we will be using.
- header(string), subheader(string),text(string) — These components display text of various sizes on our UI. header() can be thought of as the <h1> tag, subheader() as <h2> and text() as <p>
- file_uploader(label) — This creates a button to upload the file. The parameter label is the string to be displayed above the button. It returns a file object. We will use this to accept files from the user
- progress(integer) — Creates a progress bar. The integer has to between 0 and 100. It represents the percentage of the specified task completed. If we create a for loop with a sleep of 0.1 s b/w each iteration, we can create a cool progress bar animation
- spinner(label) — The label is displayed as long as we are inside its code block
- balloons() — This displays balloons, yup it is pretty cool 🎈
Building the components of the UI
The below code should be written in the main.py file. The main.py file will be the entry point for our web app.
First, we will need to import all our required modules and libraries
import streamlit as st
from transcribe import *
import time
transcribe is the name ofthe file with our helper functions.
To make sure that you have imported the libraries currently, you could try running the following command in the command line. Ensure your virtual environment is activated before running the command and you are currently in the root folder (ASSEMBLYAI)
streamlit run main.py
You should see a blank web app. To re-run the app, either you can click the hamburger menu and click on Rerun or you can open the web app and press “Ctrl + R” or “Cmnd + R”
Let’s start by creating a header and the upload button.
Type the following code in the main.py file
st.header(“Transcribe Audio”)
fileObject = st.file_uploader(label = “Please upload your file” )
After you re-run the app, you should see the following
Initially, the variable is “None”, once the file is uploaded the variable’s value is the file object.
if fileObject:
token, t_id = upload_file(fileObject)
result = {}
#polling
sleep_duration = 1
percent_complete = 0
progress_bar = st.progress(percent_complete)
st.text(“Currently in queue”)
while result.get(“status”) != “processing”:
percent_complete += sleep_duration
time.sleep(sleep_duration)
progress_bar.progress(percent_complete/10)
result = get_text(token,t_id)
sleep_duration = 0.01
for percent in range(percent_complete,101):
time.sleep(sleep_duration)
progress_bar.progress(percent)
- Essentially, if the variable fileObject is not “None”, we call the upload_file function.
- We use a while loop to poll the endpoint.
- A progress bar is created and in every iteration of the while loop, the program sleeps for a second and increments the percentage of the progress bar by 1
- Once the status changes to “processing”, the sleep time is decreased to 0.01 second. This results in a pretty cool animation where initially, the progress bar progresses slowly and once the file is being processed it progresses really quick
- Once the progress bar is at 100%, we poll the endpoint again. This time to check if the status is “completed”. We use the spinner() function to show text on the screen while we are polling
st.balloons()
st.header(“Transcribed Text”)
st.subheader(result[‘text’])
- Once the status is “completed”, we exit the while and loop show balloons on the screen using the balloons() function
- Finally, we display the transcribed text on the screen
Conclusion
Congratulations! 👏 You have successfully built a web app that can transcribe audio. Some additional features you could build on top of the web app
- AssemblyAI lets the user specify Acoustic Model and/or Language Model. You could use Streamli’s select box to build a drop down like feature
- Add a feature that lets the user record their voice and transcribe it. This might be slightly complicated since Streamlit doesn’t have any in-built components to record voice. However, you can use HTML and JavaScript to build such a feature. Check out this answer for reference.