Streamlit Pytube screncast

How to build a web app to download Youtube videos in 30 lines of code

Posted by

We will be using Streamlit and pytube to build our youtube downloader web app. I will also give a brief overview of the pytube library.

We will be implementing the following features

  • The ability for the user to give the URL as an input
  • If available, the ability to chose between video with audio/audio download

Setup Virtual Environment

Before we start, we will need to set up and activate a virtual environment

pip install virtualenv /* Install virtual environment */
virtualenv venv /* Create a virtual environment */
venv/Scripts/activate /* Activate the virtual environment */

Install Required Libraries

We will need to install the following Libraries

Streamlit

https://www.streamlit.io/

pytube

pytube – pytube 10.1.0 documentation

Type the following command to install the libraries

pip install streamlit, pytube

Step 1: Import Libraries

import streamlit as st
from pytube import YouTube

Import the installed libraries and define aliases where needed.

Step 2: Header and URL Input

Screenshot by Author

We will be using streamlit’s title() and subheader() functions for the text. Both of them take a string as a parameter and display it.

For the user input, we will use the text_input() function. A label can be passed as a parameter.

st.title("Youtube Video Donwloader")
st.subheader("Enter the URL:")
url = st.text_input(label='URL')

We will need to store the URL inside a variable. Every time the user types in an URL and hit’s enter, the app is re-run and the text_input() function will return the input URL

I will be working with the following URL https://www.youtube.com/watch?v=4Q46xYqUwZQ

Step 3: Getting the Data for the Youtube video

First, we will need to create an instance of the YouTube object we imported from pytube. The constructor of Youtube object requires the URL to be passed in as a parameter.

yt = YouTube(url)

The YouTube Object has many useful attributes, some of them are listed below

  • thumbnail_url: The URL to the thumbnail image. This URL can be used by streamlit’s image() function to display an image.
  • title: The title of the youtube video
  • length: The video length in seconds
  • rating: Average rating of the video

The Youtube Object also has an attribute called streams. It is a list of stream objects. We will be discussing them in the next step.

Step 4: Understanding the Stream Object

print(yt.streams)

I have modified the output slightly. I have also truncated the output since it is pretty long.

---------------------- Stream Object 1 -----------------------------
<Stream: itag="18" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.42001E" acodec="mp4a.40.2" progressive="True" type="video">, 

---------------------- Stream Object 2 -----------------------------
<Stream: itag="22" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.64001F" acodec="mp4
a.40.2" progressive="True" type="video">, 

---------------------- Stream Object 3 -----------------------------
<Stream: itag="299" mime_type="video/mp4" res="1080p" fps="60fps" 
vcodec="avc1.64002a" progressive="False" type="video">, 

---------------------- Stream Object 4 -----------------------------
<Stream: itag="303" mime_type="video/webm" res="1080p" fps="60fps" vcod
ec="vp9" progressive="False" type="video">

We will need to choose one of the streams and use it to download our video/audio. There are two types of streams Progressive and Dynamic Adaptive Streaming over HTTP (DASH). A progressive stream has both the video and audio components while a DASH stream only has one of them. DASH streams are used to download videos in high resolution. The video and audio components will have to be download separately and then merged together using a software. 

If you inspect the Stream object, you will notice each object has a boolean property called progressive. Stream Object 1 and Stream Object 2 have progressive set to True implying it is a progressive stream. Both of them a vcodec property and a acodec property referring to video and audio respectively. 

On the other hand, Stream Object 3 and Stream Object 4 have progressive set to False implying it is a DASH stream. Both only have the vcodec property and do not have the acodec property. Therefore they are only the video components.

Each stream object also has a res propery which is the resolution of the video.

According to pytube’s official docs

The legacy streams that contain the audio and video in a single file (referred to as “progressive download”) are still available, but only for resolutions 720p and below.

Each stream object has a download() method which takes in the path as an optional parameter. If no path is provided, the file will be downloaded in the same folder as your python script.

Step 5: Chosing a Stream Object to Download

As mentioned before, the YouTube object has an attribute called streams. This is a StreamQuery Object. It has quite a few helpful methods to help us get a stream object. 

  • filter(res , progressive , only_audio , only_video): This method returns the list of streams based on the value of the parameters passed. res is a string while the others are boolean. Setting progressive to True will only return the progressive streams. Setting only_audio to True will return the streams with only audio components and similarly setting only_video to True will only return the video components which do not have an audio component. This method returns a StreamQuery object.
  • first(): Returns the first stream object in the StreamQuery Object
  • last(): Returns the last stream object in the StreamQuery Object
  • get_highest_resolution(): Returns progressive stream object with highest resolution
  • get_lowest_resolution(): Returns progressive stream object with lowest resolution
  • get_by_itag(itag): If you want a specific stream object, you can pass it’s itag

Note: The filter() method returns a StreamQuery Object, one of the 5 methods below the filter() method will need to be used to return a Stream Object which we can download.

Step 6: Combining the the above Steps

Screenshot by Author
if url != '':
    yt = YouTube(url)
    st.image(yt.thumbnail_url, width=300)
    st.subheader('''
    {}
    ## Length: {} seconds
    ## Rating: {} 
    '''.format(yt.title , yt.length , yt.rating))
    video = yt.streams
    if len(video) > 0:
        downloaded , download_audio = False , False
        download_video = st.button("Download Video")
        if yt.streams.filter(only_audio=True):
            download_audio = st.button("Download Audio Only")
        if download_video:
            video.get_lowest_resolution().download()
            downloaded = True
        if download_audio:
            video.filter(only_audio=True).first().download()
            downloaded = True
        if downloaded:
            st.subheader("Download Complete")
    else:
        st.subheader("Sorry, this video can not be downloaded")
  • First, we check if the user has input the URL. If we create a Youtube object with an empty URL, it will give an error
  • Then we get the information about the video such as the Thumbnail Image URL, Title, Length of video, and Average Ratings. We use streamlit’s image() method to display the image.
  • Next, we get the StreamQuery Object, some Youtube videos are not available to download and will return an empty StreamQuery Object. Songs like God’s Plan by Drake return an empty StreamQuery. I am not sure why this is the case, do let me know in the comments if you find a workaround 🙂
  • We use streamlit’s button() method to create a button. It returns a boolean method which is initially set to False. Every time the button is clicked, Streamlit re-runs the app and the button() method returns True
  • Based on the button clicked by the user, we either download the video with audio or the audio-only.

Conclusion

We have successfully built a web app to download YouTube videos in 30 lines of code 😃 As mentioned above, some youtube videos return an empty StreamQuery object and as a result, can not be downloaded. Please let me know if you find a work-around or have any speculation as to why this happens.

If you are interested in deploying your streamlit app, check my tutorial below