Let’s build a web app using Streamlit and sklearn
In this tutorial, we will be working with three datasets (Iris, Breast Cancer, Wine)
We will use 3 different models (KNN, SVM, Random Forest) for classification and give the user the ability to set some parameters.
Install and Import Necessary Libraries
Setup Virtual Environment
pip install virtualenv /* Install virtual environment */ virtualenv venv /* Create a virtual environment */ venv/Scripts/activate /* Activate the virtual environment */
Install Libraries
Make sure your virtual environment is activated before installing the libraries
pip install streamlit, seaborn, scikit-learn
Import the Libraries
import streamlit as st from sklearn.datasets import load_wine, load_breast_cancer, load_iris from sklearn.neighbors import KNeighborsClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.svm import SVC from sklearn.model_selection import train_test_split from sklearn.decomposition import PCA import matplotlib.pyplot as plt import seaborn as sns import pandas as pd
We import streamlit, the datasets from sklearn, various models from sklearn, libraries needed to make our plots and pandas.
Helper Functions
Function to get the dataset
def return_data(dataset): if dataset == 'Wine': data = load_wine() elif dataset == 'Iris': data = load_iris() else: data = load_breast_cancer() df = pd.DataFrame(data.data, columns=data.feature_names , index=None) df['Type'] = data.target X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, random_state=1, test_size=0.2) return X_train, X_test, y_train, y_test,df,data.target_names
- The function takes in a string which contains the name of the dataset the User selects
- It loads the relevant dataset
- We create a dataframe which we can show in our UI
- We use sklearn’s train_test_split() to create the train and testing sets
- The function returns the train set, test set, the dataframe and the target classes
Function to return the model
We will be using streamlit’s slider component to get the input for the parameters from the user.
st.sidebar.slider(label = ‘’ , min_value = 1, max_value = 100) creates a slider in the sidebar.
def getClassifier(classifier): if classifier == 'SVM': c = st.sidebar.slider(label='Chose value of C' , min_value=0.0001, max_value=10.0) model = SVC(C=c) elif classifier == 'KNN': neighbors = st.sidebar.slider(label='Chose Number of Neighbors',min_value=1,max_value=20) model = KNeighborsClassifier(n_neighbors = neighbors) else: max_depth = st.sidebar.slider('max_depth', 2, 10) n_estimators = st.sidebar.slider('n_estimators', 1, 100) model = RandomForestClassifier(max_depth = max_depth , n_estimators= n_estimators,random_state= 1) return model
- Like the previous function, this function takes a parameter which is a string containing the model’s name.
- Based on the selected model, we ask the user to give the value of the parameter.
- For SVM, we take the C parameter as an input from the user
- For KNN, we take the number of nearest neighbours for the model to consider while making its prediction
- For Random Forest, we take the number of decision trees and the max_depth of the decision tree
- We then create an instance of the model and return the model
Function for PCA
def getPCA(df): pca = PCA(n_components=3) result = pca.fit_transform(df.loc[:,df.columns != 'Type']) df['pca-1'] = result[:, 0] df['pca-2'] = result[:, 1] df['pca-3'] = result[:, 2] return df
We use sklearn’s PCA. We add the 3 components to the dataframe and return it.
Building the UI
# Title st.title("Classifiers in Action") # Description st.text("Chose a Dataset and a Classifier in the sidebar. Input your values and get a prediction") #sidebar sideBar = st.sidebar dataset = sideBar.selectbox('Which Dataset do you want to use?',('Wine' , 'Breast Cancer' , 'Iris')) classifier = sideBar.selectbox('Which Classifier do you want to use?',('SVM' , 'KNN' , 'Random Forest'))
We use streamlit’s selectbox component to create a dropdown menu for the user to select the dataset and the model
# Get Data X_train, X_test, y_train, y_test, df , classes= return_data(dataset) st.dataframe(df.sample(n = 5 , random_state = 1)) st.subheader("Classes") for idx, value in enumerate(classes): st.text('{}: {}'.format(idx , value))
- We use our helper function to get the data
- We use streamlit’s dataframe component to display a sample of our dataset
- We also display the classes using the last variable our helper function returns
We will use seaborn and matplotlib to visualize the PCA in 2-D and 3-D.
streamlit’s pyplot component takes in a figure as the parameter and displays the plot in the UI.
# 2-D PCA df = getPCA(df) fig = plt.figure(figsize=(16,10)) sns.scatterplot( x="pca-1", y="pca-2", hue="Type", palette=sns.color_palette("hls", len(classes)), data=df, legend="full" ) plt.xlabel('PCA One') plt.ylabel('PCA Two') plt.title("2-D PCA Visualization") st.pyplot(fig)
#3-D PCA fig2 = plt.figure(figsize=(16,10)).gca(projection='3d') fig2.scatter( xs=df["pca-1"], ys=df["pca-2"], zs=df["pca-3"], c=df["Type"], ) fig2.set_xlabel('pca-one') fig2.set_ylabel('pca-two') fig2.set_zlabel('pca-three') st.pyplot(fig2.get_figure())
Finally, we will train the model and get the train, test accuracy scores.
# Train Model model = getClassifier(classifier) model.fit(X_train, y_train) test_score = round(model.score(X_test, y_test), 2) train_score = round(model.score(X_train, y_train), 2) st.subheader('Train Score: {}'.format(train_score)) st.subheader('Test Score: {}'.format(test_score))
You have successfully built a project you can showcase on your portfolio 👏 👏 👏