How to build a Streamlit UI to Analyze Different Classifiers on the Wine, Iris and Breast Cancer Dataset

Let’s build a web app using Streamlit and sklearn

Screen Capture by Author

In this tutorial, we will be working with three datasets (Iris, Breast Cancer, Wine)

We will use 3 different models (KNN, SVM, Random Forest) for classification and give the user the ability to set some parameters.

Install and Import Necessary Libraries

Setup Virtual Environment

pip install virtualenv  /* Install virtual environment */
virtualenv venv         /* Create a virtual environment */
venv/Scripts/activate   /* Activate the virtual environment */

Install Libraries

Make sure your virtual environment is activated before installing the libraries

pip install streamlit, seaborn, scikit-learn

Import the Libraries

import streamlit as st
from sklearn.datasets import load_wine, load_breast_cancer, load_iris
from  sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

We import streamlit, the datasets from sklearn, various models from sklearn, libraries needed to make our plots and pandas.

Helper Functions

Function to get the dataset

def return_data(dataset):
    if dataset == 'Wine':
        data = load_wine()
    elif dataset == 'Iris':
        data = load_iris()
    else:
        data = load_breast_cancer()
    df = pd.DataFrame(data.data, columns=data.feature_names , index=None)
    df['Type'] = data.target
    X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, random_state=1, test_size=0.2)
    return X_train, X_test, y_train, y_test,df,data.target_names

The function takes in a string which contains the name of the dataset the User selects
It loads the relevant dataset
We create a dataframe which we can show in our UI
We use sklearn’s train_test_split() to create the train and testing sets
The function returns the train set, test set, the dataframe and the target classes

Function to return the model

We will be using streamlit’s slider component to get the input for the parameters from the user.

st.sidebar.slider(label = ‘’ , min_value = 1, max_value = 100) creates a slider in the sidebar.

def getClassifier(classifier):
    if classifier == 'SVM':
        c = st.sidebar.slider(label='Chose value of C' , min_value=0.0001, max_value=10.0)
        model = SVC(C=c)
    elif classifier == 'KNN':
        neighbors = st.sidebar.slider(label='Chose Number of Neighbors',min_value=1,max_value=20)
        model = KNeighborsClassifier(n_neighbors = neighbors)
    else:
        max_depth = st.sidebar.slider('max_depth', 2, 10)
        n_estimators = st.sidebar.slider('n_estimators', 1, 100)
        model = RandomForestClassifier(max_depth = max_depth , n_estimators= n_estimators,random_state= 1)
    return model

Like the previous function, this function takes a parameter which is a string containing the model’s name.
Based on the selected model, we ask the user to give the value of the parameter.
For SVM, we take the C parameter as an input from the user
For KNN, we take the number of nearest neighbours for the model to consider while making its prediction
For Random Forest, we take the number of decision trees and the max_depth of the decision tree
We then create an instance of the model and return the model

Function for PCA

def getPCA(df):
    pca = PCA(n_components=3)
    result = pca.fit_transform(df.loc[:,df.columns != 'Type'])
    df['pca-1'] = result[:, 0]
    df['pca-2'] = result[:, 1]
    df['pca-3'] = result[:, 2]
    return df

We use sklearn’s PCA. We add the 3 components to the dataframe and return it.

Building the UI

# Title
st.title("Classifiers in Action")

# Description
st.text("Chose a Dataset and a Classifier in the sidebar. Input your values and get a prediction")

#sidebar
sideBar = st.sidebar
dataset = sideBar.selectbox('Which Dataset do you want to use?',('Wine' , 'Breast Cancer' , 'Iris'))
classifier = sideBar.selectbox('Which Classifier do you want to use?',('SVM' , 'KNN' , 'Random Forest'))

We use streamlit’s selectbox component to create a dropdown menu for the user to select the dataset and the model

# Get Data
X_train, X_test, y_train, y_test, df , classes= return_data(dataset)
st.dataframe(df.sample(n = 5 , random_state = 1))
st.subheader("Classes")
for idx, value in enumerate(classes):
    st.text('{}: {}'.format(idx , value))

We use our helper function to get the data
We use streamlit’s dataframe component to display a sample of our dataset
We also display the classes using the last variable our helper function returns

We will use seaborn and matplotlib to visualize the PCA in 2-D and 3-D.

streamlit’s pyplot component takes in a figure as the parameter and displays the plot in the UI.

# 2-D PCA
df = getPCA(df)
fig = plt.figure(figsize=(16,10))
sns.scatterplot(
    x="pca-1", y="pca-2",
    hue="Type",
    palette=sns.color_palette("hls", len(classes)),
    data=df,
    legend="full"
)
plt.xlabel('PCA One')
plt.ylabel('PCA Two')
plt.title("2-D PCA Visualization")
st.pyplot(fig)

#3-D PCA
fig2 = plt.figure(figsize=(16,10)).gca(projection='3d')
fig2.scatter(
    xs=df["pca-1"],
    ys=df["pca-2"],
    zs=df["pca-3"],
    c=df["Type"],
)
fig2.set_xlabel('pca-one')
fig2.set_ylabel('pca-two')
fig2.set_zlabel('pca-three')
st.pyplot(fig2.get_figure())

Finally, we will train the model and get the train, test accuracy scores.

# Train Model
model = getClassifier(classifier)
model.fit(X_train, y_train)
test_score = round(model.score(X_test, y_test), 2)
train_score = round(model.score(X_train, y_train), 2)

st.subheader('Train Score: {}'.format(train_score))
st.subheader('Test Score: {}'.format(test_score))

You have successfully built a project you can showcase on your portfolio 👏 👏 👏