#HealthDataStories: Building A Machine Learning Model To Predict Medical Appointment Attendance

Introduction

Adrian Ochanyo Ochieng'
6 min readJun 24, 2022

Over the years there has been a rise in demand for data professionals; with this growing demand there has also been an increase in the scope of work data professionals are required to do.

Most of then have however concentrated on building models ignoring other components of the data science project lifecycle. Hence many data scientists/data professionals are just comfortabale working in the confines of Jupyter Notebooks. This has resulted them in models having no business value since most are not put into production for use by the desired end users. This has also created a dependence on external teams to deploy the models to production which is costly for small companies that do not particularly have a datapipeline

In part two of these series #HealthDataStories,we built a machine learning model and exported it out of Jupyter Notebook; thereafter built a web app that will accept user input and feed it into the machine learning model and then display predictions to the user

We will follow these steps:

  1. Build and fine tune a machine learning model
  2. Save the model
  3. Load the model into different environments for use
  4. Build a web app from the model using streamlit python library

The web app will then take a user’s demographic and health parameters and use it to predict the chances of the person missing their medical appointment and then displaying the result on screen.

Background

The data presented to us was collected by the Brazilian Ministry of health. It profiles the medical appointment attendance of patients in Brazil. We analyzed the data,found insights and we were able to make recommendations based on the analysis.

We then used this data presented to us to predict whether a patient is likely to miss their appointment or not as demonstrated herein. The data has demographic and health parameters of these patients. It is based on these demographic parameters that we will do the predictions. To do this we will follow the four step process that was outlined earlier above.

Understanding Pre-Requisites

We used a second Python IDE since we will be building a Streamlit Web app which cannot be run within Jupyter Notebook. (You can select an IDE from a number that are available and build the model within the notebook then load and run it on a different IDE.)

Building the Model

We used the dataset that we had used for our previous analysis task.

Requisite libraries that we used were imported:

Importing Requisite Libraries

The dataset was loaded and checked for the various attributes:

Loading the dataset and checking the attributes of the dataset

The dataset had 14 attributes with one the ‘no-show’ being the target variable(dependent) the other 13 are independent variables. The 8 attributes that were used to predict the appointment attendance are:

  1. Gender
  2. Age
  3. Scholarship
  4. Hipertension
  5. Diabetes
  6. Alcoholism
  7. Handicap
  8. SMS_received

Data Cleaning

We cleaned the data and prepared the data for building the machine learning model.

Data Cleaning and Preparation

We replaced the gender values and no_show with 0 and 1 values; then lowercased and replaced ‘-’ with ‘_’. Then the columns with the variables that we do not intend to use were dropped.

On further inspection of the target variable, we saw that there is a chance of the data being imbalanced(remember the 80%-20% attendance in the analysis). So when implementing RandomForestClassification we performed RandomOversampling to prevent our model from training on an imbalanced dataset and predicting the majority class 100% of the time.

Obtaining the correlation matrix of the dataset:

Correlation matrix of the dataset

Fitting the model(randomforestclassifier) onto the variables in the dataframe:

Evaluating model performance of the model using accuracy score of the model:

The model has accuracy of 0.6177(translates to 61.77%)

Taking a look at what the model predicted:

Model Predictions

Saving the Model

The model was then saved and we will use the Joblib library

The model has been built,saved and can easily be accessed in different environments, and can be used to make predictions on external data.

Building The Streamlit Web App

Using the model created and saved above we can use Streamlit, a Python library, to build a web app. If the library is not installed a quick pip install Streamlit; installs the library.

We then moved to create a .py file. We named ours ‘hds_streamlit.py’. This file has to be in the same directory with the model as had been saved

model and the .py file in the same directory

We then imported the requisite libraries in the .py file

Importing requisite libraries in the .py file

A header was then created and a script to create input boxes to take input from the users; we have 8 independent variables we need to collect from the user. The following lines of code was run to create input boxes for users to enter data into and divide the page into two columns to make the app have a better UI/UX

Creating header and Input Boxes

We tested if this app works by typing and running this command in the terminal: streamlit run hds_streamlit.py

The output will be as below:

Navigating to http://localhost:8501 where the app resides opens a page like this:

Web app User Interface

This showed that everything works and the app is working.

We proceeded to make predictions where we will collect user input everytime it is entered. Input data was passed into the classifier which will then give us an output prediction.

The user input is in the form of strings(Yes/No) and we encoded it in the form the training data was and run the block of code to transform the user input:

Transforming user-input

We then loaded the model and used it to make predictions from the user input:

Loading the model

We then made the predictions(by pressing the predict button). We edited the code for the predict button we had earlier and changed it as below:

With these changes predictions were made and then text will be displayed and not the 0 and 1 we had encoded earlier.

This builds a simple model that can be used to predict appointment attendance by a user.

The challenge with this is that is hosted locally. The next bit would be to deploy it on either Heroku, AWS or GCP to make it publicly accessible to users all across the internet as a live public-facing website.

(The code to this project can be found on this repo; you can fork it and play around with it)

--

--

Adrian Ochanyo Ochieng'

Pharmacist | Data x Product | Health Tech| Using my data and product knowledge to churn cutting edge insights and digital health products |