Updated: Mar 23
The novel coronavirus is causing havoc all over the world! I read the news that Italy has been in the middle of this with more than 47,000 infections and 4,000 deaths as of yesterday. The health authorities there have been understanding the data for clues. In one source (Bloomberg), they found that more than 99% of fatalities were people who suffered from previous medical conditions. On analyzing the medical records of about 18% of coronavirus fatalities, just 3 victims, or 0.8% of the total, had no previous pathology. Almost half of the victims suffered from at least three prior illnesses and about a fourth had either one or two previous conditions. More than 75% had high blood pressure, about 35% had diabetes and a third suffered from heart disease. The median age of the infected is 63 but most who died have an average age of 79.5. All of Italy’s victims under 40 have been males with serious existing medical conditions. This data tells me that people of different ages and health have different risks of surviving this deadly virus.
The question I asked myself - considering my age and health, what is my personal strength to fight this virus?
This blog shows how I used research from newsfeed to create a Machine Learning (ML) model that assesses and predicts my personal strength to fight the deadly virus.
I built this application - from data to dashboard - on the Braintoy AI platform.
Step 1 - Data Engineering
Data Engineering starts with loading the data, then defining a dataset, and then generating a cross-validation dataset. The cross-validation dataset created in Step 1 will be used in Step 2 for Machine Learning.
Loading my data
A master data set COVID_19_synthetic_data.csv was uploaded.
Row # A serial number
Age [1 - 99] Age of the patient
Hypertension [0 or 1] 1 if the individual suffered from hypertension, 0 otherwise
Heart_Disease [0 or 1] 1 if the individual suffered from heart disease, 0 otherwise
Diabetes [0 or 1] 1 if the individual suffered from diabetes, 0 otherwise
Lung_Disease [0 or 1] 1 if the individual suffered from lung disease, 0 otherwise
Immune_Strength [0 - 5] If the individual hospitalized for fever and breathing issue
Risk_Score [1 - 10] Strength = 10-Risk_Score, percentile of an individual
The data could be viewed, once loaded.
Fig 1: A view of the tabular data
The next step was to define a dataset.
The ‘Target (Output)’ was selected as “Risk_Score”. The rest of the columns from the data file were selected as a ‘Feature (Input)’.
Fig. 2: Selecting input and output features for model building
Feature Pre-processor: This step allows using pre-built algorithms like 0 to 1 normalization, categorical to numeric, standard scaling, min-max normalization etc. to pre-process the data for ML modeling.
No feature extraction steps were needed since the data is numeric with no missing values.
Fig. 3: The ‘Feature Pre-processing’ step of ‘Define Dataset’
Review and Save: The dataset was named as ‘ds_mar_19’ and saved.
Fig. 4: Naming the dataset by clicking on Define Dataset button
The dataset was split into two parts – training and validation datasets. The standard norm of 80% and 20% was used.
The system randomly selected 80% of the data for training and the remaining 20% for validation. The training set will be used to train the machine learning model and then the system will use the validation set to test the ML model, calculate, and show the performance metrics.
Fig. 5: Generating the Cross-Validation files for the dataset
Step 2 - Machine Learning
This step employs the cross-validation datasets created in Step 1 - Data Engineering to make an ML model.
Since a numerical value of Risk_Score (the target variable) is to be predicted, this is a Regression problem, and hence the ‘Regression’ tab is chosen.
Click on ‘Add Base Model’ to select the desired dataset.
Fig. 6: Selecting the dataset for machine learning
Clicking on ‘Select Dataset’ pops up the window of ‘Select Regressor’.
I chose the Neural Network Regressor to start this modeling experiment. The system already suggested the best parameters for my dataset. I clicked on ‘Select Regressor’ to choose it
Fig. 7: Choosing an appropriate algorithm for building an ML model
Clicking ‘Create New Model Version’ creates an ML model using the Neural Network algorithm
Fig. 8: Machine Learning Model building progress.
A generic model name and version are available. Selecting it shows the rank, error, auto-generated documentation, and a ready to publish button. The right window shows the performance metrics. The model shows a Mean Absolute Error of 0.24 suggesting a mediocre result.
The system comes pre-shipped with many algorithms. I moved on to others and decided to use the ‘Autopilot’ - a feature that builds ML models using various algorithms and ranks them by how they performed on the dataset.
I hit the “Autopilot” button
Fig 9: ‘Autopilot’ creates ML models using various algorithms and ranks them by their performance scores.
Notice that the Decision Tree and the Random Forest Regressor came up as the top algorithms (ranked #1). The Mean Absolute Error values are 0.01 for each. As against the Neural Network, the Decision Tree and the Random Forest Regressors were found to be more suitable algorithms for my dataset.
The best model was ready to be chosen and published. The ML model that used the RandomForestRegressor was chosen. Hitting the publish icon brings up a pop-up window for additional comments and confirmation. The pop-up also provides an opportunity to publish a model for review (described in the next module)
Fig 10: The selected ML model is being published for reviewing
That brings us to the end of Step 2 - Machine Learning. The next step is Step 3 - Model Governance.
Step 3 - Model Governance
Good governance practice means that AI needs to be validated before production use. To avoid unintended consequences, the person who created the model publishes it to a peer or a third party for review.
The system sends an email to the selected reviewer. In doing so, the reviewer gets interactive documentation to validate if the model is indeed solving the problem for which it was created and is achieving the desired performance and results.
This is a topic of its own and is to be covered in a separate blog
Fig. 11: The reviewer can Accept or Reject the ML model
Once the reviewer accepts or rejects a model, the system sends an email back to the modeler confirming the reviewer’s decision. This model was accepted and so led us to Step 4 - Deploy Model.
Step 4 - Deployment
It is now time to deploy the accepted model to production use.
Since a regression model was created, the user selects the Regression tab under ‘Deploy Models’. The accepted model(s) will be available under ‘Select Model & Deploy’.
To deploy the model, the user selects the model and then successively clicks ‘Use this model’, ‘Generate Code’ and ‘Deploy’. These buttons trigger a reconfirmation pop-up window
Fig 12: Deploying the model to production in clicks
Clicking the Deploy button creates a ‘Docker’ process that bundles all the necessary code into a container and creates an API. This API can be called from an application.
My ML model is now live.
It was time to interact with it in Step 5 - Dashboard.
Step 5 - Dashboard
A real-time dashboard is automatically generated for every deployed model.
Fig. 13 Dashboard window from where the user can interact with the deployed
Fig 14: Automatically generated dashboard of the deployed model
This completes the model building and deployment.
The following UI calls the API of the deployed model to show the results
Fig 15: Application User Interface
I had asked myself a question before starting this - considering my age and health, what is my personal strength to fight this virus? And I got my answer!
The application is published on https://www.console.mlos.io/covid19. Now anyone can get the answer to the same question by indicating their age and health conditions to get a prediction of their strength.