{"id":7157,"date":"2020-12-01T15:36:21","date_gmt":"2020-12-01T10:06:21","guid":{"rendered":"https:\/\/www.h2kinfosys.com\/blog\/?p=7157"},"modified":"2025-12-10T03:43:24","modified_gmt":"2025-12-10T08:43:24","slug":"linear-classifier-with-tensorflow-keras","status":"publish","type":"post","link":"https:\/\/www.h2kinfosys.com\/blog\/linear-classifier-with-tensorflow-keras\/","title":{"rendered":"Building a Linear Classifier with Tensorflow Keras"},"content":{"rendered":"\n<p>Supervised machine learning problems can be broadly divided into 2: Regression problems and classification problems.&nbsp;<\/p>\n\n\n\n<p>In the previous tutorials, we have examined how to build a linear regression model with Tensorflow and Keras. In this tutorial, we shall be turning our attention to classification problems. Classification problems take a large chunk of <a href=\"https:\/\/www.h2kinfosys.com\/blog\/what-is-machine-learning-how-does-it-work\/\" class=\"rank-math-link\">machine learning<\/a> problems. Thus, It is critically important to understand how to build classifiers using machine learning algorithms or deep learning techniques.&nbsp;<\/p>\n\n\n\n<p>Later in this tutorial, we will build a linear classifier using Tensorflow Keras. We&#8217;d begin by brushing up on all the theoretical concepts of linear classifiers before going ahead to build one.<\/p>\n\n\n\n<p>By the end of the tutorial, you&#8217;d discover:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What is a Linear Classifier?\u00a0<\/li>\n\n\n\n<li>Types of Classification Problems\u00a0<\/li>\n\n\n\n<li>The Workings of a Binary Classifier<\/li>\n\n\n\n<li>How the Performance of a Binary Classifier is Measured\u00a0<\/li>\n\n\n\n<li>Exploratory Data Analysis (EDA)<\/li>\n\n\n\n<li>Checking for imbalanced dataset<\/li>\n\n\n\n<li>Checking for Correlation\u00a0<\/li>\n\n\n\n<li>Data Preprocessing<\/li>\n\n\n\n<li>Building a Single Layer Perceptron for binary\u00a0 classification<\/li>\n\n\n\n<li>Building a Multilayer Perceptronfor binary classification<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What is a Linear Classifier?<\/strong><\/h2>\n\n\n\n<p>To answer this, we need to first understand what a classifier is. A classifier is a model that predicts the class of an object, given its properties. For instance, if my model determines whether an object is a cat or a dog, that is a classifier. In classification problems, the labels, which are called classes, are discrete rather than continuous numbers in regression problems.&nbsp;<\/p>\n\n\n\n<p>Basically, a classifier splits the observations into its class. But while this splitting can be done using a straight hyperplane, some datasets may contain class boundaries that cannot be split by a straight hyperplane. A model in which the classification cannot be done with a hyperplane is called a nonlinear classifier. A linear classifier on the other hand is a model that can capture the class boundaries using straight lines or hyperplanes.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Types of Classification Problems&nbsp;<\/strong><\/h2>\n\n\n\n<p>Classification problems can also be divided into three based on the label classes \u2013 binary classification, multiclass classification, and multilabel classification<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Binary classification problem: This is a classification problem where the label contains only two classes. For instance, a model that predicts whether an individual has COVID-19 or not. Or a model that determines whether mail is spam or not.\u00a0<\/li>\n\n\n\n<li>Multiclass classification problem: In this type of classification problem, the label contains more than 2 classes. For example, the popular iris dataset contains 3 classes (iris setosa, iris virginica, and iris Versicolor). Such a classification problem is called a multiclass classification problem.\u00a0<\/li>\n\n\n\n<li>Multilabel classification problems: For this kind of classification problem, each label class would have more than a class. In photo recognition problems, there may be more than one object in the picture, maybe a dog, and a house. Therefore, the model would predict more than one class for this photo. This is a typical multilabel classification problem.\u00a0<\/li>\n<\/ul>\n\n\n\n<p>In this tutorial, we shall build a binary classifier. Let\u2019s understand how it works&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>The Workings of a Binary Classifier<\/strong><\/h2>\n\n\n\n<p>In supervised learning, the dataset comprises independent variables (called features) and dependent features (called labels). For linear regression problems which we treated in the last tutorial, the variables are continuous numbers (any real number). In such cases, the model attempts to predict the exact number and checks its success by determining how close it is to the correct number. Metrics such as root mean square error, r-squared error, or mean absolute error are common metrics to check how well the model has performed.&nbsp;<\/p>\n\n\n\n<p>For binary classification problems, the labels are two discrete numbers, 1(yes) or 0 (no). The classifier predicts the probability of the occurrence of each class. It then returns the class with the highest probability. Logistic regression is typically used to compute the probability of each class in a binary classification problem. But how does the logistic regression work?&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>A Logistic Function<\/strong><\/h3>\n\n\n\n<p>Logistic regression makes use of the logistic function (called a sigmoid function) to return a class output. The function is an S-curve that receives any continuous number and maps it within the range of 0 to 1 (although not exactly equal to 0 or 1).&nbsp;<\/p>\n\n\n\n<p>The function is given as<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"235\" height=\"93\" src=\"https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/12\/image.png\" alt=\"A Logistic Function\" class=\"wp-image-7161\" title=\"\"><\/figure>\n<\/div>\n\n\n<p>Where x is the continuous number that would be transformed into a number between 0 and 1. The graph is typically in this form<\/p>\n\n\n\n<p>Where x is the continuous number that would be transformed into a number between 0 and 1. The graph is typically in this form&nbsp;<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/VExdCyROF0QsAaoUFe_x_ATAm5uZTchSzpOljUUahOnA6CCi0o1J6zVzVso3UJImDLFhiyAKj2M9lp7nQQR5eDm9LQTfTlPTyCYbCJNFKYX04DR9u7q1RdwvJE41Q2WQ2sEK_Cw\" alt=\"A Logistic Function\" title=\"\"><\/figure>\n<\/div>\n\n\n<p>Source: <a href=\"https:\/\/towardsdatascience.com\/logistic-regression-explained-9ee73cede081\" rel=\"nofollow noopener\" target=\"_blank\">TowardsDataScience<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>The Logistic Regression Equation<\/strong><\/h3>\n\n\n\n<p>In logistic regression, the independent variables (x) alongside some assigned weights (b) are used to predict the binary outputs (y). The linear regression equation is given below.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"198\" height=\"87\" src=\"https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/12\/image-2.png\" alt=\" Logistic Regression Equation\" class=\"wp-image-7163\" title=\"\"><\/figure>\n<\/div>\n\n\n<p>Where b0 is the bias, b1 is the weight of each independent variable (x). The weight shows how the independent variables (x) are correlated with the dependent variable (y). Hence, a positive correlation causes an increase in the probability of the positive while a negative correlation does otherwise.&nbsp;<\/p>\n\n\n\n<p>Once the output of the equation is found, the logistic function converts them into probabilities.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How the Performance of a Binary Classifier is Measured&nbsp;<\/strong><\/h2>\n\n\n\n<p>You\u2019d need to check how well your binary classifier is performing having built it. There are a couple of metrics to check your classifier performance. Let\u2019s talk about the most popular ones.&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Accuracy<\/li>\n<\/ul>\n\n\n\n<p>Accuracy is perhaps the simplest and commonest metrics. It is simply the ratio of the total number of correct predictions to the total number of predictions made. For example, in a dataset of 1000 samples with labels indicating whether a mail is a spam or ham. If the model makes 850 correct predictions, the accuracy is simply 850 \/ 1000 which is 85%.<\/p>\n\n\n\n<p>Making use of accuracy has some shortcomings, however. In an unbalanced dataset using accuracy can be misleading. An unbalanced dataset is one where the occurrence of one class far outweighs the occurrence of the other. In the earlier instance I gave, imagine the labels have 850 labels that belong to ham and 150 that belong to spam. This is an example of an imbalanced dataset since 850 outweighs 150 by a large margin.<\/p>\n\n\n\n<p>If we build a dummy model that predicts that all observations are a ham without checking the independent features. The model would still have an accuracy of 85%. This is intrinsically not a good<\/p>\n\n\n\n<p>metric to use for such data. Furthermore, accuracy does not take into account the probability of each class. This may also be a bottleneck when you wish to finetune the outputs of the model for better results.&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Confusion Matrix\u00a0<\/li>\n<\/ul>\n\n\n\n<p>The confusion matrix is another popular metric for binary classification problems. The matrix is divided into four quadrants: true positive (TP), false positive (FP), true negative (TN), false negative (FN). Let\u2019s see what the confusion matrix looks like then explain what these terms mean.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/pAWGkIBjbad5a8NIWmGufPSqO-ErKX3xpTeqIfCCojO3l_ja6K0-WuQV8RXa59sfdHkbKVXQkyyG9uocUnC1KCg3NkzWsBkcaSuAclDJaK9ZAdh9bF0PxVVr_9kCnr7r4UxNlwg\" alt=\"Linear Classifier \" title=\"\"><\/figure>\n\n\n\n<p>Source: <a href=\"https:\/\/subscription.packtpub.com\/book\/big_data_and_business_intelligence\/9781838555078\/6\/ch06lvl1sec34\/confusion-matrix\" rel=\"nofollow noopener\" target=\"_blank\">Packt<\/a><\/p>\n\n\n\n<p>As seen above, the row indicates the actual value and the column indicates the predicted values. The 4 quadrants are the interception of the actual and predicted values.&nbsp;<\/p>\n\n\n\n<p>TN: This is when the model predicts that observation is NOT a class and is actually correct. Say, the model predicts that observation is not spam, and is correctly so.&nbsp;<\/p>\n\n\n\n<p>FN: This is when the model predicts that observation is NOT a class but is incorrect. Say, the model predicts that observation is not spam, but it is actually spam.&nbsp;<\/p>\n\n\n\n<p>FP: This is when the model predicts that an observation is a class but is incorrect. Say, the model predicts that observation is spam, but it is actually not spam.&nbsp;<\/p>\n\n\n\n<p>TP: This is when the model predicts that an observation is a class and it actually is correct. Say, the model predicts that observation is spam and is correctly so.&nbsp;<\/p>\n\n\n\n<p>The confusion matrix adds a lot of flexibility to the performance measure and birth concepts such as precision and recall.&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Precision: The precision indicates how accurate the positive class is. Mathematically, the precision is given by<\/li>\n<\/ul>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"258\" height=\"63\" src=\"https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/12\/image-3.png\" alt=\"\" class=\"wp-image-7164\" title=\"\"><\/figure>\n<\/div>\n\n\n<p>If the model correctly predicts all positive classes, the precision will be equal to 1. This is not a very good metric, especially for an unbalanced dataset with a high positive class, since the precision neglects the negative classes. It can however be useful when we place grave importance on the positive class. An example would be the ham(P)\/spam(N) classifier. In this problem, the major concern is to correctly predict that spam is spam (TP) while not predicting a ham as spam. Predicting a ham when it is actually spam (FN) may not have serious consequences but predicting a ham as spam (FP) can be very expensive. You have a high precision when the TP is high and the FP is low.<\/p>\n\n\n\n<p>Precision is typically combined with another metric called the recall.&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Recall: Recall is basically concerned about the rate of positive classes that was predicted corrected. It is also called the sensitivity of true positive rate. Mathematically, the recall is given by<\/li>\n<\/ul>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"223\" height=\"78\" src=\"https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/12\/image-4.png\" alt=\"\" class=\"wp-image-7165\" title=\"\" srcset=\"https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/12\/image-4.png 223w, https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/12\/image-4-220x78.png 220w\" sizes=\"(max-width: 223px) 100vw, 223px\" \/><\/figure>\n<\/div>\n\n\n<p>The recall is particularly important in situations where the true positive is critically important such as a cancer classifier. In such a situation, predicting that a patient has cancer when he does not have cancer may not be severe. However, predicting that a patient does not have cancer when they actually do (FN), can be very grave. In situations such as this, it is critically important to check the recall of your model. This is because to get a high recall, the FN must be, low.&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>F1 score<\/li>\n<\/ul>\n\n\n\n<p>F1 score takes both the recall and precision into cognizance. Mathematically, the f1 score is given by<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"334\" height=\"71\" src=\"https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/12\/image-5.png\" alt=\"\" class=\"wp-image-7166\" title=\"\" srcset=\"https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/12\/image-5.png 334w, https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/12\/image-5-300x64.png 300w\" sizes=\"(max-width: 334px) 100vw, 334px\" \/><\/figure>\n<\/div>\n\n\n<p>If you are looking for a balance between recall and precision, you should go for the F1 score.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Building a Binary Classifier with Keras on Tensorflow<\/strong><\/h2>\n\n\n\n<p>IN this section, we will go-ahead to build a binary classifier using Keras. The dataset used is the dataset from the National Institute of Diabetes and Digestive and Kidney Diseases. The dataset shows whether or not a patient has diabetes based on some diagnostic measurements such as age, glucose level, blood pressure, insulin level, and so on. We will begin by importing the data using the pandas read_csv() method. We will then check out how the data frame looks like by printing the first five rows.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#import necessary libraries<\/em>\n<strong>import<\/strong> <strong>pandas<\/strong> <strong>as<\/strong> <strong>pd<\/strong>\n<strong>import<\/strong> <strong>matplotlib.pyplot<\/strong> <strong>as<\/strong> <strong>plt<\/strong>\n<strong>import<\/strong> <strong>seaborn<\/strong> <strong>as<\/strong> <strong>sns<\/strong>\n<strong>from<\/strong> <strong>sklearn.model_selection<\/strong> <strong>import<\/strong> train_test_split\n<strong>import<\/strong> <strong>tensorflow<\/strong> <strong>as<\/strong> <strong>tf<\/strong>\n<strong>from<\/strong> <strong>tensorflow.keras.models<\/strong> <strong>import<\/strong> Sequential\n<strong>from<\/strong> <strong>tensorflow.keras.layers<\/strong> <strong>import<\/strong> Dense, Dropout\n\n<em>#read the dataset file<\/em>\ndf = pd.read_csv('diabetes.csv')\n<em>#print the first five rows of the dataframe<\/em>\n<strong>print<\/strong>(df.head())<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">&nbsp;&nbsp;&nbsp;Pregnancies&nbsp; Glucose&nbsp; BloodPressure&nbsp; SkinThickness&nbsp; Insulin &nbsp; BMI&nbsp; \\\n0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 6&nbsp; &nbsp; &nbsp; 148 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 72 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 35&nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; 33.6&nbsp;&nbsp;&nbsp;\n1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1 &nbsp; &nbsp; &nbsp; 85 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 66 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 29&nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; 26.6&nbsp;&nbsp;&nbsp;\n2&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 8&nbsp; &nbsp; &nbsp; 183 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 64&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; 23.3&nbsp;&nbsp;&nbsp;\n3&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1 &nbsp; &nbsp; &nbsp; 89 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 66 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 23 &nbsp; &nbsp; &nbsp; 94&nbsp; 28.1&nbsp;&nbsp;&nbsp;\n4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; 137 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 40 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 35&nbsp; &nbsp; &nbsp; 168&nbsp; 43.1&nbsp;&nbsp;&nbsp;\n\n&nbsp;&nbsp;&nbsp;DiabetesPedigreeFunction&nbsp; Age&nbsp; Outcome&nbsp;&nbsp;\n0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.627 &nbsp; 50&nbsp; &nbsp; &nbsp; &nbsp; 1&nbsp;&nbsp;\n1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.351 &nbsp; 31&nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp;&nbsp;\n2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.672 &nbsp; 32&nbsp; &nbsp; &nbsp; &nbsp; 1&nbsp;&nbsp;\n3 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.167 &nbsp; 21&nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp;&nbsp;\n4 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 2.288 &nbsp; 33&nbsp; &nbsp; &nbsp; &nbsp; 1\n<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Exploratory Data Analysis (EDA)<\/strong><\/h2>\n\n\n\n<p>It is important to know the numbers of rows in your dataset. That way, you\u2019d be able to determine how large or small your data is.&nbsp;<\/p>\n\n\n\n<p><code>df.shape<\/code><\/p>\n\n\n\n<p>Output:<\/p>\n\n\n\n<p><code>(768, 9)<\/code><\/p>\n\n\n\n<p>So as shown above, the dataset has 768 rows with 9 columns (including the target column). This is thus a fairly small dataset. Let\u2019s get some more information about each column.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#check the datatype of each column<\/em>\ndf.info()<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">&lt;<strong>class<\/strong> '<strong>pandas<\/strong>.core.frame.DataFrame'&gt;\nRangeIndex: 768 entries, 0 to 767\nData columns (total 9 columns):\nPregnancies &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 768 non-null int64\nGlucose &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 768 non-null int64\nBloodPressure &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 768 non-null int64\nSkinThickness &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 768 non-null int64\nInsulin &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 768 non-null int64\nBMI &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 768 non-null float64\nDiabetesPedigreeFunction&nbsp; &nbsp; 768 non-null float64\nAge &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 768 non-null int64\nOutcome &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 768 non-null int64\ndtypes: float64(2), int64(7)\nmemory usage: 54.1 KB<\/pre>\n\n\n\n<p>All columns are of int data type except for BMI and Diabetes Pedigree Function that are of data type float64. The info() method also revealed that all columns contain non-null values. We can however confirm this by using the isnull() method.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#check for null values&nbsp;<\/em>\ndf.isnull().sum()\n<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Pregnancies &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0\nGlucose &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0\nBloodPressure &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0\nSkinThickness &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0\nInsulin &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0\nBMI &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0\nDiabetesPedigreeFunction&nbsp; &nbsp; 0\nAge &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0\nOutcome &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0\ndtype: int64\n<\/pre>\n\n\n\n<p>Next, we would use the describe method to get the statistical details for each column.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#print the statistical summary for each column<\/em>\ndf.describe()<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Pregnancies &nbsp; &nbsp; Glucose&nbsp; BloodPressure&nbsp; SkinThickness &nbsp; &nbsp; Insulin&nbsp; \\\ncount &nbsp; 768.000000&nbsp; 768.000000 &nbsp; &nbsp; 768.000000 &nbsp; &nbsp; 768.000000&nbsp; 768.000000&nbsp;&nbsp;&nbsp;\nmean&nbsp; &nbsp; &nbsp; 3.845052&nbsp; 120.894531&nbsp; &nbsp; &nbsp; 69.105469&nbsp; &nbsp; &nbsp; 20.536458 &nbsp; 79.799479&nbsp;&nbsp;&nbsp;\nstd &nbsp; &nbsp; &nbsp; 3.369578 &nbsp; 31.972618&nbsp; &nbsp; &nbsp; 19.355807&nbsp; &nbsp; &nbsp; 15.952218&nbsp; 115.244002&nbsp;&nbsp;&nbsp;\nmin &nbsp; &nbsp; &nbsp; 0.000000&nbsp; &nbsp; 0.000000 &nbsp; &nbsp; &nbsp; 0.000000 &nbsp; &nbsp; &nbsp; 0.000000&nbsp; &nbsp; 0.000000&nbsp;&nbsp;&nbsp;\n25% &nbsp; &nbsp; &nbsp; 1.000000 &nbsp; 99.000000&nbsp; &nbsp; &nbsp; 62.000000 &nbsp; &nbsp; &nbsp; 0.000000&nbsp; &nbsp; 0.000000&nbsp;&nbsp;&nbsp;\n50% &nbsp; &nbsp; &nbsp; 3.000000&nbsp; 117.000000&nbsp; &nbsp; &nbsp; 72.000000&nbsp; &nbsp; &nbsp; 23.000000 &nbsp; 30.500000&nbsp;&nbsp;&nbsp;\n75% &nbsp; &nbsp; &nbsp; 6.000000&nbsp; 140.250000&nbsp; &nbsp; &nbsp; 80.000000&nbsp; &nbsp; &nbsp; 32.000000&nbsp; 127.250000&nbsp;&nbsp;&nbsp;\nmax&nbsp; &nbsp; &nbsp; 17.000000&nbsp; 199.000000 &nbsp; &nbsp; 122.000000&nbsp; &nbsp; &nbsp; 99.000000&nbsp; 846.000000&nbsp;&nbsp;&nbsp;\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;BMI&nbsp; DiabetesPedigreeFunction &nbsp; &nbsp; &nbsp; &nbsp; Age &nbsp; &nbsp; Outcome&nbsp;&nbsp;\ncount&nbsp; 768.000000&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 768.000000&nbsp; 768.000000&nbsp; 768.000000&nbsp;&nbsp;\nmean&nbsp; &nbsp; 31.992578&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.471876 &nbsp; 33.240885&nbsp; &nbsp; 0.348958&nbsp;&nbsp;\nstd&nbsp; &nbsp; &nbsp; 7.884160&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.331329 &nbsp; 11.760232&nbsp; &nbsp; 0.476951&nbsp;&nbsp;\nmin&nbsp; &nbsp; &nbsp; 0.000000&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.078000 &nbsp; 21.000000&nbsp; &nbsp; 0.000000&nbsp;&nbsp;\n25% &nbsp; &nbsp; 27.300000&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.243750 &nbsp; 24.000000&nbsp; &nbsp; 0.000000&nbsp;&nbsp;\n50% &nbsp; &nbsp; 32.000000&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.372500 &nbsp; 29.000000&nbsp; &nbsp; 0.000000&nbsp;&nbsp;\n75% &nbsp; &nbsp; 36.600000&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.626250 &nbsp; 41.000000&nbsp; &nbsp; 1.000000&nbsp;&nbsp;\nmax &nbsp; &nbsp; 67.100000&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 2.420000 &nbsp; 81.000000&nbsp; &nbsp; 1.000000&nbsp;<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Checking if the data is imbalanced<\/strong><\/h2>\n\n\n\n<p>When dealing with binary classification problems, this is an important step. If an imbalanced dataset is fed into a machine learning model, the model tends to perform lowers. Let\u2019s see whether our data is imbalanced. We use seaborn\u2019s countplot() method. This method counts the occurrence of each class in a column and plots a simple bar graph.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#plot a bar plot that shows the number of each class labels<\/em>\nsns.countplot(df['Outcome'])<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/wEj0VGHCOr74hxkqIM8DvR9CUj7VE5uEbkknaSV_bmoxP7HUSFT7HtiwVUE-JX8VK2P1OjWfC0HapNjD_ON2XT6NmKOzwRY96oDJQXLTDkc1fYgYwJr_Kl1xk9O2SFFrZvrZh9A\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>You get insight by plotting a bivariate graph for the columns using the Pairplot method of seaborn.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#make a plot of the columns against one another<\/em>\nsns.pairplot(df, hue='Outcome')<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/Pcl-kVvmFLkvZwPG1v68rBg9Bi1ZyD3nCc5jusO8Cji8JGAGB19ju4TCiw7GHfuLP-xsnWv0stxCQ4y1cJHC3jaUEioPdPBILS8Ig7S-fv-mVmpLsrMRvNOm5pqiiK5hGSxlnbE\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>From the plot, you\u2019d notice that classes are separable by using a hyperplane (or line) to split the data<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Checking for Correlation&nbsp;<\/strong><\/h2>\n\n\n\n<p>&nbsp;As a way of exploring your data, you should check for data correlation. This gives you insight as to what attribute after the other and the way they are connected. Harmed with this knowledge, you could drop or merge columns to reduce the dimensionality of the data. A strong correlation can also give you an idea of what to replace missing values with. Let\u2019s check if some columns are strongly correlated. We use the corr() method and then draw a heat map.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#check the correlation of each column<\/em>\nplt.figure(figsize=(10, 6))\nsns.heatmap(df.corr(), cmap='Blues', annot=True)<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/q0jckR6thCNM2_ykzSh71yMnFntt0Ij-pyu9UsaJNnmfa93ZuvExxfwMdeHxCYmue2NJCyHwT8Fn1mVXoT9Hrz-CM7xLsqRrlX21rOSFywP9e5it4UA6T6yr8wm7CS57yDcoWVY\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Split the Data into Train and Test Data<\/strong><\/h2>\n\n\n\n<p>As we gear up to build the deep learning model, we need to split the data into train and test data. Before that, the data needs to be split into its independent variable(X) and dependent variable (y).&nbsp; The independent variables are simply the data without the label column. Hence, we drop that column. The dependent variable on the other hand is the label column alone.&nbsp;<\/p>\n\n\n\n<p>Afterward, the X and y data and split into train and test datasets. Given that the test data is 20% of the entire data. If you do not understand why the data is split into train and test. See the test data has a portion of the data that is hidden from the model during training. It is then used to measure how well the model will predict unseen data.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#split the data into features and labels<\/em>\nX = df.drop(['Outcome'], axis=1)\ny = df['Outcome']\n\n<em>#further split the labels and features into train and test data<\/em>\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Checking for Outliers<\/strong><\/h2>\n\n\n\n<p>As explained in the previous tutorial, outliers can affect how the models perform. Thus, it is pivotal to check for them and deal with them. Here, we will graph the boxplot for each column to detect the presence of outliers.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#create a copy of the dataset<\/em>\ndf1 = df.copy()\n<em># # Create a figure with 10 subplots with a width spacing of 1.5&nbsp;&nbsp;&nbsp;&nbsp;<\/em>\nfig, ax = plt.subplots(2,3)\nfig.subplots_adjust(wspace=1.5)\n\n<em># Create a boxplot for the continuous features&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/em>\nbox_plot1 = sns.boxplot(y=np.log(df1[df1.columns[0]]), ax=ax[0][0])\nbox_plot2 = sns.boxplot(y=np.log(df1[df1.columns[1]]), ax=ax[0][1])\nbox_plot3 = sns.boxplot(y=np.log(df1[df1.columns[2]]), ax=ax[0][2])\nbox_plot6 = sns.boxplot(y=np.log(df1[df1.columns[5]]), ax=ax[1][0])\nbox_plot7 = sns.boxplot(y=np.log(df1[df1.columns[6]]), ax=ax[1][1])\nbox_plot8 = sns.boxplot(y=np.log(df1[df1.columns[7]]), ax=ax[1][2])\n;\n<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/EBw2VDnqpVOL3QT-I1fs6Cni9ECYuxa7xVED6V-6K2iO6-3ye_VOBD4rtaAlSkziKs86a6OLJLRKwyoU6LiTKfDQvXZWIXSct1HaHHVfFIwgvCUq0g73ilCFf4VZLlH-uj8tRrU\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>You\u2019d notice that Glucose, BloodPressure, and BMI have sprinkles of outliers. The effect of these data points can be softened by standardizing the data.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Building the Neural Network (a Single Layer Perceptron)<\/strong><\/h2>\n\n\n\n<p>We will begin by building a simple neural network \u2013 one fully connected hidden layer with the same number of nodes, as the independent variables (8). This is a neutral way to begin building neural networks. The ReLu activation function is used for the input layer. The outer layer is a single node that spits out the probability of the output class. Hence, a sigmoid activation function is used. This probability can be easily converted into class values.&nbsp;<\/p>\n\n\n\n<p>Furthermore, the common binary_crossentropy is used as the loss function. This the preferable loss function for classification problems. Adam optimizer is used for the gradient descent optimization. Finally, accuracy, precision and recall are set as the model\u2019s metrics.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>def<\/strong> create_model():\n&nbsp;&nbsp;&nbsp;&nbsp;<em>'''The function creates a Perceptron using Keras'''<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;model = Sequential()\n&nbsp;&nbsp;&nbsp;&nbsp;model.add(Dense(8, input_dim=len(X.columns), activation='relu'))\n&nbsp;&nbsp;&nbsp;&nbsp;model.add(Dense(1, activation='sigmoid'))\n&nbsp;&nbsp;&nbsp;&nbsp;\n&nbsp;&nbsp;&nbsp;&nbsp;<strong>return<\/strong> model\nestimator = create_model()\nestimator.compile(optimizer='adam', metrics=['accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()], loss='binary_crossentropy')\n<\/pre>\n\n\n\n<p>The model is then trained on the train dataset, specifying the test datasets as the validation data.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#train the model<\/em>\nhistory = estimator.fit(X_train, y_train, epochs=300, validation_data=(X_test, y_test))<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Train on 614 samples, validate on 154 samples\nEpoch 1\/300\n614\/614 [==============================] - 3s 5ms\/sample - loss: 0.6690 - acc: 0.6531 - precision_14: 0.0000e+00 - recall_14: 0.0000e+00 - val_loss: 0.6787 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00\nEpoch 2\/300\n614\/614 [==============================] - 0s 168us\/sample - loss: 0.6657 - acc: 0.6531 - precision_14: 0.0000e+00 - recall_14: 0.0000e+00 - val_loss: 0.6754 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00\nEpoch 3\/300\n614\/614 [==============================] - 0s 173us\/sample - loss: 0.6622 - acc: 0.6531 - precision_14: 0.0000e+00 - recall_14: 0.0000e+00 - val_loss: 0.6729 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00\nEpoch 4\/300\n614\/614 [==============================] - 0s 163us\/sample - loss: 0.6600 - acc: 0.6531 - precision_14: 0.0000e+00 - recall_14: 0.0000e+00 - val_loss: 0.6713 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00\nEpoch 5\/300\n614\/614 [==============================] - 0s 166us\/sample - loss: 0.6572 - acc: 0.6531 - precision_14: 0.0000e+00 - recall_14: 0.0000e+00 - val_loss: 0.6689 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00\nEpoch 6\/300\n614\/614 [==============================] - 0s 164us\/sample - loss: 0.6547 - acc: 0.6531 - precision_14: 0.0000e+00 - recall_14: 0.0000e+00 - val_loss: 0.6663 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00\nEpoch 7\/300\n614\/614 [==============================] - 0s 229us\/sample - loss: 0.6524 - acc: 0.6531 - precision_14: 0.0000e+00 - recall_14: 0.0000e+00 - val_loss: 0.6645 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00\nEpoch 8\/300\n614\/614 [==============================] - 0s 214us\/sample - loss: 0.6504 - acc: 0.6547 - precision_14: 1.0000 - recall_14: 0.0047 - val_loss: 0.6626 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00\nEpoch 9\/300\n614\/614 [==============================] - 0s 161us\/sample - loss: 0.6483 - acc: 0.6547 - precision_14: 1.0000 - recall_14: 0.0047 - val_loss: 0.6607 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00\nEpoch 10\/300\n614\/614 [==============================] - 0s 171us\/sample - loss: 0.6463 - acc: 0.6547 - precision_14: 1.0000 - recall_14: 0.0047 - val_loss: 0.6594 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00\nEpoch 11\/300\n614\/614 [==============================] - 0s 168us\/sample - loss: 0.6445 - acc: 0.6547 - precision_14: 1.0000 - recall_14: 0.0047 - val_loss: 0.6575 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00\nEpoch 12\/300\n614\/614 [==============================] - 0s 163us\/sample - loss: 0.6422 - acc: 0.6547 - precision_14: 1.0000 - recall_14: 0.0047 - val_loss: 0.6547 - val_acc: 0.6429 - val_precision_14: 0.0000e+00 - val_recall_14: 0.0000e+00\nEpoch 13\/300\n614\/614 [==============================] - 0s 179us\/sample - loss: 0.6407 - acc: 0.6547 - precision_14: 1.0000 - recall_14: 0.0047 - val_loss: 0.6524 - val_acc: 0.6494 - val_precision_14: 1.0000 - val_recall_14: 0.0182\nEpoch 14\/300\n614\/614 [==============================] - 0s 197us\/sample - loss: 0.6382 - acc: 0.6547 - precision_14: 1.0000 - recall_14: 0.0047 - val_loss: 0.6510 - val_acc: 0.6494 - val_precision_14: 1.0000 - val_recall_14: 0.0182\nEpoch 15\/300\n614\/614 [==============================] - 0s 163us\/sample - loss: 0.6362 - acc: 0.6564 - precision_14: 1.0000 - recall_14: 0.0094 - val_loss: 0.6487 - val_acc: 0.6494 - val_precision_14: 1.0000 - val_recall_14: 0.0182\nEpoch 16\/300\n614\/614 [==============================] - 0s 200us\/sample - loss: 0.6340 - acc: 0.6564 - precision_14: 1.0000 - recall_14: 0.0094 - val_loss: 0.6464 - val_acc: 0.6429 - val_precision_14: 0.5000 - val_recall_14: 0.0182\nEpoch 17\/300\n614\/614 [==============================] - 0s 184us\/sample - loss: 0.6319 - acc: 0.6547 - precision_14: 0.6667 - recall_14: 0.0094 - val_loss: 0.6443 - val_acc: 0.6429 - val_precision_14: 0.5000 - val_recall_14: 0.0182\nEpoch 18\/300\n614\/614 [==============================] - 0s 168us\/sample - loss: 0.6301 - acc: 0.6547 - precision_14: 0.6667 - recall_14: 0.0094 - val_loss: 0.6426 - val_acc: 0.6494 - val_precision_14: 0.6667 - val_recall_14: 0.0364\nEpoch 19\/300\n614\/614 [==============================] - 0s 171us\/sample - loss: 0.6278 - acc: 0.6547 - precision_14: 0.6667 - recall_14: 0.0094 - val_loss: 0.6410 - val_acc: 0.6429 - val_precision_14: 0.5000 - val_recall_14: 0.0364\nEpoch 20\/300\n614\/614 [==============================] - 0s 176us\/sample - loss: 0.6257 - acc: 0.6547 - precision_14: 0.6667 - recall_14: 0.0094 - val_loss: 0.6390 - val_acc: 0.6429 - val_precision_14: 0.5000 - val_recall_14: 0.0364\n\u2026\nEpoch 281\/300\n614\/614 [==============================] - 0s 211us\/sample - loss: 0.4527 - acc: 0.7785 - precision_14: 0.7251 - recall_14: 0.5822 - val_loss: 0.4638 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364\nEpoch 282\/300\n614\/614 [==============================] - 0s 166us\/sample - loss: 0.4520 - acc: 0.7818 - precision_14: 0.7365 - recall_14: 0.5775 - val_loss: 0.4647 - val_acc: 0.8052 - val_precision_14: 0.7907 - val_recall_14: 0.6182\nEpoch 283\/300\n614\/614 [==============================] - 0s 169us\/sample - loss: 0.4525 - acc: 0.7850 - precision_14: 0.7580 - recall_14: 0.5587 - val_loss: 0.4653 - val_acc: 0.7922 - val_precision_14: 0.7805 - val_recall_14: 0.5818\nEpoch 284\/300\n614\/614 [==============================] - 0s 197us\/sample - loss: 0.4515 - acc: 0.7818 - precision_14: 0.7365 - recall_14: 0.5775 - val_loss: 0.4637 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364\nEpoch 285\/300\n614\/614 [==============================] - 0s 176us\/sample - loss: 0.4526 - acc: 0.7818 - precision_14: 0.7158 - recall_14: 0.6150 - val_loss: 0.4633 - val_acc: 0.7987 - val_precision_14: 0.7609 - val_recall_14: 0.6364\nEpoch 286\/300\n614\/614 [==============================] - 0s 210us\/sample - loss: 0.4524 - acc: 0.7785 - precision_14: 0.7278 - recall_14: 0.5775 - val_loss: 0.4641 - val_acc: 0.8052 - val_precision_14: 0.7907 - val_recall_14: 0.6182\nEpoch 287\/300\n614\/614 [==============================] - 0s 176us\/sample - loss: 0.4515 - acc: 0.7834 - precision_14: 0.7381 - recall_14: 0.5822 - val_loss: 0.4634 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364\nEpoch 288\/300\n614\/614 [==============================] - 0s 176us\/sample - loss: 0.4516 - acc: 0.7818 - precision_14: 0.7365 - recall_14: 0.5775 - val_loss: 0.4635 - val_acc: 0.7987 - val_precision_14: 0.7727 - val_recall_14: 0.6182\nEpoch 289\/300\n614\/614 [==============================] - 0s 178us\/sample - loss: 0.4519 - acc: 0.7818 - precision_14: 0.7453 - recall_14: 0.5634 - val_loss: 0.4651 - val_acc: 0.7987 - val_precision_14: 0.7857 - val_recall_14: 0.6000\nEpoch 290\/300\n614\/614 [==============================] - 0s 180us\/sample - loss: 0.4509 - acc: 0.7834 - precision_14: 0.7381 - recall_14: 0.5822 - val_loss: 0.4630 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364\nEpoch 291\/300\n614\/614 [==============================] - 0s 197us\/sample - loss: 0.4514 - acc: 0.7866 - precision_14: 0.7330 - recall_14: 0.6056 - val_loss: 0.4631 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364\nEpoch 292\/300\n614\/614 [==============================] - 0s 124us\/sample - loss: 0.4522 - acc: 0.7818 - precision_14: 0.7365 - recall_14: 0.5775 - val_loss: 0.4642 - val_acc: 0.8052 - val_precision_14: 0.7907 - val_recall_14: 0.6182\nEpoch 293\/300\n614\/614 [==============================] - 0s 134us\/sample - loss: 0.4510 - acc: 0.7834 - precision_14: 0.7326 - recall_14: 0.5915 - val_loss: 0.4629 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364\nEpoch 294\/300\n614\/614 [==============================] - 0s 141us\/sample - loss: 0.4512 - acc: 0.7866 - precision_14: 0.7330 - recall_14: 0.6056 - val_loss: 0.4630 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364\nEpoch 295\/300\n614\/614 [==============================] - 0s 176us\/sample - loss: 0.4512 - acc: 0.7818 - precision_14: 0.7232 - recall_14: 0.6009 - val_loss: 0.4630 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364\nEpoch 296\/300\n614\/614 [==============================] - 0s 166us\/sample - loss: 0.4510 - acc: 0.7834 - precision_14: 0.7326 - recall_14: 0.5915 - val_loss: 0.4637 - val_acc: 0.7987 - val_precision_14: 0.7727 - val_recall_14: 0.6182\nEpoch 297\/300\n614\/614 [==============================] - 0s 189us\/sample - loss: 0.4509 - acc: 0.7834 - precision_14: 0.7299 - recall_14: 0.5962 - val_loss: 0.4633 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364\nEpoch 298\/300\n614\/614 [==============================] - 0s 139us\/sample - loss: 0.4511 - acc: 0.7801 - precision_14: 0.7294 - recall_14: 0.5822 - val_loss: 0.4633 - val_acc: 0.8052 - val_precision_14: 0.7778 - val_recall_14: 0.6364\nEpoch 299\/300\n614\/614 [==============================] - 0s 172us\/sample - loss: 0.4533 - acc: 0.7801 - precision_14: 0.7143 - recall_14: 0.6103 - val_loss: 0.4629 - val_acc: 0.7987 - val_precision_14: 0.7609 - val_recall_14: 0.6364\nEpoch 300\/300\n614\/614 [==============================] - 0s 181us\/sample - loss: 0.4509 - acc: 0.7850 - precision_14: 0.7425 - recall_14: 0.5822 - val_loss: 0.4638 - val_acc: 0.8052 - val_precision_14: 0.7907 - val_recall_14: 0.6182<\/pre>\n\n\n\n<p>As seen above, at the end of the 300<sup>th<\/sup> epoch, the model had a loss of 45.09%, an accuracy of 78.50%, precision of 74.25%, and a recall of 58.22%. This is a pretty decent result with a very simple neural network.<\/p>\n\n\n\n<p>Let\u2019s visualize how the training process went.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#plot the loss and validation loss of the dataset<\/em>\nhistory_df = pd.DataFrame(history.history)\nplt.plot(history_df['loss'], label='loss')\nplt.plot(history_df['val_loss'], label='val_loss')\n\nplt.legend()<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/0X5Lm7MXT5JDEcj94nAtH338y6AlWC8WgFJ-xTJfPU3K88oTNWbm9WZyHuhTeeWkthp9A30KI62fkHdh4Cp3ojbUwWe_uaFNK2KnDrS9NaFAdv6ucGvnQMQVAX3CQqfDJD9Wxoo\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Adding Some Hidden Layers (a Multilayer Perceptron)<\/strong><\/h2>\n\n\n\n<p>Let\u2019s tweak the neural network architecture by adding more layers with some dropout and see how the performance will be affected.&nbsp;<\/p>\n\n\n\n<p>This time, the first hidden layer has 16 nodes with a ReLu activation function. In the next layer, the nodes were reduced to 12, with a 20% dropout to avoid overfitting. The next layer had 3 nodes with a ReLu activation while the final layer was a single-node layer with the sigmoid activation.&nbsp;<\/p>\n\n\n\n<p>The optimizer, metrics and loss function remains the same as in the last architecture.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>def<\/strong> create_model():\n&nbsp;&nbsp;&nbsp;&nbsp;<em>'''The function creates a Perceptron using Keras'''<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;model = Sequential()\n&nbsp;&nbsp;&nbsp;&nbsp;model.add(Dense(16, input_dim=len(X.columns), activation='relu'))\n&nbsp;&nbsp;&nbsp;&nbsp;model.add(Dense(12, activation='relu'))\n&nbsp;&nbsp;&nbsp;&nbsp;model.add(Dropout(0.2))\n&nbsp;&nbsp;&nbsp;&nbsp;model.add(Dense(3, activation='relu'))\n&nbsp;&nbsp;&nbsp;&nbsp;model.add(Dense(1, activation='sigmoid'))\n&nbsp;&nbsp;&nbsp;&nbsp;\n&nbsp;&nbsp;&nbsp;&nbsp;<strong>return<\/strong> model\n<strong>import<\/strong> <strong>tensorflow<\/strong> <strong>as<\/strong> <strong>tf<\/strong>\nestimator = create_model()\nestimator.compile(optimizer='adam', metrics=['accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()], loss='binary_crossentropy')<\/pre>\n\n\n\n<p>Now, we train the model on 300 epochs as well.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#train the model<\/em>\nhistory = estimator.fit(X_train, y_train, epochs=300, validation_data=(X_test, y_test))<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Train on 614 samples, validate on 154 samples\nEpoch 1\/300\n614\/614 [==============================] - 2s 4ms\/sample - loss: 0.7018 - acc: 0.4235 - precision_17: 0.3667 - recall_17: 0.9108 - val_loss: 0.6878 - val_acc: 0.7338 - val_precision_17: 0.6591 - val_recall_17: 0.5273\nEpoch 2\/300\n614\/614 [==============================] - 0s 156us\/sample - loss: 0.6924 - acc: 0.5749 - precision_17: 0.4172 - recall_17: 0.5681 - val_loss: 0.6855 - val_acc: 0.6883 - val_precision_17: 0.6842 - val_recall_17: 0.2364\nEpoch 3\/300\n614\/614 [==============================] - 0s 175us\/sample - loss: 0.6883 - acc: 0.6173 - precision_17: 0.4505 - recall_17: 0.4695 - val_loss: 0.6831 - val_acc: 0.6948 - val_precision_17: 0.7500 - val_recall_17: 0.2182\nEpoch 4\/300\n614\/614 [==============================] - 0s 173us\/sample - loss: 0.6842 - acc: 0.6564 - precision_17: 0.5079 - recall_17: 0.3005 - val_loss: 0.6809 - val_acc: 0.6623 - val_precision_17: 0.8000 - val_recall_17: 0.0727\nEpoch 5\/300\n614\/614 [==============================] - 0s 188us\/sample - loss: 0.6799 - acc: 0.6775 - precision_17: 0.5882 - recall_17: 0.2347 - val_loss: 0.6784 - val_acc: 0.6558 - val_precision_17: 0.6667 - val_recall_17: 0.0727\nEpoch 6\/300\n614\/614 [==============================] - 0s 191us\/sample - loss: 0.6771 - acc: 0.6808 - precision_17: 0.6049 - recall_17: 0.2300 - val_loss: 0.6758 - val_acc: 0.6623 - val_precision_17: 0.7143 - val_recall_17: 0.0909\nEpoch 7\/300\n614\/614 [==============================] - 0s 221us\/sample - loss: 0.6752 - acc: 0.6775 - precision_17: 0.6087 - recall_17: 0.1972 - val_loss: 0.6737 - val_acc: 0.6623 - val_precision_17: 0.8000 - val_recall_17: 0.0727\nEpoch 8\/300\n614\/614 [==============================] - 0s 210us\/sample - loss: 0.6731 - acc: 0.6759 - precision_17: 0.6522 - recall_17: 0.1408 - val_loss: 0.6715 - val_acc: 0.6558 - val_precision_17: 1.0000 - val_recall_17: 0.0364\nEpoch 9\/300\n614\/614 [==============================] - 0s 170us\/sample - loss: 0.6699 - acc: 0.6645 - precision_17: 0.6000 - recall_17: 0.0986 - val_loss: 0.6690 - val_acc: 0.6623 - val_precision_17: 0.8000 - val_recall_17: 0.0727\nEpoch 10\/300\n614\/614 [==============================] - 0s 189us\/sample - loss: 0.6675 - acc: 0.6824 - precision_17: 0.6667 - recall_17: 0.1690 - val_loss: 0.6663 - val_acc: 0.6753 - val_precision_17: 0.7778 - val_recall_17: 0.1273\nEpoch 11\/300\n614\/614 [==============================] - 0s 184us\/sample - loss: 0.6643 - acc: 0.6889 - precision_17: 0.7115 - recall_17: 0.1737 - val_loss: 0.6643 - val_acc: 0.6623 - val_precision_17: 1.0000 - val_recall_17: 0.0545\nEpoch 12\/300\n614\/614 [==============================] - 0s 170us\/sample - loss: 0.6628 - acc: 0.6661 - precision_17: 0.5870 - recall_17: 0.1268 - val_loss: 0.6617 - val_acc: 0.6753 - val_precision_17: 0.8571 - val_recall_17: 0.1091\nEpoch 13\/300\n614\/614 [==============================] - 0s 166us\/sample - loss: 0.6595 - acc: 0.7003 - precision_17: 0.7164 - recall_17: 0.2254 - val_loss: 0.6589 - val_acc: 0.6818 - val_precision_17: 0.8750 - val_recall_17: 0.1273\nEpoch 14\/300\n614\/614 [==============================] - 0s 163us\/sample - loss: 0.6553 - acc: 0.6938 - precision_17: 0.7451 - recall_17: 0.1784 - val_loss: 0.6562 - val_acc: 0.6818 - val_precision_17: 0.8750 - val_recall_17: 0.1273\nEpoch 15\/300\n614\/614 [==============================] - 0s 164us\/sample - loss: 0.6560 - acc: 0.6873 - precision_17: 0.6842 - recall_17: 0.1831 - val_loss: 0.6532 - val_acc: 0.6948 - val_precision_17: 0.9000 - val_recall_17: 0.1636\nEpoch 16\/300\n614\/614 [==============================] - 0s 168us\/sample - loss: 0.6521 - acc: 0.7003 - precision_17: 0.7042 - recall_17: 0.2347 - val_loss: 0.6508 - val_acc: 0.6948 - val_precision_17: 0.9000 - val_recall_17: 0.1636\nEpoch 17\/300\n614\/614 [==============================] - 0s 178us\/sample - loss: 0.6506 - acc: 0.6840 - precision_17: 0.6727 - recall_17: 0.1737 - val_loss: 0.6478 - val_acc: 0.7013 - val_precision_17: 0.9091 - val_recall_17: 0.1818\nEpoch 18\/300\n614\/614 [==============================] - 0s 163us\/sample - loss: 0.6425 - acc: 0.7036 - precision_17: 0.6914 - recall_17: 0.2629 - val_loss: 0.6452 - val_acc: 0.7468 - val_precision_17: 0.7667 - val_recall_17: 0.4182\nEpoch 19\/300\n614\/614 [==============================] - 0s 179us\/sample - loss: 0.6465 - acc: 0.7215 - precision_17: 0.6458 - recall_17: 0.4366 - val_loss: 0.6408 - val_acc: 0.7208 - val_precision_17: 0.8333 - val_recall_17: 0.2727\nEpoch 20\/300\n614\/614 [==============================] - 0s 158us\/sample - loss: 0.6365 - acc: 0.7264 - precision_17: 0.7528 - recall_17: 0.3146 - val_loss: 0.6375 - val_acc: 0.7468 - val_precision_17: 0.7500 - val_recall_17: 0.4364\n\u2026\nEpoch 281\/300\n\n614\/614 [==============================] - 0s 197us\/sample - loss: 0.4374 - acc: 0.7883 - precision_17: 0.7862 - recall_17: 0.5352 - val_loss: 0.4249 - val_acc: 0.7857 - val_precision_17: 0.7500 - val_recall_17: 0.6000\nEpoch 282\/300\n614\/614 [==============================] - 0s 203us\/sample - loss: 0.4210 - acc: 0.8127 - precision_17: 0.7882 - recall_17: 0.6291 - val_loss: 0.4293 - val_acc: 0.7792 - val_precision_17: 0.7442 - val_recall_17: 0.5818\nEpoch 283\/300\n614\/614 [==============================] - 0s 200us\/sample - loss: 0.4402 - acc: 0.7850 - precision_17: 0.7718 - recall_17: 0.5399 - val_loss: 0.4301 - val_acc: 0.7987 - val_precision_17: 0.7857 - val_recall_17: 0.6000\nEpoch 284\/300\n614\/614 [==============================] - 0s 207us\/sample - loss: 0.4354 - acc: 0.8208 - precision_17: 0.8084 - recall_17: 0.6338 - val_loss: 0.4280 - val_acc: 0.7857 - val_precision_17: 0.7391 - val_recall_17: 0.6182\nEpoch 285\/300\n614\/614 [==============================] - 0s 200us\/sample - loss: 0.4444 - acc: 0.7915 - precision_17: 0.7673 - recall_17: 0.5728 - val_loss: 0.4310 - val_acc: 0.7922 - val_precision_17: 0.7805 - val_recall_17: 0.5818\nEpoch 286\/300\n614\/614 [==============================] - 0s 220us\/sample - loss: 0.4380 - acc: 0.8046 - precision_17: 0.7888 - recall_17: 0.5962 - val_loss: 0.4315 - val_acc: 0.7727 - val_precision_17: 0.7381 - val_recall_17: 0.5636\nEpoch 287\/300\n614\/614 [==============================] - 0s 208us\/sample - loss: 0.4297 - acc: 0.8013 - precision_17: 0.7725 - recall_17: 0.6056 - val_loss: 0.4278 - val_acc: 0.8052 - val_precision_17: 0.7660 - val_recall_17: 0.6545\nEpoch 288\/300\n614\/614 [==============================] - 0s 192us\/sample - loss: 0.4266 - acc: 0.8062 - precision_17: 0.8013 - recall_17: 0.5869 - val_loss: 0.4295 - val_acc: 0.7857 - val_precision_17: 0.7500 - val_recall_17: 0.6000\nEpoch 289\/300\n614\/614 [==============================] - 0s 163us\/sample - loss: 0.4294 - acc: 0.8078 - precision_17: 0.7987 - recall_17: 0.5962 - val_loss: 0.4298 - val_acc: 0.7922 - val_precision_17: 0.7556 - val_recall_17: 0.6182\nEpoch 290\/300\n614\/614 [==============================] - 0s 192us\/sample - loss: 0.4299 - acc: 0.8094 - precision_17: 0.7667 - recall_17: 0.6479 - val_loss: 0.4346 - val_acc: 0.7857 - val_precision_17: 0.7750 - val_recall_17: 0.5636\nEpoch 291\/300\n614\/614 [==============================] - 0s 193us\/sample - loss: 0.4333 - acc: 0.8094 - precision_17: 0.7824 - recall_17: 0.6244 - val_loss: 0.4301 - val_acc: 0.7987 - val_precision_17: 0.7727 - val_recall_17: 0.6182\nEpoch 292\/300\n614\/614 [==============================] - 0s 263us\/sample - loss: 0.4380 - acc: 0.8029 - precision_17: 0.7674 - recall_17: 0.6197 - val_loss: 0.4278 - val_acc: 0.7922 - val_precision_17: 0.7556 - val_recall_17: 0.6182\nEpoch 293\/300\n614\/614 [==============================] - 0s 206us\/sample - loss: 0.4212 - acc: 0.8127 - precision_17: 0.7816 - recall_17: 0.6385 - val_loss: 0.4263 - val_acc: 0.7987 - val_precision_17: 0.7609 - val_recall_17: 0.6364\nEpoch 294\/300\n614\/614 [==============================] - 0s 185us\/sample - loss: 0.4322 - acc: 0.8078 - precision_17: 0.7844 - recall_17: 0.6150 - val_loss: 0.4293 - val_acc: 0.7922 - val_precision_17: 0.7674 - val_recall_17: 0.6000\nEpoch 295\/300\n614\/614 [==============================] - 0s 243us\/sample - loss: 0.4377 - acc: 0.8046 - precision_17: 0.7784 - recall_17: 0.6103 - val_loss: 0.4298 - val_acc: 0.7922 - val_precision_17: 0.7556 - val_recall_17: 0.6182\nEpoch 296\/300\n614\/614 [==============================] - 0s 217us\/sample - loss: 0.4348 - acc: 0.7866 - precision_17: 0.7563 - recall_17: 0.5681 - val_loss: 0.4308 - val_acc: 0.7857 - val_precision_17: 0.7500 - val_recall_17: 0.6000\nEpoch 297\/300\n614\/614 [==============================] - 0s 180us\/sample - loss: 0.4295 - acc: 0.7997 - precision_17: 0.7557 - recall_17: 0.6244 - val_loss: 0.4287 - val_acc: 0.7857 - val_precision_17: 0.7619 - val_recall_17: 0.5818\nEpoch 298\/300\n614\/614 [==============================] - 0s 200us\/sample - loss: 0.4431 - acc: 0.7915 - precision_17: 0.7607 - recall_17: 0.5822 - val_loss: 0.4296 - val_acc: 0.7857 - val_precision_17: 0.7500 - val_recall_17: 0.6000\nEpoch 299\/300\n614\/614 [==============================] - 0s 167us\/sample - loss: 0.4310 - acc: 0.8046 - precision_17: 0.7853 - recall_17: 0.6009 - val_loss: 0.4296 - val_acc: 0.7792 - val_precision_17: 0.7442 - val_recall_17: 0.5818\nEpoch 300\/300\n614\/614 [==============================] - 0s 178us\/sample - loss: 0.4190 - acc: 0.8078 - precision_17: 0.7684 - recall_17: 0.6385 - val_loss: 0.4269 - val_acc: 0.7857 - val_precision_17: 0.7500 - val_recall_17: 0.6000<\/pre>\n\n\n\n<p>This time, the model has a loss of 41.9%, an accuracy of 80.78%, precision of 76.84%, and a recall of 63.85. These results are slightly better than the previous model but are not convincing.&nbsp;<\/p>\n\n\n\n<p>It goes to show that for this dataset, increasing the number of layers does not necessarily improve the performance of the model.&nbsp;<\/p>\n\n\n\n<p>Finally, we can visualize the training process with the code below.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#plot the loss and validation loss of the dataset<\/em>\nhistory_df = pd.DataFrame(history.history)\nplt.plot(history_df['loss'], label='loss')\nplt.plot(history_df['val_loss'], label='val_loss')\nplt.legend()<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/djhZEeLtkFDNa8AUntvpipUhZ9CuwTvsZpq-ti_HCgTaqm7BgvqEccnRzH8IOWcgPjewBJStjxjhvHoB_DLjcWwmEPhEIgeOnxivTzHk3kTT0Ly6MFWVeCvgQ7UujXnIl1HZ18g\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Summary<\/strong><\/h2>\n\n\n\n<p>In conclusion, you have discovered how to build a binary classifier using Keras. We started out by carrying out some EDA after which the data was preprocessed and then fed into the neural network. For this dataset, we found out that increasing the number of hidden layers in the neural network architecture does not have a significant effect on the performance of the model.&nbsp;<\/p>\n\n\n\n<p>You can conclude that this is because the dataset was a relatively small dataset. It explains why a single perceptron was able to capture the patterns in the data with an accuracy of over 80%.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Supervised machine learning problems can be broadly divided into 2: Regression problems and classification problems.&nbsp; In the previous tutorials, we have examined how to build a linear regression model with Tensorflow and Keras. In this tutorial, we shall be turning our attention to classification problems. Classification problems take a large chunk of machine learning problems. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":7358,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[498],"tags":[],"class_list":["post-7157","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence-tutorials"],"_links":{"self":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/7157","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/comments?post=7157"}],"version-history":[{"count":1,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/7157\/revisions"}],"predecessor-version":[{"id":32717,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/7157\/revisions\/32717"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media\/7358"}],"wp:attachment":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media?parent=7157"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/categories?post=7157"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/tags?post=7157"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}