{"id":7021,"date":"2020-11-24T15:00:35","date_gmt":"2020-11-24T09:30:35","guid":{"rendered":"https:\/\/www.h2kinfosys.com\/blog\/?p=7021"},"modified":"2022-06-29T11:39:34","modified_gmt":"2022-06-29T06:09:34","slug":"linear-regression-with-keras-on-tensorflow","status":"publish","type":"post","link":"https:\/\/www.h2kinfosys.com\/blog\/linear-regression-with-keras-on-tensorflow\/","title":{"rendered":"Linear Regression with Keras on Tensorflow"},"content":{"rendered":"\n<p>In the last tutorial, we introduced the concept of linear regression with Keras and how to build a Linear Regression problem using <a href=\"https:\/\/www.h2kinfosys.com\/blog\/introduction-to-tensorflow-tensorflow-architecture-with-example\/\">Tensorflow\u2019s <\/a>estimator API. In that tutorial, we neglected a step which for real-life problems is very vital. Building any machine learning model whatsoever would require you to preprocess the data before feeding it to the machine learning algorithm or neural network architecture. This is because some of the data may contain missing values, duplicate values, unreasonable computations, or even redundant features. These anomalies can greatly affect the performance of your model. Data preprocessing would involve data cleaning, data augmentation, exploratory data analysis, data standardization, normalization, feature extraction, etc.\u00a0<\/p>\n\n\n\n<p>In this tutorial, we will be building a linear regression with Keras model, this time taking data preprocessing into account. Just as in the last tutorial, we will be using the Boston dataset to train and test our model. The Boston dataset is a popular dataset that relates the median price of a house to other relating factors. The dataset can be gotten from the Scikit-learn inbuilt dataset and the description of the dataset is shown below.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Boston house prices dataset\n---------------------------\n\n**Data Set Characteristics:**&nbsp;&nbsp;\n\n&nbsp;&nbsp;&nbsp;&nbsp;:Number of Instances: 506&nbsp;\n\n&nbsp;&nbsp;&nbsp;&nbsp;:Number of Attributes: 13 numeric\/categorical predictive. Median Value (attribute 14) <strong>is<\/strong> usually the target.\n\n&nbsp;&nbsp;&nbsp;&nbsp;:Attribute Information (<strong>in<\/strong> order):\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- CRIM &nbsp; &nbsp; per capita crime rate by town\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- ZN &nbsp; &nbsp; &nbsp; proportion of residential land zoned <strong>for<\/strong> lots over 25,000 sq.ft.\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- INDUS&nbsp; &nbsp; proportion of non-retail business acres per town\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- CHAS &nbsp; &nbsp; Charles River dummy variable (= 1 <strong>if<\/strong> tract bounds river; 0 otherwise)\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- NOX&nbsp; &nbsp; &nbsp; nitric oxides concentration (parts per 10 million)\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- RM &nbsp; &nbsp; &nbsp; average number of rooms per dwelling\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- AGE&nbsp; &nbsp; &nbsp; proportion of owner-occupied units built prior to 1940\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- DIS&nbsp; &nbsp; &nbsp; weighted distances to five Boston employment centres\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- RAD&nbsp; &nbsp; &nbsp; index of accessibility to radial highways\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- TAX&nbsp; &nbsp; &nbsp; full-value property-tax rate per $10,000\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- PTRATIO&nbsp; pupil-teacher ratio by town\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- B&nbsp; &nbsp; &nbsp; &nbsp; 1000(Bk - 0.63)^2 where Bk <strong>is<\/strong> the proportion of blacks by town\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- LSTAT&nbsp; &nbsp; % lower status of the population\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- MEDV &nbsp; &nbsp; Median value of owner-occupied homes <strong>in<\/strong> $1000's\n\n&nbsp;&nbsp;&nbsp;&nbsp;:Missing Attribute Values: None<\/pre>\n\n\n\n<p>A machine learning model makes predictions by making generalizations from patterns in the trained data. This implies that the model must first learn embedded patterns in the data. And that&#8217;s intuitive. If I spill out a sequence of numbers say 2, 4, 6, 8, and I ask that you predict the next number, you&#8217;d most likely say 10. You know this because you discovered that the sequence of numbers has an increment of 2. You generalized the observed pattern in the data.&nbsp;<\/p>\n\n\n\n<p>This same principle for machine learning models. Only that this time, the data is way more bogus that we humans may not see the embedded patterns easily. For instance, the Boston dataset which is regarded as a small dataset has 13 features plus the target, with 506 samples. It\u2019s almost impossible to get any substantial pattern from the data by barely looking at the numbers.&nbsp;<\/p>\n\n\n\n<p>But here&#8217;s the thing. Most times, not all features directly affect the target. Features that do not necessarily affect the labels are called noise should be tailed off or completely removed. Selecting the most important features would have a huge toll on the performance of your model. Not only does it make the data compact, it allows the model to learn patterns very quickly during training.&nbsp;<\/p>\n\n\n\n<p>Another point to take note of is that some features are highly correlated such that a change in one strongly affects the other. This is called multicollinearity. If you observe this occurrence in your data, it is good practice to remove one of the features or better still, merge both features into one. While multicollinearity may not affect your model\u2019s performance, it is good practice to check for it and deal with it to remove dummy features.&nbsp;<\/p>\n\n\n\n<p>In this tutorial, you will learn the steps involved in data preprocessing and model building. By the end of this tutorial, you would discover<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>How to get a quick overview of your data<\/li><li>How to deal with missing values<\/li><li>Checking for multicollinearity<\/li><li>How to deal with multicollinearity<\/li><li>How to inspect your data<\/li><li>How to check for outliers<\/li><li>How to normalize and standardize your data<\/li><li>Building a neural network with Keras<\/li><li>Training a neural network<\/li><li>Evaluating the model<\/li><li>Improving the model<\/li><\/ul>\n\n\n\n<p>These steps are the framework for building machine learning models. Let&#8217;s dive in.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Data Overview<\/strong><\/h2>\n\n\n\n<p>Inspecting the data is a critical step when building a machine model. This is because many times, the data have some imperfections. Moreso, you&#8217;d need to be conversant with the features of the data and their specific data type. A good and common practice is to check the first five rows of the data.&nbsp;<\/p>\n\n\n\n<p>Let&#8217;s start by importing the necessary libraries and downloading the data from sklearn.datasets method. Throughout the course of this tutorial, we will use other libraries such as matplotlib, seaborn, NumPy, and of course TensorFlow.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em># import the necessary libraries<\/em>\n<strong>from<\/strong> <strong>sklearn.datasets<\/strong> <strong>import<\/strong> load_boston\n<strong>import<\/strong> <strong>pandas<\/strong> <strong>as<\/strong> <strong>pd<\/strong>\n<strong>import<\/strong> <strong>matplotlib.pyplot<\/strong> <strong>as<\/strong> <strong>plt<\/strong>\n<strong>import<\/strong> <strong>seaborn<\/strong> <strong>as<\/strong> <strong>sns<\/strong>\n<strong>import<\/strong> <strong>numpy<\/strong> <strong>as<\/strong> <strong>np<\/strong>\n<strong>from<\/strong> <strong>statsmodels.stats.outliers_influence<\/strong> <strong>import<\/strong> variance_inflation_factor <strong>as<\/strong> vif\n<strong>from<\/strong> <strong>sklearn.decomposition<\/strong> <strong>import<\/strong> PCA\n<strong>from<\/strong> <strong>sklearn.preprocessing<\/strong> <strong>import<\/strong> RobustScaler\n<strong>from<\/strong> <strong>sklearn.preprocessing<\/strong> <strong>import<\/strong> MinMaxScaler\n<strong>from<\/strong> <strong>sklearn.model_selection<\/strong> <strong>import<\/strong> train_test_split\n<strong>import<\/strong> <strong>tensorflow<\/strong> <strong>as<\/strong> <strong>tf<\/strong>\n<strong>from<\/strong> <strong>tensorflow.keras<\/strong> <strong>import<\/strong> Sequential\n<strong>from<\/strong> <strong>tensorflow.keras.layers<\/strong> <strong>import<\/strong> Dense\n\n<em>#load the dataset<\/em>\ndata = load_boston()\n<em>#convert the dataset into a Pandas dataframe and add the target column named 'Price'<\/em>\ndf = pd.DataFrame(data.data, columns=data.feature_names)\ndf['Price'] = data.target<\/pre>\n\n\n\n<p>We&#8217;d do this using the head() method of pandas to print the first 5 rows of the dataset. Needless to say, you&#8217;d need to have pandas installed on your machine. If you do not, simply type pip install pandas on your console.<\/p>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CRIM&nbsp; &nbsp; ZN&nbsp; INDUS&nbsp; CHAS&nbsp; &nbsp; NOX &nbsp; &nbsp; RM &nbsp; AGE &nbsp; &nbsp; DIS&nbsp; RAD&nbsp; &nbsp; TAX&nbsp; \\\n0&nbsp; 0.00632&nbsp; 18.0 &nbsp; 2.31 &nbsp; 0.0&nbsp; 0.538&nbsp; 6.575&nbsp; 65.2&nbsp; 4.0900&nbsp; 1.0&nbsp; 296.0&nbsp;&nbsp;&nbsp;\n1&nbsp; 0.02731 &nbsp; 0.0 &nbsp; 7.07 &nbsp; 0.0&nbsp; 0.469&nbsp; 6.421&nbsp; 78.9&nbsp; 4.9671&nbsp; 2.0&nbsp; 242.0&nbsp;&nbsp;&nbsp;\n2&nbsp; 0.02729 &nbsp; 0.0 &nbsp; 7.07 &nbsp; 0.0&nbsp; 0.469&nbsp; 7.185&nbsp; 61.1&nbsp; 4.9671&nbsp; 2.0&nbsp; 242.0&nbsp;&nbsp;&nbsp;\n3&nbsp; 0.03237 &nbsp; 0.0 &nbsp; 2.18 &nbsp; 0.0&nbsp; 0.458&nbsp; 6.998&nbsp; 45.8&nbsp; 6.0622&nbsp; 3.0&nbsp; 222.0&nbsp;&nbsp;&nbsp;\n4&nbsp; 0.06905 &nbsp; 0.0 &nbsp; 2.18 &nbsp; 0.0&nbsp; 0.458&nbsp; 7.147&nbsp; 54.2&nbsp; 6.0622&nbsp; 3.0&nbsp; 222.0&nbsp;&nbsp;&nbsp;\n&nbsp;\n&nbsp;&nbsp;&nbsp;PTRATIO &nbsp; &nbsp; &nbsp; B&nbsp; LSTAT&nbsp; Price&nbsp;&nbsp;\n0 &nbsp; &nbsp; 15.3&nbsp; 396.90 &nbsp; 4.98 &nbsp; 24.0&nbsp;&nbsp;\n1 &nbsp; &nbsp; 17.8&nbsp; 396.90 &nbsp; 9.14 &nbsp; 21.6&nbsp;&nbsp;\n2 &nbsp; &nbsp; 17.8&nbsp; 392.83 &nbsp; 4.03 &nbsp; 34.7&nbsp;&nbsp;\n3 &nbsp; &nbsp; 18.7&nbsp; 394.63 &nbsp; 2.94 &nbsp; 33.4&nbsp;&nbsp;\n4 &nbsp; &nbsp; 18.7&nbsp; 396.90 &nbsp; 5.33 &nbsp; 36.2&nbsp;&nbsp;<\/pre>\n\n\n\n<p>Let\u2019s see the number of rows and columns we have in our dataset. This will help give an idea of how large the dataset is. This is done using the shape attribute of the dataframe.&nbsp;<\/p>\n\n\n\n<p><em>#check the number of rows and columns in the dataset<\/em><\/p>\n\n\n\n<p>df.shape<\/p>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">(506, 14)\n<\/pre>\n\n\n\n<p>To get an overview of the data, we use the describe() method. The method shows the mean, standard deviation, minimum value, 25th percentile, median, 75th percentile, and the maximum value of each column.&nbsp;<\/p>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">&nbsp;&nbsp;&nbsp;&nbsp;count&nbsp; &nbsp; &nbsp; &nbsp; mean &nbsp; &nbsp; &nbsp; &nbsp; std&nbsp; &nbsp; &nbsp; &nbsp; min &nbsp; &nbsp; &nbsp; &nbsp; 25%&nbsp; &nbsp; &nbsp; &nbsp; 50%&nbsp; \\\nCRIM &nbsp; &nbsp; 506.0&nbsp; &nbsp; 3.613524&nbsp; &nbsp; 8.601545&nbsp; &nbsp; 0.00632&nbsp; &nbsp; 0.082045&nbsp; &nbsp; 0.25651&nbsp;&nbsp;&nbsp;\nZN &nbsp; &nbsp; &nbsp; 506.0 &nbsp; 11.363636 &nbsp; 23.322453&nbsp; &nbsp; 0.00000&nbsp; &nbsp; 0.000000&nbsp; &nbsp; 0.00000&nbsp;&nbsp;&nbsp;\nINDUS&nbsp; &nbsp; 506.0 &nbsp; 11.136779&nbsp; &nbsp; 6.860353&nbsp; &nbsp; 0.46000&nbsp; &nbsp; 5.190000&nbsp; &nbsp; 9.69000&nbsp;&nbsp;&nbsp;\nCHAS &nbsp; &nbsp; 506.0&nbsp; &nbsp; 0.069170&nbsp; &nbsp; 0.253994&nbsp; &nbsp; 0.00000&nbsp; &nbsp; 0.000000&nbsp; &nbsp; 0.00000&nbsp;&nbsp;&nbsp;\nNOX&nbsp; &nbsp; &nbsp; 506.0&nbsp; &nbsp; 0.554695&nbsp; &nbsp; 0.115878&nbsp; &nbsp; 0.38500&nbsp; &nbsp; 0.449000&nbsp; &nbsp; 0.53800&nbsp;&nbsp;&nbsp;\nRM &nbsp; &nbsp; &nbsp; 506.0&nbsp; &nbsp; 6.284634&nbsp; &nbsp; 0.702617&nbsp; &nbsp; 3.56100&nbsp; &nbsp; 5.885500&nbsp; &nbsp; 6.20850&nbsp;&nbsp;&nbsp;\nAGE&nbsp; &nbsp; &nbsp; 506.0 &nbsp; 68.574901 &nbsp; 28.148861&nbsp; &nbsp; 2.90000 &nbsp; 45.025000 &nbsp; 77.50000&nbsp;&nbsp;&nbsp;\nDIS&nbsp; &nbsp; &nbsp; 506.0&nbsp; &nbsp; 3.795043&nbsp; &nbsp; 2.105710&nbsp; &nbsp; 1.12960&nbsp; &nbsp; 2.100175&nbsp; &nbsp; 3.20745&nbsp;&nbsp;&nbsp;\nRAD&nbsp; &nbsp; &nbsp; 506.0&nbsp; &nbsp; 9.549407&nbsp; &nbsp; 8.707259&nbsp; &nbsp; 1.00000&nbsp; &nbsp; 4.000000&nbsp; &nbsp; 5.00000&nbsp;&nbsp;&nbsp;\nTAX&nbsp; &nbsp; &nbsp; 506.0&nbsp; 408.237154&nbsp; 168.537116&nbsp; 187.00000&nbsp; 279.000000&nbsp; 330.00000&nbsp;&nbsp;&nbsp;\nPTRATIO&nbsp; 506.0 &nbsp; 18.455534&nbsp; &nbsp; 2.164946 &nbsp; 12.60000 &nbsp; 17.400000 &nbsp; 19.05000&nbsp;&nbsp;&nbsp;\nB&nbsp; &nbsp; &nbsp; &nbsp; 506.0&nbsp; 356.674032 &nbsp; 91.294864&nbsp; &nbsp; 0.32000&nbsp; 375.377500&nbsp; 391.44000&nbsp;&nbsp;&nbsp;\nLSTAT&nbsp; &nbsp; 506.0 &nbsp; 12.653063&nbsp; &nbsp; 7.141062&nbsp; &nbsp; 1.73000&nbsp; &nbsp; 6.950000 &nbsp; 11.36000&nbsp;&nbsp;&nbsp;\nPrice&nbsp; &nbsp; 506.0 &nbsp; 22.532806&nbsp; &nbsp; 9.197104&nbsp; &nbsp; 5.00000 &nbsp; 17.025000 &nbsp; 21.20000&nbsp;&nbsp;&nbsp;<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;75% &nbsp; &nbsp; &nbsp; max&nbsp;&nbsp;\nCRIM &nbsp; &nbsp; &nbsp; 3.677083 &nbsp; 88.9762&nbsp;&nbsp;\nZN&nbsp; &nbsp; &nbsp; &nbsp; 12.500000&nbsp; 100.0000&nbsp;&nbsp;\nINDUS &nbsp; &nbsp; 18.100000 &nbsp; 27.7400&nbsp;&nbsp;\nCHAS &nbsp; &nbsp; &nbsp; 0.000000&nbsp; &nbsp; 1.0000&nbsp;&nbsp;\nNOX&nbsp; &nbsp; &nbsp; &nbsp; 0.624000&nbsp; &nbsp; 0.8710&nbsp;&nbsp;\nRM &nbsp; &nbsp; &nbsp; &nbsp; 6.623500&nbsp; &nbsp; 8.7800&nbsp;&nbsp;\nAGE &nbsp; &nbsp; &nbsp; 94.075000&nbsp; 100.0000&nbsp;&nbsp;\nDIS&nbsp; &nbsp; &nbsp; &nbsp; 5.188425 &nbsp; 12.1265&nbsp;&nbsp;\nRAD &nbsp; &nbsp; &nbsp; 24.000000 &nbsp; 24.0000&nbsp;&nbsp;\nTAX&nbsp; &nbsp; &nbsp; 666.000000&nbsp; 711.0000&nbsp;&nbsp;\nPTRATIO &nbsp; 20.200000 &nbsp; 22.0000&nbsp;&nbsp;\nB&nbsp; &nbsp; &nbsp; &nbsp; 396.225000&nbsp; 396.9000&nbsp;&nbsp;\nLSTAT &nbsp; &nbsp; 16.955000 &nbsp; 37.9700&nbsp;&nbsp;\nPrice &nbsp; &nbsp; 25.000000 &nbsp; 50.0000&nbsp;<\/pre>\n\n\n\n<p>You&#8217;d observe that while some columns contain averagely large numbers (e.g. TAX column with a mean of 168.5), small others contain small numbers (eg NOX column with a mean value of 0.39). Having a dataset with such a wide range of numbers makes it difficult for our machine learning model to learn. To fix this, the data should be rescaled through standardization or normalization. We would explain what these terms mean later in this tutorial.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Dealing with Missing Values<\/strong><\/h2>\n\n\n\n<p>Going forward, we check for missing values. The presence of missing values can greatly affect how the machine learning model behaves. This makes it critically important to check if missing values exist in your data and deal with them appropriately. To check for missing values, we use the isnull() method&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#check for null values<\/em>\ndf.isnull().sum()\n<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CRIM &nbsp; &nbsp; &nbsp; 0\nZN &nbsp; &nbsp; &nbsp; &nbsp; 0\nINDUS&nbsp; &nbsp; &nbsp; 0\nCHAS &nbsp; &nbsp; &nbsp; 0\nNOX&nbsp; &nbsp; &nbsp; &nbsp; 0\nRM &nbsp; &nbsp; &nbsp; &nbsp; 0\nAGE&nbsp; &nbsp; &nbsp; &nbsp; 0\nDIS&nbsp; &nbsp; &nbsp; &nbsp; 0\nRAD&nbsp; &nbsp; &nbsp; &nbsp; 0\nTAX&nbsp; &nbsp; &nbsp; &nbsp; 0\nPTRATIO&nbsp; &nbsp; 0\nB&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0\nLSTAT&nbsp; &nbsp; &nbsp; 0\nPrice&nbsp; &nbsp; &nbsp; 0\ndtype: int64\n<\/pre>\n\n\n\n<p>As seen, missing values were not present in this particular dataset which is usually not the case for untouched real-world data. In a situation where missing values exist, you can drop the rows completely if they are relatively not many. If however, the missing values are many, it is not advisable to drop all rows containing the missing values as you\u2019d be losing a lot of information. In such cases, you can replace missing values with the aggregation of the column such as mean, median, or mode.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Checking for Multicollinearity<\/strong><\/h2>\n\n\n\n<p>Multicollinearity may not have a serious impact on the performance of most machine learning algorithms but it is important to check for multicollinearity to have a better understanding of your data. So let\u2019s discuss what <a href=\"https:\/\/en.wikipedia.org\/wiki\/Multicollinearity#:~:text=Multicollinearity%20refers%20to%20a%20situation,equal%20to%201%20or%20%E2%88%921.\" rel=\"nofollow noopener\" target=\"_blank\">multicollinearity <\/a>is first. Multicollinearity occurs when two or more independent variables (features) are strongly correlated. This can be a big deal in linear regression problems as it reduces the efficacy of the linear regression coefficient. By implication, you won&#8217;t have a clear insight into how the features affect the target variable.\u00a0<\/p>\n\n\n\n<p>Let&#8217;s say you have a linear regression problem given by the equation<\/p>\n\n\n\n<p>Y = m<sub>1<\/sub>x<sub>1<\/sub> + m<sub>2<\/sub>x<sub>2 <\/sub>+ m<sub>3<\/sub>x<sub>3<\/sub> +\u2026 m<sub>n<\/sub>x<sub>n<\/sub>+ c<\/p>\n\n\n\n<p>If X1 and X2 are strongly correlated, an increase in X1 will cause an increase in X2. Thus, you won&#8217;t be able to determine how X1 and X2 individually affect the target variable, Y.&nbsp;<\/p>\n\n\n\n<p>So now we have an idea of what multicollinearity is, how do we detect it?<\/p>\n\n\n\n<p>Before dealing with multicollinearity, you must first detect it. There are a couple of methods to detect multicollinearity. One way is to plot the correlation matrix for the data using a heat map and observe the features that have a strong correlation (it could be positive or negative). Another method is to calculate the VIF and check for columns with close VIF values.&nbsp; In this tutorial, we will focus on the VIF method.&nbsp;<\/p>\n\n\n\n<p>VIF stands for Variable Inflation Factor. It is given by the inverse of 1 &#8211; R<sup>2<\/sup> value.&nbsp;<\/p>\n\n\n\n<p>For VIF scores equal to 1, it means there is no correlation at all.&nbsp;<\/p>\n\n\n\n<p>If the VIF score ranges from 1 to 5, it means there&#8217;s a slight correlation.&nbsp;<\/p>\n\n\n\n<p>While for VIF scores greater than 10, it means there is a strong correlation.<\/p>\n\n\n\n<p>We calculate the VIF scores for each column using the statsmodels library. The code that calculates the VIF scores and creates a DataFrame is shown below.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>def<\/strong> create_vif(dataframe):\n&nbsp;&nbsp;&nbsp;&nbsp;<em>''' This function calculates the Variation Inflation Factors for each column and convert it into a dataframe'''<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;\n&nbsp;&nbsp;&nbsp;&nbsp;<em>#create an empty dataframe<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;vif_table = pd.DataFrame()\n&nbsp;&nbsp;&nbsp;&nbsp;<em>#populate the first column with the columns of the dataset<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;vif_table['variables'] = dataframe.columns\n&nbsp;&nbsp;&nbsp;&nbsp;<em>#calculate the VIF of each column and create a VIF column to store the number<\/em>\n&nbsp;&nbsp;&nbsp;&nbsp;vif_table['VIF'] = [vif(dataframe.values, i) <strong>for<\/strong> i <strong>in<\/strong> range(df.shape[1])]\n&nbsp;&nbsp;&nbsp;&nbsp;\n&nbsp;&nbsp;&nbsp;&nbsp;<strong>return<\/strong> vif_table\n\n<em>#print the VIF table for each variable<\/em>\n<strong>print<\/strong>(create_vif(df))\n<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">&nbsp;&nbsp;variables &nbsp; &nbsp; &nbsp; &nbsp; VIF\n0 &nbsp; &nbsp; &nbsp; CRIM&nbsp; &nbsp; 2.131404\n1 &nbsp; &nbsp; &nbsp; &nbsp; ZN&nbsp; &nbsp; 2.910004\n2&nbsp; &nbsp; &nbsp; INDUS &nbsp; 14.485874\n3 &nbsp; &nbsp; &nbsp; CHAS&nbsp; &nbsp; 1.176266\n4&nbsp; &nbsp; &nbsp; &nbsp; NOX &nbsp; 74.004269\n5 &nbsp; &nbsp; &nbsp; &nbsp; RM&nbsp; 136.101743\n6&nbsp; &nbsp; &nbsp; &nbsp; AGE &nbsp; 21.398863\n7&nbsp; &nbsp; &nbsp; &nbsp; DIS &nbsp; 15.430455\n8&nbsp; &nbsp; &nbsp; &nbsp; RAD &nbsp; 15.369980\n9&nbsp; &nbsp; &nbsp; &nbsp; TAX &nbsp; 61.939713\n10 &nbsp; PTRATIO &nbsp; 87.227233\n11 &nbsp; &nbsp; &nbsp; &nbsp; B &nbsp; 21.351015\n12 &nbsp; &nbsp; LSTAT &nbsp; 12.615188\n13 &nbsp; &nbsp; Price &nbsp; 24.503206\n<\/pre>\n\n\n\n<p>As seen from the table, DIS, RAD, and INDUS have VIF scores of 15.43, 15.38, and 14.48 respectively. These values are greater than 10 and are close together. By implication, these three columns are strongly correlated. So how do we deal with them?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Dealing with Multicollinearity<\/strong><\/h2>\n\n\n\n<p>There&#8217;s the option of dropping one (or more columns, if there are more than 3 strongly correlated columns) that are strongly correlated to be left with just one of such columns. Of course, the idea is that the column left behind would behave like the ones dropped and can stand in their stead. Some other data scientists combine all the correlated columns into one.&nbsp;<\/p>\n\n\n\n<p>Here, we will combine the correlated columns into one to accommodate the slightest behavior of the individual columns. We do this by using the Principal Component Analysis (PCA) transformation technique. This technique is used for reducing the dimensions of data without losing the important properties of each column. To do the PCA transformation, we instantiate the class and then fit transform the class on the correlated columns. The code below explains this procedure.&nbsp;&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#compress the columns 'DIS', 'RAD', 'INDUS' into 1 column<\/em>\npca = PCA(n_components=1)\n<em>#call the compressed column 'new'<\/em>\ndf['new'] = pca.fit_transform(df[['DIS', 'RAD', 'INDUS']])\n<em>#drop the three columns from the dataset<\/em>\ndf = df.drop(['DIS', 'RAD', 'INDUS'], axis=1)\n<\/pre>\n\n\n\n<p>With the new dataframe, we can recheck the VIF using the function we created earlier.&nbsp;<\/p>\n\n\n\n<p>Now if we check the new columns, you&#8217;d realize that the column has a VIF that is less than 10 which is good.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#recheck the new VIF table<\/em>\n<strong>print<\/strong>(create_vif(df))\n<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">&nbsp;&nbsp;variables &nbsp; &nbsp; &nbsp; &nbsp; VIF\n0 &nbsp; &nbsp; &nbsp; CRIM&nbsp; &nbsp; 2.006392\n1 &nbsp; &nbsp; &nbsp; &nbsp; ZN&nbsp; &nbsp; 2.349186\n2 &nbsp; &nbsp; &nbsp; CHAS&nbsp; &nbsp; 1.173519\n3&nbsp; &nbsp; &nbsp; &nbsp; NOX &nbsp; 65.166302\n4 &nbsp; &nbsp; &nbsp; &nbsp; RM&nbsp; 133.757986\n5&nbsp; &nbsp; &nbsp; &nbsp; AGE &nbsp; 18.823276\n6&nbsp; &nbsp; &nbsp; &nbsp; TAX &nbsp; 56.391909\n7&nbsp; &nbsp; PTRATIO &nbsp; 77.938234\n8&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; B &nbsp; 21.345554\n9&nbsp; &nbsp; &nbsp; LSTAT &nbsp; 12.580803\n10 &nbsp; &nbsp; Price &nbsp; 23.131681\n11 &nbsp; &nbsp; &nbsp; new&nbsp; &nbsp; 9.194328\n<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Inspecting the Data<\/strong><\/h2>\n\n\n\n<p>You should inspect your data by drawing a plot of features against each other. Seaborn library provides an easy way to do this with the pairplot method. We select 3 correlated features with high VIF (NOX, RM, TAX) and 2 features with low VIF (LSTAT, new).&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#print a pairplot to check the relationships between strongly correlated features<\/em>\npp = sns.pairplot(df[['NOX', 'RM', 'TAX', 'LSTAT', 'new']])\npp = pp.map_lower(sns.regplot)\npp = pp.map_upper(sns.kdeplot);\n<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/b2M2xUe2uEyVL1fD0oqxJl4lC2X8zsyIkBoV-u0ffPPt9vuvxrPJgszR5d7bgT777YZxh6vtzLs-EJ_UFYi74phgR-Au-bE6wALJ8co-HmtrLGoCQMnuCoWzYB-2BeI4MfTIazFGH9fsLpoGHg\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>We can see the relationships in the features from the pairplot. For some features, the data points follow a pattern. This is a pointer to the fact that a linear regression model can learn the data and subsequently make predictions. You\u2019d also notice that some data points are far from where the majority are. We will be discussing next, how to make our model robust to such data points, called outliers.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Checking for Outliers<\/strong><\/h2>\n\n\n\n<p>Statistical parameters such as mean and standard deviation as well as machine learning algorithms such as linear regression and ANOVA are sensitive to outliers. Ideally, the distribution numbers in the column should follow a normal distribution curve (bell shape), where the majority of the class appears at the center. A dataset with outliers however has exceptionally high values in the extreme end of the distribution curve. These unusual occurrences are called outliers.&nbsp;<\/p>\n\n\n\n<p>Data outliers immensely affect the training of machine learning models. Most times, it causes longer training time and reduced model accuracy.&nbsp;<\/p>\n\n\n\n<p>There are various ways of detecting outliers in a dataset. For this tutorial, we&#8217;d be plotting boxplots to visualize how the data points are distributed. An outlier is anywhere above or below the whiskers of the boxplots. They are typically identified with a circle above or below the boxplot whiskers.&nbsp;<\/p>\n\n\n\n<p>We would use the seaborn library to plot a boxplot for the independent variables of the dataset.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">df1 = df.copy()\n<em># # Create a figure with 10 subplots with a width spacing of 1.5&nbsp;&nbsp;&nbsp;&nbsp;<\/em>\nfig, ax = plt.subplots(2,5)\nfig.subplots_adjust(wspace=1.5)\n\n<em># Create a boxplot for the continuous features&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/em>\nbox_plot1 = sns.boxplot(y=np.log(df1[df1.columns[0]]), ax=ax[0][0])\nbox_plot2 = sns.boxplot(y=np.log(df1[df1.columns[1]]), ax=ax[0][1])\nbox_plot3 = sns.boxplot(y=np.log(df1[df1.columns[2]]), ax=ax[0][2])\nbox_plot4 = sns.boxplot(y=np.log(df1[df1.columns[3]]), ax=ax[0][3])\nbox_plot5 = sns.boxplot(y=np.log(df1[df1.columns[4]]), ax=ax[0][4])\nbox_plot6 = sns.boxplot(y=np.log(df1[df1.columns[5]]), ax=ax[1][0])\nbox_plot7 = sns.boxplot(y=np.log(df1[df1.columns[6]]), ax=ax[1][1])\nbox_plot8 = sns.boxplot(y=np.log(df1[df1.columns[-3]]), ax=ax[1][2])\nbox_plot9 = sns.boxplot(y=np.log(df1[df1.columns[8]]), ax=ax[1][3])\nbox_plot10 = sns.boxplot(y=np.log(df1[df1.columns[10]]), ax=ax[1][4])\n;\n<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/PKTj8Y9UYNhoO1vHgxBEuTSv8r_X5DPQe60YATV2G5veil_04-DOZddCZ6c1SyRv0gGvKUZZs-O84bJpVgBrx5vKQDY66A9X5jdnb5QWJVAhK5QsTzDPpUsDktEZX2taivlDyEhBaBtY6a-pzw\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>From the boxplots, you&#8217;d observe that features such as RM, AGE, PTRATIO, B, and LSTAT have outliers. So how do we deal with them? It is intrinsically not the best of ideas to drop rows containing outliers. Especially in situations where the outliers are many, we&#8217;d be losing a lot of information. You can decide to normalize your data such that it is robust to outliers.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Data Normalization and Standardization&nbsp;<\/strong><\/h2>\n\n\n\n<p>We can rescale the data distribution through normalization or standardization. Standardization involves you rescaling your data such that the minimum and maximum values are within a predetermined range.&nbsp; Normalization on the other hand involves you rescaling your data such that the frequency distribution curve is reshaped to something more like the bell curve shape.<\/p>\n\n\n\n<p>Scikit learn&#8217;s preprocessing allows us to carry out the various standardization and normalization steps. Let&#8217;s discuss some of the options.&nbsp;<\/p>\n\n\n\n<p>1. StandardScaler: This rescales the data by subtracting all the entries from the mean value and dividing it by the standard deviation. After a StandardScaler step has been carried out, the mean of the distribution is equal to zero while 67.7% of the distribution falls between -1 and 1<\/p>\n\n\n\n<p>&nbsp;2. MinMaxScaler: The MinMaxScaler is done by subtracting the minimum value in the feature and dividing by the range of the feature. The MinMaxScaler does not change the shape of the distribution but shrinks the frequency distribution between 0 to 1.&nbsp;<\/p>\n\n\n\n<p>3. RobustScaler: The RobustScaler subtracts the median value from each entry and divides by the interquartile range of the feature. Since RobustScaler divides by the interquartile range, the returned frequency distribution penalizes outliers. This makes RobustScaler robust for data with outliers.&nbsp;<\/p>\n\n\n\n<p>Since our data contains outliers, we will standardize it using the RobustScaler class. Note that we&#8217;d need to split the data into train and test data first. We would also need to specifically change the CHAS column (a categorical feature) into One-Hot Encoded features. We then fit the RobustScaler class on the train dataset but transform on my train and test dataset. The code below does all these.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#One-Hot Encode the CHAS column<\/em>\ndf = pd.get_dummies(df, columns=['CHAS'], drop_first=True)\n<em>#define the features and the labels, X and y<\/em>\nX = df.drop(['Price'], axis=1)\ny = df['Price']\n\n<em>#split the features and labels into&nbsp; train and test data<\/em>\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)\n\n<em>#rescale the data to be robust to outliers<\/em>\nscaler = RobustScaler()\nscaler.fit(X_train)\nX_train = scaler.transform(X_train)\nX_test = scaler.transform(X_test)<\/pre>\n\n\n\n<p>Now we have preprocessed the data, it is time to build the neural network model using Keras.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Building a Multilayer Neural Network with Tensorflow Keras.&nbsp;<\/strong><\/h2>\n\n\n\n<p>Before training our model, we have to build it. Building the architecture of a neural network in Keras is done using the Sequential class. Layers can be added to whatever numbers you desire.&nbsp;<\/p>\n\n\n\n<p>First off, we will create a single hidden layer and see how the model performs.&nbsp;<\/p>\n\n\n\n<p>Since the data we are passing into the model has 11 features, we must define the input_dim parameter in the first layer and set it to 11. Our single hidden is set to have 15 nodes and then it&#8217;s passed to the output layer with just one node. Since it is a linear regression problem and the output is just one number, the final layer should have one node.&nbsp;<\/p>\n\n\n\n<p>In addition, the hidden layer has a ReLu activation function whereas the output function has a linear activation function. If you don&#8217;t know what activation functions are, I like to see them as &#8216;switches&#8217; that are responsible for aggregating the weights of the nodes to give an output to the next node input.&nbsp;<\/p>\n\n\n\n<p>The code to build the neural network architecture is shown below.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#built the neural network architecture<\/em>\nmodel = Sequential()\nmodel.add(Dense(15, input_dim=11, activation='relu'))\nmodel.add(Dense(1, activation='linear'))<\/pre>\n\n\n\n<p>The next step is to compile the model. We use an Adam optimizer with a mean squared error loss. We defined the validation metrics to be mean squared error and mean absolute error.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">model.compile(loss='mse', optimizer='adam', metrics=['mse', 'mae'])<\/pre>\n\n\n\n<p><strong>Training the Model<\/strong><\/p>\n\n\n\n<p>The model was trained on 200 epochs with a validation set of 20% of the train data. The validation set helps you check how well the model is learning during the training process, based on the loss function.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#train the neural network on the train dataset<\/em>\nhistory = model.fit(X_train, y_train, epochs=200, validation_split=0.2)<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">Output:\nTrain on 323 samples, validate on 81 samples\nEpoch 1\/200\n323\/323 [==============================] - 0s 1ms\/sample - loss: 616.0037 - mean_squared_error: 616.0037 - mean_absolute_error: 23.3245 - val_loss: 584.0988 - val_mean_squared_error: 584.0989 - val_mean_absolute_error: 22.4651\nEpoch 2\/200\n323\/323 [==============================] - 0s 127us\/sample - loss: 606.8097 - mean_squared_error: 606.8097 - mean_absolute_error: 23.1052 - val_loss: 576.2635 - val_mean_squared_error: 576.2634 - val_mean_absolute_error: 22.2775\nEpoch 3\/200\n323\/323 [==============================] - 0s 161us\/sample - loss: 598.1349 - mean_squared_error: 598.1349 - mean_absolute_error: 22.8789 - val_loss: 568.6242 - val_mean_squared_error: 568.6242 - val_mean_absolute_error: 22.0914\nEpoch 4\/200\n323\/323 [==============================] - 0s 248us\/sample - loss: 590.0231 - mean_squared_error: 590.0231 - mean_absolute_error: 22.6751 - val_loss: 561.2776 - val_mean_squared_error: 561.2776 - val_mean_absolute_error: 21.9079\nEpoch 5\/200\n323\/323 [==============================] - 0s 161us\/sample - loss: 582.1993 - mean_squared_error: 582.1993 - mean_absolute_error: 22.4697 - val_loss: 554.2171 - val_mean_squared_error: 554.2170 - val_mean_absolute_error: 21.7276\nEpoch 6\/200\n323\/323 [==============================] - 0s 198us\/sample - loss: 574.5526 - mean_squared_error: 574.5526 - mean_absolute_error: 22.2655 - val_loss: 547.2002 - val_mean_squared_error: 547.2002 - val_mean_absolute_error: 21.5468\nEpoch 7\/200\n323\/323 [==============================] - 0s 248us\/sample - loss: 566.7739 - mean_squared_error: 566.7739 - mean_absolute_error: 22.0529 - val_loss: 540.1250 - val_mean_squared_error: 540.1251 - val_mean_absolute_error: 21.3606\nEpoch 8\/200\n323\/323 [==============================] - 0s 111us\/sample - loss: 559.2289 - mean_squared_error: 559.2289 - mean_absolute_error: 21.8367 - val_loss: 532.9769 - val_mean_squared_error: 532.9769 - val_mean_absolute_error: 21.1680\nEpoch 9\/200\n323\/323 [==============================] - 0s 111us\/sample - loss: 551.4707 - mean_squared_error: 551.4707 - mean_absolute_error: 21.6204 - val_loss: 526.0247 - val_mean_squared_error: 526.0247 - val_mean_absolute_error: 20.9819\nEpoch 10\/200\n323\/323 [==============================] - 0s 149us\/sample - loss: 543.9210 - mean_squared_error: 543.9210 - mean_absolute_error: 21.4173 - val_loss: 519.0010 - val_mean_squared_error: 519.0010 - val_mean_absolute_error: 20.7915\nEpoch 11\/200\n323\/323 [==============================] - 0s 124us\/sample - loss: 536.3257 - mean_squared_error: 536.3257 - mean_absolute_error: 21.2125 - val_loss: 511.7967 - val_mean_squared_error: 511.7967 - val_mean_absolute_error: 20.5944\nEpoch 12\/200\n323\/323 [==============================] - 0s 136us\/sample - loss: 528.6936 - mean_squared_error: 528.6937 - mean_absolute_error: 21.0106 - val_loss: 504.5885 - val_mean_squared_error: 504.5885 - val_mean_absolute_error: 20.3977\nEpoch 13\/200\n323\/323 [==============================] - 0s 124us\/sample - loss: 520.8847 - mean_squared_error: 520.8847 - mean_absolute_error: 20.7995 - val_loss: 497.2613 - val_mean_squared_error: 497.2613 - val_mean_absolute_error: 20.2193\nEpoch 14\/200\n323\/323 [==============================] - 0s 124us\/sample - loss: 513.0849 - mean_squared_error: 513.0849 - mean_absolute_error: 20.5858 - val_loss: 489.8176 - val_mean_squared_error: 489.8176 - val_mean_absolute_error: 20.0351\nEpoch 15\/200\n323\/323 [==============================] - 0s 124us\/sample - loss: 505.3566 - mean_squared_error: 505.3567 - mean_absolute_error: 20.3856 - val_loss: 482.2511 - val_mean_squared_error: 482.2511 - val_mean_absolute_error: 19.8488\nEpoch 16\/200\n323\/323 [==============================] - 0s 99us\/sample - loss: 497.5187 - mean_squared_error: 497.5188 - mean_absolute_error: 20.1893 - val_loss: 474.6838 - val_mean_squared_error: 474.6838 - val_mean_absolute_error: 19.6661\nEpoch 17\/200\n323\/323 [==============================] - 0s 149us\/sample - loss: 489.7085 - mean_squared_error: 489.7086 - mean_absolute_error: 19.9929 - val_loss: 467.2122 - val_mean_squared_error: 467.2122 - val_mean_absolute_error: 19.4878\nEpoch 18\/200\n323\/323 [==============================] - 0s 223us\/sample - loss: 482.0081 - mean_squared_error: 482.0081 - mean_absolute_error: 19.8129 - val_loss: 459.4699 - val_mean_squared_error: 459.4698 - val_mean_absolute_error: 19.3026\nEpoch 19\/200\n323\/323 [==============================] - 0s 124us\/sample - loss: 474.0288 - mean_squared_error: 474.0287 - mean_absolute_error: 19.6281 - val_loss: 451.8731 - val_mean_squared_error: 451.8731 - val_mean_absolute_error: 19.1187\nEpoch 20\/200\n323\/323 [==============================] - 0s 111us\/sample - loss: 466.1271 - mean_squared_error: 466.1271 - mean_absolute_error: 19.4428 - val_loss: 444.5884 - val_mean_squared_error: 444.5884 - val_mean_absolute_error: 18.9436\n\u2026\nEpoch 181\/200\n323\/323 [==============================] - 0s 149us\/sample - loss: 28.0329 - mean_squared_error: 28.0329 - mean_absolute_error: 3.8922 - val_loss: 29.0025 - val_mean_squared_error: 29.0025 - val_mean_absolute_error: 3.8905\nEpoch 182\/200\n323\/323 [==============================] - 0s 136us\/sample - loss: 27.7569 - mean_squared_error: 27.7569 - mean_absolute_error: 3.8608 - val_loss: 28.9420 - val_mean_squared_error: 28.9420 - val_mean_absolute_error: 3.8719\nEpoch 183\/200\n323\/323 [==============================] - 0s 124us\/sample - loss: 27.5550 - mean_squared_error: 27.5550 - mean_absolute_error: 3.8354 - val_loss: 28.9521 - val_mean_squared_error: 28.9521 - val_mean_absolute_error: 3.8516\nEpoch 184\/200\n323\/323 [==============================] - 0s 149us\/sample - loss: 27.3054 - mean_squared_error: 27.3054 - mean_absolute_error: 3.8107 - val_loss: 28.5168 - val_mean_squared_error: 28.5168 - val_mean_absolute_error: 3.8161\nEpoch 185\/200\n323\/323 [==============================] - 0s 173us\/sample - loss: 27.0219 - mean_squared_error: 27.0219 - mean_absolute_error: 3.7885 - val_loss: 28.0858 - val_mean_squared_error: 28.0858 - val_mean_absolute_error: 3.7814\nEpoch 186\/200\n323\/323 [==============================] - 0s 161us\/sample - loss: 26.7649 - mean_squared_error: 26.7649 - mean_absolute_error: 3.7670 - val_loss: 27.8294 - val_mean_squared_error: 27.8294 - val_mean_absolute_error: 3.7574\nEpoch 187\/200\n323\/323 [==============================] - 0s 136us\/sample - loss: 26.5128 - mean_squared_error: 26.5128 - mean_absolute_error: 3.7427 - val_loss: 27.4006 - val_mean_squared_error: 27.4006 - val_mean_absolute_error: 3.7293\nEpoch 188\/200\n323\/323 [==============================] - 0s 161us\/sample - loss: 26.3242 - mean_squared_error: 26.3242 - mean_absolute_error: 3.7329 - val_loss: 27.1109 - val_mean_squared_error: 27.1109 - val_mean_absolute_error: 3.7049\nEpoch 189\/200\n323\/323 [==============================] - 0s 136us\/sample - loss: 26.0745 - mean_squared_error: 26.0745 - mean_absolute_error: 3.7042 - val_loss: 27.0394 - val_mean_squared_error: 27.0394 - val_mean_absolute_error: 3.6909\nEpoch 190\/200\n323\/323 [==============================] - 0s 161us\/sample - loss: 25.8574 - mean_squared_error: 25.8574 - mean_absolute_error: 3.6782 - val_loss: 26.9795 - val_mean_squared_error: 26.9795 - val_mean_absolute_error: 3.6774\nEpoch 191\/200\n323\/323 [==============================] - 0s 149us\/sample - loss: 25.6682 - mean_squared_error: 25.6682 - mean_absolute_error: 3.6587 - val_loss: 26.8557 - val_mean_squared_error: 26.8557 - val_mean_absolute_error: 3.6599\nEpoch 192\/200\n323\/323 [==============================] - 0s 149us\/sample - loss: 25.4568 - mean_squared_error: 25.4568 - mean_absolute_error: 3.6391 - val_loss: 26.5597 - val_mean_squared_error: 26.5597 - val_mean_absolute_error: 3.6302\nEpoch 193\/200\n323\/323 [==============================] - 0s 111us\/sample - loss: 25.2383 - mean_squared_error: 25.2383 - mean_absolute_error: 3.6239 - val_loss: 26.2430 - val_mean_squared_error: 26.2430 - val_mean_absolute_error: 3.6019\nEpoch 194\/200\n323\/323 [==============================] - 0s 124us\/sample - loss: 25.0200 - mean_squared_error: 25.0200 - mean_absolute_error: 3.6001 - val_loss: 26.2021 - val_mean_squared_error: 26.2021 - val_mean_absolute_error: 3.5890\nEpoch 195\/200\n323\/323 [==============================] - 0s 124us\/sample - loss: 24.8465 - mean_squared_error: 24.8465 - mean_absolute_error: 3.5796 - val_loss: 25.9885 - val_mean_squared_error: 25.9885 - val_mean_absolute_error: 3.5653\nEpoch 196\/200\n323\/323 [==============================] - 0s 111us\/sample - loss: 24.6697 - mean_squared_error: 24.6697 - mean_absolute_error: 3.5667 - val_loss: 25.7908 - val_mean_squared_error: 25.7908 - val_mean_absolute_error: 3.5423\nEpoch 197\/200\n323\/323 [==============================] - 0s 99us\/sample - loss: 24.4858 - mean_squared_error: 24.4858 - mean_absolute_error: 3.5508 - val_loss: 25.7717 - val_mean_squared_error: 25.7717 - val_mean_absolute_error: 3.5298\nEpoch 198\/200\n323\/323 [==============================] - 0s 136us\/sample - loss: 24.2800 - mean_squared_error: 24.2800 - mean_absolute_error: 3.5314 - val_loss: 25.8030 - val_mean_squared_error: 25.8030 - val_mean_absolute_error: 3.5115\nEpoch 199\/200\n323\/323 [==============================] - 0s 99us\/sample - loss: 24.2206 - mean_squared_error: 24.2206 - mean_absolute_error: 3.5227 - val_loss: 25.5244 - val_mean_squared_error: 25.5244 - val_mean_absolute_error: 3.4847\nEpoch 200\/200\n323\/323 [==============================] - 0s 111us\/sample - loss: 23.9753 - mean_squared_error: 23.9753 - mean_absolute_error: 3.5040 - val_loss: 25.1087 - val_mean_squared_error: 25.1087 - val_mean_absolute_error: 3.4590<\/pre>\n\n\n\n<p>We have successfully trained our model. You\u2019d notice that the loss moved from 616.0 in the first epoch to 23.9 in the 200<sup>th<\/sup> epoch. This shows that the model was improving as upon every epoch.&nbsp;<\/p>\n\n\n\n<p>To visualize the losses, we will convert the history object into a dataframe and plot the graph of the loss and the validation loss. If the gap between the two is high, it means the model has not appreciably learnt the data.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#plot the loss and validation loss of the dataset<\/em>\nhistory_df = pd.DataFrame(history.history)\nplt.plot(history_df['loss'], label='loss')\nplt.plot(history_df['val_loss'], label='val_loss')\n\nplt.legend()\n<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/_uE0MaO-3xMn9IFdKkPVH0IafTtkidiy8-PGN-sETjyokX41jKLXJwvf_e4ovTs63EpZ6LHXM4A9C6UzAXIDf4T_zXCkS05p8_XT7rOI_SJndJ3mraifUsNJoPEKC5gp9BhItCoQ3qOZ7h5MdQ\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>In our case, the model has learned the data well since the training loss and validation loss are close. Furthermore, the loss greatly dropped in the first few epochs and stabilized at some point without going up. This is an indication that while the model has learned, it is not overfitting.&nbsp;<\/p>\n\n\n\n<p>Notice that at the end of the training, these are the following loss values.<\/p>\n\n\n\n<p>Training loss: 23.97<\/p>\n\n\n\n<p>Mean absolute error: 3.50<\/p>\n\n\n\n<p>Validation loss: 25.11<\/p>\n\n\n\n<p>Validation mean absolute error: 3.46<\/p>\n\n\n\n<p><strong>Evaluating the Model<\/strong><\/p>\n\n\n\n<p>We can evaluate the model with the evaluate() method. This compares the result the model predicts with the result from the test data and calculates the loss\/error.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#evaluate the model<\/em>\nmodel.evaluate(X_test, y_test, batch_size=128)\n<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">102\/102 [==============================] - 0s 153us\/sample - loss: 22.5740 - mean_squared_error: 22.5740 - mean_absolute_error: 3.5839\nOut[51]:\n[22.573974609375, 22.573975, 3.5838845]<\/pre>\n\n\n\n<p>We can have an overview of what the model predicts versus its actual value using a simple plot.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">y_pred = model.predict(X_test).flatten()\n\na = plt.axes(aspect='equal')\nplt.scatter(y_test, y_pred)\nplt.xlabel('True values')\nplt.ylabel('Predicted values')\nplt.title('A plot that shows the true and predicted values')\nplt.xlim([0, 60])\nplt.ylim([0, 60])\nplt.plot([0, 60], [0, 60])<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/QBw59V713m4dkokRtQpSJInMqvgVfWP_iUxNdcWbFSH045DB0SMclrhlRpGoEzgjLfqUkkuPXZXZ52RD14-a6AmLEkdWugGQR2jU-nKHlpgWBntk0raL3zfDViCUjEH54sdU5g87RLpgUw4A3A\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>From the plot, you\u2019d see that the model has performed reasonably well in making correct predictions.&nbsp;<\/p>\n\n\n\n<p>We could still tweak our model to further enhance its performance. There are a lot of techniques that can be used to improve a neural network. They include adding more hidden layers, increasing the number of nodes in a layer, changing the activation function, adding more data, tweaking optimizer parameters, etc.&nbsp;<\/p>\n\n\n\n<p>Let\u2019s see how adding more hidden layers will improve the model.&nbsp;<\/p>\n\n\n\n<p><strong>Improving the Model by Adding More Hidden Layers<\/strong><\/p>\n\n\n\n<p>One of the ways of improving neural network performance is by adding more hidden layers. Always remember, deeper is better.&nbsp;<\/p>\n\n\n\n<p>So let&#8217;s go ahead to change our model by adding 2 more layers. One with 7 nodes and the other with 3 nodes. Both still with the relu activation function. Just like in the last model, we compile it with a mean squared error loss, an Adam optimizer and with both the mean squared error and mean absolute error metrics.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#built the neural network architecture<\/em>\nmodel = Sequential()\nmodel.add(Dense(15, input_dim=11, activation='relu'))\nmodel.add(Dense(7, activation='relu'))\nmodel.add(Dense(3, activation='relu'))\nmodel.add(Dense(1, activation='linear'))\n\nmodel.compile(loss='mse', optimizer='adam', metrics=['mse', 'mae'])\n\n<em>#train the neural network on the train dataset<\/em>\nhistory = model.fit(X_train, y_train, epochs=200, validation_split=0.2)<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Train on 323 samples, validate on 81 samples\nEpoch 1\/200\n323\/323 [==============================] - 1s 2ms\/sample - loss: 584.4734 - mean_squared_error: 584.4734 - mean_absolute_error: 22.6072 - val_loss: 553.0111 - val_mean_squared_error: 553.0111 - val_mean_absolute_error: 21.6163\nEpoch 2\/200\n323\/323 [==============================] - 0s 97us\/sample - loss: 575.5218 - mean_squared_error: 575.5219 - mean_absolute_error: 22.3731 - val_loss: 544.7089 - val_mean_squared_error: 544.7089 - val_mean_absolute_error: 21.4320\nEpoch 3\/200\n323\/323 [==============================] - 0s 97us\/sample - loss: 565.2366 - mean_squared_error: 565.2367 - mean_absolute_error: 22.1050 - val_loss: 535.9432 - val_mean_squared_error: 535.9432 - val_mean_absolute_error: 21.2384\nEpoch 4\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 554.2672 - mean_squared_error: 554.2672 - mean_absolute_error: 21.8140 - val_loss: 525.9688 - val_mean_squared_error: 525.9689 - val_mean_absolute_error: 21.0172\nEpoch 5\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 541.8079 - mean_squared_error: 541.8079 - mean_absolute_error: 21.4882 - val_loss: 514.2750 - val_mean_squared_error: 514.2750 - val_mean_absolute_error: 20.7664\nEpoch 6\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 527.6235 - mean_squared_error: 527.6235 - mean_absolute_error: 21.1237 - val_loss: 500.1802 - val_mean_squared_error: 500.1802 - val_mean_absolute_error: 20.4756\nEpoch 7\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 510.5902 - mean_squared_error: 510.5903 - mean_absolute_error: 20.7072 - val_loss: 483.6809 - val_mean_squared_error: 483.6808 - val_mean_absolute_error: 20.1316\nEpoch 8\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 490.7871 - mean_squared_error: 490.7871 - mean_absolute_error: 20.2235 - val_loss: 463.2415 - val_mean_squared_error: 463.2415 - val_mean_absolute_error: 19.7122\nEpoch 9\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 465.3827 - mean_squared_error: 465.3828 - mean_absolute_error: 19.6485 - val_loss: 439.1797 - val_mean_squared_error: 439.1796 - val_mean_absolute_error: 19.2099\nEpoch 10\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 436.7313 - mean_squared_error: 436.7312 - mean_absolute_error: 18.9943 - val_loss: 410.8449 - val_mean_squared_error: 410.8448 - val_mean_absolute_error: 18.5876\nEpoch 11\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 404.6039 - mean_squared_error: 404.6039 - mean_absolute_error: 18.2399 - val_loss: 379.6046 - val_mean_squared_error: 379.6046 - val_mean_absolute_error: 17.8701\nEpoch 12\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 369.8315 - mean_squared_error: 369.8315 - mean_absolute_error: 17.4045 - val_loss: 346.7320 - val_mean_squared_error: 346.7320 - val_mean_absolute_error: 17.0592\nEpoch 13\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 332.8788 - mean_squared_error: 332.8788 - mean_absolute_error: 16.4958 - val_loss: 314.1923 - val_mean_squared_error: 314.1923 - val_mean_absolute_error: 16.2052\nEpoch 14\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 298.7931 - mean_squared_error: 298.7931 - mean_absolute_error: 15.5864 - val_loss: 281.9098 - val_mean_squared_error: 281.9098 - val_mean_absolute_error: 15.3273\nEpoch 15\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 265.7078 - mean_squared_error: 265.7079 - mean_absolute_error: 14.5916 - val_loss: 253.8650 - val_mean_squared_error: 253.8650 - val_mean_absolute_error: 14.4485\nEpoch 16\/200\n323\/323 [==============================] - 0s 97us\/sample - loss: 237.9645 - mean_squared_error: 237.9644 - mean_absolute_error: 13.6058 - val_loss: 230.3261 - val_mean_squared_error: 230.3261 - val_mean_absolute_error: 13.6310\nEpoch 17\/200\n323\/323 [==============================] - 0s 97us\/sample - loss: 213.5237 - mean_squared_error: 213.5237 - mean_absolute_error: 12.7039 - val_loss: 210.8874 - val_mean_squared_error: 210.8874 - val_mean_absolute_error: 13.0260\nEpoch 18\/200\n323\/323 [==============================] - 0s 97us\/sample - loss: 193.0863 - mean_squared_error: 193.0863 - mean_absolute_error: 11.8859 - val_loss: 194.1782 - val_mean_squared_error: 194.1782 - val_mean_absolute_error: 12.4450\nEpoch 19\/200\n323\/323 [==============================] - 0s 97us\/sample - loss: 176.8083 - mean_squared_error: 176.8083 - mean_absolute_error: 11.3360 - val_loss: 180.8584 - val_mean_squared_error: 180.8584 - val_mean_absolute_error: 11.8897\nEpoch 20\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 164.0756 - mean_squared_error: 164.0756 - mean_absolute_error: 10.8522 - val_loss: 171.2320 - val_mean_squared_error: 171.2320 - val_mean_absolute_error: 11.4639\n\u2026\nEpoch 181\/200\n323\/323 [==============================] - 0s 97us\/sample - loss: 12.1372 - mean_squared_error: 12.1372 - mean_absolute_error: 2.3793 - val_loss: 15.9544 - val_mean_squared_error: 15.9544 - val_mean_absolute_error: 2.3558\nEpoch 182\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 12.0800 - mean_squared_error: 12.0800 - mean_absolute_error: 2.3553 - val_loss: 15.8774 - val_mean_squared_error: 15.8774 - val_mean_absolute_error: 2.3423\nEpoch 183\/200\n323\/323 [==============================] - 0s 97us\/sample - loss: 12.0202 - mean_squared_error: 12.0202 - mean_absolute_error: 2.3414 - val_loss: 15.7801 - val_mean_squared_error: 15.7801 - val_mean_absolute_error: 2.3369\nEpoch 184\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 11.9876 - mean_squared_error: 11.9876 - mean_absolute_error: 2.3502 - val_loss: 15.7188 - val_mean_squared_error: 15.7188 - val_mean_absolute_error: 2.3659\nEpoch 185\/200\n323\/323 [==============================] - 0s 242us\/sample - loss: 11.9647 - mean_squared_error: 11.9647 - mean_absolute_error: 2.3655 - val_loss: 15.8191 - val_mean_squared_error: 15.8191 - val_mean_absolute_error: 2.4131\nEpoch 186\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 12.0691 - mean_squared_error: 12.0691 - mean_absolute_error: 2.4635 - val_loss: 16.2266 - val_mean_squared_error: 16.2266 - val_mean_absolute_error: 2.6174\nEpoch 187\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 12.1569 - mean_squared_error: 12.1569 - mean_absolute_error: 2.4610 - val_loss: 15.6773 - val_mean_squared_error: 15.6773 - val_mean_absolute_error: 2.4570\nEpoch 188\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 11.8792 - mean_squared_error: 11.8792 - mean_absolute_error: 2.3678 - val_loss: 15.7074 - val_mean_squared_error: 15.7074 - val_mean_absolute_error: 2.3972\nEpoch 189\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 11.9190 - mean_squared_error: 11.9190 - mean_absolute_error: 2.3617 - val_loss: 15.8393 - val_mean_squared_error: 15.8393 - val_mean_absolute_error: 2.3808\nEpoch 190\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 12.8232 - mean_squared_error: 12.8232 - mean_absolute_error: 2.5695 - val_loss: 16.4048 - val_mean_squared_error: 16.4048 - val_mean_absolute_error: 2.6977\nEpoch 191\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 12.0817 - mean_squared_error: 12.0817 - mean_absolute_error: 2.4824 - val_loss: 15.5024 - val_mean_squared_error: 15.5024 - val_mean_absolute_error: 2.4516\nEpoch 192\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 11.8084 - mean_squared_error: 11.8084 - mean_absolute_error: 2.3831 - val_loss: 15.4221 - val_mean_squared_error: 15.4221 - val_mean_absolute_error: 2.4194\nEpoch 193\/200\n323\/323 [==============================] - 0s 97us\/sample - loss: 11.7507 - mean_squared_error: 11.7507 - mean_absolute_error: 2.3955 - val_loss: 15.4557 - val_mean_squared_error: 15.4557 - val_mean_absolute_error: 2.4357\nEpoch 194\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 11.6437 - mean_squared_error: 11.6437 - mean_absolute_error: 2.3657 - val_loss: 15.3709 - val_mean_squared_error: 15.3709 - val_mean_absolute_error: 2.3435\nEpoch 195\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 11.6290 - mean_squared_error: 11.6290 - mean_absolute_error: 2.3445 - val_loss: 15.3940 - val_mean_squared_error: 15.3940 - val_mean_absolute_error: 2.3470\nEpoch 196\/200\n323\/323 [==============================] - 0s 97us\/sample - loss: 11.6334 - mean_squared_error: 11.6334 - mean_absolute_error: 2.3860 - val_loss: 15.4824 - val_mean_squared_error: 15.4824 - val_mean_absolute_error: 2.3938\nEpoch 197\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 11.6110 - mean_squared_error: 11.6110 - mean_absolute_error: 2.3495 - val_loss: 15.5030 - val_mean_squared_error: 15.5030 - val_mean_absolute_error: 2.2746\nEpoch 198\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 11.8521 - mean_squared_error: 11.8521 - mean_absolute_error: 2.3540 - val_loss: 15.2363 - val_mean_squared_error: 15.2363 - val_mean_absolute_error: 2.3209\nEpoch 199\/200\n323\/323 [==============================] - 0s 145us\/sample - loss: 11.5532 - mean_squared_error: 11.5532 - mean_absolute_error: 2.3486 - val_loss: 15.3506 - val_mean_squared_error: 15.3506 - val_mean_absolute_error: 2.3752\nEpoch 200\/200\n323\/323 [==============================] - 0s 97us\/sample - loss: 11.4892 - mean_squared_error: 11.4892 - mean_absolute_error: 2.3523 - val_loss: 15.3902 - val_mean_squared_error: 15.3902 - val_mean_absolute_error: 2.3758<\/pre>\n\n\n\n<p>Let\u2019s see the losses in contrast.&nbsp;<\/p>\n\n\n\n<p>Training loss: 11.49<\/p>\n\n\n\n<p>Mean absolute error: 2.35<\/p>\n\n\n\n<p>Validation loss: 15.39<\/p>\n\n\n\n<p>Validation mean absolute error: 2.38<\/p>\n\n\n\n<p>Notice that the loss is now 11.49 even with the same number of epochs. We can again plot the graph to show both the training loss and validation loss.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#plot the loss and validation loss of the dataset<\/em>\nhistory_df = pd.DataFrame(history.history)\nplt.plot(history_df['loss'], label='loss')\nplt.plot(history_df['val_loss'], label='val_loss')\n\nplt.legend()\n<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/KgMjEwEpoPxSUkp3NWFt1ypE47B57lp9K66F-uvD9p5X7nYBwhrUbTBRZ3YrHJYra_Zi2uYVgMq8Zy4CvGQK9E-omhfukRD7rH4JL43YZAkeM3WyDLaQcHYNo8TGjPy3AQMNr2MT2wiNWQghuw\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>As seen in the figure above, the model\u2019s loss stabilizes after the first 50 epochs. This is an improvement as it took the previous model 175 epochs to get to the local minimum.&nbsp;<\/p>\n\n\n\n<p>We can also evaluate this model to determine how accurate it is.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#evaluate the model<\/em>\nmodel.evaluate(X_test, y_test, batch_size=128)\n<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">102\/102 [==============================] - 0s 0s\/sample - loss: 13.0725 - mean_squared_error: 13.0725 - mean_absolute_error: 2.7085<\/pre>\n\n\n\n<p>Finally, we will visualize the prediction with a simple plot.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">y_pred = model.predict(X_test).flatten()\n\na = plt.axes(aspect='equal')\nplt.scatter(y_test, y_pred)\nplt.xlabel('True values')\nplt.ylabel('Predicted values')\nplt.title('A plot that shows the true and predicted values')\nplt.xlim([0, 60])\nplt.ylim([0, 60])\nplt.plot([0, 60], [0, 60])<\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/njobeSx60XGLEPOFXb1irzLLeaaD6QNswJmGq9AU42U-v1SIqmEZzrDbtu9gU1iSDKKJtiUqQMhGpzZT8fZCzHTzxqXZ4GMv7LUc_4i1qax8cdZiCKvhI4EGocSPrRq9O5rWP31CJMhmwartyQ\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>Notice that this time, the data points are more concentrated on the straight line. It indicates that our model is performing well.&nbsp;<\/p>\n\n\n\n<p><strong>Conclusion<\/strong><\/p>\n\n\n\n<p>In this tutorial, you have learned the step-by-step approach to data preprocessing and building a linear regression model. We saw how to build a neural network using Keras in TensorFlow and went a step further to improve the model by increasing the number of hidden layers.&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the last tutorial, we introduced the concept of linear regression with Keras and how to build a Linear Regression problem using Tensorflow\u2019s estimator API. In that tutorial, we neglected a step which for real-life problems is very vital. Building any machine learning model whatsoever would require you to preprocess the data before feeding it [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":7024,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[498],"tags":[],"class_list":["post-7021","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence-tutorials"],"_links":{"self":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/7021","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/comments?post=7021"}],"version-history":[{"count":0,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/7021\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media\/7024"}],"wp:attachment":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media?parent=7021"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/categories?post=7021"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/tags?post=7021"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}