{"id":4842,"date":"2020-09-15T14:13:02","date_gmt":"2020-09-15T08:43:02","guid":{"rendered":"https:\/\/www.h2kinfosys.com\/blog\/?p=4842"},"modified":"2020-09-15T14:13:05","modified_gmt":"2020-09-15T08:43:05","slug":"applying-descriptive-statistics-using-pandas","status":"publish","type":"post","link":"https:\/\/www.h2kinfosys.com\/blog\/applying-descriptive-statistics-using-pandas\/","title":{"rendered":"Applying Descriptive Statistics using Pandas"},"content":{"rendered":"\n<p>Statistical analysis is used to analyze the results and deduce or infer meaning about the underlying dataset or the reality that it attempts to describe.<\/p>\n\n\n\n<p>\ufeffStatistical analysis may be used to:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Present key findings revealed by a dataset.<\/li><li>Summarize information.<\/li><li>Calculate measures of cohesiveness, relevance, or diversity in data.<\/li><li>Make future predictions based on previously recorded data.<\/li><li>Test experimental predictions.<\/li><\/ul>\n\n\n\n<p>Any data scientist will spend much of their day performing these functions.<br><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/cw-RT2QkS_3-AE_aimpySD-aZV5yIQrJ_OYK91HjgeQqd_ueE0Nao6ZVSW78Nasvnx8mVywCJ6QYUyTrwnR6iUvgfmSL02hlC0YvbbiAn81RH7kZGwZusKPYxe5z139bo_BNFZ9g4y1KwUXe1A\" width=\"583\" height=\"179\" alt=\"\" title=\"\"><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Aggregation functions<\/strong><\/h2>\n\n\n\n<p>An essential piece of analysis of large data is efficient summarization: computing aggregations like sum(), mean(), median(), min(),and max(), in which a single number gives insight into the nature of a potentially large dataset. In this section, we&#8217;ll explore aggregations in Pandas, from simple operations akin to what we&#8217;ve seen on NumPy arrays.<\/p>\n\n\n\n<p>.sum() function. It returns a number, the sum of all items in an iterable.<\/p>\n\n\n\n<p>Consider the following DataFrame<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><\/td><td><strong>A&nbsp;&nbsp;<\/strong><\/td><td><strong>B<\/strong><\/td><td><strong>C&nbsp;&nbsp;<\/strong><\/td><td><strong>D<\/strong><\/td><\/tr><tr><td><strong>0<\/strong><\/td><td>83<\/td><td>67<\/td><td>17<\/td><td>38<\/td><\/tr><tr><td><strong>1<\/strong><\/td><td>92<\/td><td>72<\/td><td>81<\/td><td>16<\/td><\/tr><tr><td><strong>2<\/strong><\/td><td>54<\/td><td>83<\/td><td>27<\/td><td>13<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><code>df.sum()<\/code><\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>A&nbsp; &nbsp;<\/td><td>229<\/td><\/tr><tr><td>B&nbsp; &nbsp;<\/td><td>222<\/td><\/tr><tr><td>C&nbsp; &nbsp;<\/td><td>125<\/td><\/tr><tr><td>D &nbsp;<\/td><td>&nbsp; 67<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><code>dtype: int64<\/code><\/p>\n\n\n\n<p>We can also find the sum of\u00a0 individual columns in the DataFrame\u00a0<\/p>\n\n\n\n<p><code>df[\u2018A'].sum()<\/code><\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<p>229<\/p>\n\n\n\n<p>Now we will learn how to find mean(), median(), min(),and max() values. We already know what mean median is<\/p>\n\n\n\n<p><strong>.mean()<\/strong>function returns the mean of the values for the requested axis. If the method is applied on a pandas series object, then the method returns a scalar value which is the mean value of all the observations in the data frame. If the method is applied on a <a href=\"https:\/\/www.h2kinfosys.com\/blog\/getting-started-with-pandas\/\">pandas dataframe object<\/a>, then the method returns a pandas series object which contains the mean of the values over the specified axis.<\/p>\n\n\n\n<p><code>df.mean()<\/code><\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>A&nbsp;<\/td><td>76.33333<\/td><\/tr><tr><td>B&nbsp;<\/td><td>74.00000<\/td><\/tr><tr><td>C&nbsp;<\/td><td>41.66667<\/td><\/tr><tr><td>D&nbsp;<\/td><td>22.33333<\/td><\/tr><tr><td>dtype:<\/td><td>float64<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>We can also find the mean of&nbsp; individual columns in the DataFrame&nbsp;<\/p>\n\n\n\n<p><code>df['A'].mean()<\/code><\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<p>76.33333333333333<\/p>\n\n\n\n<p><strong>.<\/strong><strong>median()<\/strong>function return the median of the underlying data in the given DataFrame.<\/p>\n\n\n\n<p><code>df.median()<\/code><\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-subtle-pale-blue-background-color has-background\"><tbody><tr><td>A&nbsp; &nbsp; 83.0<br>B&nbsp; &nbsp; 72.0<br>C&nbsp; &nbsp; 27.0<br>D&nbsp; &nbsp; 16.0<br>dtype: float64<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><strong>.<\/strong><strong>max()<\/strong> function returns the maximum of the values in the given object. If the input is a series, the method will return a scalar which will be the maximum of the values in the series. If the input is a dataframe, then the method will return a series with maximum of values over the specified axis in the dataframe. By default the axis is the index axis.<\/p>\n\n\n\n<p><code>df.max()<\/code><\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-subtle-pale-blue-background-color has-background\"><tbody><tr><td>A&nbsp; &nbsp; 92<br>B&nbsp; &nbsp; 83<br>C&nbsp; &nbsp; 81<br>D&nbsp; &nbsp; 38<br>dtype: int64<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Similarly <strong>.min()<\/strong>function returns the minimum of the values in the given object. If the input is a series, the method will return a scalar which will be the minimum of the values in the series. If the input is a dataframe, then the method will return a series with minimum of values over the specified axis in the dataframe<\/p>\n\n\n\n<p><code>df.min()<\/code><\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-subtle-pale-blue-background-color has-background\"><tbody><tr><td>A&nbsp; &nbsp; 54<br>B&nbsp; &nbsp; 67<br>C&nbsp; &nbsp; 17<br>D&nbsp; &nbsp; 13<br>dtype: int64<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><strong>.count()<\/strong> is used to count the no. of non-NA\/null observations across the given axis. It works with non-floating type data as well.<\/p>\n\n\n\n<p><code>df.count()<\/code><\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-subtle-pale-blue-background-color has-background\"><tbody><tr><td>A&nbsp; &nbsp; 3<br>B&nbsp; &nbsp; 3<br>C&nbsp; &nbsp; 3<br>D&nbsp; &nbsp; 3<br>dtype: int64<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>.std() function return sample standard deviation over requested axis.<\/p>\n\n\n\n<p><code>df.std()<\/code><\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-subtle-pale-blue-background-color has-background\"><tbody><tr><td>A&nbsp; &nbsp; 19.857828<br>B &nbsp; &nbsp; 8.185353<br>C&nbsp; &nbsp; 34.428670<br>D&nbsp; &nbsp; 13.650397<br>dtype: float64<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>If we have some missing values in our DataFrame, we skip those missing values by passing the argument&nbsp; df.std(skipna=True)<\/p>\n\n\n\n<p><strong>.describe()<\/strong>is used to view some basic statistical details like percentile, mean, std etc. of a data frame or a series of numeric values. When this method is applied to a series of string, it returns a different output which is shown in the examples below.<\/p>\n\n\n\n<p><code>df.describe()<\/code><\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table aligncenter is-style-regular\"><table><tbody><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><\/td><td>&nbsp; &nbsp; &nbsp; &nbsp; A&nbsp;&nbsp;<\/td><td>&nbsp; &nbsp; &nbsp; &nbsp; B&nbsp;&nbsp;<\/td><td>&nbsp; &nbsp; &nbsp; &nbsp; C&nbsp;&nbsp;<\/td><td>&nbsp; &nbsp; &nbsp; &nbsp; D<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>&nbsp; &nbsp; count&nbsp;&nbsp;<\/strong><\/td><td>3.000000<\/td><td>3.000000<\/td><td>3.000000<\/td><td>3.000000<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>&nbsp; &nbsp; mean&nbsp;&nbsp;&nbsp;<\/strong><\/td><td>76.333333<\/td><td>74.000000<\/td><td>41.666667<\/td><td>22.333333<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>&nbsp; &nbsp; &nbsp; &nbsp; std&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><\/td><td>19.857828<\/td><td>8.185353<\/td><td>34.428670<\/td><td>13.650397<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>&nbsp; &nbsp; &nbsp; &nbsp; min&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><\/td><td>54.000000<\/td><td>67.000000<\/td><td>17.000000<\/td><td>13.000000<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>25%<\/strong><\/td><td>68.500000<\/td><td>69.500000<\/td><td>22.000000<\/td><td>14.500000<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>50%<\/strong><\/td><td>83.000000<\/td><td>72.000000<\/td><td>27.000000<\/td><td>16.000000<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>75%<\/strong><\/td><td>87.500000<\/td><td>77.500000<\/td><td>54.000000<\/td><td>27.000000<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>&nbsp; &nbsp; &nbsp; &nbsp; max&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><\/td><td>92.000000<\/td><td>83.000000<\/td><td>81.000000<\/td><td>38.000000<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><strong>.corr()<\/strong> is used to find the <a href=\"https:\/\/pandas.pydata.org\/pandas-docs\/stable\/reference\/api\/pandas.DataFrame.corr.html\" rel=\"nofollow noopener\" target=\"_blank\">pairwise correlation<\/a> of all columns in the dataframe. Any NaN values are automatically excluded. For any non numeric data type columns in the dataframe it is ignored.<\/p>\n\n\n\n<p><code>df.corr()<\/code><\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table is-style-regular\"><table><tbody><tr><td><strong>&nbsp;&nbsp;<\/strong><\/td><td>&nbsp; &nbsp; &nbsp; &nbsp; A&nbsp;<\/td><td>&nbsp; &nbsp; &nbsp; &nbsp; B&nbsp;<\/td><td>&nbsp; &nbsp; &nbsp; &nbsp; C&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>D<\/td><\/tr><tr><td><strong>A&nbsp;<\/strong><\/td><td>1.000000<\/td><td>-0.858233<\/td><td>0.569956<\/td><td>0.394121<\/td><\/tr><tr><td><strong>B&nbsp;<\/strong><\/td><td>-0.858233<\/td><td>1.000000<\/td><td>-0.067421<\/td><td>0.809964<\/td><\/tr><tr><td><strong>C&nbsp;<\/strong><\/td><td>0.569956<\/td><td>-0.067421<\/td><td>1.000000<\/td><td>-0.530536<\/td><\/tr><tr><td><strong>D&nbsp;<\/strong><\/td><td>0.394121<\/td><td>-0.809964<\/td><td>-0.530536<\/td><td>1.000000<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The output dataframe can be interpreted as for any cell, row variable correlation with the column variable is the value of the cell. As mentioned earlier, that the correlation of a variable with itself is 1. For that reason all the diagonal values are 1.00.<\/p>\n\n\n\n<p><strong>Importance of GroupBy in Statistical Analysis<\/strong><\/p>\n\n\n\n<p>Pandas\u2019 GroupBy is a powerful and versatile function in Python. It allows you to split your data into separate groups to perform computations for better analysis.GroupBy is used to quickly summarize data and aggregate it in a way that\u2019s easy to interpret.<\/p>\n\n\n\n<p>.groupby() method allows you to group rows of data together and call aggregate functions<\/p>\n\n\n\n<p>Consider the following Data Frame<\/p>\n\n\n\n<figure class=\"wp-block-table is-style-regular\"><table><tbody><tr><td><strong>&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><\/td><td><strong>Points&nbsp;&nbsp;<\/strong><\/td><td><strong>Rank&nbsp;&nbsp;<\/strong><\/td><td><strong>&nbsp; Team&nbsp;&nbsp;<\/strong><\/td><td><strong>Year<\/strong><\/td><\/tr><tr><td><strong>0<\/strong><\/td><td>876<\/td><td>1<\/td><td>Riders&nbsp;&nbsp;<\/td><td>2014<\/td><\/tr><tr><td><strong>1<\/strong><\/td><td>789<\/td><td>2<\/td><td>Riders&nbsp;&nbsp;<\/td><td>2015<\/td><\/tr><tr><td><strong>2<\/strong><\/td><td>863<\/td><td>2<\/td><td>Devils&nbsp;&nbsp;<\/td><td>2014<\/td><\/tr><tr><td><strong>3<\/strong><\/td><td>673<\/td><td>3<\/td><td>Devils&nbsp;&nbsp;<\/td><td>2015<\/td><\/tr><tr><td><strong>4<\/strong><\/td><td>741<\/td><td>3<\/td><td>Kings&nbsp;&nbsp;<\/td><td>2014<\/td><\/tr><tr><td><strong>5<\/strong><\/td><td>812<\/td><td>4<\/td><td>kings&nbsp;&nbsp;<\/td><td>2015<\/td><\/tr><tr><td><strong>6<\/strong><\/td><td>756<\/td><td>1<\/td><td>Kings&nbsp;&nbsp;<\/td><td>2016<\/td><\/tr><tr><td><strong>7<\/strong><\/td><td>788<\/td><td>1<\/td><td>Kings&nbsp;&nbsp;<\/td><td>2017<\/td><\/tr><tr><td><strong>8<\/strong><\/td><td>694<\/td><td>2<\/td><td>Riders&nbsp;&nbsp;<\/td><td>2016<\/td><\/tr><tr><td><strong>9<\/strong><\/td><td>701<\/td><td>4<\/td><td>Royals&nbsp;&nbsp;<\/td><td>2014<\/td><\/tr><tr><td><strong>10<\/strong><\/td><td>804<\/td><td>1<\/td><td>Royals&nbsp;&nbsp;<\/td><td>2015<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Now you can use the .groupby() method to group rows together based on Statistical analysis column name. For instance let&#8217;s group based on Team. This will create a DataFrameGroupBy object<\/p>\n\n\n\n<p><code>df.groupby(\u2018Team\u2019)<\/code><\/p>\n\n\n\n<p>The grouped data is saved in an object&nbsp;<\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<p>&lt;pandas.core.groupby.DataFrameGroupBy object at 0x113014128&gt;<\/p>\n\n\n\n<p>Now we can save this object as a new variable with the name team<\/p>\n\n\n\n<p>team = df.groupby(\u2018Team\u2019)<\/p>\n\n\n\n<p>Now we can perform all the aggregation functions on that variable<\/p>\n\n\n\n<p>team.mean().astype(&#8216;int32&#8217;)<\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><\/td><td><\/td><td><\/td><td><\/td><\/tr><tr><td><strong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><\/td><td>Points&nbsp;&nbsp;<\/td><td>Rank&nbsp;&nbsp;<\/td><td>Year<\/td><\/tr><tr><td><strong>Team&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><\/td><td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp;<\/td><\/tr><tr><td><strong>Devils&nbsp;&nbsp;<\/strong><\/td><td>768<\/td><td>2<\/td><td>2014<\/td><\/tr><tr><td><strong>Kings&nbsp;&nbsp;&nbsp;<\/strong><\/td><td>761<\/td><td>1<\/td><td>2015<\/td><\/tr><tr><td><strong>Riders&nbsp;&nbsp;<\/strong><\/td><td>786<\/td><td>1<\/td><td>2015<\/td><\/tr><tr><td><strong>Royals&nbsp;&nbsp;<\/strong><\/td><td>752<\/td><td>2<\/td><td>2014<\/td><\/tr><tr><td><strong>kings&nbsp;&nbsp;&nbsp;<\/strong><\/td><td>812<\/td><td>4<\/td><td>2015<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>We can perform all aggregation functions like mean median mode on this grouped data.<\/p>\n\n\n\n<p>We can also group two columns at the same. This is how you can do it<\/p>\n\n\n\n<p><code>teams=df.groupby(['Team','Rank'])<br>teams.mean().astype(\u2018int32\u2019).&nbsp; &nbsp;<\/code><\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><\/td><td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>Points&nbsp;&nbsp;<\/td><td>Year<\/td><\/tr><tr><td><strong>Team&nbsp;&nbsp;&nbsp;<\/strong><\/td><td>Rank&nbsp;<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td><\/td><\/tr><tr><td><strong>Devils&nbsp;<\/strong><\/td><td>2<\/td><td>863<\/td><td>2014<\/td><\/tr><tr><td><strong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><\/td><td>3<\/td><td>673<\/td><td>2015<\/td><\/tr><tr><td><strong>Kings&nbsp;&nbsp;<\/strong><\/td><td>1<\/td><td>772<\/td><td>2016<\/td><\/tr><tr><td><strong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><\/td><td>3<\/td><td>741<\/td><td>2014<\/td><\/tr><tr><td><strong>Riders&nbsp;<\/strong><\/td><td>1<\/td><td>876<\/td><td>2014<\/td><\/tr><tr><td><strong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><\/td><td>2<\/td><td>741<\/td><td>2015<\/td><\/tr><tr><td><strong>Royals&nbsp;<\/strong><\/td><td>1<\/td><td>804<\/td><td>2015<\/td><\/tr><tr><td><strong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><\/td><td>4<\/td><td>701<\/td><td>2014<\/td><\/tr><tr><td><strong>kings&nbsp;&nbsp;<\/strong><\/td><td>4<\/td><td>812<\/td><td>2015<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Filtering and Sorting<\/strong>&nbsp;<\/h2>\n\n\n\n<p>In order to perform Statistical analysis, We should also perform a lot of filtering operations. Pandas provide many methods to Statistical analysis filter a Data frame<\/p>\n\n\n\n<p>We can filter the data using Python <strong>Comparison Operators <\/strong>and<strong> <\/strong>Python<strong> Boolean Operators<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Comparison Operators&nbsp;<\/strong><\/h2>\n\n\n\n<p>These operators compare the values on either sides of them and decide the relation among them. They are also called Relational operators.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>==<\/strong><\/h4>\n\n\n\n<p>If the values of two operands are equal, then the condition becomes true.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>!=<\/strong><\/h4>\n\n\n\n<p>If values of two operands are not equal, then condition becomes true.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>&gt;<\/strong><\/h4>\n\n\n\n<p>If the value of left operand is greater than the value of right operand, then condition becomes true.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>&lt;<\/strong><\/h4>\n\n\n\n<p>If the value of left operand is less than the value of right operand, then condition becomes true.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>&gt;=<\/strong><\/h4>\n\n\n\n<p>If the value of left operand is greater than or equal to the value of right operand, then condition becomes true.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>&lt;=<\/strong><\/h4>\n\n\n\n<p>If the value of left operand is less than or equal to the value of right operand, then condition becomes true.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Boolean Operators<\/strong><\/h2>\n\n\n\n<p>Python has three Boolean operators that are typed out as plain English words:<\/p>\n\n\n\n<p><strong>or<\/strong><\/p>\n\n\n\n<p>Returns True if one of the statements is true<\/p>\n\n\n\n<p><strong>And<\/strong><\/p>\n\n\n\n<p>Returns True if both statements are true<\/p>\n\n\n\n<p><strong>not<\/strong><\/p>\n\n\n\n<p>Reverse the result, returns False if the result is true<\/p>\n\n\n\n<p>Now we will learn how to filter the values of the DataFrame based on certain condition&nbsp;<\/p>\n\n\n\n<p>Consider the following DataFrame<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><\/td><td><strong>Points&nbsp;&nbsp;<\/strong><\/td><td><strong>Rank&nbsp;&nbsp;<\/strong><\/td><td><strong>&nbsp; Team&nbsp;&nbsp;<\/strong><\/td><td><strong>Year<\/strong><\/td><\/tr><tr><td><strong>0<\/strong><\/td><td>876<\/td><td>1<\/td><td>Riders&nbsp;&nbsp;<\/td><td>2014<\/td><\/tr><tr><td><strong>1<\/strong><\/td><td>789<\/td><td>2<\/td><td>Riders&nbsp;&nbsp;<\/td><td>2015<\/td><\/tr><tr><td><strong>2<\/strong><\/td><td>863<\/td><td>2<\/td><td>Devils&nbsp;&nbsp;<\/td><td>2014<\/td><\/tr><tr><td><strong>3<\/strong><\/td><td>673<\/td><td>3<\/td><td>Devils&nbsp;&nbsp;<\/td><td>2015<\/td><\/tr><tr><td><strong>4<\/strong><\/td><td>741<\/td><td>3<\/td><td>Kings&nbsp;&nbsp;<\/td><td>2014<\/td><\/tr><tr><td><strong>5<\/strong><\/td><td>812<\/td><td>4<\/td><td>kings&nbsp;&nbsp;<\/td><td>2015<\/td><\/tr><tr><td><strong>6<\/strong><\/td><td>756<\/td><td>1<\/td><td>Kings&nbsp;&nbsp;<\/td><td>2016<\/td><\/tr><tr><td><strong>7<\/strong><\/td><td>788<\/td><td>1<\/td><td>Kings&nbsp;&nbsp;<\/td><td>2017<\/td><\/tr><tr><td><strong>8<\/strong><\/td><td>694<\/td><td>2<\/td><td>Riders&nbsp;&nbsp;<\/td><td>2016<\/td><\/tr><tr><td><strong>9<\/strong><\/td><td>701<\/td><td>4<\/td><td>Royals&nbsp;&nbsp;<\/td><td>2014<\/td><\/tr><tr><td><strong>10<\/strong><\/td><td>804<\/td><td>1<\/td><td>Royals&nbsp;&nbsp;<\/td><td>2015<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>If I want to find the names of the team with ranking 1. This is how we can do it<\/p>\n\n\n\n<p><code>df[df[\u2018Rank\u2019]==1]<\/code><\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><\/td><td><strong>Points&nbsp;&nbsp;<\/strong><\/td><td><strong>Rank&nbsp;&nbsp;<\/strong><\/td><td><strong>&nbsp; Team&nbsp;&nbsp;<\/strong><\/td><td><strong>Year<\/strong><\/td><\/tr><tr><td><strong>0<\/strong><\/td><td>876<\/td><td>1<\/td><td>Riders&nbsp;&nbsp;<\/td><td>2014<\/td><\/tr><tr><td><strong>6<\/strong><\/td><td>756<\/td><td>1<\/td><td>Kings&nbsp;&nbsp;<\/td><td>2016<\/td><\/tr><tr><td><strong>7<\/strong><\/td><td>788<\/td><td>1<\/td><td>Kings&nbsp;&nbsp;<\/td><td>2017<\/td><\/tr><tr><td><strong>10<\/strong><\/td><td>804<\/td><td>1<\/td><td>Royals&nbsp;&nbsp;<\/td><td>2015<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>If I want to find the names of the team with ranking greater than 3<\/p>\n\n\n\n<p><code>df[df[\u2018Rank\u2019]&lt;3]<\/code><\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><\/td><td><strong>Points&nbsp;&nbsp;<\/strong><\/td><td><strong>Rank&nbsp;&nbsp;<\/strong><\/td><td><strong>&nbsp; Team&nbsp;&nbsp;<\/strong><\/td><td><strong>Year<\/strong><\/td><\/tr><tr><td><strong>0<\/strong><\/td><td>876<\/td><td>1<\/td><td>Riders&nbsp;&nbsp;<\/td><td>2014<\/td><\/tr><tr><td><strong>1<\/strong><\/td><td>789<\/td><td>2<\/td><td>Riders&nbsp;&nbsp;<\/td><td>2015<\/td><\/tr><tr><td><strong>2<\/strong><\/td><td>863<\/td><td>2<\/td><td>Devils&nbsp;&nbsp;<\/td><td>2014<\/td><\/tr><tr><td><strong>6<\/strong><\/td><td>756<\/td><td>1<\/td><td>Kings&nbsp;&nbsp;<\/td><td>2016<\/td><\/tr><tr><td><strong>7<\/strong><\/td><td>788<\/td><td>1<\/td><td>Kings&nbsp;&nbsp;<\/td><td>2017<\/td><\/tr><tr><td><strong>8<\/strong><\/td><td>694<\/td><td>2<\/td><td>Riders&nbsp;&nbsp;<\/td><td>2016<\/td><\/tr><tr><td><strong>10<\/strong><\/td><td>804<\/td><td>1<\/td><td>Royals&nbsp;&nbsp;<\/td><td>2015<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>We can also check multiple conditions using Boolean operators<\/p>\n\n\n\n<p>Now we will find the teams with ranking 1 and points greater than 800<\/p>\n\n\n\n<p><code>df[(df[\u2018Rank\u2019]==1)&amp;(df['Points']&gt;800)]<\/code><\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><\/td><td><strong>Points&nbsp;&nbsp;<\/strong><\/td><td><strong>Rank&nbsp;&nbsp;<\/strong><\/td><td><strong>&nbsp; Team&nbsp;&nbsp;<\/strong><\/td><td><strong>Year<\/strong><\/td><\/tr><tr><td><strong>0<\/strong><\/td><td>876<\/td><td>1<\/td><td>Riders&nbsp;&nbsp;<\/td><td>2014<\/td><\/tr><tr><td><strong>10<\/strong><\/td><td>804<\/td><td>1<\/td><td>Royals&nbsp;&nbsp;<\/td><td>2015<\/td><\/tr><\/tbody><\/table><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Statistical analysis is used to analyze the results and deduce or infer meaning about the underlying dataset or the reality that it attempts to describe. \ufeffStatistical analysis may be used to: Present key findings revealed by a dataset. Summarize information. Calculate measures of cohesiveness, relevance, or diversity in data. Make future predictions based on previously [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":4859,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[500],"tags":[1358,1356,1357],"class_list":["post-4842","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science-using-python-tutorials","tag-aggregation-functions","tag-descriptive-statistics-using-pandas","tag-statistical-analysis"],"_links":{"self":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/4842","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/comments?post=4842"}],"version-history":[{"count":0,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/4842\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media\/4859"}],"wp:attachment":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media?parent=4842"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/categories?post=4842"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/tags?post=4842"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}