{"id":5725,"date":"2020-10-19T15:55:13","date_gmt":"2020-10-19T10:25:13","guid":{"rendered":"https:\/\/www.h2kinfosys.com\/blog\/?p=5725"},"modified":"2020-10-19T15:55:15","modified_gmt":"2020-10-19T10:25:15","slug":"introduction-to-seaborn","status":"publish","type":"post","link":"https:\/\/www.h2kinfosys.com\/blog\/introduction-to-seaborn\/","title":{"rendered":"Introduction to Seaborn"},"content":{"rendered":"\n<p>There is just something extraordinary about a well-designed visualization. The colors stand out, the layers blend nicely together, the contours flow throughout, and the overall package not only has a nice aesthetic quality, but it provides meaningful insights to us as well. This is quite important in data science where we often work with a lot of messy data. Having the ability to visualize it is critical for a data scientist. Our stakeholders or clients will more often than not rely on visual cues rather than the intricacies of a machine learning model. There are plenty of excellent Python visualization libraries available, including the built-in&nbsp; Matplotlib. Matplotlib has proven to be an incredibly useful and popular visualization tool, but even avid users will admit it often leaves much to be desired. An answer to all these problems is Seaborn &nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What is Seaborn?<\/strong><strong>&nbsp;&nbsp;<\/strong><\/h2>\n\n\n\n<p>Seaborn is a Python<a href=\"https:\/\/www.h2kinfosys.com\/blog\/advanced-data-visualization-using-matplot\/\"> data visualization<\/a> library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Seaborn provides simple high-level functions for common statistical plot types and integrates with the functionality provided by Pandas DataFrames. \u00a0<\/p>\n\n\n\n<p><strong>Why should we use Seaborn over matplotlib? <\/strong><strong>&nbsp;<\/strong><\/p>\n\n\n\n<p>\u2022 Matplotlib functions don\u2019t work well with data frames, whereas &nbsp;seaborn does &nbsp;<\/p>\n\n\n\n<p>\u2022 Seaborn comes with a large number of high-level interfaces and&nbsp; customized themes that matplotlib lacks as it\u2019s not easy to figure out &nbsp;the settings that make plots attractive &nbsp;<\/p>\n\n\n\n<p>\u2022 Matplotlib&#8217;s API is a relatively low level. Doing sophisticated statistical visualization is possible, but often requires a <em>lot <\/em>of boilerplate code. &nbsp;<\/p>\n\n\n\n<p>\u2022 Matplotlib predated Pandas by more than a decade and thus is not designed for use with Pandas DataFrames. To visualize data from a &nbsp;Pandas DataFrame, you must extract each Series and often concatenate them together in the right format. It would be nicer to have a plotting library that can intelligently use the DataFrame labels in a plot.&nbsp;<\/p>\n\n\n\n<p><strong>How to install Seaborn <\/strong><strong>&nbsp;<\/strong><\/p>\n\n\n\n<p>To install Seaborn and use it effectively, first, we need to install the aforementioned dependencies. &nbsp;The following are the four mandatory dependencies you need to have <\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>NumPy ( Version &gt;= 1.9.3)&nbsp;<\/li><li>SciPy ( Version &gt;= 0.14.0)&nbsp;<\/li><li>matplotlib ( Version &gt;= 1.4.3) &nbsp;<\/li><li>Pandas ( Version &gt;= 0.15.2)&nbsp;<\/li><\/ul>\n\n\n\n<p>Some of the Optional dependencies you need to have &nbsp;\u2022 Statsmodels, for advanced regression plots&nbsp;<\/p>\n\n\n\n<p>\u2022 Fastcluster, for clustering large matrices &nbsp;<\/p>\n\n\n\n<p>Once this step is done, we are all set to install Seaborn and enjoy its mesmerizing plots. To install Seaborn, you can use the following line of &nbsp;code &nbsp;<\/p>\n\n\n\n<p>To install the latest release of seaborn, you can use pip<\/p>\n\n\n\n<p><code>pip install seaborn&nbsp;&nbsp;<\/code><\/p>\n\n\n\n<p>You can also use <code>conda<\/code> to install the latest version of seaborn: &nbsp;conda install seaborn&nbsp;To import the dependencies and seaborn itself in your code, you can&nbsp; use the following code&nbsp;&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pandas as pd  \nimport numpy as np  \nimport matplotlib.pyplot as plt  \nimport seaborn as sns  \nfrom scipy import stats \n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Data Visualization using Seaborn&nbsp;<\/strong><\/h2>\n\n\n\n<p>The following are the list of plots we can perform using seaborn \u2022 Relational plots&nbsp;<\/p>\n\n\n\n<p>\u2022 Distribution plots&nbsp;<\/p>\n\n\n\n<p>\u2022 Categorical plots&nbsp;<\/p>\n\n\n\n<p>\u2022 Regression plots&nbsp;<\/p>\n\n\n\n<p>\u2022 Matrix plots&nbsp;<\/p>\n\n\n\n<p>\u2022 Pair plots &amp; Joint plots &nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Relational plots<\/strong><strong>&nbsp;&nbsp;<\/strong><\/h2>\n\n\n\n<p>Statistical analysis is a process of understanding how variables in a \u00a0dataset relate to each other and how those relationships depend on other variables. Visualization can be a core component of this process \u00a0because, when data are visualized properly, the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Human_visual_system_model\" rel=\"nofollow noopener\" target=\"_blank\">human visual system<\/a> \u00a0can see trends and patterns that indicate a relationship \u00a0The one we will use most is relplot(). This is a figure-level function for visualizing statistical relationships using two common approaches scatter plots and line plots.\u00a0\u00a0<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>scatterplot() ## Draw a scatter plot with the possibility of several semantic groupings.&nbsp;<\/li><li>lineplot() ## Draw a line plot with the possibility of several semantic groupings.&nbsp;<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Distribution plots<\/strong><strong>&nbsp;&nbsp;<\/strong><\/h2>\n\n\n\n<p>Whenever we are dealing with a dataset, we want to know how the data or the variables are being distributed.&nbsp;<\/p>\n\n\n\n<p>distplot()is the figure-level function for visualizing distribution plots.&nbsp; Distribution of data could tell us a lot about the nature of the data, as &nbsp;we all know that there are two types&nbsp;<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Univariate Distributions &nbsp;<\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>distplot() ## Flexibly plot a univariate distribution of&nbsp; observations.<\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Bivariate Distributions<\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>histplot() ## Plot univariate or bivariate histograms to show&nbsp; distributions of datasets.&nbsp;<\/li><li>kde plot() ## Plot univariate or bivariate distributions using &nbsp;kernel density estimation.&nbsp;<\/li><li>ecdfplot() ## Plot empirical cumulative distribution functions.&nbsp;<\/li><li>rugplot() ## Plot marginal distributions by drawing ticks along&nbsp; the x and y axes.&nbsp;<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Categorical plots &nbsp;<\/strong><\/h2>\n\n\n\n<p>In this section, we\u2019ll see the relationship between two variables of which one would be categorically divided into different groups. We\u2019ll be using &nbsp;catplot() is the figure-level function of a seaborn library to draw the&nbsp; plots of categorical data&nbsp;&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>barplot() ## Show point estimates and confidence intervals as rectangular bars.&nbsp;<\/li><li>countplot() ## Show the counts of observations in each categorical bin using bars.&nbsp;<\/li><li>boxplot() ## Draw a box plot to show distributions concerning categories.&nbsp;<\/li><li>voilinplot() ## Combination of boxplot and kernel density estimate.&nbsp;<\/li><li>point plot() ## Show point estimates and confidence intervals using scatter plot glyphs.&nbsp;<\/li><li>swarmplot() ## Draw a categorical scatterplot with non-overlapping points.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Regression Plots &nbsp;<\/strong><\/h2>\n\n\n\n<p>The regression plots in seaborn are primarily intended to add a visual guide that helps to emphasize patterns in a dataset during exploratory data analyses. Regression plots as the name suggest create a &nbsp;regression line between 2 parameters and help to visualize their linear relationships.&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>&nbsp;regplot() ## Plot data and a linear regression model fit.<\/li><li>residplot() ## Plot the residuals of linear regression.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Matrix Plots<\/strong><strong>&nbsp;&nbsp;<\/strong><\/h2>\n\n\n\n<p>A <strong>matrix plot <\/strong>is a <strong>plot <\/strong>of <strong>matrix <\/strong>data. A <strong>matrix plot <\/strong>is a color-coded diagram that has rows of data, columns of data, and values.&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>heatmap() ## Plot rectangular data as a color-encoded matrix.&nbsp;<\/li><li>clustermap() ## Plot a matrix dataset as a hierarchically clustered heatmap.&nbsp;<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Pair plots &amp; Joint plots&nbsp;&nbsp;<\/strong><\/h2>\n\n\n\n<p>We can also plot multiple bivariate distributions in a dataset by using the seaborn library. This shows the relationship between each column of the database. It also draws the univariate distribution plot of each variable on the diagonal axis.&nbsp;&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>pairplot() ## Plot pairwise relationships in a dataset.&nbsp;<\/li><li>jointplot() ## Draw a plot of two variables with bivariate and univariate graphs.&nbsp;Using all these plots we can perform data analysis and draw meaningful conclusions. We can also quickly see trends and outliers. If we can see something, we internalize it quickly.&nbsp;&nbsp;<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Importing Datasets <\/strong><strong>&nbsp;<\/strong><\/h2>\n\n\n\n<p>Seaborn comes with a few important inbuilt datasets in the library. &nbsp;When Seaborn is installed, the datasets download automatically.&nbsp;You can use any of these datasets for your learning. With the help of &nbsp;the following function you can load the required dataset&nbsp;&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import seaborn as sns  \nsns.load_dataset()  \nload_dataset (name, cache: bool=True,**kws) \nname : name of the dataset ( name.csv on https:\/\/github.com\/  mwaskom\/seaborn-data ).  <\/code><\/pre>\n\n\n\n<p>cache : boolean, optional&nbsp;<\/p>\n\n\n\n<p>If True, then cache data locally and use the cache on subsequent calls&nbsp; kws : dict, optional Passed to pandas.read_csv&nbsp;&nbsp;To view all the available data sets in the Seaborn library, you can use&nbsp; the following command with the <strong>get_dataset_names() <\/strong>function as&nbsp; shown below<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sns.get_dataset_names()<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">&nbsp;&nbsp;<strong>Output:&nbsp;&nbsp;<\/strong><\/h4>\n\n\n\n<pre class=\"wp-block-code\"><code>&#91;'anagrams','anscombe','attention','brain_networks',  'car_crashes','diamonds','dots','exercise','flights',  'fmri','gammas','geyser','iris','mpg','penguins',  \u2018planets\u2019,'tips','titanic']  \n<\/code><\/pre>\n\n\n\n<p>This is the list of all available datasets in seaborn. We can use these&nbsp; datasets for our practise purpose.&nbsp;&nbsp;<\/p>\n\n\n\n<p>Now we will learn how to import these datasets.&nbsp;&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Importing Data as Pandas DataFrame&nbsp;&nbsp;<\/strong><\/h2>\n\n\n\n<p>Now, we will import a dataset. This dataset loads as Pandas DataFrame by default. If there is any function in the <a href=\"https:\/\/www.h2kinfosys.com\/blog\/getting-started-with-pandas\/\">Pandas<\/a> DataFrame, it works on this DataFrame.\u00a0<\/p>\n\n\n\n<p>import seaborn as sns&nbsp;&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>df = sns.load_dataset(\u2018tips\u2019)  \ndf.head()  \n<\/code><\/pre>\n\n\n\n<p>We imported inbuilt tips data set from seaborn&nbsp;<\/p>\n\n\n\n<p>The above line of code will generate the following output&nbsp;&nbsp;<\/p>\n\n\n\n<p><strong>Output :&nbsp;<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><\/td><td><strong><em>total_bill<\/em><\/strong><\/td><td><strong><em>tip<\/em><\/strong><\/td><td><strong><em>sex<\/em><\/strong><\/td><td><strong><em>smoker<\/em><\/strong><\/td><td><strong><em>day<\/em><\/strong><\/td><td><strong><em>time<\/em><\/strong><\/td><td><strong><em>size<\/em><\/strong><\/td><\/tr><tr><td><strong>0<\/strong><\/td><td><strong>16.99<\/strong><\/td><td><strong>1.01<\/strong><\/td><td><strong>Female<\/strong><\/td><td><strong>No<\/strong><\/td><td><strong>Sun<\/strong><\/td><td><strong>Dinner<\/strong><\/td><td><strong>2<\/strong><\/td><\/tr><tr><td><strong>1<\/strong><\/td><td><strong>10.34<\/strong><\/td><td><strong>1.66<\/strong><\/td><td><strong>Male<\/strong><\/td><td><strong>No<\/strong><\/td><td><strong>Sun<\/strong><\/td><td><strong>Dinner<\/strong><\/td><td><strong>3<\/strong><\/td><\/tr><tr><td><strong>2<\/strong><\/td><td><strong>21.01<\/strong><\/td><td><strong>3.50<\/strong><\/td><td><strong>Male<\/strong><\/td><td><strong>No<\/strong><\/td><td><strong>Sun<\/strong><\/td><td><strong>Dinner<\/strong><\/td><td><strong>3<\/strong><\/td><\/tr><tr><td><strong>3<\/strong><\/td><td><strong>23.68<\/strong><\/td><td><strong>3.31<\/strong><\/td><td><strong>Male<\/strong><\/td><td><strong>No<\/strong><\/td><td><strong>Sun<\/strong><\/td><td><strong>Dinner<\/strong><\/td><td><strong>2<\/strong><\/td><\/tr><tr><td><strong>4<\/strong><\/td><td><strong>24.59<\/strong><\/td><td><strong>3.61<\/strong><\/td><td><strong>Female<\/strong><\/td><td><strong>No<\/strong><\/td><td><strong>Sun<\/strong><\/td><td><strong>Dinner<\/strong><\/td><td><strong>4<\/strong><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>In the next article, we will learn how to visualize all the seaborn plots.&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There is just something extraordinary about a well-designed visualization. The colors stand out, the layers blend nicely together, the contours flow throughout, and the overall package not only has a nice aesthetic quality, but it provides meaningful insights to us as well. This is quite important in data science where we often work with a [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":5809,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[500],"tags":[],"class_list":["post-5725","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science-using-python-tutorials"],"_links":{"self":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/5725","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/comments?post=5725"}],"version-history":[{"count":0,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/5725\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media\/5809"}],"wp:attachment":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media?parent=5725"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/categories?post=5725"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/tags?post=5725"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}