Data visualization gives many insights that data alone cannot. Python has some of the most interactive data visualization tools. The most basic plot types are shared between multiple libraries, but others are only available in certain libraries.
Top Python Libraries for Data Visualization
Matplotlib is one of python’s important data visualization libraries. Despite being over a decade old, it’s still the most widely used library for plotting in the python community. It was designed to closely resemble MATLAB, a proprietary programming language developed in the 1980s.
Because matplotlib was the first Python data visualization library, many other libraries are built on top of it or designed to work in tandem with it during analysis. Some libraries like pandas and Seaborn are “wrappers” over matplotlib. They allow you to access several of the matplotlib methods with less code.
While matplotlib is good for getting a sense of the data, it’s not very useful for creating publication-quality charts quickly and easily. Matplotlib is extremely powerful but with that power comes complexity.”
The versatility of Matplotlib can be used to make visualization types such as:
- Scatter plots
- Bar charts
- Line plots
- Pie charts
- Stem plots
- Contour plots
- Quiver plots
%matplotlib is a magic function in IPython. %matplotlib inline sets the backend of matplotlib to the ‘inline’ backend. With this backend, the output of plotting commands is displayed inline within frontends like the Jupyter notebook, directly below the code cell that produced it. The resulting plots will then also be stored in the notebook document.
Seaborn is a popular data visualization library that is built on top of Matplotlib. Seaborn harnesses the power of matplotlib to create beautiful charts in a few lines of code. Seaborn aims to make visualization a central part of exploring and understanding data. The key difference is Seaborn’s default styles and color palettes, which are designed to be more aesthetically pleasing and modern. Since Seaborn is built on top of matplotlib, you’ll need to know matplotlib to tweak Seaborn’s defaults.
Seaborn puts visualization at the core of understanding any data. Seaborn is a higher-level library- it’s easier to generate certain kinds of plots, including heat maps, time series, and violin plots.
It provides a high-level interface for drawing attractive and informative statistical graphics. It is tightly integrated with PyData stack, including support for numpy and pandas data structures.
The following are some of the useful graphs we can generate using seaborn
- Bar Plot
- Factor Plot
- Density Plot
- Joint Distribution Plot
The Plotly Python library is an interactive, open-source plotting library that supports over 40 unique chart types covering a wide range of statistical, financial, geographic, scientific, and 3-dimensional use-cases.
ggplot is based on ggplot2, an R plotting system, and concepts from The Grammar of Graphics. ggplot operates differently than matplotlib: it lets you layer components to create a complete plot. For instance, you can start with axes, then add points, then a line, a trend-line, etc. Although The Grammar of Graphics has been praised as an “intuitive” method for plotting, seasoned matplotlib users might need time to adjust to this new mindset.
According to the creator, ggplot isn’t designed for creating highly customized graphics. It sacrifices complexity for a simpler method of plotting.ggplot is tightly integrated with pandas, so it’s best to store your data in a DataFrame when using ggplot.
It is hard to describe how ggplot2 works because it embodies a deep philosophy of visualization. However, in most cases, you start with ggplot(), supply a dataset, and aesthetic mapping (with aes()). You then add on layers (like geom_point() or geom_histogram()), scales (like scale_colour_brewer()), faceting specifications (like facet_wrap()) and coordinate systems (like coord_flip()).
Bokeh, native to Python is also based on The Grammar of Graphics like ggplot. It also supports streaming and real-time data. The unique selling proposition is its ability to create interactive, web-ready plots, which can easily output as JSON objects, HTML documents, or interactive web applications.
Bokeh has three interfaces with varying degrees of control to accommodate different types of users. The topmost level is for creating charts quickly. It includes methods for creating common charts such as bar plots, box plots, and histograms. The middle level allows the user to control the basic building blocks of each chart (for example, the dots in a scatter plot) and has the same specificity as Matplotlib. The bottom level is geared toward developers and software engineers. It has no pre-set defaults and requires the user to define every element of the chart.
geoplotlib is a toolbox for creating maps and plotting geographical data. You can use it to create a variety of map-types, like choropleths, heat-maps, and dot-density maps. You must have Pyglet (an object-oriented programming interface) installed to use geoplotlib. Nonetheless, since most Python data visualization libraries don’t offer maps, it’s nice to have a library dedicated solely to them.
folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the leaflet.js library. Manipulate your data in Python, then visualize it in on a Leaflet map via folium.
folium makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing rich vector/raster/HTML visualizations as markers on the map. The library has a number of built-in tilesets from OpenStreetMap, Mapbox, and Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys. folium supports both Image, Video, GeoJSON, and TopoJSON overlays.
Dealing with missing data is a pain. missingno allows you to quickly gauge the completeness of a dataset with a visual summary, instead of trudging through a table. You can filter and sort data based on completion or spot correlations with a heatmap or a dendrogram.