All IT Courses 50% Off
Data Science using Python Tutorials

Choosing color palettes in Seaborn

Seaborn makes it easy to use colors that are well-suited to the characteristics of your data and your visualization goals. This chapter discusses both the general principles that should guide your choices and the tools in seaborn that help you quickly find the best solution for a given application. 

General principles for using color in plots  

Components of color  

Because of the way our eyes work, a color can be defined using three components. We usually program colors in a computer by specifying their RGB values, which set the intensity of the red, green, and blue channels in a display. But for analyzing the perceptual attributes of color, it’s better to think in terms of hue, saturation, and luminance channels. 

Hue is the component that distinguishes “different colors” in a non-technical sense. It’s property of color that leads to first-order names like  “red” and “blue”: 

Choosing color palettes in Seaborn

Saturation or chroma is the colorfulness. Two colors with different hues  will look more distinct when they have more saturation 

All IT Courses 50% Off
Choosing color palettes in Seaborn

And lightness corresponds to how much light is emitted or reflected, for  printed colors, ranging from black to white

Choosing color palettes in Seaborn

Vary hue to distinguish categories  

When you want to represent multiple categories in a plot, you will usually want to vary the color of the elements. Consider this simple example: in which of these two plots is it easier to count the number of triangular points? 

In the plot on the right, the orange triangles “pop out”, making it easy to distinguish them from the circles. This pop-out effect happens because our visual system prioritizes color differences. 

Choosing color palettes in Seaborn

The blue and orange colors differ mostly in terms of their hue. Hue is useful for representing categories, because most people can distinguish a moderate number of hues relatively easily, and points that have different hues but similar brightness or intensity seem equally important. It also makes plots easier to talk about. Consider this  example

Choosing color palettes in Seaborn

Most people would be able to quickly ascertain that there are five distinct categories in the plot on the left and, if asked to characterize the “blue” points, would be able to do so. 

With the plot on the right, where the points are all blue but vary in their luminance and saturation, it’s harder to say how many unique categories are present. And how would we talk about a particular category? “The fairly-but-not-too-blue points?” What’s more, the gray dots seem to fade into the background, de-emphasizing them relative to the more intense blue dots. If the categories are equally important,  this is a poor representation. 

So as a general rule, use hue variation to represent categories. With that said, here are few notes of caution. If you have more than a handful of colors in your plot, it can become difficult to keep in mind what each one means, unless there are pre-existing associations between the categories and the colors used to represent them.  

This makes your plot harder to interpret: rather than focusing on the data, a viewer will have to continually refer to the legend to make sense of what is shown. So you should strive not to make plots that are too complex. And it is important to be mindful that not everyone sees colors the same way. Varying both shape (or some other attribute) and color can help people with anomalous color vision understand your plots, and it can keep them (somewhat) interpretable if they are printed to black-and-white. 

Vary luminance to represent numbers  

On the other hand, hue variations are not well suited to represent numeric data. Consider this example, where we need colors to represent the counts in a bivariate histogram.  

On the left, we use a circular colormap, where gradual changes in hue correspond to gradual changes in the number of observations within each bin. On the right, we use a palette that uses brighter colors to  represent bins with larger counts

Choosing color palettes in Seaborn

With the hue-based palette, it’s quite difficult to ascertain the shape of the bivariate distribution. In contrast, the luminance palette makes it much more clear that there are two prominent peaks. 

Varying luminance helps you see structure in data, and changes in luminance are more intuitively processed as changes in importance. But the plot on the right does not a grayscale colormap. Its colorfulness is more interesting, and the subtle hue variation increases the perceptual distance between two values, making small differences slightly easier to resolve. 

These examples show that color palette choices are about more than aesthetics: the colors you choose can reveal patterns in your data if used effectively or hide them if used poorly. There is not one optimal  palette, but there are palettes that are better or worse for particular  datasets and visualization approaches. 

And aesthetics do matter: the more that people want to look at your  figures, the greater the chance that they will learn something from them.  This is true even when you are making plots for yourself.  

During exploratory data analysis, you may generate many similar figures. Varying the color palettes will add a sense of novelty, which  keeps you engaged and prepared to notice interesting features of your data. So how can you choose color palettes that represent your data well and look attractive?

Tools for choosing color palettes  

The most important function for working with color palettes is,  aptly, color_palette(). This function provides an interface to most  of the possible ways that one can generate color palettes in seaborn.  And it’s used internally by any function that has a palette argument. 

The primary argument to color_palette() is usually a string: either the name of a specific palette or the name of a family and additional arguments to select a specific member. 

In the latter case, color_palette() will delegate to more specific  function, such as cubehelix_palette().  

It’s also possible to pass a list of colors specified any way that  matplotlib accepts an RGB tuple, a hex code, or a name in the X11  table.  

The return value is an object that wraps a list of RGB tuples with a few useful methods, such as conversion to hex codes and a rich HTML  representation. 

Calling color_palette() with no arguments will return the current  default color palette that matplotlib (and most seaborn functions) will  use if colors are not otherwise specified. This default palette can be set  with the corresponding set_palette() function, which  calls color_palette() internally and accepts the same arguments. 

To motivate the different options that color_palette() provides, it  will be useful to introduce a classification scheme for color palettes.  Broadly, palettes fall into one of three categories: 

  • Qualitative palettes, good for representing categorical data 
  • Sequential palettes, good for representing numeric data 

Diverging palettes, good for representing numeric data with a  categorical boundary.

Facebook Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Back to top button