Cluster analysis or clustering in Tableau divides a data set into segments or clusters with relevant data values. Clustering helps us to conduct a comparative analysis of data. A cluster comprises of similar data values of a dimension, i.e., the cluster values are related to each other more than the data in other clusters. Therefore, clustering is done using specific clustering algorithms where similar values are kept together as a part of the group. We can also have a cluster of up to seven color shades or codes at a time.
For example, if we have sales data for a product for different types of consumers/buyers and we want to analyze the purchasing capacity/trends of consumers. We can create clusters for this where we can segregate consumers based on their purchasing capacities. Now that we can unveil and analyze consumers’ purchasing or spending capacities, we can develop strategies to maximize sales.
In Tableau, the clustering algorithm is used to create clusters on a Tableau worksheet and is known as the K-means clustering algorithm. The reason behind it being called K-means is that this algorithm divides a data set into K clusters or segments based on the similarity metrics. Then it calculates the mean for each cluster that gives the Centroid of a cluster.
After it calculates the centroid value for each cluster present, it arranges them in such a way that the total sum of the distance between the centroid and concerning members in a cluster is minimum or as small as possible. In this way, the K-means algorithm also gives us closely packed clusters, each made of closely related/similar values.
How to Create Clusters in Tableau?
As a prerequisite to make a cluster in Tableau, we have first created a scatter plot for sales.
Step 1: For creating a cluster, go to the Analytics tab and then select Cluster from the Model section.
Step 2: Hold the option Cluster and then drag and drop it on the data visualization area, as shown below.
Step 3: A dialog box Clusters will open containing two columns: Variables and Number of Clusters. In the Variables column, you can add dimensions or measure fields to be included in the cluster. Moreover, from the Number of Clusters section, you can enter the numeric value of the clusters you want.
Once you are done setting up all the cluster settings, you will have a cluster of data values formed in the visualization area. On the extreme right, you can see each cluster name with their corresponding colors. Here, it shows four clusters named as (Cluster 1, Cluster 2, Cluster 3, and Cluster 4) as we have set the Number of Clusters value as 4 in the previous step.
Step 4: On the left panel, we have the Cluster (1) showing where we can explore and manage a cluster’s properties further. For accessing the drop-down menu, click on the arrow given right next to Cluster (1). To see the description of the selected cluster, click on Describe clusters… option.]
It will open a detailed description window for the active cluster containing two tabs: Summary and Models. The summary tab gives a detailed summary of the cluster made, giving information about the variables, level of detail, number of clusters, scaling, number of points, and the total sum of squares along with information on each cluster. The Models tab describes the cluster model type.
Step 5: You can also change the shape of each cluster point from the Shape card present on the Marks section, as shown below. This option will give you a range of different shapes to select.
Step 6: You can also change the color scheme of the cluster by clicking on the Color option present in the Marks section. From the window that opens, you can see the current color scheme, select a new color palette, and assign it to the cluster.
The color scheme of the cluster has now changed.
Clustering Conditions in Tableau:
There are certain conditions where you cannot use the clustering option in Tableau.
- Clustering is not available for authors on the web such as on Tableau Online, Tableau Server, etc. Instead, it is available on the Tableau Desktop.
- Clustering is not available while using a cube data source.
- Clustering is not available when a blended dimension is available in the view.
- When you do not have fields that you can use as variables or inputs for clustering in the view.
- Clustering is not available when no dimensions are present in an aggregated view.
- When the data have field types such as table calculations, Blended calculations, Ad-hoc calculations, Generated latitude/longitude values, Groups, Sets, Dates, Bins, Parameters, Measure Names, and Measure Values, these cannot be used as variables or inputs for clustering.