Data blending allows us to bring data from two different data sources together in a single view or from a single Tableau worksheet. Data blending is different from creating joins because blending combines only relevant data from distinct data sources, whereas joins work on row-level data and often duplicate data repeating in several rows.
There are two types of data sources: a primary data source and a secondary data source.
We can also make graphs and charts using the data from both data sources simultaneously in a sheet.
Primary and Secondary Data Sources in Tableau:
To perform blending in a Tableau sheet, there should be at least one secondary data source. The data in the primary data source is the main data on which Tableau defines graphs and charts.
However, only those data that are relevant, matching, or corresponding to the values of primary data sources are taken from secondary data sources, leaving everything else at the source.
For example, in our H&M sales data, if in the primary data source, we have regions like North, West, and Central. However, the secondary data source has North, South, West, and East.
Then, after data blending, the final graph cannot show the data related to the South and East regions because it is not available in the primary data source. Therefore, you should always select the primary and secondary data sources wisely, depending on the fields and values you wish to display in a chart. You can also select a data source as primary by simply using its fields first in a chart.
Types of Tableau Data Blending:
There are two types of Data Blending:
- Automatically Defined Relationship: The automatically defined data blending is defined as a relationship in Tableau that only works best if the field in which you are working consists of the field name that is the same for both sources of data. If not, you have alias names so that they can match.
- Manual Tableau Data Blending: This process is used when there exists a scenario that requires a more complex blend such as the budget comparison data from spreadsheets with data from a database.
Defining Link Relationships for Data Blending:
You can define relationships or create links between data fields from a blended primary and secondary data source. Usually, Tableau creates links between similar fields automatically, but you can define more relationships with the help of the Edit Relationships option.
For creating a link between two fields, you must have common dimensions in both the data sources that are known as linking fields. Once you have linking fields in your primary and secondary data sources, you can easily create a link.
Here are some points that are required while establishing a link between two linking fields.
- All the links are displayed in the secondary data source pane in a sheet. You will find an orange link icon in front of the linked fields and a grey broken link icon for the linking fields.
- You can only create a link between the two fields if they contain similar data. For example, if Table 1 has the Regions field and Table 2 has the Zones field with the same field values, i.e., North, South, East, West, you can establish a link manually between these two fields or common dimensions rather than renaming one of them.
- You also need to click on a broken link icon to activate an inactive linking field.
- If the primary and secondary data sources have more than one field in common or there is more than one potential linking field, then multiple links can be established between them at a time in a single Tableau worksheet.
How to Blend Data in Tableau?
Let us take an example to understand data blending in Tableau. We have two data sets of sales of the store H&M in 2018 and 2019. Now, the number of fields in both the data sets is the same but data is different.
For instance, in table H&M 2018, the field Region stores all the data of regions, whereas, in the H&M 2019, the field is having the same five regions called Zone. With data blending, we can equate these two fields as they contain similar regions like north, south, central, east, and west.
Hence, data blending helps establish relationships between two relevant data sources, making data analysis more meaningful and insightful. We can also compare two data sets more efficiently by blending them in a single Tableau worksheet. So, before starting, let us see our two sets of sample data for H&M sales.
We will make separate connections to both the data sets named H&M Sales 2018 and H&M sales 2019.
Now, we will see the existing relationship between these two data sets, and to make new relationships, we go to the Data tab and then select the option Edit Relationship.
A relationship dialog box will open, which displays the primary data source, a secondary data source, and a list of already existing data or automatically detected relationships between the fields of the two tables.
You can also change the primary and secondary data source from the drop-down list. Also, you can change the option Automatic to Custom to make a new relationship.
Now, you need to create a relation between the fields Region and Zone using the two tables stating that the constituent data in them is the same (not identical). For this, you need to select the Custom option and click on Add.
A list of available fields from both the tables will open. You can select Zone and Region from here. Click on OK.
This will create a new relationship between the Zone and Region fields. Click on OK to confirm.
We have our two data sources on our worksheet, which have blue and orange tick marks in front of their names, indicating the primary data source (blue) and secondary data source (orange).
Now, we are ready to use the data from these two data sets and start our analysis. You will also find a link icon in front of the fields that are linked between both the tables, which means that you can use those fields from the primary data set as a common field as they are linked.
As shown in the screenshot below, we made a bar graph for total sales in H&M stores in 2018 and 2019 (in the USA). We could get region-wise and state-wise sales data for both the years in one graph because of data blending. Here, we could use the Zone field from the 2019 sales dataset as a common field between the tables to provide information on regions.
When to Blend your Data in Tableau?
The data-blending feature in Tableau is particularly useful in the following cases:
- When you are not able to use cross-database joins in some specific database that does not support it, such as, Oracle Essbase, Google Analytics (an extract only connection), you can import or connect to separate data in such case sources and then combine them using data blending. This allows you to use a combination of data from distinct data sources on a single Tableau worksheet.
- Another case is where it is fit to use data blending is when your data values exist at different levels of details or have different granularity.
- Data blending is the best option to use when you are using larger data sets. Instead of using Joins, you can blend the data because Joins combine the data beforehand and then aggregate it for the view, which will affect the performance when the database is large. On the other hand, when we blend data, it aggregates the data first and then combines it whenever required saving a lot of computational power in case of large data sets.
Difference between Data Blending and Data Joining:
The main difference between the data blending and data join is when the join is performed concerning aggregation, it will require at least two data sources termed as primary and secondary sources. When designated a primary data source, it functions as the primary data source or main table and any subsequent data sources used on the sheet are treated as a secondary data source. The columns of secondary data sources contain corresponding matches in the primary data source appear in the view.
After characterizing the primary and secondary data sources, defining the standard dimension or dimension between two data sources is necessary. This process of defining is termed as the linking field.
If the date field available in the primary and secondary data sources have the same name, then Tableau will create the association between them and this relationship is a link icon ( ) next to the date field when the field is in the view of the secondary data source.
If two dimensions do not have the same name, it defines a relationship that creates an exact mapping between date fields among primary and secondary data sources.
Limitations of Data Blending in Tableau:
- You cannot publish a blended data source as a single data source on the server. Instead, you need to publish the two data sources separately on the same server and then blend the published sources.
- It is essential to always aggregate data in calculations from the secondary data sources.
- You may face issues while using non-additive aggregates such as MEDIAN, COUNTD, etc. when blending data.