Subject Area: Architecture
Level of complexity: Advanced
Approximate Time to complete: 1 hour
Companion Workbook (twbx): n/a
Additional Resources on the web:
- http://onlinehelp.tableau.com/current/pro/online/en-us/joins_addtables_datawindow.html
- http://tableaulove.tumblr.com/post/11692301750/what-i-learned-about-tableau-join-culling-over-summer
- http://tableaulove.tumblr.com/post/62447366098/what-i-learned-about-tableau-join-culling-over-fall
With Tableau 8.1, you have the option to control how joined tables are included in queries when using an inner join. Join culling will assume that the database tables have referential integrity; if a query requires only data from one table, the other joined tables will not be referenced, resulting in improved performance. You get the speed of working with one table when only one is required, and the flexibility of two or more tables when needed.
In cases where you have joins but no referential integrity defined in the database for performance reasons, you can impose soft referential integrity in Tableau by simply checking a box and Tableau will eliminate unnecessary inner joins, making your queries faster. In this example, we will use Tableau's Superstore dataset where we create an inner join on the Orders and Users table.
Assuming the above join has been done (and you're connecting live), select Data>[Your Data Connection Name]>Edit Tables…
Here, select "Include joined tables only when referenced (assume referential integrity)" to turn on join culling, and select OK*:
*It should be noted that if your tables have explicit Primary Key and Foreign Key relationships, join culling is automatically in effect. The above setting, "Include joined tables only when referenced" is only for cases when the PK/FK relationship is not defined in the database and the user wants to tell Tableau that there should be such a relationship.
By dragging the Sales field just onto the rows shelf, we get, without culling:
SELECT SUM([Orders$].[Sales]) AS
FROM [Orders$]
INNER JOIN [Users$] ON [Orders$].[Region] = [Users$].[Region]
HAVING (COUNT(1) > 0)
and with culling:
SELECT SUM([Orders$].[Sales]) AS
FROM [Orders$]
HAVING (COUNT(1) > 0)
How does this work? When you select the option to "Include joined tables only when referenced", you are telling Tableau that the joined tables have referential integrity. You are effectively confirming that the secondary tables will always have a matching row in the primary table. Because that is true, Tableau doesn't need any information from other unnecessary tables to return results.
It's important to realize when and how join culling happens. Join culling occurs when:
- You're connected to a star schema (snowflake schemas will not work)
- You're using inner joins
- You have referential integrity
If you check the "include joined tables only when referenced" box and you don't have referential integrity in your data, the results may not be accurate and sometimes inconsistent (depending on the number of tables being joined and what fields are in the view).
When Does Join Culling NOT Work?
- Using Custom SQL
- Anything besides a star schema
- Context Filters, User Filters, or Data Source filters
- Left, Right, or Full outer joins
Filtering Alternatives: If you choose to utilize a context filter or data source filter, be aware that this turns off join culling and can adversely affect workbook load performance. In cases where you use context filters, Tableau creates a temporary table which does not allow for join culling. Additionally, User Filters default to having "Add to Context" turned on, resulting in this issue as well.
The solution in this case is to remove the filter from context by right-clicking on the filter, and selecting "Remove from Context":
By selecting "Remove from Context", you will now allow for join culling to take effect (assuming the above bullet points hold true) which should improve performance.
It should be noted, however, that doing this to a User Filter will remove the security layer that a temporary table provides for user filters. If it is the case that you are not implementing user filters for security reasons (i.e., it's okay for a given user to see other users' data in your workbook), then removing the context filter will allow for join culling to take effect.
Because data source filters operate at a higher precedence than context filters, they will have the same effect on join culling and would need to be removed if performance is your primary concern. Therefore, if your workbook viewing experience is significantly slowed after using context or data source filters (and you're not implementing these filter types due to security concerns), you may consider removing all context and data source filters to improve overall workbook speed.