Let Tableau Write Your SQL for You
Kirk Munroe, Chief Analytics Officer and Founding Partner at Paint with Data and Tableau DataDev Ambassador, explains the value of using relationships in your Tableau data models.
Let Tableau write your SQL for you—your dashboard consumers will thank you for it
I recently saw a post on LinkedIn where a Tableau expert asked his network how often people used custom SQL in Tableau. Of the nearly 230 respondents, 41% answered either “all the time” or “frequently”!
This set of answers does not definitively represent the entirety of the Tableau Community, of course. I was a little surprised it was this high, but I was not completely shocked either. I often attend Tableau User Groups and watch Tableau experts on social media, and it is a very common occurrence to see either custom SQL or joins in Tableau.
You might be asking, “What’s the problem with custom SQL and joins?” Leaving aside that custom SQL can run very slow in Tableau (on a live connection, Tableau will run your SQL code as a subquery on most interactions with a dashboard), joins in general will both explode your data and limit the number of questions that you can answer with your data set. That’s right—more data and fewer answers!
How can this be possible? The reason is that when you join your data, you are creating a data set that is designed to answer any question from a subset of your total data—unless you are using full outer joins everywhere, but then your data set will be very large and full of NULL values.
Dynamic SQL with joins and relationships
Tableau has generated SQL dynamically since day one, 20 years ago. However, the SQL it generated could only happen against a single physical table. Tableau added a user interface to allow for joins, but these joins occur before the SQL generated by the pills you pull into your view.
Relationships, released in 2020.2 (over 4 years ago!), spared you from needing to worry about joining your data before bringing it into Tableau. Before this, there were a few dynamic data modeling features in Tableau. Those features were either under the covers (e.g. join culling) or difficult and brittle in the UI (e.g. blends).
The evolution to Relationships was magic! The entire reason we write (SELECT-based) SQL in the first place is to answer questions from our data. How can we realistically write this SQL before we know what the questions are? We just let Tableau know how the tables are related and Tableau does the rest, depending on the fields in the view.
Let’s look at a simple example to start. Imagine that Superstore sales were in one table with line items and a product catalog is in another table. What happens if we join these tables together?
Here we can see that we have four options: left join, right join, inner join, (full) outer join. Each one has its drawbacks. If we pick left, we can’t see the products that haven’t sold; if we pick right, we can’t see the sales without products; if we go inner, we get only sales with products AND the products that have sold; and if we go outer, we get a ton of messy NULL values.
Also, no matter which one we pick, we will end up with a table that is bigger than just adding the size of the two input tables.
What happens if we decide to create a relationship between these tables?
First, Tableau doesn’t explode the data. The size of the model is the additive size of both tables. Second, we can answer any question we want!
Let’s take the case of orders and product sales. Just by creating a relationship on Product ID, we can discover which products have and haven’t sold. By dragging our Product ID pill from our ProductDB table to the view and then bringing in the count of line items… voila! A list of all the products that have—and haven’t—sold!
Tableau was clever enough to know that it shouldn’t do an inner join and eliminate all the rows with products that weren’t in the line items (sold items) table.
Quick queries with multi-fact relationships
What should we do if we're measuring things in our organization that have unique data structures because they come from different source systems in various business functions?
Take an example of what a table of support calls might look like:
Now imagine that we wanted to check the relationship with product sales. That is, which products generate the most calls based on sales volume. The tables share no fields, but have many more records, and could look like:
In the past, relating these tables wouldn’t be possible with Relationships. With the introduction of Multi-fact Relationships in 2024.2, Tableau can now handle these types of queries for you too!
If you were joining your data, this would be very tricky. You would have to join Orders, Addresses, Customers, Line Items, and Products into a single, massive table. You would also have to create a big table with Support Cases, Products, and Customers. Then, you would need to join those tables together to get your answers—at the right level of aggregation.
With multi-fact relationships, Tableau can also do these more complex queries involving shared dimensions without exploding your data. It is also designed for performance. If the query would be too complex in the database, Tableau can query each “side” of your query and stitch it together in your viz!
Let’s look at our use case of support calls by product. We can now create a calculation to get the ratio of support calls generated per line item (product order), even those these are from completely different tables:
If we bring this field and Sub-Category to the view, we can see which sub-categories are generating the most calls per sale!
Another benefit with multi-fact relationships is that we future-proof our calculations, like the one I used here. If we decide to add another shared dimension table to our model in the future, this calculation will continue to respect it—and any new shared dimensions we add. If we were doing a join, union, or custom SQL, we might have to go back and update all of our calculations!
Learn more about using multi-fact relationships in Tableau
The best decision you can make for the business users you serve is to use relationships and let Tableau write your SQL for you. Small data sets. More answers. More flexibility. What more could you want?
How about learning more about multi-fact relationships? I presented at TC24 with Tableau product management director, Thomas Nhan—watch the session recording on Salesforce+ to dive deeper and see data modeling with multi-fact relationships in action.
相關文章
Subscribe to our blog
在收件匣中收到最新的 Tableau 消息。