DASH: Some Text Walked into a Bar (Chart)

Tableau Research presents at IEEE VIS 2024: DASH, a bimodal data exploration tool for interactive text and visualizations with encoded semantic metadata.

Text as a first class data analysis citizen

When we read about a current world conflict, a contentious election, or an amazing sports achievement, why aren’t we just shown a bunch of charts and graphs? Why didn’t I author this blog as a chart? What is it about text that so dominates our information gathering?

In the face of data technologies like Tableau, AWS, Python, and AI, talking about ‘text’ feels like bringing a pencil to a GPU fight. But then why is literally every newspaper, magazine, blog, book, academic paper, and how-to-manual in the world text first?

One way to approach this is to look at a 2021 paper by Lundgard and Satyanarayan that outlines four levels of semantic content for natural language descriptions of data visualizations. Briefly, these levels are:

Level

Label

Examples

Level 4 (L4)

Contextual and domain-specific

Domain-specific insights, explanations

Level 3 (L3)

Perceptive and cognitive

Complex trends

Level 2 (L2)

Statistical and relational

Descriptive statistics, outliers

Level 1 (L1)

Elemental and encoded

Chart type, title, axis ranges

As we move up from the lower levels to the higher levels, the semantics of the chart descriptions become richer and more meaningful. At the bottom (Level 1), they are primarily factual; at the top (Level 4) they deal with explanations and domain-specific knowledge.

Taking inspiration from this, we developed our own four-level semantic hierarchy of data consumption:

Level

Label

Examples

Level 4 (L4)

Insights and integration of domain knowledge

“The steady increase in CEO pay may be playing a role in the recent uptick in investor activism.”

Level 3 (L3)

Relationships among data and statistics

Clusters, correlations, outliers, and trends

Level 2 (L2)

Statistics

STDEV(Profit)

Level 1 (L1)

Base data

Rows and columns of a SQL database

Looking at this, we can notice a few things:

  1. The lower levels are super easy to display with a chart. For example, we can plot Profit for every store in the country (L1). We can plot the average store profit per state (L2). We can even begin to cluster states by profit (L3). L4 is a little tougher—what would we chart exactly to show this? And would it be as efficient and clear as simply saying it with text? (Spoiler alert: No.)
  2. The higher levels are super easy to convey with text. We can discuss why lower profits might lead to CEOs getting fired (L4). We can discuss which states show similar sales performance (L3). We could even discuss per-state averages (L2), although the fifty-state comma-separated sentence would run on a bit. L1 would be hard—are we really going to recite every number in our database? We may as well just print the SQL table and be done.
  3. There is a relatively smooth handoff between text and charts right around L2 and L3. Showing a handful of clusters and discussing a handful of clusters are both totally reasonable. Discussing national averages and state averages is totally reasonable, as is showing trend lines. So L2 and L3 certainly seem to provide a nice handoff between text and charts.

Once we codified these semantic levels, things began to make a bit more sense. When we read the news or a blog, or listen to a podcast, we are looking for knowledge and analysis, not data. Semantic levels 1-3 are the data facts of the matter, but Level 4 is the knowledge and analysis that we need to make decisions and understand our world. And Level 4 information is best conveyed by text.

DASH: A data exploration tool for all semantic levels

Once we articulated these four levels and their relationship to text and charts, we created DASH (Figure 1), a system for data analysis with semantic hierarchies.

DASH interface from Tableau Research showing 5 zones (A containing text, B containing charts, C displaying JSON representation of semantic metadata, D displaying the DASH data-exchange JSON packet, and E displaying text that with color encoding)

Figure 1: DASH tool’s interface for supporting textual and visual data analysis. (A, B) Text and charts encoded with DASH semantic metadata. (C) A JSON representation of semantic metadata, including the semantic level, the data field, and the data value. (D) The DASH data-exchange JSON packet comprises the interactive text, its metadata, and identifiers that link the textual narrative to specific data points. (E) DASH semantic level assignment using the Lundgard et al. color encoding.

The DASH interface consists of a few major pieces:

  1. The text area (Figure 1A) that renders natural language text descriptions of the data.
  2. The chart area (FIgure 1B) that renders visualizations of the data.
  3. Metadata (Figure 1C and 1D) that support the text, telling DASH which semantic level the text belongs to. More on this below.

To begin a data analysis session, we kickstart DASH by giving it high-level knowledge about what we are looking at and what we want to accomplish. For example, in the session shown in Figure 2, here’s what we gave DASH:

  1. A dataset detailing Seattle real estate data.
  2. A natural language description of the data set explaining that this is a dataset about Seattle real estate and a description of the columns.
  3. A natural language description of our analytical goal: to find a good neighborhood for a family of four.

DASH then kicks off the session with a high-level analysis of the dataset as seen through the lens of our analytical goal. As you can see in Figure 1A and paragraph 1 of Figure 2, we are using a large language model (LLM; for example, GPT-4 Turbo) to generate the text. But the LLM is doing more than just text generation; the text is delivered to DASH in a JSON structure that contains a ton of metadata (Figures 1C and 1D).

DASH interface from Tableau Research annotated to show examples of interactivity and how text elements can be expanded on with additional text or represented visually using the "Tell Me More" and "Show Me More" buttons, respectively.

Figure 2: The DASH interface and an associated interactivity example. Generated text paragraphs are labeled as Px, and interactive mouse-drag gestures are labeled as Ix. The DASH interface comprises two primary sections: the text area on the left and the chart area on the right. The Tell Me More and Show Me More buttons at the bottom left trigger additional text and chart rendering, respectively. Text-to-text (I1), text-to-chart (I2-I5), chart-to-chart (I6), and chart-to-text (I7) interactions are illustrated.

As a result, DASH understands the text’s semantic level, the text’s related fields (zip code, price, etc.), and the specific user-facing text of every piece of text in the natural language description. This means, for example, that the vague-sounding text “higher average number of bedrooms” is actually carting around metadata that tells DASH exactly what the text is talking about—specifically, its data field and its value. It’s actually not vague at all; it is hard linked to a very specific data definition.

So what does this buy us? It lets us treat text like any other data mark in a chart; it has one or more fields and one or more values. In fact, to DASH, text is completely indistinguishable from charts; words and marks both carry the same data. As a result, they are completely interchangeable. Text and charts can now exchange data fluidly, without barriers.

To illustrate this, let’s look back at Figure 2. Figure 2 illustrates an interactive DASH data analysis session with generated text paragraphs (Px) and interactions (Ix).

  1. DASH starts out with a high-level discussion of the data as analyzed through the lens of our analytical goal—finding a neighborhood for a family of four (P1).
  2. We drag the text ‘house and lot sizes’ to Tell Me More (I1), whereupon DASH creates P2, a paragraph describing why house and lot sizes are important for a growing family.
  3. We then drag the text ‘affordable option’ (which, under the hood, is tagged with the data field ‘avg_price) to Show Me More (I2), which shows us Seattle house prices by creating the Average House Price chart.
  4. To focus on specific zip codes, we drag the text ‘98101’ and ‘98015, 98112, and 98117’ (I3 and I4) to the house price chart, highlighting those zip codes.
  5. To get a sense of beds and bath numbers, we drag the text ‘bedrooms and bathrooms’ to Show Me More (I5), which creates the chart Average Number of Bedrooms by Average Number of Bathrooms.
  6. We notice that zip code 98112 is pretty expensive. To figure out why, we drag the 98112 mark from the Average House Price chart to the Average Number of Bedrooms by Average Number of Bathrooms chart (I6), highlighting 98112. From this, we can see that 98112 homes have lots of beds and baths, possibly contributing to their high house price.
  7. Finally, to get a deeper analysis of the 98112 zip code, we drag the 98112 mark from the Average Number of Bedrooms by Average Number of Bathrooms chart over to Tell Me More, whereupon DASH gives us a high-level text description of the 98112 zip code (P3).

Why is this useful and cool?

We’ve already discussed how charts excel at lower semantic levels, and text excels at higher semantic levels. So now imagine what you could do if DASH were integrated into newspapers: you’re reading an article about a conflict in some country. Where is that country anyway? Just drag the country’s name to Show Me More and get a map. What about the bordering countries? What do they think of this conflict? Just drag their marks to Tell Me More and get an analysis of the local geopolitics.

DASH breaks down the barriers between text and charts and lets you move around inside the data in a way that you just can’t do with either text or charts individually. DASH opens the door for a data discussion, not just a chart-based data exposition.

So what’s next?

Lots of cool stuff. We have a lot of ideas kicking around and we can’t wait to share them with you. Until then, please read our paper (co-authored with Vidya Setlur) and check out the accompanying video on the Tableau Research website.

If you’re able to join us (virtually) at the IEEE VIS 2024 Conference, we’ll be showing DASH along with several other Tableau Research projects.

Thanks for reading!