Governed data curation bridges the gap between data and business.
As data sources become more complex, diverse, and numerous, data management is now even more critical in modern BI deployments. As more of the workforce uses data to drive decisions, organizations must ensure accuracy within their data and its use in analysis.
Organizations have turned to data curation to address the data management and governance challenges that come with this broader data access. Data curation encompasses the way an organization captures, cleans, defines, and aligns disparate data. This process creates a bridge between the data and its real-world applications.
Organizations are already spending millions of dollars on technologies that integrate data definitions with the analytical tools that help analyze the data—aiming to remove ambiguity across teams and organizations. In response, data curation tools and processes (like data catalogs and semantic governance) are converging with BI platforms to link data with business context.
A data catalog acts as an enterprise business glossary of data sources and common data definitions. Subject matter experts like data engineers and data stewards can add descriptions and definitions to data sources and fields, tagging for better discoverability, and even helpful data quality indicators—including notifications for certifications of trusted content, or maintenance or deprecation of data assets.
Everyday users don’t need to know where data lives in the data source, but they do need to understand what the data represents in the real world. For example, analysts and consumers of content often need to verify the origin of a piece of data (also called a lineage analysis). And if data sets change, data engineers and data stewards need to analyze the downstream impact to assets connected to tables or schemas they manage. Combining a data catalog and BI platform helps to streamline all of these tasks, providing usage metrics to quickly identify the most frequently-accessed data sources and dashboards.
As necessary as data catalogs may be, there is arguably greater opportunity beyond metadata governance in the area of semantic governance. Semantics help to connect not just the context of data, but the intent of analytical actions—such as mapping synonyms to connect commands like "order size" with "quantity." This enables new modalities for the full spectrum of data workers to interact with data and quickly arrive at new insights. One way is through natural language interactions, where a BI platform understands layers that involve multiple queries, such as "Highlight the highest, lowest, and average."
As these technologies and processes continue to converge, data curation and semantics will provide a stronger foundation for the rest of the analytical experience. This will unify more disparate components of the data ecosystem—like cleansing and downstream analysis—and feed stronger machine-generated recommendations for tables, joins, and data models. Ultimately, advancements in data curation will enable the workforce to move beyond just asking questions of their data during analysis, toward asking questions of their business.