What Is Data Discovery and Why Does It Matter?

Last Published: Aug 16, 2023 |
Nathan Turajski
Nathan Turajski

“A journey of a thousand miles begins with a single step,” according to Lao Tzu. This centuries-old proverb remains relevant today as it helps illustrate the role data discovery plays in the journey toward visualizing data, improving data governance, and operationalizing data intelligence for business value creation.

Data discovery is the first step to begin problem-solving modern-day data transparency challenges, such as enterprise-wide data democratization, by applying analytics, privacy and trust assessments for compliance and greater data insights on the road to digital transformation success. Let’s dig deeper!

What is data discovery?

Data discovery enables your organization to identify, catalog, and classify business-critical and sensitive data, so you can govern it for meaningful purposes with increased transparency. Data discovery helps you:

  • Uncover new insights for opportunities in business value creation
  • Apply data protection to lower risk exposure from abuse and comply with privacy mandates
  • Drive similar high-value business outcomes where data is the fuel of modern business operations

As the first step in a data governance journey, data discovery provides the data intelligence an organization needs to develop new products and services, optimize data use, and protect data from risk exposure. The result enables greater opportunities for new revenue sources when collecting greater volumes of data discovered across today’s modern enterprises.

As an example, information captured from a company’s consumers, such as personal preferences and transaction records, may lack the necessary data transparency needed when scattered across enterprise systems. Data discovery helps automate building a metadata repository using AI and machine learning to accelerate an understanding of where data is located, where it’s being moved and used, and help determine its value to an organization to make it available through data democratization efforts, such as a data marketplace.

Data discovery helps ensure data is trusted and available when it’s aligned with data governance policies that enhance enterprise and customer confidence, avoid abuses during data handling, and deliver next-generation optimized operations as a key part of a digital transformation effort. This is increasingly true as a prerequisite when determining the application workloads and data types to safely migrate to the cloud with trust.

Data intelligence is the key benefit of data discovery

There’s a staggering statistic from the World Economic Forum: “By 2025, it’s estimated that 463 exabytes of data will be created each day globally—that’s the equivalent of 212,765,957 DVDs per day!” But, with all this data, there’s something fundamentally missing: How to make sense of a massive collection of enterprise data to understand its relevance. Or, is it simply a record of what was -- rather than what could be?

Increasingly, innovative organizations are realizing it’s the latter, as a strategic opportunity to accelerate past their competition, such as use for predictive analytics. Why? Data discovery is that first critical step to answer today’s important enterprise questions, enabling you to transform raw data into increased data intelligence. Here are a handful of examples:

Data discovery to improve customer experience

Arguably, every company is in business to better serve customers, and data discovery is that critical first step toward understanding how to better fulfill consumer needs. Consider that during 2020, consumers spent about $1 million per minute online. The metadata generated about each transaction creates a potentially untapped record of future buying behavior. How was a product or service found and paid for? What else was purchased with it?

Data discovery can help reduce buying friction, leverage affinity marketing to group similar buyers with common needs, and increase customer brand loyalty by improving the alignment between buyers and sellers for greater satisfaction. Optimizing revenue through improved product planning and delivery means better consumer relationships that begin with data discovery and transparency.

Data discovery to fuel self-service analytics

Modern organizations are emerging through a period of digital transformation with new insights into data that improve business operations and accelerate new product and service development. As increased data literacy and data democratization strive to make more raw data available for data analytics, it’s not just the data scientists tapping into new value. Chief data officers, line-of-business owners, marketing, and similar teams focused on revenue-growth agendas are increasingly mining data and applying data visualization tools to unlock hidden potential as an output of data discovery.

Is data free of unfair bias and used appropriately to be considered trustworthy? What operational gaps exist for understanding and protecting classified data types? Where is data duplicated across an organization, creating risk exposure?

Data discovery helps turn data unknowns into data opportunities with insights.

Data discovery to reduce risk exposure for compliance

With the data privacy law landscape evolving as global regulators adopt best practices for data protection and data transparency, two fundamental data discovery questions have emerged:

  • What data classes require data governance oversight?
  • Where is that data being used that creates risk exposure?

Today’s data privacy mandates are focused on personal information with new consumer rights obligations that require timely fulfillment during routine inquiries. Data needs to be classified as subject to mandates such as the GDPR, the CCPA, LGPD, and other laws. And, just as critical, that data requires monitoring to track appropriate use. With the Schrems II decision invalidating the EU-US Privacy Shield framework, organizations in the EU, as well as forward-thinking global organizations, in general, need to consider data location and movement risks, data residency rules, cross-border transfers, and other data lineage insights to better control unnecessary risk exposure as data is shared.

Data discovery for optimized business operations

Boosting revenue through greater efficiencies, controlling operational costs and deploying resources for improved business outcomes can start with data discovery for improving data intelligence around operational processes. This is increasingly true when determining the impact of cloud data migration for greater economies.

Data discovery helps determine what data requires protection, what data should be minimized to reduce ROT (redundant, outdated, trivial) that drives retention costs and risks, what data is ideally suited for migration to cloud apps, and what data priorities take precedent. Every business planner needs to consider operational and capital expenses to maximize revenue, and data discovery to drive efficiencies can help to increase the bottom line when turning raw data into data intelligence.

Six fundamentals of data discovery

Organizations can accelerate data discovery by adopting modular capabilities across an integrated platform approach for gathering and managing metadata-driven intelligence. A platform approach allows organizations to scale efficiently and adopt solutions based on organizational maturity.

Here are six fundamental requirements of a complete data discovery approach to consider on your road to data maturity:

1.     Data collection with complete scope to discover the unknowns

Your data discovery program is only as good as your enterprise-wide coverage. How well can you search across data sources and scan for information, such as dark data, to catalog data types?

Organizations that have tens of thousands of data sources cannot inventory this business-critical data manually. There simply isn’t enough time, and accuracy will be in question. You need to use AI and ML to automate data source scanning. Advanced scanners can help you create a complete metadata repository, where trusted data can be curated.

2.     Data preparation to enable high quality in the data discovered

The process of transforming raw data into actionable information that offers data intelligence is fundamental to data preparation. Data preparation enables processing raw data across the multitude of sources mentioned above during data collection to sanitize it for further handling and analysis. Organizations with mature data management tools to extract, transform and load (ETL) help to fuel data discovery and analysis, and to then operationalize further data use.

3.     Metadata inventory to capture the details of discovered data

A data inventory lists all the data assets available to a data catalog. This includes the data location, details about each data repository and its data types, and similar metadata (data about the data) from your data collection. Like data collection, this is near impossible to perform manually and requires AI and ML to accelerate building a metadata inventory.

4.     Data exploration and analysis to turn discovered data into data intelligence

For data collection to result in increased data intelligence, data sets need to offer meaningful insights that drive data context of use policies for data governance programs. Data visualization tools can help drive improved data handling. Data exploration and assessment using analysis tools can help shape the important questions that need asking and resolving.

5.     Data lineage for transparency into data movement and sharing

Not only is it critical to apply AI and ML to efficiently determine where data is located, but also understand where data is moving, as this directly impacts governing appropriate use or taking actions to mitigate improper exposure. Data lineage supports data visualization with insights into tracking where data is shared. With the recent disruption of supply chains and a workplace pivot toward remote workforces, data proliferation has only increased, while risk exposure from proliferation creates new obstacles to data trust.

6.     Classification and tagging to organize data domains from discovered data

To begin to derive insights, data types require classification into data domains. Data domains may consist of personally identifiable information (PII), electronically protected health information (ePHI), financial (e.g., payment cards), or similar classes. Classification and tagging are driven by rules for matching data that enable it to be dispositioned. As an example, today’s privacy regulations such as the EU’s GDPR or California’s CCPA, along with similar regional residency rules governing data proliferation, make tagging important to identify personal information along with location attributes to apply data protection for regulatory compliance.

Customer success with data discovery use cases

Here are a few examples of how data discovery helped enable greater insights into business-critical information to accelerate data intelligence and unleash value across enterprises.

Railinc modernizes its data infrastructure

Railinc Corporation provides rail data and messaging services to the North American freight railway industry as a for-profit subsidiary of the Association of American Railroads based in North Carolina. As Railinc modernizes its data infrastructure, critical data has become distributed across Oracle, data lakes, and tools such as SAS.

However, business users were challenged to find and understand the data they need. Railinc embarked on a major effort to address this problem by cataloging all data assets and adding business context to the data to enable easy search and discovery.

Confidence in the data is improved through visibility into how data sets are created, along with business descriptions and tags that add business context, resulting in a focused and phased approach to drive business user adoption. Railinc learned valuable lessons and best practices related to deployment and scaling, and driving increased business user adoption, on their mission to better serve its customers with data insights.

Biogen achieves a unified view of data across the enterprise

Biogen is an American multinational biotechnology company specializing in the treatment of neurological diseases to patients worldwide. By establishing an enterprise data catalog to provide a unified view of data across the enterprise, it is driving the use of self-service analytics, reducing the time required to find and understand relevant data, and helping ensure data is used appropriately.

With increased data transparency through data visualization and lineage mapping, data discovery helps to plan Biogen’s cloud data lake modernization with end-to-end lineage data visualization supporting the development of data pipelines for their cloud data lake. Moreover, along with business glossary integration and data curation, data discovery supports enterprise data governance programs for an enterprise-wide approach to effectively manage data value creation agendas while minimizing data risk exposure.

Eli Lilly & Co. creates trusted data foundation to drive decision-making

As a global pharmaceutical leader, Eli Lilly is uniting care with data discovery to create medicine and improve lives around the world. Utilizing Informatica’s data governance and privacy platform—along with Amazon Web Services, Salesforce, and SAP to drive their Cloud First and Customer-Centric strategies—Eli Lilly delivers consistent and trusted data across its organization to fuel intelligent business decisions across 120 countries.

Not only is it increasing revenue with greater agility, but it’s also enabling high-performance data discovery that allows the company to process over 10,000 daily transactions in under a minute!

On its journey, Eli Lilly created an enterprise data governance program to discover and collect silos of healthcare knowledge and better understand its context within the business; improve analyst and engineering productivity by making it easier to locate and understand the data they need; and monitor compliance with data protection regulations such as the GDPR and the CCPA.

For privacy compliance, it can respond more quickly to data subject access requests for data privacy and protection requirements with an understanding of personal data types, locations, and identities.

Data discovery: Take your first step to new data insights

Data discovery opens the door to new insights, helping transform raw data collection into data intelligence that fuels value creation and data risk management in the modern enterprise. To learn more about Informatica’s comprehensive data governance and privacy solutions, download the Extract Value from Your Data with AI-Powered Data Discovery ebook and see our Meet the Experts webinar on data discovery for data privacy.

Accelerate your journey today! 

First Published: Sep 30, 2021