What is a Data Product?

What do you think of when you hear the term “product?” Typically, a product is viewed as something we purchase for a specific purpose. We buy groceries so we can make meals, plates and silverware so we can eat those meals, and sponges, dish soap, and towels so we can clean the dishes afterwards.

Similarly, data is often considered within the context of the insights it can provide. What does this market data say about the viability of our stocks? What does our consumer data say about their likelihood of purchasing certain items? How can this demographic information help us predict the spread of a new virus?

The mechanisms and applications that are built to answer these questions are known as “data products.” In this article, we’ll define the concept of a data product, explain how they’re used for specific purposes, and explore some of the most common types of contemporary data product applications.

What is a Data Product?

Your team likely has regular goals you need to meet, such as key performance indicators or strategic business objectives. It’s likely that you also have data that you keep in cloud-based or on-premises storage platforms. How can you take this data and apply it towards meeting these goals?

This is where data products play an important role. A data product is a self-contained application, tool, or platform that is created to process data for a specific purpose or purposes. These are practical tools that teams build to apply their data for specific organizational needs. They handle some analytical legwork, providing users with fast, actionable insights so they can bridge the gap between data resources and business objectives.

Given their similar names, data products can sometimes be confused with the concept of “data as a product.” While data products are self-contained applications or systems, data as a product is a lens through which to treat data resources. One of the core tenets of the data mesh, data as a product calls for data to be viewed, designed, and manipulated from the consumer’s point-of-view. This can involve creating data products, platforms, and other tools that make the user experience as seamless and effective as possible.

Characteristics of Data Products

In order to drive business objectives, data products should possess the following characteristics:

Accessibility

Data products should not exist in a vacuum. They need to be discoverable and accessible for the users that are meant to benefit from them. This means that data product creation and implementation should be well documented, and registered in a centralized location for ease of access and maintenance. Making the data product accessible also helps ensure that it will be reusable to drive future goals that require similar capabilities.

Observability

Data products should not necessarily be a “set and forget” means of using data. While they are geared towards streamlining processes, they are still subject to changes made to the underlying data. If data sources change, they could disrupt automated processes built into a data product, leading to breakdowns in reporting or inaccurate results. You should ensure that your data products are reliably monitored and adaptable to these changes.

Security

Data security standards extend beyond data sets, and also apply to data products. Just because an application or tool is granted access to certain types of data, this does not mean that anyone who uses the product should be granted this same level of access. Comprehensive data access controls and governance must be applied across data products in order to adhere to privacy regulations and keep data resources secure.

Quality

Data products are useless if they produce low quality or inaccurate analytics. This is especially relevant when these analytics are driving business or operational decisions. To avoid this, it’s essential to enforce quality control over the data that is fed into the products. With data monitoring and detection capabilities, teams can ensure that the data fed into and extracted from data products is quality and usable for its intended purpose(s), as well as maintain a holistic view of how data is being accessed by which users.

Common Types of Data Products

What are the most common ways data products are used in practice?

Data Visualization

Data visualization tools take data sets and present them in a manner that is easy to comprehend. This can involve charts, graphs, maps, and other visual representations that are better suited for “at-a-glance” understanding by a wide range of users. Popular visualization tools include Google Looker Studio, Tableau, Sisense, and IBM Watson.

The purpose of these tools is to speed up time-to-insights for data users. A chart depicting patient data is much easier for a healthcare worker to view and take immediate action on than a spreadsheet that requires them to comb through hundreds of rows.

Predictive Models

Where visualizations provide up-to-the-moment snapshots of current data, predictive modeling uses these resources to look towards the future. Predictive models are data products that analyze both current and historical data in order to formulate statistical models and generate data-driven predictions. These models can then be monitored and adjusted based on any newer, more relevant data that is generated.

An example of predictive modeling would be an investment firm creating a data product that analyzes past and present trends to provide predictive insights about the stock market. This model would give investors the opportunity to make more data-based decisions about the ways they invest their money.

Search Engines

Most of us interact with search engines like Google and Bing on a daily basis. The purpose of these engines is to interpret a user’s query and return the most relevant information sourced from resources across the entire internet. Being able to cull the best information from this immense pool of content is not an easy task. That’s why search engines create specific data products geared towards carrying out this quality control process.

These products, using machine learning and natural language processing, examine trends in both queries and popular content to surface the most current, relevant, and appropriate results for search engine users. This ensures that the content served by the engine is not stale, outdated, or simply incorrect – without requiring manually scanning and interpreting of an innumerable amount of web pages.

Artificial Intelligence Chatbots

The rising popularity of artificial intelligence is spurring the generation and mass adoption of generative AI tools, large language models (LLMs), and AI chatbots. These range from OpenAI’s juggernaut ChatGPT, to fast-developing tools like Google Bard and Bing Chat, and even self-built models organizations are creating for their own use.

Each of these models is built and trained on mass amounts of diverse data, with the intent of making their query responses as accurate and reliable as possible. In this way, they are very much data products that aim to reduce search complexity and provide relevant and legitimate results to user questions. Where these chatbots have room for improvement is their adherence to the “quality” tenet of data products. Not every query posed to these chatbots returns a reliable and truthful answer, and therefore cannot be fully trusted for business-driving purposes.

Making Use of Data Products

Data products can serve an array of purposes and have a notable impact on how modern organizations meet their goals. The previous section listed some of the most commonly implemented data products, but it is by no means exhaustive. As long as your product is created to serve a specific purpose and is accessible, observable, secure, and produces good quality output, it can have a positive impact on your organization’s workstreams, business objectives, and overall bottom line.

“The data products are probably the most tangible part of the data mesh principles, something you really can quantify, you can put value on it,” said Paul Rankin, Head of Data Management Platforms at Roche. “How much does it cost to build and maintain? How much value is it deriving? How many users are using it? You can really put those metrics to a data product.”

To learn how data product creation and application can serve distributed data architectures, check out the Data Security for Data Mesh Architectures eBook.