Data analysis

Tips for handling unwanted data to improve data analysis

Garbage in, garbage out: The idea is true for many things in life, including data. In a corporate environment, unwanted data can cause both minor problems and serious risks.

Junk data is a term that refers to incomplete, incorrect or otherwise harmful data, which can come from many sources. This can be data generated by bad code or data in a format that is unusable for an organization’s systems.

“Often the most prolific source of unwanted data is one that doesn’t accurately capture what it intends to do,” said Paul Mander, senior vice president of technical solutions and services at mParticle, a customer data platform provider.

According to a study by Experian, 83% of companies see data as essential to developing a business strategy, but also suspect that more than a quarter of their contract and lead data is incorrect. In 2016, IBM estimated the annual cost of bad data to the US economy was around $3.1 trillion, and it seems plausible that this number only increased in the ensuing five years.

As organizations increasingly rely on data analytics to guide business decisions, data quality and reliability are paramount.

“For most organizations, unwanted data is a bigger problem than they realize,” Mander said.

Mander noted that unwanted data is usually rooted in issues with data accuracy or application requirements. “Things look okay on the surface, but when someone digs into it, problems in the data are revealed,” he said.

Filter unwanted data early and often

A cohesion data strategy helps organizations routinely identify and address unwanted data, before it becomes a problem. A data strategy can aid in quality analysis of existing datasets, of course, but more powerfully it can prevent unwanted data from moving forward.

“Because junk data can have many root causes, the first step is to understand what is making your data junk,” Mander said. “Only then can a business begin to clean up unwanted data and put processes in place that can prevent the data from becoming unwanted in the future.”

Everybody on the bridge

Even in organizations with dedicated data professionals and teams, data management is a shared responsibility. Mander recommends that organizations create a cross-functional team to develop their unwanted data management strategy.

Businesses also need to become aware of what good data is and how it supports business goals, both for the company as a whole and within specific departments. By collaborating with teams from marketing, customer service, analytics, and other departments, IT professionals can ensure that the right data is collected in the right places, using the right technology.

This approach can increase the productivity of data professionals by reducing the time they waste on “data processing”, i.e. collecting, organizing and cleaning data in order to make it available for analysis and analysis. other uses.

Joint efforts around data management are essential for organizations that have invested in data transformation initiatives, Mander noted.

“Unwanted data can often block digital transformation efforts,” Mander said. He added that data users may not trust data if they cannot understand it or what it is used for.

Garbage or treasure?

It is important that an organization’s staff understand the value of data. It is also essential that everyone can differentiate valuable data from unwanted data and designate data accordingly.

Whether or not a particular data set is valuable may vary from department to department, so it may be necessary for organizations to develop different approaches specific to departments or use cases.

What kind of junk data is the hardest to manage? Tell us in the comments below!