Data governance

Why better data governance is key to better AI

Artificial intelligence is everywhere. If a business isn’t using AI, they’re either pretending to use it or pretending they’re about to start any day now. Whatever problem your business is facing, it seems like a solution powered by business intelligence, machine learning, or some other form of AI is available. Yet beneath the hype, the truth is that many businesses can actually benefit from this technology – if they take the time to learn what it can (and can’t) do for them and understand the potential pitfalls.

Essentially, AI allows its users to do useful things with a large pool of data – for example, extract insights without hogging data scientists’ time. Data is therefore fundamental to AI. There is a direct relationship between the quality (and quantity) of what is fed into a machine learning application and the accuracy of its output.

Data governance has traditionally been viewed in terms of compliance with regulations that dictate how data should be collected, stored, and processed. But AI has introduced new challenges and risks to manage. It is not enough to obtain a large amount of data; you should also consider its characteristics. Where is he from ? What does it actually represent? Is there anything you need to consider before introducing this material into your algorithm? Will it train the algorithm for the right things?

“We can use AI to identify unusual patterns of behavior in a company…or we can see a company change the way it makes money through real-time contracts,” says Franki Hackett, head of audit and ethics at data analytics company Engine B.” To do that, you obviously need a clear idea of ​​what’s relevant, as well as high-quality governance processes on your Otherwise, you either find there are too many “risky” things to consider, or the AI ​​is pointing you in the wrong direction.

No more input required

One way to approach data governance is to use a tool known as the observability pipeline. This ensures that every process is visible, collecting data that is then unified and cleansed to create a more consumable final dataset.

An example would be converting raw website logs into an analytics platform. The original data and its point of consumption are “buffered” by the pipeline: raw data enters and is processed before being sent to where it needs to be consumed. The consumption method can easily be changed because the underlying data is not affected, i.e. you can change how the data is presented without changing the collection process.

AI can both benefit from and be part of this process. The pipeline itself can feed an algorithm, but machine learning can be used to detect anomalous data (based on past trends) before it goes too far. This can save users the time and effort they would otherwise have had to spend on checking and cleaning the data and, once it has been processed, investigating any irregularities. But it can also ensure that business-critical algorithms don’t receive data that would cause them to draw the wrong conclusions and, potentially, negate any benefit gained from introducing AI to the process in the first place.

Ensuring observability has many benefits for data streams that don’t involve AI, but the sheer volume of hardware involved and the complexity of machine learning processes means knowing what’s going on is critical. happens to the data being processed. Verifying that the number of visitors in your web analytics matches what your logs are telling you is trivial compared to understanding the output of a complex algorithm that is trained and tuned over time.

If a dataset and model are considered too accurate in their score, it could lead to over-representation, which would make things go horribly wrong.

This is because a system that might have started by providing the information you were looking for might drift further and further away from generating anything useful. The better your view of what happens to the data, the easier it is for you to prevent that outcome.

The risks in this regard may be more serious than, for example, the potential overstatement of a set of projected sales figures. Dr. Leslie Kanthan, co-founder and CEO of AI company TurinTech, gives an example where the stakes are much higher: “If AI is applied to a hospital’s magnetic resonance imaging scanners and misdiagnoses a serious illness such as pulmonary fibrosis as bronchitis, causing the patient to take the wrong medication and experience unwanted side effects, who should be held legally responsible?”

He continues: “Similarly, if a dataset and model are considered too precise in their score, it could lead to over-representation, which would make things go horribly wrong. For example, an AI model used to predict future criminal behavior could over-fit the data and falsely bias against ethnic minorities.

Data governance is key to ensuring that AI produces useful results. It incorporates an understanding not only of ethical and legal issues, but also of the implications they have on the material that should be collected and its potential limitations.

The organizations that will benefit the most from AI will be those that take the time to create a framework to ensure they are targeting the right data; collect enough; check and clean it to ensure it is of a high standard; then to use it appropriately and ethically.

With the right data governance in place, these companies can maximize the benefits and minimize the risks of using AI to deliver insights that will streamline their processes, inform their decision-making, and create powerful new products and services. There’s more than hype behind what AI can do for your business – as long as you lay the right foundation for it.