One of the oldest four-letter acronyms in the technology world is GIGO – garbage in, garbage out. It explains quite pithily that even the best and most expensive technology can’t produce good results from bad data. This is true too for AI models say Liam Cotter and Niall Duggan of KPMG.
Nothing has changed, yet everything has changed as a result of the advent of generative AI and large language models. Their ability to surface unstructured data and use it to generate insights is enormously powerful and dangerous at the same time.
The problem lies in the quality of the unstructured data, and indeed much of the structured data, which it may have access to. The best that can be said for it in many cases is dubious.
In the main, this situation has arisen because organisations simply haven’t been using this data to any great extent up until now. Documents and records that have remained untouched for many years could be verified individually when they were accessed for a particular purpose.
Refining data for best results
This is no longer the case. If organisations want to take advantage of the potential benefits offered by GenAI then they must give it access to their full treasure trove of data, or as much of it as legally permitted. But if data really is the new oil, much of it is in need of refining to unlock its value.
Unfortunately, too many organisations are turning AI loose on the data they have now without first addressing the quality and governance issues associated with it. This is something we at KPMG have been flagging for some time. AI and data analytics both need good trusted, consistent, and well curated data to work properly and deliver value. This can be quite rare.
Organisations tend to have very fragmented enterprise data environments. Data can be stored on premise, in the cloud, externally with third parties, and it can be both structured and unstructured. Typically, there are lots of siloes and duplication. And this results in separate parts of the same organisation interpreting the same data differently.
Finding the best storage solution
It’s a complex problem to solve. First of all, there is the sheer volume of data, much of it historical in nature, held by organisations. And then there is the way the data is passed around, often being stored in multiple locations, amended and altered in different ways in different places, and subject to misinterpretation which again results in the same data having more than one existence.
This is not a new problem, and it has already been addressed for business intelligence systems. The standard solution has been the creation of data warehouses or farms which attempt to offer a single source of data truth for the entire organisation. But with the enormous volume of data required for AI to deliver on its promise, the cost of maintaining and resourcing a single data warehouse or farm would quickly become prohibitive. Furthermore, the effectiveness of having data stored in single locations is now questionable.
As a result, we are now seeing a move towards data mesh infrastructure. This sees data stored in multiple interconnected, decentralised domains which are all equally accessible. They are organised by business function, so the people most familiar with the data and best qualified to assess and assure its quality are in control of it. This helps to ensure the consistency and good governance of the data. That in turn is the key foundation required for the adoption of AI and GenAI in organisations.
The need for team collaboration
This structure brings other advantages. It allows different collaborators to work together, for example. In the warehouse model, all data was in the hands of the IT department and that has seriously limitations. IT professionals may be experts on secure data storage, but they are typically not familiar with the nature of the data itself and can’t be expected to vouch either for its quality or the accuracy of an interpretation put on it.
On the other hand, when different parts of the business are responsible for the management and curation of their own data, they can make more use of it and work together to create new uses.
Of course, the people in those departments or functions are not data experts and that needs to be addressed. One way of doing so is through a hub and spoke model where the different data holders can avail of services from a centralised resource. At KPMG we have already assisted a number of clients with the establishment of data competency centres which deliver specialist data expertise to the mesh thereby ensuring consistency in approach to data storage and handling.
While organisations do need to adopt AI and must do so fairly quickly, there is a need for a measured approach to how it is done and how the data foundations are put in place. It is not a question of boiling the ocean. The mesh does not have to be created all at once, nor should it be. It needs to be developed and tested function by function. This allows organisations to fail fast and go back start again if necessary.
AI can be deployed while the mesh is under development and can be given access to the data in each domain as it becomes available.
Verifying data for best results
This approach also enables organisations to combine different technologies if they wish to. In many cases they will have bespoke legacy technology investments which would be very costly to discard. Taking a step-by-step approach to mesh construction can allow them to retain those elements of their existing infrastructure that are still fit for purpose.
Development of a data mesh is not simply a technology exercise, however. It is also a data cleansing and quality assurance process. All data in the mesh should be verified for its quality and consistency.
That is vitally important for organisations where the lineage of data can be doubtful. For example, an energy utility’s meter data sits in multiple areas of the business including the billing function and the asset function. That data needs to be brought together into one coherent object and there is a requirement for disparate systems to be joined up and for a common taxonomy to be shared when describing the data. This will enable the AI systems to learn from the data in a consistent and more reliable way.
This cleansing and verification exercise offers significant benefits in relation to compliance with new reporting standards such as CSRD. Having readily accessible quality assured data will make the reporting process much less onerous.
Putting in place the correct governance and controls
Having addressed the quality and accuracy issue, the correct governance and controls must be put in place in relation to privacy, data protection and security. Organisations must ensure their data is not utilised inappropriately by AI systems. This requires constant monitoring of data management and governance.
Other key aspects to be addressed are the culture of the organisation and the skills of its workforce. Organisations need to become data-centric with their people adopting a data mindset if they are to take full advantage of the value of their data. They must also look at the skills within the workforce and ensure that everyone has at least basic data skills and that the organisation is not dependent on the IT function to get business insights from its data.
KPMG is in a unique position to help organisations to understand the value GenAI can bring to their business, support them in addressing the underlying risks, and provide them with the modern data platform needed to speed journey to becoming a data centric enterprise.
Get in touch
At KPMG we understand the pressure business leaders are under to get it right on tech and AI.
To find out more about how KPMG perspectives and fresh thinking can help your business please contact Liam Cotter of our AI team. We’d be delighted to hear from you.