Everyone recognises the value of data today but adopting a data-driven decision-making approach means gathering more data from more sources than ever before. As the volume of data that organisations collect explodes, it presents challenges for data management processes which must deal with disparate data silos, ensure governance rules are complied with, and deliver high quality data in near real time.
These issues can be addressed using the DataOps methodology, which increases the productivity of an organisation’s data engineers, resulting in faster, more accurate analytics, giving the competitive advantage everyone wants. As the name suggests, it borrows established practices from DevOps which has reshaped software engineering to shorten development cycles, increased collaboration with business stakeholders, to create higher quality software.
What is DataOps?
There is no universal definition of DataOps, however, there are several core concepts, the use of the Agile methodology to support the creation of data pipelines, automation of the build cycle using a DevOps toolkit and the adoption of Statistical Process Control techniques to monitor active pipelines.
Agile takes a collaborative approach to development, small cross-functional teams are created with participants, including data technology specialists and business users. Development takes place using short, ‘sprint’ cycles, delivering small pieces of working at rapid intervals. The inclusion of business stakeholders in the team allows the developers to receive frequent feedback and be more responsive and adaptive to changing requirements.
To support short Agile development cycles, DataOps uses automation techniques pioneered by DevOps for integration, deployment, and testing, providing continuous delivery of new features and fixes. The goal of bringing DevOps methods to data engineering is to move away from a world where experts create bespoke data pipelines for each use case, to one of highly automated assembly lines, delivering mass produced pipelines.
By using code to create pipelines and infrastructure you can spin-up new environments as required, creating identical copies of specific set-ups. Our experience at KPMG is that being able to achieve consistency across environments has significant benefits, repeatable testing results across environments, standardized maintenance processes, plus a reduction in troubleshooting effort.
DataOps also looks to improve the stability of operational data pipelines and the quality of data, by adopting Statistical Process Control (SPC) techniques from Lean manufacturing, which was pioneered by the automotive industry, where products must be made quickly and efficiently, while maintaining high quality standards.
The SPC monitoring approach collects metrics relating to data quality and performance at each stage of the pipeline to ensure they are within a normal operating range. If statistical variations occur, the data team can be automatically alerted.
Data Governance is a critical element of an organisations data management, and it should reflect the iterative nature that DataOps brings to the delivery of data. By using the metrics that SPC establishes at each stage of the DataOps workflow it’s possible to apply the principal of continuous improvement to Governance. Every cycle of the DataOps process provides the opportunity for fresh observations of the data and to build those insights into the governance system.
To align with the philosophy of DataOps, there should be a drive to replace manual compliance checks, with automatic procedures becoming part of release and data pipelines that enforce, measure, and report on governance standards.
To successfully build a DataOps capability an organisation must invest in the skills of their people and review the culture that surrounds its data processes.
Existing data engineering teams have typically been building and maintaining ETLs that take data from a small number of relational databases in the enterprise and populate a corporate data warehouse. DataOps expands the role of these teams to support the ingestion of both structured and unstructured data from large numbers of data sources to repositories where it can be shared throughout the organisation. To make this transition in scale data teams must acquire the skills and, importantly, the mindset that ‘everything is code’ so that all processes are automated.
DataOps doesn’t just require changes to how data teams operate, business stakeholders need to participate in Agile processes, providing requirements and feedback. The responsibility for data quality and governance should be placed in the hands of the teams who work with the data, to build greater collaboration between data consumers and producers.
Starting your DataOps Journey
You probably have the foundations for DataOps already in place, skilled Agile practitioners within software development teams, experience with DevOps tools such as GitHub, Docker and Jenkins that automate code releases, together with familiarity with monitoring operational systems. DataOps brings together these modern delivery techniques and applies them to unlock value from the data your organisation collects.
When selecting your first project look to demonstrate how transformative DataOps can be, consider candidates that will lead to a measurable financial return for the organisation or those that grab attention and have a high profile with senior management, such as demonstrating how DataOps can underpin cutting edge initiatives in AI or ML.
KPMG’s experience can help build your DataOps capability in different areas:
- Creation of a data strategy, using DataOps initiatives to deliver a capability to make data-driven decisions.
- Provide coaching on the Agile methodology to help improve collaboration between teams.
- Advise on the selection of a toolset that supports a programmatic approach to the creation of data pipelines.
- Support the creation of a unified data set by mapping schemas from multiple data sources to form a common data taxonomy for your organisation.
- Help design solutions to incorporate automatic tests in your deployment processes and data pipelines, so problems are found before they impact production.
- Establish a Data Governance Strategy, documenting practices to ensure data is secure, accurate, available and managed.
- Help you develop a security policy that delivers the control the business needs, while supporting agile exploitation of data.
If you’d like to chat about how DataOps could turn the mountains of data you’re collecting into fast, accurate answers to the questions your business is asking get in touch.