The Difference Between DevOps and DataOps
Samuel Ward-Riggs – Managing Consultant, London
First DevOps, now DataOps (and ArchOps, DevSecOps, and WinOps!?). You would be forgiven for thinking that the tech hipsters have started a campaign of random capitalization, emulating their favourite i-products and the eXPeriments carried out at the turn of the ME-llennium. But make no mistake: DevOps is a step-change in the delivery capability of software teams, and now DataOps promises the same benefits in the world of Data and Analytics.
But before we examine the difference between DevOps and DataOps, let me set the scene…
In the beginning, there was Waterfall
At the dawn of the software cosmos were Big Bang project deliveries. Waterfall was inherited from traditional engineering disciplines, but high-rise apartments, roads, and shopping centres are repeatable, well-known quantities: software is not. Enter: Agile.
Fundamentally, Agile flips the project constraints of scope, time, and cost on their head. Whereas Waterfall produces an estimate of time and cost to complete a fixed scope, Agile fixes time and cost and produces an estimate of scope.
Figure 1: Project constraints of scope, time, and cost in Waterfall and Agile.
Agile delivers a working product to users, feature-by-feature, in iterative cycles. Getting working software up-front means the business realises value from its investments sooner and more cheaply since a product is delivered for the cost and time of only one iteration.
Agile is only a philosophy
To deliver value to customers, Agile approaches must deliver a working product. In two-week “sprints,” the technical effort to release into production and the communication overhead of multiple, disparate teams (commonly Engineering, Testing, Operations) makes development progress challenging and may even prevent finished work from being released. This breaks the very promise of Agile, but that’s where DevOps comes in.
DevOps (a contraction of Development and Operations) aims to significantly reduce the time between when development is done and when the business realises the benefits. As well as a culture of collaboration within the project team, DevOps teams focus on achieving Continuous Integration and Continuous Delivery.
Figure 2: The DevOps lifecycle.
Continuous Integration is the development practice of frequently integrating new code into the shared repository, avoiding “integration hell” when many developers attempt to commit changes at the end of a sprint.
After code is integrated, Continuous Delivery aims to build, test, and release software with one-click deployments from Development into higher environments. Unit and Integration testing is automated and a workflow alerts Acceptance Testers to review the new features before release into Production.
Figure 3: The Continuous Delivery process flow.
To achieve Continuous Delivery, DevOps Engineers must put everything the project produces, from code, to tests, to deployment instructions, into source control. By reducing everything to an instruction that can be executed, the major software houses (e.g. the Faangs: Facebook, Amazon, Apple, Netflix, and Google) have managed to push releases to customers as often as every minute.
Given how well defined and supported DevOps is, why define DataOps at all? Can’t Data and Analytics projects just use DevOps?
The Data and Analytics pipeline
Data and Analytics initiatives more closely resemble systems integration and business analysis than they do a typical software project. The first major difference is the creation of a Data and Analytics pipeline which copies the business’s operational data, makes transforms according to business rules, and populates a Data Store from which analysts can understand how the business is performing.
Figure 4: The Data and Analytics pipeline and Data Store.
The Stateful Data Store
Continuous Delivery in software engineering is made simpler because most software products are stateless: if a new deployment fails, we simply revert to previous version. A Data Store, however, is rather different since a new application version may include schema changes. Reverting to a previous application version would also require reverting to a previous schema version (as well as removing data in any new objects or fields).
To gain Continuous Delivery capability in Data and Analytics projects, the team must have the ability to roll-back changes to the Data Store if necessary. This also means that each team member must have their own copy of the Data Store, otherwise changes to a shared schema during development would cause local copies of the code to fail where fields have been added or re-named, or where temporary test data violates business rules elsewhere.
Each day, as the business sells more widgets and services more customers, the Data Store is updated and what was a valid business rule yesterday may produce errors today. A business expert can look at reports, dashboards, or predictive models and know instinctively when something is significantly off the mark. To maintain end-user confidence the Data and Analytics platform must strive to do the same. To this end, DataOps borrows Process Control, a key concept from Lean Manufacturing.
Rather than hard business rules that limit business transactions entering the Data Store, Process Control uses statistical algorithms to alert the team to data anomalies without interrupting business as usual.
The difference between DevOps and DataOps
DataOps aims to improve the quality and reduce the cycle time of Data and Analytics initiatives. The difference between DataOps and DevOps is in the unique nature of developing with data and delivering data to users.
Having seen some of the ways that Data and Analytics differs from software engineering (the Data and Analytics pipeline, the stateful Data Store, and Process Control) we can also show the different ways in which DevOps and DataOps deliver value and assure quality (Figure 5).
Figure 5: The difference between DevOps and DataOps.
The modern Data and Analytics team
Data and Analytics has long lagged software engineering in delivery rigor, but the world of traditional Business Intelligence has been shaken up by new technologies and a fresh injection of data professionals with varied backgrounds.
To continue to deliver business value, teams must embrace practices that raise their maturity level over the long-term. The term DataOps is important because it draws a line in the sand, declaring that Data and Analytics projects can make big improvements in capability and generate more business value in the same way that DevOps and software engineering has.