The Future of Data Engineering
Neil Sparrow, Managing Consultant, Altis London
Neil recently attended a fascinating meetup discussing how the nature of how we manage data could change. Hosted by the DataNinjas community and Blis group, he heard speakers from the web advertising, data hosting and data science sectors.
These are his key takeaways:
Data as Product
We should treat Data as a final product. Not a demo or a temporary file, but a real product with attributes such as shelf-life much like any other product in a supermarket. Taking the example of tins of pilchards, these have clear labeling to inform us exactly what we are consuming, who created it, an indication of its quality and the constituent parts such as carbs/fat/sugar etc.
Why would we use this analogy with our data? Metadata is the equivalent of the label on a tin. Get this right and the end consumer trusts what we are supplying and understands its uses, benefits and limitations.
Perhaps some form of industry metadata standard is required?
Data Science meet Data Warehousing
Big data technologies and techniques are now truly embedded in the modern world of BI and are fast changing how we can deliver our projects. The use of data lakes is common leading us to treat our BI databases as another consumer of the source lake data, just like any other application, rather than them being the final destination for all data and ultimate source for other systems.
This frees us from a number of technical constraints meaning we can deliver data ecosystems that not only deliver an Enterprise Class Data Warehouse, but also a data lake capable of supporting numerous other applications and services. More bang for the buck.
Skills & Technologies
The landscape for data tooling has exploded in the last few years and the choice can seem daunting. With the merging of Data Science/Big Data tools into the BI arena making the right tool decisions is more complicate than ever.
Advice given by the group focused on picking a tool set and “sticking to it”, with a standard set of tools approved for use within an organisation. The alternative is a mishmash of technologies chosen by personal preference which become unscalable and ultimately unsupportable.
As always the focus was on people rather than technology. How do we enable innovation and free creative thinking while still delivering business value?
An insightful evening and great visuals. An unlabelled can of pilchards is a great metaphor for data which lacks the key element to make it useful, metadata.