Big Data Navigational Aids
Altis’ CEO, John Hoffman, shares some insights into how to successfully navigate the Big Data landscape.
I am sure many of you flinched when you first saw this graphic on the Big Data landscape when it was published by Matt Turk in April. For those of you who missed it, have a look and weep. We are being challenged to leverage Big Data and Analytics solutions to deliver tangible business outcomes and every single vendor below has a pretty PowerPoint describing how their solution is the silver bullet. The proliferation of solutions has become so vast that it is difficult to know where to start, especially if you add the variable of not wanting to invest into a product/platform that may not get its next round of funding. On the flip side, there are some absolute winners that could deliver a competitive advantage for your organisation.
Source: Matt Turk Firing on All Cylinders: The 2017 Big Data Landscape
In working with our clients to traverse this landscape, we have been proposing some navigational aids to increase the chances of success for big data projects. They are:
- Iterative/Agile approach to reduce costs and increase capability to trial solutions.
- Seriously consider open source or commercial versions of open source.
- The big vendors such as AWS, IBM, Microsoft, Oracle, SAP and Salesforce are investing as much or more in R&D than the start-ups and have the financial capacity to acquire functionality.
Let me share a few of our projects that demonstrate these navigational aids.
Recently we were engaged to develop the capability to identify blockages in water pipes using photos and images. During the sales process, we looked a variety of solutions and had originally planned to utilise Tensor Flow (Google Machine learning architecture that has now been released to the open source community). Now that we are about to kick off the initial sprint, we have changed direction and are looking to utilise Microsoft’s Custom Vision Service which is part of their Azure AI suite. I can pretty much guarantee that we will be using a different suite of tools and logic or at a minimum, a combination of tools to deliver the final capability. Applying our navigational aids to this project:
- Iteration: When we contracted the project, the client and Altis agreed upon the general outcome, time frames and budget. We did not attempt to define the detail of the scope or technology. It is going to be fast and iterative and some of the benefits of this approach will include discovering what tools/architectures are not a good fit.
- Open Source: There is an amazing amount of functionality available from the open source community. For example, our original architecture used Tensor Flow. What is interesting is how much is actually produced by commercial organisations and being shared with the open source community. Tensor Flow (Google’s Machine Learning/AI framework) or Kylo (Teradata’s data lake management software) are great examples of this.
- Big Vendors: When we first pitched the project, we looked at Microsoft’s Azure services, as the client is a strong Microsoft user, but they didn’t have a robust solution for the image analysis piece. At the time, we looked at piping the results back into Microsoft’s Cortana suite as that was Microsoft’s architecture direction. Now as we are about to start the project we are looking to utilise Microsoft’s newly released Custom Vision Service, which we tested (took less than 30 minutes as it is a service with an exposed REST API) and the results look promising. This is a classic example of the engineering capacity of the big vendors in a cloud environment releasing products at a staggering pace.
In another example, I am going to revisit a project that I mentioned in our last newsletter. We are now in production for a reporting and analytics solution for one of the largest Point of Sale data sets in Australia. Here are how the navigational aids were used at this client:
- Iteration: The project took 2 months from kick-off to production with 4 iterations encompassing 3 different architectures.
- Open Source: The solution leveraged a variety of open source or commercial open source tools. Specifically, a significant amount of Spark was used. In addition, Altis utilised AWS EMR which is the AWS commercial version of Hadoop.
- Big Vendors: The team took advantage of the significant investment that AWS is putting into its platform including improvements in Redshift (columnar database), EMR (commercial Hadoop as a service) and Lambda (serverless architecture) to accelerate the delivery of the project.
My last example is a client with over 400 locations Australia wide. I have included them as they are not using all the new technology and services that are out there, as they need to go through internal change to be prepared to do so first. This is very common with established firms and I wanted to highlight that you don’t need to fear that you have or are falling behind. This company, like many, wants to position itself to embrace the big data landscape, but first needs to change the way they deliver work. Specifically, they need to migrate from a waterfall/high documentation process to an agile approach. Along the way, we are helping them create a common language between IT and the Business including basic things like agreeing on what “done” means and creating continuous communication between all the involved parties. We have just led their first 4 week sprint and there are a lot of lessons learnt that will be incorporated into future sprints, with the end game being to enable the business to react faster in using big data technologies to deliver tangible business outcomes.
- Iteration: That is the big thing here. Once the business and IT are working collaboratively together and delivering outcomes faster, they will be able to trial and utilise big data technologies to benefit the organisation.
- Open Source: Currently this is not a significant component of the program of work, but may become so in the future.
- Big Vendors: This company uses technology from SAP and Microsoft but are not leveraging the latest features and services as they are released. Built into the upcoming sprints is the enablement to trial vast amounts of functionality that are being released by their core software providers.
In summary, the Big Data landscape is vast and getting more complicated every day. Saying that, there are a few navigational aids that you can use to traverse this complex landscape effectively. First, focus on iterations – not big bang. Second, don’t be afraid to use open source technologies as they are often very powerful, cost effective and often backed by large commercial organisations. Lastly, take advantage of the myriad of functionality and services that are delivered via the large amounts of money spent on R&D and acquisitions for the cloud stacks of the Big Vendors.