AWS re:invent 2017 – Data and Analytics Announcements.
Every year, we take a look at the avalanche of announcements made at AWS re:invent and explore the possibilities offered by new Big Data and Analytics products and services.
Following the trend from last year, the focus has been put on the Data and Analytics stack, especially in the Artificial Intelligence and the Machine Learning spaces. But first, before we take a look at some of those enhancements, let’s get back to fundamentals with an interesting new EC2 instance.
EC2 H1 Instances
Like the D family, the H family is data focused, but with more processing power. The typical use case is for heavy data streaming and processing. AWS already offers Managed Services covering these use cases, but there may be a time when full control on the underlying OS is required, or when public end-points used by those services are not allowed. This is when EC2 H1 comes along.
Amazon Neptune and Aurora Server-less
There are two big announcements on the DB-front. The first is Amazon Neptune, a Graph Database that will be the subject of a separate blog.
The second is Aurora Server-less: Aurora’s architecture was decoupling storage from compute, where users would provision instances based on their compute requirements and storage would scale automatically. There was no need to buy more compute to get additional storage. When there was a need for more compute power, the instance could be scaled up easily, but there was no option to scale out. It was not ideal for highly fluctuating data processing requirements.
Now with Aurora Server-less, both parts of the architecture (compute and storage) can scale in and out independently, making the applications using it more resilient to sudden changes in storage and processing power requirements.
As mentioned previously, major announcements were made in the ML area and the biggest announcement was the introduction of SageMaker.
SageMaker offers a workflow experience when building and publishing ML models, although every step in the workflow can be used in isolation.
At a high level, the steps are data preparation, model building /training and model publication.
For data preparation, Jupyter notebooks are used. Jupyter allows exploration of data stored in S3. To help build the models commonly used Machine Learning algorithms are embedded in the service, while custom ones can be built using docker.
With SageMaker, AWS caters for more advanced ML practitioners who are limited by the “black-box” aspect of AWS’ other ML service called … AWS ML. Let’s hope this doesn’t confuse too many people. It will be interesting to see how AWS comparatively positions the two products.
This is my take on some of the announcements made by AWS in the Data and Analytics space. There were a lot more announcements, especially in the Artificial Intelligence space that will be covered in a future blog. We are curious to see the adoption and the real life business problems that will be solved by these new AWS features.
-Guillaume Jaudouin, AWS Practice Lead