Using Machine Learning and ElasticSearch to discover unstructured Gold
Mythili Baker, Head of Innovation
Our client, a large distributor of electricity and gas, had a wide range of responsibilities and tasks managed via their call centre. Their call centre data repository was vast and contained a high volume of rich unstructured text. They hypothesis was that this data contained valuable insights that they were not taking advantage of. For example, did the free text details updated by site workers reveal any trends or common categories of repair work? Or did the details of the work carried out represent the original assessment of the problem and work required?
In order to see what insights were contained within the call centre data repository, they approached Altis to develop a proof of concept. Altis utilised both Machine Learning and text interrogation to see what the data could tell us.
Firstly, we used Microsoft Azure Machine Learning, where we tested the ability to classify rows of unstructured text into classifications (e.g.: Tree, Pillar, Network etc.). This was possible using a Naïve Bayes predictive model, where we achieved a 90% accuracy for a relatively simple classification.
Secondly, using ElasticSearch, we then interrogated the text that was contained in the call centre repository. We tested the ability to search through the free text captured, to determine attributes and then predict and classify text. This enabled us to quickly analyse rows for key words, to view both the original and predicted classifications together, and more sophisticated string searching. We then enhanced this analysis by adding Kibana, to present these insights in visualisations including word clouds and trends over time.
The POC proved there is value that can be extracted from this data. The outputs indicated potential relationships between causes identified upfront and predicted classifications. It also indicated the pre-work of training data sets that will be required to implement the text classification model. The more sophisticated the text classification, the bigger the training data set required to train the model.
The next step in the POC will be to build a training set for the most valuable classification and take the pilot further.
If you’re interested in exploring the value of your unstructured text, connect with us today.