Unless you spent the last 10 years in a cave, you must have heard abundant stories about IA, Machine Learning and Data Science! A large number of highly skilled statisticians and actuaries have updated their resume to Data Scientist in order to look better on Linkedin and get more mileage from their trade.
Having said that, the industries’ little dirty secret is that too many Data Science initiatives don’t make it beyond the Proof Of Concept wall either because the findings they produce aren’t perceived as valuable or the data they have to work with is not high in prediction « Octane ».
This article tries to share some important factors and considerations behind the most successful Data Science deployments.
1. The Business Brief
Not unlike an advertisement marketing brief, the business brief is an important conversation that needs to take place between the end-business-user and the data scientist in order to explain what the technology can and can’t do and set clear objectives for the project. In fact, when data science experts try to second-guess the business users, it is not uncommon that they don’t make it beyond the obvious and fail to enchant with their results! In other words, a «Tell me something I don’t already know! » situation becomes highly probable.
One way of enhancing the chances to “hit it on the head » is to understand the dynamics and consider the aspects of the business that the users can act upon. For example, in a marketing organization, such levers can be focusing telemarketing efforts on a higher potential set of prospects, or arbitrage advertisement budget spending in favor of the best medium in a given season as predicted by a model.
2. Think outside the data sources box
With an Analytics Platform-as-a-Service, it is very easy to include new sources of data and add new features to see if they enhance the models’ test scores. Publicly available structural and cyclical data such as weather or economic indexes can give your models a boost. Other data concerning targeted customers or the competitions’ activities can be inferred by sourcing their social networks signals (Facebook events, Visual intelligence applied to Instagram, …)
3. Feature engineering
While any decently trained data scientist will agree that Feature Engineering is an important (if not the most important) step in the process, it is easy to succumb to the temptation of throwing every raw data you have on the wall an lazily hope that something is going to stick! For example, in a series of Dates, perfume sales spikes are explained only if you capture the fact that February 14th is Valentine’s day and December 25th is Christmas. If you want to predict an opponent’s choice in a Rock–paper–scissors game, the machine will give you better assistance if not only it is fed the previous choices series but also the « who wins » feature.
Platforms like ForePaaS gives data scientists the power of both SQL and Python to model and query potentially complex features within a data set.
4. Train, re-Train and Score
An important part of putting predictions into production is the ability to make them part of the IT daily routines and monitor their performance. Indeed, most IT departments are reluctant to take on the management of such processes because they don’t feel comfortable answering business users’ questions if something goes wrong. Including a unique model management feature that takes care of the models’ life cycles, it allows to configure their periodic training and scoring routine. Emitting alerts is useful for knowledgeable experts to be updated when the prediction power of a model goes below critical thresholds.
5. Prediction and Data Visualization combined
While it is easy to implement an API that serves prediction results for consumption but other systems (i.e. Credit Scoring), a form of collaboration between the users and the models can take place if come of the mechanics are exposed: For example, a graph representing the order of feature importance can be confirmed or challenged by a business expert. Similarly, user-generated predictions can be used as features in model to nudge them towards those surprisingly different results that may lead to rethinking on either side. For example, feedback on reassortment human decisions can allow the supply chain managers to focus on those that they “didn’t get right » and learn new tricks for the next time!
Successfully and quickly scaling data science projects in production is still a struggle for many companies, although more and more are exploring AI possibilities. That’s why solutions offering capabilities to accelerate AI operationalization exist and help to optimize the whole process. By providing an all-in-one environment and abstracting complexity, ForePaaS supports companies planning to deploy AI into production by putting in place a consistent pipeline and best practices.