What are the five critical factors for a successful data science project
In this blog we explore the five critical factors that make a data science project a successful. Unless you spent the last 10 years in a cave, under a rock or in the bottom of the sea, you must have heard abundant stories about AI, Machine Learning and Data Science! A large number of highly skilled statisticians and actuaries have updated their resume to Data Scientist to look better on LinkedIn and get more mileage from their trade.
Having said that, the industries’ little dirty secret is that too many Data Science initiatives don’t make it beyond the Proof- of-Concept wall either because the findings they produce aren’t perceived as valuable or the data they have to work with is not high in prediction « Octane ».
This article tries to share some important factors and considerations behind the most successful Data Science deployments.
The business brief for successful data science projects
For a successful data science project, similar to advertisement marketing brief, the business brief is an important conversation that needs to take place between the end-business-user and the data scientist to explain what the technology can and can’t do and set clear objectives for the project. In fact, when data science experts try to second-guess the business users, it is not uncommon that they don’t make it beyond the obvious and fail to enchant with their results! In other words, a «Tell me something I don’t already know! » situation becomes highly probable.
One way of enhancing the chances to “hit it on the head » is to understand the dynamics and consider the aspects of the business that the users can act upon. For example, in a marketing organization, such levers can be focusing telemarketing efforts on a higher potential set of prospects, or arbitrage advertisement budget spending in favor of the best medium in a given season as predicted by a model.
Think outside the data sources box
Fro successful data science project, it should be very easy to include new sources of data and add new features to see if they enhance the models’ test scores. Publicly available structural and cyclical data such as weather or economic indexes can give your models a boost. Other data concerning targeted customers or the competitions’ activities can be inferred by sourcing their social networks signals (i.e. Facebook events, Visual intelligence applied to Instagram)
Successful data science project and feature engineering
While any decently trained data scientist will agree that feature engineering is an important (if not the most important) step of a successful data science project, it is easy to succumb to the temptation of throwing every raw data you have on the wall like spaghetti, with the lazy hope that something will stick! For example, in a series of Dates, perfume sales spikes are explained only if you capture the fact that February 14th is Valentine’s Day and December 25th is Christmas. If you want to predict an opponent’s choice in a Rock–paper–scissors game, the machine will give you better assistance if not only it is fed the previous choices series but also the « who wins » feature.
Platforms like ForePaaS gives data scientists the power of both SQL and Python to model and query potentially complex features within a data set.
Train, re-train and score your machine learning models
An important part of a successful data science project is putting predictions into production and the the ability to make them part of the IT daily routines and monitor their performance. Indeed, most IT departments are reluctant to take on the management of such processes because they don’t feel comfortable answering business users’ questions if something goes wrong. Including a unique model management feature that takes care of the models’ life cycles, it allows to configure their periodic training and scoring routine. Emitting alerts is useful for knowledgeable experts to be updated when the prediction power of a model goes below critical thresholds.
How to combine prediction with data visualization
While it is easy to implement an API that serves prediction results for consumption but other systems (i.e., credit scoring), a form of collaboration between the users and the models can take place if come of the mechanics are exposed: For example, a graph representing the order of feature importance can be confirmed or challenged by a business expert. Similarly, user-generated predictions can be used as features in model to nudge them towards those surprisingly different results that may lead to rethinking on either side. For example, feedback on reassortment human decisions can allow the supply chain managers to focus on those that they “didn’t get right » and learn new tricks for the next time!
Successfully and quickly scaling a data science project in production is still a struggle for many companies, although more and more are exploring machine learning possibilities. This is why Machine Learning platforms offering capabilities to accelerate ML, AI or analytics operationalization exist and help to optimize the whole process. By providing an all-in-one environment and abstracting complexity, ForePaaS supports companies planning to deploy machine learning into production by putting in place a consistent pipeline and best practices.
For more articles on data, analytics, machine learning, and data science, follow Paul Sinaï on Towards Data Science.
Get started with ForePaaS for FREE!
Discover how to make your journey towards successful ML/ Analytics – Painless