Discover the secret sauce for data science – Know the importance of understanding the business context for building better machine learning models.
Business understanding is the secret sauce to develop a clear data science strategy. Let’s learn how.
Is feature engineering the secret sauce?
Speaking at a conference in New York earlier this year, Richard Pook, an executive search consultant at Dore Partnership, explained, “the only data scientists who can demand the huge pay are those that can bridge the gap between the analytics team and the C-suite. Everyone else gets a lot less.” Is this the secret sauce? Do you think good data scientists are hard to come by?
While it’s tempting to imagine data scientists building and manipulating complex algorithms, the reality is way more down-to-earth. The algorithm is only a fraction of the data scientist’s work, most of their time being dedicated to data transformation and enrichment thanks to more features to highlight relevant elements hidden inside your data: what is called feature engineering. A simple algorithm enriched with more data and better features can thus perform far better than weak assumptions combined with a complex model. But, is this the secret sauce?
How to set-up realistic machine learning models
The examples below are chosen to help us better understand the secret sauce and the business understanding, or scenario understanding, to develop detailed data, science models.
Closer to home, in preparation for end-of-year present giving, and I guess I had some time to kill, or maybe I was looking for a new hobby, I tried to use Machine Learning to determine the dresses my teenage daughters would like. To that effect, I web-scrapped pictures from a popular e-commerce site (exclusively for personal use!) and asked them to create an I-like/I-don’t-like training set of images. Using a couple of popular algorithms, I only produced predictions that couldn’t beat a coin flip!
Sure the “I’m special, and unpredictable” teenage reaction played a role, but when prompted, explanations came from tiny details like “the shapes of the straps,” “the drape,” “the hemlines,”…
Is this the secret sauce? Yes, my teenage daughters put their fingers on the concept of feature engineering and learning spaces!
Vladimir Vapnik, one of the machine learning pioneers, explains that « …When musicians are training in masterclasses, the teacher does not show exactly how to play. He or she talks to students and gives some images transmitting hidden information ». Vapnik talks of “Gestalt description » and non-inductive approaches. Indeed, a feature is a complementary information interesting to improve a prevision’s relevance.
Another simple example is a rock-paper-scissors game modeled to predict my opponent’s next move. I found out that unless you create the feature “who wins,” the prediction based on historical data is terrible, but the players become very predictable once you add it!
The secret sauce: business context
In essence, to understand the secret sauce, data scientists need first to understand the needs and business goals of the business users. They need to enter and embrace their semantic space and apprehend and judge the subject matter for the magic of data science to happen. It is vital to capture the additional features, including those stemming from third-party data sources such as public holidays and other local celebrations for date series, to understand and predict shopping activities. These are just some of the critical steps to successful data science projects.
The ForePaaS platform was designed with such usage in mind to roll these configurations in a continuous and robust production setting.
For more articles on cloud infrastructure, data, analytics, machine learning, and data science, follow Paul Sinaï on Towards Data Science.
Get started with ForePaaS for FREE!
Discover how to make your journey towards successful ML/ Analytics – Painless