What is the importance of understanding the business context for building better machine learning models.
What are the types of data scientists companies look for?
Speaking at a conference in New York earlier this year, Richard Pook, an executive search consultant at Dore Partnership, explained “the only data scientists who can demand the really big pay are those that can bridge the gap between the analytics team and the C-suite. Everyone else gets a lot less.” So if you think Data Scientists are hard to come by, wait untill you look for one to take care of business!
While it’s tempting to imagine Data Scientists building and manipulating complex algorithms, the reality is way more down-to-earth. The algorithm is only a fraction of the data scientist’s work, most of their time being dedicated to data transformation and enrichment thanks to more features to highlight relevant elements hidden inside your data: what is called feature engineering. A simple algorithm, enriched with more data and better features, can thus perform far better than weak assumptions combined with a complex model.
How to set-up realistic machine learning models
Closer to home, in preparation for end-of-year present giving, I tried to use Machine Learning to determine the dresses my teenage daughters would like. To that effect, I web-scrapped pictures from a popular e-commerce site (exclusively for personal use!) and asked them to create an I-like/I-don’t-like training set of pictures. Using a couple of popular algorithms I only produced predictions that couldn’t beat a coin flip!
Sure the “I’m special, and unpredictable” teenage reaction played a role but when prompted, explanations came from tiny details like “the shapes of the straps”, “the drape”, “the hemlines”…
Yes, my teenage daughters put their fingers on the concept of feature engineering and learning spaces!
Vladimir Vapnik, one of the machine learning pioneers explains that « …When musicians are training in masterclasses, the teacher does not show exactly how to play. He or she talks to students and gives some images transmitting hidden information ». Vapnik talks of “Gestalt description » and non-inductive approaches. Indeed, a feature is complementary information interesting to improve a prevision’s relevance.
Another simple example is a rock-paper-scissors game modeled to predict the next move of my opponent. I found out that unless you create the feature “who wins”, the prediction based on historical data is very bad but the players become very predictable once you add it!
The importance of understanding the business context
In essence, one needs to first understand the needs and the business goals of the users, enter and embrace their semantic space and the way they apprehend and judge the subject matter for the magic of Data Science to happen. It is vital to capture the additional features including ones stemming from third party data sources such as public holidays and other local celebrations for date series to understand and predict shopping activities for example. These are just some of the critical steps to successful data science projects.
The ForePaaS platform was designed with such usage in mind to roll these configurations in a continuous and robust production setting.
Get started with ForePaaS for FREE!
Discover how to make your journey towards successful ML/ Analytics – Painless