Data scientists should go data-upstream

Data science frustration rarely comes from limitations of what available modeling algorithms can do. The data scientist’s most common excuse for failing to deliver at expectation level is “data is not good enough” or “there’s not enough of it”. But to the questions “what is good enough data?” and “how much is enough?”, they never give crisply clear answers. Regardless of the veracity of the objections, they usually underscore the notion that all too often, aspiring data scientists don’t see data quality and volumes as a core part of their mission and tend to rely on others to provide ready-to-crunch data. 

More senior professionals will tell you that in real life, 80% of their work is about collecting and preparing the right data and that they prefer starting with the rawest data possible because when others do it for them, they tend to introduce their own biases. 

Understand the business

The humble-pie-eating data scientists think of their craft as augmenting professional people’s ability to do their job rather than replacing them. Data science is just a lever to move bigger rocks and not meant to replace rock-moving! So collaborating with the folks who know which rock to move is usually a sound approach. Fore example, shopping mall managers may not know all about the reasons shoppers visit their locations but they have good judgement about new ideas to explore and data features to take into account when modeling to predict shopping traffic. That insight can be lost when data scientists are left out of the conversation since algorithmic abilities, when explained to business people can nudge them into suggesting or exploring new avenues.

In other instances, even the business folks may have misconceptions about what they’re trying to accomplish. Recently a lawyer described how she uses an online service to get input on subjects that are not part of her core expertise. The service AI built her profile as highly involved in those particular subjects which is antithetical to the reality that she precisely doesn’t care enough to be an expert in those subjects yet is recognized among her peers in other areas that she never inquires about.

The proper AI profiling approach meant for Advertisement should have inferred a different persona.

Broaden the horizon

Too many data science projects stem for the desire to « do something with our data » yet most B2B or B2C businesses are sensitive to extraneous conditions like weather, stock markets or other factors such as those for which economic indexes can be good proxy indicators. Some businesses like electrical power generators critically rely on weather and economic activity forecasts but many other business can enhance their business predictions as well by incorporating external data and forecasts. Shopping mall traffic turned out to be decently forecastable when taking into account weather, day of week, yearly celebrations and school vacations. If you add, social network event public announcements by shops and the competition and you, you get very good predictive capabilities.

The ForePaaS Platform was designed with these realities in mind. An extensive connector marketplace puts all sorts of data in reach and the data engineering capabilities allow the mixing of various temporalities from those captures asynchronously in realtime to batches being processed periodically. It also encourages dialog and interactions between data scientists and business users by offering a comprehensive point-and-click environment to build production-grade web applications that promote human consumption and feedback of the information produced.