What are Machine Learning communication silos? Where they occur and how to avoid them? Why you need a unified data platform?
As we covered in our recent blog, The Technical and Human Challenges of Machine Learning Projects, the data you use in your machine learning (ML) models is only half of the equation. The other half is the human element: The communication silos. This human element is often far more complicated than the data element. Most (over 85%) of ML projects fail—in large part due to poorly defined goals and a lack of collaboration between stakeholders.
The typical ML project has three major stakeholders: IT, data scientists, and business users. Attaining solid alignment and understanding between these three very different camps is extremely difficult, mainly due to poor communication and conflicting interests and stakes.
This blog will explore where, how, and why communication silos form during machine learning projects, and what you can do to avoid them and make sure your ML projects run smoothly to deliver their results on time and under budget.
First—let’s understand the different priorities of the various teams involved in a typical ML project.
What data scientists care about in an ML project
Every ML project has two main phases: the exploration phase, where you find your data and create and train your models, and the production phase, where you build your models and start putting them to use to see how well they work. This phase could be more effective without communication silos.
Data scientists work in the exploration phase. For them, the person to please is the business itself. They’re also interested in finding the model that will provide valuable insight and attain strong outcomes that produce genuine business results. They tinker with the building blocks of the project, the foundational elements, the model parameters, and the algorithms to find the perfect model that will generate this business value.
What business users care about in an ML project
The business users are of course similar to the data scientists in that their stakeholder is the business itself and the results, for the business, that the ML project can produce. However, without the technical knowledge of the data scientists, they aren’t really sure what questions to ask or where to look for a clear picture of what’s going on. Mainly, they want to be able to look at a dashboard that shows them the variables to change to reach their business objectives. If they see that, they tend to trust the technical teams to deliver on their promises. Their jobs would be more effective without communication silos.
What IT cares about in an ML project
Finally, you have IT. IT controls the data, the databases, the infrastructure, and the security. IT also has a lot of projects to work on and juggle several priorities at any given time. They have their processes in place, and specific ways they tackle new projects. Obviously, the last thing they want to see a costly data leak like the one this large retail company in US had a few years ago. They also have to manage the production lifecycle of the ML project, and once the ML application is deployed, they need to keep it functional 100% of the time.
ML project communication silos: IT and the Data Scientists
The primary communication breakdown in any ML project occurs between the worlds of the data scientists and IT. The data scientists work on a specific data set to solve a specific business user problem, often only talking to IT except when they need access to the data. They don’t have time to worry about things like architectural constraints because they’re working mainly in theory and with the understanding that, hey, we’ll get the data and IT infrastructure figured out when we get there. These are communication silos.They’re not really talking to the IT people involved in the real-world deployment and productization of their ML models early enough in the project.
When they do, they find out that IT is not ready to deploy this type of project, for various infrastructure, cost, or process reasons. Oops, that’s when they realize that their objectives and incentives aren’t the same.
But at some point, the data scientists need to integrate their model with IT’s architecture, and that’s where the problems start because in most cases the data scientists are considering primarily only their own best practices and not product deployment scenarios.
Data scientists hand off the model to IT for deployment. But this is only the first deployment, and the model has a lifecycle. It will need to be retrained with new data sets and will likely get folded into other development cycles and eventually become part of your company’s ML code library.
That’s why it’s absolutely essential for data scientists and IT to collaborate and communicate well from the get-go, without forgetting about (or leaving out) the needs of the business users.
And for that, you need a framework.
Bridging the communication silos between IT and Data Science
To successfully pull off an ML project, you need to have the data scientists, IT teams, and business users in perfect alignment, and no communication silos. To make this happen, you need a unified data platform, such as the ForePaaS End-to-End Machine Learning and Analytics Platform, that helps the data scientists prepare and launch their models into production from day 1 and collaborate with their stakeholders like IT and business during the entire process.
This unified data platform also gives IT the peace of mind of not having to worry about the data lifecycles and the data scientists the peace of mind of not having to worry about IT not being ready.
It’s a win-win.
Using this unified data platform, IT can specify their requirements from the very beginning, so that:
- The data scientists aren’t working in blind theory and are thinking about real-world deployments right from the outset.
- There’s no siloing of tools. The data scientists are working with their own tools—the tools they’re comfortable using— and IT is using the same framework to prepare the data and the infrastructure for the ML project. Business users can have access to them, too, and all parties can better understand how the project will tie into bottom-line business results.
- IT concerns around security, encryption, data, access control, deployment, testing, and QA are assuaged because everything has been included and accounted for in this framework.
And then, moving forward, this unified data platform would also account for ML learning lifecycles by establishing a repeatable, manageable, traceable, and secure operational analytics structure that constantly monitors, re-evaluates, tunes, and manages your ML models on an ongoing basis to help your ML projects reach their initial goals.
Because remember: you don’t necessarily get there the first time around. ML project teams tend not to give their models time to gestate and want results right away, but you have to keep running them over time and be patient. Sometimes models can prove their value over time.
For more articles on cloud infrastructure, data, analytics, machine learning, and data science, follow Paul Sinaï on Towards Data Science.
Get started with ForePaaS for FREE!
Discover how to make your journey towards successful ML/ Analytics – Painless
The image used in this post is a royalty free image from Unsplash.