Avoiding Disaster in ML Projects
A few high-level patterns to look out for in any projects you start or join
| UPDATED
Dovstoyesky said in his novel Anna Karenina, “Happy families are all alike; every unhappy family is unhappy in its own way.”
Be that as it may with actual families, unhappy ML teams seem to have their recurring patterns. With that in mind, here’s a quick summary of some red flags and pitfalls to look out for before joining any projects, or to always keep vigilant about if you’re starting any projects.
Prelude: Hanlon’s Razor
It’s worth noting that most of the below examples focus on the kinds of mistakes that can be made when everyone involved is well meaning, but perhaps inexperienced. With that in mind, there may be additional red flags resulting from individuals NOT having the best intentions. In my time as an ML engineer, I’ve witnessed data scientists that make their mission-critical pipelines intentionally opaque so that they can keep their job security by virtue of being the only person who knows how everything works. I’ve witnessed middle managers that have used project or company resources for their own, entirely unrelated ends. I’ve even witnessed leadership at one company attempt to defraud both engineering and the business stakeholders, without realizing that their plan required one of those two to be functioning properly.
In most cases (such as most instances of the pitfalls listed below), we can safely look to Hanlon’s Razor for guidance: “Never attribute to malice that which is adequately explained by ignorance”. The reason those examples of malicious intent stand out is because they’re the exception, rather than the norm. That being said it’s also important to remember that in some cases, sufficiently advanced ignorance may be indistinguishable from malice.
Pitfall #1: Lack of support from top leadership
If the leadership of the company or organization isn’t prioritizing an ML project, middle management may not have the resources to adequately support the project either. If this happens, the project is probably doomed no matter how talented the individual contributors are. Even if the middle management is supportive of a project, differences in support between the middle and the top can lead to disagreements that harm the contributors and the project as a whole.
If you’re starting a project within a company or existing organization, you need to be absolutely sure that whatever data science or machine learning project you’re working on is relevant to the organization’s goals. In fact, being relevant is not enough.
Pitfall #2: Immature Data Infrastructure
I’ve seen too many companies still relying on Microsoft Excel for project-critical data, or even inter-department data. I’ve encountered multiple drug-discovery companies that have PhD-level computational biology grads still passing around .csv
files by thumb drive.
If you’re doing anything with Machine learning or data you need to invest the effort in building up your infrastructure, regardless of whether it’s cloud-based or on-site. It won’t matter if you have the smartest engineers if there is still enormous friction involved in just handling the data or starting new experiments.
Pitfall #3: Immature Analytics
This refers to the level of maturity of a company’s analytics tools. Not all analytics techniques are equal. In fact, they are made incredibly un-equal by virtue of the questions they can ask. Descriptive analytics refers to answering “what happened?”, or simply identifying an event such as a machine learning model failure, or a model not getting high enough accuracy. Diagnostic analytics goes a step further and asks, “why did happen?”. Many machine learning teams will put together a diagnostic pipleine identifying the steps of inference and where the failure point is. They might put together the most impressive model interpretability and explainability tools, but many teams just stop at this point. It’s absolutely critical to be able to move onto predictive analytics, or “what will happen?”. This may refer to machine learning models themselves being used for projections, or it may involve building up a pipeline that can predict which experiments to prioritize. This brings us to the final stage of analytics maturity: Prescriptive analytics, which answers How can we make happen?. This stage of analytics takes conditional probabilities into account even more, and overlaps heavily with the actual engineering. Having an analytics system that can produce causal models rather than just descriptive statistics is absolutely critical to this stage.
Unfortunately, many teams may stop at descriptive or diagnostic analytics, without moving onto predictive or prescriptive analytics. This is about as bad as going to a doctor that is great at identifying symptoms, or maybe even identifying a disease, but has no idea what the prognosis or treatment options are. Still, prescribing treatments without identifying the disease isn’t much of an improvement either. The deliverables may need to gradually move up this ladder of maturity when they’re being planned.
Pitfall #4: Unclear Scoping
This type of pitfall generally falls into two categories:
In the first, the engineers may be fulfilling multiple roles that fall outside the scope of data science or machine learning engineering. This may be as minor as requiring engineers to do a lot of the project management themselves, or it could be at the level of doing the budgeting and pitching of the project.
In the second, the scoping or requirements is just a laundry list of techniques and tools, without distinguishing between the “mandatory” and “nice to have as a bonus”
While distinct, these two scoping errors are pretty similar to the next pitfall…
Pitfall #5: Giving ML engineers Several Jobs at Once
If you have an ML engineer or Data scientist, and they’re spending 40% or 60% of their time in meetings, you can easily see how this will lead to very little ML engineering getting done. Humans only have so much time in the day, and adding hours to the day on top of the usual responsibilities is more often than not a recipe for decreased labor productivity rather than actually completing multiple jobs well. The idea behind hiring a “generalist” to save money by having one person wear multiple hats is usually fallacious. A better approach would be to apply something like Warren Buffet’s 5/25 rule to a data scientist or ML engineer’s list of priorities.
Pitfall #6: Disconnect between Business Cases and Engineering
If there’s not a lot of communication between the project stakeholders and engineering, it usually means that the engineers will have to come up with business use-cases themselves. As you can imagine, there’s very little guarantee that they’ll come to the same conclusions as the stakeholders regarding what’s relavant and what isn’t.
Pitfall #7: Lack of Communication Channels in General between Engineering and Stakeholders
Project requirements can change a lot. In fact, this is almost inevitable for any project or organization that exists for long enough. As a consequence, it’s important that the engineering/analytics team doesn’t fall out of the loop on shifts in goals or priorities. If there aren’t efficient lines of communication between the data scientists/ML-engineers between management/stakeholders, this will become an increasingly expensive problem to fix the longer it goes on.
Pitfall #8: No emphasis on project impact
Sometimes an ML engineering team can focus too much on the tech of a project, and not enough on the impact or goals of the project. This is the ML engineering equivalent of Juicero. This might be caused by the analytics or engineering team being focused too much on improving metrics that are increasingly removed from the goal (or even worse, being so focused on the tech because they don’t even have the metrics to go on).
Pitfall #9: Multiple Data silos and Scattered Sources
This can lead to data scientists having access to data that is less comprehensive and lower quality than data available elsewhere in the organization. This can also lead to inconsistent model performance on engineering teams, especially if the data that’s being used for ML engineering projects has no versioning system in place.
Cited as:
@article{mcateer2019projectfailure,
title = "Avoiding Disaster in ML Projects",
author = "McAteer, Matthew",
journal = "matthewmcateer.me",
year = "2019",
url = "https://matthewmcateer.me/blog/avoiding-disaster-in-ml-projects/"
}
If you notice mistakes and errors in this post, don’t hesitate to contact me at [contact at matthewmcateer dot me]
and I will be very happy to correct them right away! Alternatily, you can follow me on Twitter and reach out to me there.
See you in the next post 😄