As you probably recall, the 2008 recession that largely started on Wall Street had ripple effects felt by everybody across the nation for years to come. I personally had a pretty difficult time finding a job out of undergrad in 2012, and I recall my dad — a self-employed HVAC contractor — noting that several big clients who typically threw a lot of work his way had paused many of their major plans. (Fortunately, that all has since bounced back in his favor.)
Without going into too much depth on the failures of Wall Street, the general idea is that top executives placed too much faith in their very elaborate predictive models. If you’re not familiar with the idea of a predictive model, it’s the idea that you can shove lot of data about a current event into a fancy pants mathematical equation and obtain a probabilistic, inferential result on the other side. These predictive models are created by observing past patterns in historical data and largely assuming that those patterns will carry on into the future.
I like to use weather as an example because it is generally pretty steady from year to year. Let’s say I have a magic time machine that could plop you into any calendar date I selected. If I dropped you into a random date without you knowing what that date is, could you give a reasonable estimate as to what month you are in? (Assuming, of course, I kept you to your same geographic location.) The answer is yes. If I plopped you into a date where the air is cool, leaves are red, and Starbucks is serving Pumpkin Spice Lattes (PSLs), you could reasonably assume I dropped you into the month of September or October. How do you know this? Because you’re experiencing something that generally happens every single year.
Back to the 2008 Wall Street fiasco, the problem with the very elaborate “black box” models that these quants came up with is that they made a lot of assumptions based on past market data. Of course, hindsight is 20/20, so let’s not let our hubris carry us too far, but can you really compare market data from 1980 to 2005? Think of all the technological changes and advancements that took place between those dates. Heck, think of all the advancements between 2005 and today. Smartphones alone have totally turned the world upside down. Trying to build predictive models for today based on 2005 market data would likely be wildly inaccurate!
So the general reason that Wall Street failed in 2008 is because their fancy predictive models made a lot of untrue assumptions that led to ruinous results. Again, I’m not a quant and thus cannot guess whether the intentions behind these models were good or bad, so I’m not willing to cast a “moral judgment” so quickly. What I can say, though, is that the market patterns they expected to hold in the future based off data in the past didn’t manifest themselves as expected. The world changed too much, too quickly for these models to provide accurate predictions.
(Side bar: Of course, all predictive models will inevitably not be accurate 100% of the time. Even if general trends remain the same, there are always minute changes that occur. We in the data science field call this model drift, and it’s very important for analyzing the effectiveness of an active model.)
Before moving forward, you might be wondering, what’s the difference between a predictive model and a machine learning model? The answer is actually nothing: a machine learning model is one kind of predictive model. Predictive models can either be derived from machine learning or from the practices of highly trained statisticians. A great example of predictive models coming from human statisticians are the actuarial examples that undergird the whole insurance industry, which has been around for more than 100 years. I’ve been using the phrasing “predictive models” so far because I believe the predictive models associated with the 2008 Wall Street recession were “hand-created” by humans, not with machine learning.
With the evolution of computers, some smart folks found that computers could also do a reasonable job at finding patterns in data (using fancy mathematical algorithms), and thus the idea of “machines learning from data” (or machine learning) came to be a wildly popular practice. Machine learning has enabled a whole new world of predictive models that don’t necessarily require people to be math geniuses. Trust me, as much as I love and appreciate math, I don’t know nearly as much as I would like to! That doesn’t prevent me from being a successful machine learning engineer. (Thanks, computers!)
So you all are smart folks. I bet I don’t even need to explain from here why COVID-19 is crippling a lot of machine learning models. But hey, you’re already here. Let’s read on!
Let’s say a person has a machine learning model that predicts the sales of toilet paper. That person likely created the model by training the model against historical toilet paper sales. I’m obviously not a toilet paper expert, but my assumption is that sales are pretty steady from month to month, year to year. If I had to throw out a wild guess, I might think sales would go up in November / December when folks have a lot more guests staying in their homes. Who knows? The Charmin bear is much more informed than I am here.
Obviously, these machine learning models could have not have predicted what COVID-19 would do to the toilet paper industry. If they could have, they likely would have increased production to meet demand, because that’s a lot of money to be made! But how could these models have known? Not only was COVID-19 itself unpredictable, but toilet paper isn’t directly correlated with the disease. Without getting too graphic (or gross), COVID-19 is generally not a gastrointestinal virus that, ummm… warrants the need for toilet paper.
That’s just one example. Any machine learning model that was based on past data that was radically disrupted by COVID-19 is likely crippled today. Other quick examples might be…
- Models that predict grocery sales
- Models that predict anything in the entertainment industry
- Models that predict restaurant sales
- Models that predict cleaning product sales
But this doesn’t mean all models are now null and void. Any model that uses data NOT disrupted by COVID-19 is likely still valid. A great example of this would be anything predicting weather patterns. Granted, we have had a very odd year of weather, but I don’t think that at all can be largely correlated to COVID-19. Generally speaking, weather has been consistent with years past. In my home state of Illinois, we had a normally hot summer and are now looking into a normal transition to cooler, fall temperatures.
So yeah, I feel bad for all those folks maintaining machine learning models that aren’t performing up to snuff due to COVID-19. It’s not likely you can alter those models on the fly and expect them to be performant moving forward. As I write this post, we’re still largely in the midst of the pandemic. Truthfully speaking, we have no idea how long this will last, nor do we know what the world will look like in a post-pandemic world. Are industries like the toilet paper industry forever disrupted? Who knows?
That said, I’d be appropriately be cautious about new machine learning models created during this time. The biggest question to ask yourself is this: is the data undergirding this model volatile due to COVID-19? If the answer is yes, proceed with caution. You or your company could be walking into a trap. I’m not saying to not follow the inferential expectations every time, but I would definitely proceed forward with a lot of extra caution.
And that wraps up this post! I hope you learned a lot more about machine learning through this lens of COVID-19. Machine learning can be a really useful, really cool tool, but as I always say with these things: garbage in, garbage out. Being mindful of how machine learning is used in this odd time is very important for the success of your company.