Like many things in life, most people are introduced to the field of artificial intelligence (AI) by means of the pop culture. When you think of AI, you probably think about Skynet from the Terminator series, HAL from 2001: A Space Odyssey, or Agent Smith and his cronies from The Matrix. These are entertaining depictions of what may be theoretically possible in the far flung future, but they are wildly exaggerated compared to the capabilities of AI today.
So when I’m asked questions about AI, they’re usually posed from this sci-fi vantage point, where AI is perceived to be a lot more robust than it actually is. AI optimists ask me why we can’t quickly implement AI into an existing system, as if AI is this magical bot that can figure out how to do things itself and automatically run your systems without human intervention. Please don’t mistake me: I’m not making fun of these people. Pop culture has a strong hand in all sorts of thinking, which is why something like nuclear energy’s biggest hurdle is how The Simpsons negatively portrays nuclear power plants. I’m sure I hold some malformed belief based on something I watched on Netflix!
Speaking of Hollywood, the movie studio Warner Brothers inked a deal in early January 2020 with an AI firm, Cinelytic, to create an AI solution that will supposedly enable the studio to “assess the value of a star in any territory” and predict “how much a film is expected to make in theaters.” (Quotes taken from this article.) The article doesn’t delve into any more details than that, but based on that information… I’m skeptical that Warner Bros is going to get consistently positive results. Granted, I don’t have the means to peek at their algorithm or data sources, but I think intuition can inform us enough to cast reasonable doubt on this solution.
The neat thing is that you don’t really need a strong background in AI to understand that this sounds like a shady deal. That said, we’ll use this Warner Bros. use case as a means to help teach you how to think differently about AI, no experience required! As much as I love tinkering with the details under the hood, building logic around this situation doesn’t require us to go that deep. Even if you have no experience with IT, my hope is that you can walk away from this post and be able to apply similar logic to any AI scenario you encounter.
Alright, let’s get into it!
There are lots of different manifestations of AI, and the thing they all have in common is that AI solutions seek for patterns amidst information. There are lots of different ways a computer can do this, but for this post, we don’t need to go any deeper than this. Most of them involve some form of fancy number crunching with statistical algorithms, but hey, you probably didn’t come for a math lesson.
When I refer to “information” above, I’m specifically referring to data. These AI solutions use lots of different attributes — also called “features” — of data to find patterns amidst historical data. So if we were to develop an AI solution to predict future weather patterns, we’d use lots of different data features like temperature, air pressure, precipitation, geographic location, and more. It’s not uncommon for an AI solution to look for patterns across as many as 100+ features. (This is why AI tends find patterns better than humans, since humans can’t process through vast amounts of information like that.)
It’s by understanding these patterns that AI solutions can then make inferential decisions on what probabilistically makes the most sense. So if you have a weather AI solution that uses only weather condition as its basis and it sees 49 straight days of sunny weather, it’s going to infer that day 50 is likely also going to be sunny. That said, a well crafted AI solution can be fantastic across many different business contexts. But they key there is “well crafted.” A poorly crafted AI solution can seem like it will give good results from a statistical perspective, but in reality, these solutions may perform very poorly.
This is what I’m getting to with the Warner Bros. AI deal, but before we jump back into that situation, let’s quickly touch on some ways in which an AI solution can be poorly crafted AI solution.
Where AI Can Go Wrong
As we touched on in the previous section, an AI solution is only good as the person (or team) who crafted it. No AI solution is perfect, and it’s totally possible to rig up a solution that checks out from a math perspective but totally bombs in the real world.
In most cases, AI solutions fail at the doorstep of the data being fed into them. Garbage in, garbage out. For example, let’s say I faked data for January weather in Illinois. January is almost always cold, so let’s say this faked data told an AI solution that it was hot all month. What do you expect an AI solution would infer about weather for next January? Not a trick question: it’ll tell you to expect hot weather!
And you don’t even have to fake data to still yield poor results. Using real data in poor ways will almost certainly yield poor results. Three specific ways in which real data can be abused in AI solutions including the following:
- Low Dimensional Data: Low dimensional data is another way of saying you don’t have that many features. In the weather example we’ve been using so far, this would be like creating a weather AI solution that only uses temperature as a basis for its predictions. Obviously, if you want a more accurate weather predictor, that AI solution will need to take LOTS of other factors into account!
- Biased Data: When it comes to all sorts of data endeavors, we generally tend to sample small portions of data from a population because getting all information about a population generally isn’t feasible. The thing you have to be careful about is ensuring that your sample contains a good representation of your population. So if you tried applying a weather prediction AI to the entire globe but only trained it on data collected from Illinois, that AI solution would probably give some pretty poor results for most of the globe that doesn’t have our temperate climate. The AI solution is too biased toward Illinois weather.
- “Bubble” Patterned Data: Of the three mentioned here, this one is probably the most difficult one to hurdle. When I mean “bubble” patterned data, I’m talking about things like data leading into an economic bubble. So for example, if I created a stock buying AI solution in the late 1990s, it probably would have told me that investing in “.com” stocks was the smart buy. Obviously, we know that bubble burst in 2000, and a lot of .com investors lost a lot of money. But you can’t really blame the AI solution here. It did its job in finding a pattern; it just couldn’t predict that that pattern would ultimately turn on a dime and fail. That’s why I say this is the hardest to predict ourselves because if we all knew when bubbles were coming, we’d all avoid them. That’s kinda the definition of a bubble, after all.
The actual implementation of an AI algorithm can also play a factor into how well an AI solution performs, but in most cases, bad data is the reason an AI solution performs poorly. For our purposes, this will suffice to cover both the Warner Bros. use case as well as most other AI scenarios we come across in the future.
Okay, I think we’re finally in a place to scrutinize Warner Bros.’s deal. Before going forward, you might be a little surprised to see that this really is all we need to know to analyze this deal. Is this really all there is to AI? And yeah, this is really all you need to know. Pop culture has really made AI to seem far more “magical” than it actually is, and even though we didn’t go into every nuance about the field, you still have enough information to now form a decent opinion on most AI matters.
Applying Our New Logic to the Warner Bros. Deal
If you’ve been paying close attention so far, you can probably expect what I’m going to say is my “red flag” concern here: the data. I’m sure Cinelytic has probably crafted a very robust AI algorithm, but it ultimately dies at the doorstep of the quality of the data. Let’s break down what we know from the article.
Let’s just look at “assess[ing] the value of a star in any territory.” That’s a very bold claim! At a minimum, you would need to have data in the following categories to back this up with any validity:
- Historical information about how a movie performed in EVERY territory
- Basic information about EVERY people group in every territory
Those two categories of data alone are extremely robust, and I’m going to go out on a limb to guess that Warner Bros. doesn’t have this treasure trove of data at their fingertips. This would be a tough ask even from companies like Facebook or Google, who are much more privy to information like this.
But even if they do have this information, there are still many other factors that would come into play. Here are a few questions I can think of off the top of my head:
- How important was a star’s role in a particular movie? (Leading role? Supporting role? Award winning role??)
- What’s the impact of modern social change? (Because I highly doubt a movie like Blazing Saddles would fly today, with all its racial undertones.)
- How has a star’s public perception radically changed following the release of a specific movie or two? (Because I still recall the days of when Matthew McConaughey was largely written off as a pretty face until more recent roles in movies like Dallas Buyer’s Club.)
- How do you solve the “cold start” problem of an unknown star becoming an overnight sensation? (Which happens far more often than you’d think.)
I could go on, but the simple idea is this: Warner Bros. (nor Cinelytic) probably doesn’t have information like this. And even if they did, we all know many cases when a star-studded cast with an award-winning premise ultimately fails at the box office. (Looking at you, Cats!)
(Bad pun incoming: The “stars” would have to align in order for the data to produce consistently fruitful predictions. 😂)
I mean, I could be wrong. For Warner Brothers’ sake, I hope I’m wrong. Maybe they really have solved this data conundrum. But as I was writing this post, I came across a tweet thread from AI genius Dr. Hannah Fry, and she gives some more concrete details on why she thinks Warner Bros. is getting a raw deal here. Just like me, her concerns also largely revolve around data matters. So with somebody much smarter than me validating my thoughts, I’m feeling pretty good about my intuition here.
And that brings us to the end of another post! I hope this quick use case study helped you build a new intuition around thinking about AI solutions. Data lies at the heart of all AI solutions, so if you can logic your way through why the data may or may not lead to predictive patterns, you can intuit whether or not that AI solution may or may not work. Thanks for checking out this post! Catch you in the next one.