The Data Scientist’s Greatest Ally

David Hundley
6 min readFeb 19, 2021

--

I’m thinking about putting another bathroom in my basement.

I currently live in a nice but relatively modest house. Four bedrooms, two and a half bathrooms, and a mostly finished basement. Three of the bedrooms and both full bathrooms reside in the second floor of the house, and the fourth bedroom (which we use as a guest bedroom) is two levels below in the basement. As you can guess, this is less than convenient for guests who want to shower at their convenience and privacy. Fortunately, there is a rough-in for a full bathroom in the basement, which would totally alleviate this issue. (Not to mention add to the value of the home overall!)

Fortunately, I have a super handy dad who will eventually help me do this, but let’s say I decide to hire it all out to contractors. I know almost nothing about plumbing, so I’d definitely need a plumber to install the shower, toilet, and sink. Let’s say this plumber arrives for the first time with his tools in hand, and right before I walk out the door to go to work, I tell the plumber, “Hey thanks for doing this! The bathroom goes in the basement. Have a great day!”

How much do you think the plumber is going to get done?

The answer is pretty much nothing. Even though this person is a master of their craft, they’re reliant on me (or somebody I hire out as an architect) to share the plans of what I desire out of the bathroom. For all this plumber knows, I could want a standing-only shower instead of a shower-bath combo. And let’s not forget the fact that I left this poor plumber no materials. It’s kind of hard to install a sink if the sink isn’t there to install! The most this plumber can do is take measurements for things and provide me as the owner with questions / options of what to do next.

I start with this analogy because I think it’s an easy one to grasp and translates well into the world of data science. The trouble I’ve observed about data science is that people in general think that it’s so complex a topic that there’s no way the average person can even begin to comprehend what data science is. Don’t get me wrong, data science and artificial intelligence can be very complex topics when you get into the weeds, but the good news is that anybody can learn these topics at a general level.

Think back on our analogy of my basement bathroom. I admittedly know very little about actual plumbing installation, but I definitely understand why plumbing is necessary. I also have a very general idea of how plumbing works (e.g. clean water flows in; used water drains away). With this rudimentary level of knowledge, I am at least able to make educated decisions alongside the expert — our plumber — in a more than reasonable capacity. I can tell the plumber, “I’d prefer a standing-only shower, and I’d like if it could go in this corner over here.” The plumber can then inform me if that’s even possible, and we can continue our back and forth until we settle on a good plan that the plumber can execute on.

My reason for writing this post in the first place is that as a machine learning engineer, I’m often asked the question, “Is data science right for my scenario?” These well meaning folks have this sensibility that data science or artificial intelligence is this “magic bullet” that can solve the answer to any problem. Although we can apply data science to many scenarios, it isn’t right at all times. It makes sense for me to inquire about installing a bathroom because I have a rough-in for one in my basement. If I lived in a rented apartment with very fixed square footage, there’s no way I’d even pick up the phone to call a plumber. In that scenario, it’s just not necessary. (Or possible, unless you have a super accommodating, super friendly landlord.)

In some cases, people will hire data scientists to help solve business issues with some sensibility that there is a lot of opportunity given how much data the business generates. But trouble sneaks in when the data scientist is disconnected from the business and asked to make these great solutions from the data alone. This is very akin to me leaving the house and letting the plumber sort out things for himself. A data scientist can only get so far on the data alone.

So who is the data scientist’s greatest ally?

It’s the folks who understand the nature of the business.

Behind all the numbers and the algorithms that comprise data science, the core reason we value it so much is that we can make well-informed decisions to help the future of the business. Data science in practical application is simply looking for trends in business practices and using that trend analysis to make better (or fully automated) business decisions. I think anybody can understand that, right?

It can be pretty difficult to understand data without understanding the semantics behind the data. Let’s illustrate this with another concrete example. Let’s pretend a data scientist is hired to work for a clothing website that sells a certain style of clothing. When a data scientist goes to look at the data, they might be encountered with data attributes like…

  • Clothing trim
  • Fabric type
  • Fashion pattern
  • Price

Unless the data scientist happens to be a fashionista, the data scientist is likely to only understand that “Price” variable. The data scientist can do a bit of number crunching to find some trends that look plausible, but the data scientist by themselves ultimately does not know if the predictive model they created is of any value.

In this case, it would be extremely helpful if one of the company’s fashion consultants could assist the data scientist in making heads or tails of the trends the data scientist is seeing. The fashion consultant will have that extra knowledge of things like, “Yeah, our customers really like our silk items, and they really like to buy them during the spring.” (I clearly have no idea what I’m talking about. I’m currently wearing a wolf t-shirt and black athletic shorts.)

With the assistance of a subject matter expert, a data scientist can take a much more targeted and productive approach to their data science solution. A data scientist left to their own accord might produce some insights on their own, but it might take them a lot longer if they have to do extra research themselves to understand the nature of the data. Frankly, that’s not exactly why you hire a data scientist. It just isn’t the best use of their time to become, for example, a fashionista themselves.

I can promise you that when their is collaborative synergy between data scientists and the business matter experts, they can come up with quite a few very useful solutions. I actually know this from direct experience. In my current day job, I work in a group with data scientists that have a very strong connection with very competent folks on the business side to produce a lot of extremely valuable solutions. It’s the perfect template of what an ideal data science partnership looks like. (Kudos to all the folks I work with — You know who you are!)

That pretty much wraps up this post. I hope that this can help you make informed decisions alongside your fellow data scientists even if the algorithms and numbers seem nebulous to you. Leave all that silly number crunching to the data scientists! That’s what you hired them for. But you, friend, are still a great asset in this process. You and your expertise is indeed a data scientist’s greatest ally.

--

--

David Hundley

Principal machine learning engineer at a Fortune 50 company, 5x AWS certified, 2x HashiCorp certified, 1x GCP certified, M.A. in Org Leadership, PMP, ChFC, CSM