Image for post
Image for post

MLflow 101

Getting your parameters, metrics, artifacts, and more logged to an MLflow tracking server

Hey there, friends, and welcome back to another post in our series on MLflow. If this is the first post you’ve seen and would like to catch up, be sure to check out the previous posts here:

As always, if you would like to see the code mentioned in this post, please be sure to check out my GitHub repo here.

This latest post is going to build right on top of part 2, so please do check that out if you missed it. Just to quickly recap what we did in that post, we deployed an MLflow tracking server to Kubernetes with Minikube on our local machines. Behind the scenes, the MLflow tracking server is supported by a Postgres metadata store and an AWS S3-like artifact store called Minio. That post was quite meaty, so I’m happy to share this one is much simpler by comparison. …


Image for post
Image for post

Hello there, friends! It’s been a while since I’ve written a more business-oriented post given that I’ve been focused on more data science / machine learning-related stuff. But as I begin to type this, it’s 2:00am on a Thursday, and I woke up randomly inspired to write this post. (Can’t remember what I was dreaming about, but it must have been along these lines!)

One of the areas of life in which I’m most passionate is this idea of persuasion psychology. In my own words, persuasion psychology is the idea that humans are deeply irrational beings and thus behave in ways that are irrational yet oddly predictable. So for example, if you’re in line at Starbucks and the 5 people in front of you “pay it forward” by paying for the coffee of the person behind them, how much more inclined are you to also follow suit? …


Image for post
Image for post

MLflow 101

Creating a point for logging and tracking model artifacts in a single server running on Minikube

10/15/20 Update: In writing my next post in this series, I found several bugs that prevented me from appropriately deploying to Minikube. To that end, I’ve updated a number of things to get you up and going with a WORKING instance! 😃

Welcome back, friends! We’re back with our continued mini-series on MLflow. In case you missed out part one, be sure to check it out here. The first post was a super basic introduction to log basic parameters, metrics, and artifacts with MLflow. That was just having us log those items to a spot on our local machine, which is not an ideal practice. In a company context, you ideally want to have all those things logged to a central, reusable location. That’s we’ll be tackling in today’s post! …


Image for post
Image for post

MLflow 101

Helping you take your first step into the machine learning lifecycle flow with this handy tool

Hello again friends! We’re back here with another quick tip, and because I do attempt to keep these posts quick, this is actually going to be part one in a series of tips related to MLFlow. In the spirit of full transparency, MLFlow is pretty new to me, so I’m going to be learning things alongside you all over the next few weeks. If you’d like to follow along with my code, check out this link to my correlated GitHub repository.

I’m sure the first question on your mind is, what is MLflow? Simply put, MLflow is an open source platform designed to help streamline the machine learning lifecycle process. Again, I’m still learning all it does, but it seems to offer a lot of promising features that I’m excited to explore within future posts. These things range from creating a model registry, easy deployment of models as APIs, and more. I honestly don’t know how long this sub-series will go, but I imagine we’re going to get a lot out of this neat tool! …


Image for post
Image for post

Ensuring your ML-serving API can handle the properly expected performance load when used in production

Hello again, friends! Welcome back to another data science quick tip. Now, when it comes to the full spectrum of data science (discovery to production), this post definitely falls toward the end of the spectrum. In fact, some companies might recognize this as the job of a machine learning engineer rather than a data scientist. As a machine learning engineer myself, I can verify that’s definitely true for my situation.

Still, I‘m sure there are many data scientists out there who are responsible for deployment of their own machine learning models, and this post will hopefully shed some light on how to do easy performance testing with this neat tool called Locust. …


Image for post
Image for post

Helping you to demystify what some people might perceive as a “black box” for your machine learning models

Hello there all! Welcome back again to another data science quick tip. This particular post is most interesting for me not only because this is the most complex subject we’ve tackled to date, but it’s also one that I just spent the last few hours learning myself. And of course, what better way to learn than to figure out how to teach it to the masses?

Before getting into it, I’ve uploaded all the work shown in this post to a singular Jupyter notebook. You can find it at my personal GitHub if you’d like to follow along more closely.

So even though this is a very complex topic behind the scenes, I’m going to intentionally dial it down as much as possible for the widest possible audience. …


Image for post
Image for post

Teaching you two different correct ways to perform one-hot encoding (and one “wrong” way)

Hello, hello everybody! I hope you all are enjoying a nice Labor Day weekend. I personally took Friday off as well to extend my weekend into four days, and I had a lot of great quality time with my daughters these last few days. But you know me, I’m always itching to keep producing in some capacity!

Before we get into the post, please be sure to reference my personal GitHub for all the code we’ll get into below. It’s pretty simple, but if you’d like to follow along in a concise Jupyter notebook, then I’d encourage you to check that out. …


Image for post
Image for post

Learn how to make use of custom data transformers within the same Scikit-Learn pipeline

Hi there all. We’re back again with a follow up post to the last post’s tip regarding how to create Scikit-Learn pipelines in general. In case you missed that, you can now check it out at this link. (Where it is now officially published to Towards Data Science. w00t!) And as always, if you want to directly follow along with this post’s code, you can find that here at my personal GitHub.

To quickly cap where we left off from the last post, we had successfully created a Scikit-Learn pipeline that does all the data transformation, scaling, and inference all in one clean little package. But as of yet, we had to make use of Scikit-Learn’s default transformers within our pipeline. As great as those transformers are, wouldn’t it be great if we could make use of our own custom transformations? Well of course! I’d say it’s not only great, but it’s also necessary. If you recall from last week’s post, we built a model based off a single feature. …


Image for post
Image for post

As you probably recall, the 2008 recession that largely started on Wall Street had ripple effects felt by everybody across the nation for years to come. I personally had a pretty difficult time finding a job out of undergrad in 2012, and I recall my dad — a self-employed HVAC contractor — noting that several big clients who typically threw a lot of work his way had paused many of their major plans. (Fortunately, that all has since bounced back in his favor.)

Without going into too much depth on the failures of Wall Street, the general idea is that top executives placed too much faith in their very elaborate predictive models. If you’re not familiar with the idea of a predictive model, it’s the idea that you can shove lot of data about a current event into a fancy pants mathematical equation and obtain a probabilistic, inferential result on the other side. These predictive models are created by observing past patterns in historical data and largely assuming that those patterns will carry on into the future. …


Image for post
Image for post

Teaching you how to conjoin appropriate data transformers and a predictive algorithm in a nice, clean manner

Hello again, lovely people! We’re back this week with another data science quick tip, and this one is sort of a two parter. In this first part, we’ll be covering how to use Scikit-Learn pipelines with Scikit-Learn’s barebones transformers, and in the next part, I’ll teach you how to use your own custom data transformers within this same pipeline framework. (Stay tuned for that post!)

Before getting into things, let me share my GitHub for this post in case you want to follow along more closely. I’ve also included the data we’ll be working with as well. …

About

David Hundley

Machine learning engineer by day, spiritual explorer by night.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store