The Recommendation Engine

Zachary Hester
3 min readApr 18, 2021

So how does Spotify know to suggest Britney to me when I’m in my car? Or my phone recognizes that at 6 pm, when standing in my kitchen, I’m most likely to use Audible or the Podcasts application?

Recommendation engines, for better or worse, have clearly proven their staying power in the tangled and close-knit lives most of us live with the internet. This is especially true on social media apps, where the ability to understand and predict a user’s next move ties directly into a company’s ability to monetize that product (the product being us, not their app). I don’t mean to suggest recommendation engines are inherently nefarious, I only mean to suggest their inevitability — queue Agent Smith.

There are two, both fairly straightforward, algorithms one can use to construct their own recommendation engine. We will delve briefly into each, but at their core, one depends on the commonality of a sample size of users, while the other searches for data on a user’s previously used products (be it genres of music or director preference, you can sub almost any example in here that you want). It’s important to choose the right one for your particular project.

First, let’s jump into the Jaccard Index Formula. This algorithm examines a sample size of people and scores their likes and dislikes on an index ranging from 0 to 1 (TopTal). Many notable examples online reference the use of recommendation engines when scrolling for movies to watch (especially useful in this time of quarantine). At the heart of Jaccard’s formula is a relatively simple and elegant scoring system: “the users, from the system’s point of view, are three things: an identifier, a set of liked movies, and a set of disliked movies.”

As you can imagine, this route depends on a large sample size. With so few types of variables, it depends on the user to be the actual metric by which a recommendation is offered. It doesn’t necessarily matter if the movies you liked all had Renee Zelwegger at the helm, what matters is that you are a User and that your likes and dislikes are being compared directly to another User’s. This creates a compatibility link between you and other user groups. This is ideal for larger companies like Twitter or Facebook.

The accuracy of recommendations will only grow over time and with an increase in sample size. Whereas if you’re dealing with only a few dozen users who have only liked or disliked a few dozen products, the recommendations may not be that specific to the user. This is where your second option comes in.

Say you’re a startup and you don’t have access to millions of users, what may be more beneficial is to log the actual likes and dislikes and then to try and assess what, specifically, in those movies the user liked. What are the commonalities? Oh, this is your third Renee Zelwegger film this week — okay, maybe tonight you’re in the mood for a little Bridget Jones: the Edge of Reason.

By extracting data from the movies (or albums or recipes or whatever) like genre, director, year of release, etc. the recommendation engine is able to create a sample size of movies that are like those you have previously watched.

One concern to worry about when going down this rabbit hole is it may lead into a bit of a pigeon-holed response from the recommendation list, but still, it seems preferable to Jaccard’s method when working with a smaller sample size of users.

Our data is valuable. Again, not necessarily nefarious though. There’s nothing new or radical about this realization. Our data is becoming increasingly valuable as businesses and software developers are able to predict what we will like and dislike before we ever make the decision ourselves. I had a teacher not long ago discuss the necessity and requirement for ethics classes in professions like medicine and law, he lamented this doesn’t quite yet exist in software engineering. I don’t think it’s too far off. At least I hope not.

--

--