That new artist you just favorited; that new genre you just discovered or that new song you are saving for a new day, are all products of music discovery and algorithms. Although we are perfectly capable of curating our own playlists, searching up new artists and bands, and finding new genres to enjoy, sometimes we just don’t have the time for it or don’t feel like thumbing through hundreds of thousands of songs. This is where your music streaming app becomes your best friend because it can turn a bad day into a bearable one and a boring transit ride into a silent concert, all with the push of a button. Welcome to automated music curation, a product of big data, machine learning, and artificial intelligence. Yes, this is how your music streaming app knows you so well; let’s explore how recommendation models work.
Generally speaking, there are three types of music recommendation models that use three different types of analysis. The first is collaborating filtering models, which looks at a user’s behavior on the music streaming app as well as other people’s behavior. The second is natural language processing (NLP models), which analyze text, and the third are audio models which take a look at the raw audio tracks themselves. In most cases, music streaming applications will use a combination of all three types of analysis, as this provides a unique and powerful discovery engine.
The most common example of collaborative filtering is star-based movie ratings that you get on Netflix. It provides you with a basic understanding of which movies you may like based on your previous ratings and it provides Netflix with the ability to recommend movies/television shows based on what similar users have enjoyed. For music applications, the collaborative filtering is based on implicit feedback data, meaning the live stream of your music is counted. This includes how many tracks are streamed and additional streaming data like whether a song is saved to a playlist or whether a user visits that particular artist’s page after listening to one of their songs. Although this is great, how does it actually work?
A user has a set of track preferences, such as P, R, Q, and T, whereas, another user may have a set of preferences denoted as R, F, P, and Q. The collaborative data shows that both users like Q, R, and P and so you probably are both very similar in what you like. This is furthered to say, you will probably enjoy what each other listen to and so you should check out the only track currently not mentioned in the preference list. So in this instance, for the first user it is F and for the second, it is track T. Now the question is, how does a music streaming app do this for millions of preferences? By using matrix math. Essentially, at the end of the mathematical equation, you get two types of vectors, where X is the user and Y is the song. When you compare these vectors, you find out which users have similar music tastes and which song’s are similar to the current song you are looking at.
The second type of music recommendations come from natural language processing which is sourced from text data. This can include news articles, blogs, text through the internet, and even metadata. What occurs here is your music streaming applications crawl the web, constantly looking out for written text about music, such as blogs or song titles, and figures out what people are saying about these songs or artists. Now because natural language processing allows a computer to understand human speech, it is able to see which adjectives are being used frequently in reference to the songs and artists in question. This data then gets processed into cultural vectors and top terms and given an associated weight that has a corresponding importance. Basically, the probability that someone is going to use that specific term to describe a song, band, genre, or artist. If the probability is high, that piece of music is likely to be categorized as similar.
The third type of music recommendation comes from analyzing raw audio tracks. Although it may not seem like you would need this in your music streaming app if you have the first two, what this particular model does is improves the accuracy of recommendations by taking both old and new songs into account. An example of this would be a new song coming onto the music application and only getting 50-100 listens but since there are so few filtering against it, this type of song could end up on a discovery playlist alongside popular songs. This is because raw audio models do not discriminate against new and popular songs, especially if natural language processing hasn’t picked the track up through text online. So how does raw audio tracks get analyzed? Through convolutional neural networks that form spectrograms. This works by creating convolutional layers or “thick and thin” bars that showcase time-frequency representations of audio frames and inputs. After passing through each layer, you are able to see what computing statistics are learned across the time of the song, or in layman’s terms the features of the songs. This can include time signature, mode, tempo, loudness, and key of the song. Your music streaming application can then understand the fundamental similarities of songs and recommend them based on their listening features.
When it comes down to it, your music streaming application knows you because of the massive amount of data that it stores and analyzes. In order to work correctly though, audio files, matrices, mathematics, and text must all be analyzed in real-time, applied, and updated through machine learning processes. Yes, a form of artificial intelligence powers that perfect recommended playlist you tap into on a daily basis.