Computing Bacon Numbers
I spent the past week doing Google Foobar exercises, and that was really fun (and really addictive). I managed to stop doing that to start working on a fun project: calculating a given person’s Bacon number.
First version
My first idea was to parse IMDb pages to extract the information I would need. I discovered Beautiful Soup, which is amazing and super easy to use. To find out what movies an actor was in all I needed to do was search for /title/ in that actor’s IMDb page, and to find what actors were in a movie I searched for /name/, and it works! After 2 days of working in this version, I had a Command Line Interface that could tell if an actor’s Bacon number was 1 or greater.
But this version was just too slow to do anything other than telling if someone did a movie with Kevin Bacon or not.
Final version
Turns out IMDb lets a subset of its data available as plain text files, and everything I needed was in there. So I could create my graph by parsing text files. At first I was trying to have a graph a just the actors, and 2 actors were connected if they’d done a movie together. To do that I would have to look at every pair of actors and connect them if they have a movie in common. But I don’t need the graph to be this way in order to calculate someone’s Bacon number. If I have a graph in which an actor is connected to every movie he made and a movie is connected to every actor in it, the shortest path between an actor and Kevin Bacon will be twice that actor’s Bacon number. And I can generate this graph in a couple minutes.
Finding the shortest path
So now all I have to do is find the shortest path between 2 nodes in a graph. Since all edges have the same weight, I can solve this with a Breadth-First Search:
So if I pass to BFS my graph with all the actors and movies and source = Bacon, I get a dictionary:
And a someone’s Bacon number is just:
The final version of my code is here
tags