Build a Movie Recommendation System Project in Python with Source Code

Faraz

By Faraz - July 19, 2024

Learn how to create a movie recommendation system in Python with this detailed step-by-step guide. Discover data preprocessing, feature extraction, and similarity computation techniques to build your own recommendation engine.


build-a-movie-recommendation-system-project-in-python-with-source-code.webp

Creating a movie recommendation system involves several steps, including data preprocessing, feature extraction, and similarity computation. In this blog, we'll walk through how to build a simple movie recommendation system using Python. Our system will leverage movie metadata such as genres, keywords, cast, and crew to recommend movies similar to a given title.

Setting Up Your Environment

Installing Python

First, ensure you have Python installed on your system. You can download it from python.org. We recommend using Python 3.7 or later.

Necessary Libraries

You'll need several libraries to build your recommendation system:

  • Pandas: For data manipulation and analysis.
  • NumPy: For numerical operations.
  • Scikit-learn: For machine learning algorithms.
  • NLTK: For natural language processing.

Install these libraries using pip:

pip install numpy pandas nltk scikit-learn

Step by Step Movie Recommendation System in Python

1. Loading the Data

We will use two datasets: credits.csv and movies.csv. The credits.csv file contains information about the cast and crew of the movies, while the movies.csv file contains movie details such as title, overview, genres, and keywords.

import numpy as np
import pandas as pd
import nltk
import pickle
from nltk.stem.porter import PorterStemmer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
import ast

# Load the datasets
credits = pd.read_csv('credits.csv')
movies = pd.read_csv('movies.csv')

# Merge datasets on movie title
movies = movies.merge(credits, on='title')
movies = movies[['movie_id', 'title', 'overview', 'genres', 'keywords', 'cast', 'crew']]
movies.dropna(inplace=True)

2. Data Preprocessing

We need to preprocess the data to make it suitable for our recommendation system:

  • Convert genre, keywords, and cast information from JSON strings to lists.
  • Extract the top 3 cast members and the director from the crew.
  • Collapse the lists to remove spaces for uniformity.
def convert(text):
    return [i['name'] for i in ast.literal_eval(text)]

movies['genres'] = movies['genres'].apply(convert)
movies['keywords'] = movies['keywords'].apply(convert)
movies['cast'] = movies['cast'].apply(convert).apply(lambda x: x[0:3])
movies['crew'] = movies['crew'].apply(lambda x: [i['name'] for i in ast.literal_eval(x) if i['job'] == 'Director'])

def collapse(L):
    return [i.replace(" ", "") for i in L]

movies['cast'] = movies['cast'].apply(collapse)
movies['crew'] = movies['crew'].apply(collapse)
movies['genres'] = movies['genres'].apply(collapse)
movies['keywords'] = movies['keywords'].apply(collapse)

We also need to process the overview text by splitting it into words and combining it with other features into a single "tags" column:

movies['overview'] = movies['overview'].apply(lambda x: x.split())
movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']

new = movies.drop(columns=['overview', 'genres', 'keywords', 'cast', 'crew'])
new['tags'] = new['tags'].apply(lambda x: " ".join(x))

3. Feature Extraction and Similarity Computation

We will use CountVectorizer to convert the text data into numerical vectors and compute the cosine similarity between these vectors:

cv = CountVectorizer(max_features=5000, stop_words='english')
vector = cv.fit_transform(new['tags']).toarray()

ps = PorterStemmer()

def stem(text):
    return " ".join([ps.stem(word) for word in text.split()])

new['tags'] = new['tags'].apply(stem)
similarity = cosine_similarity(vector)

4. Building the Recommendation Function

Finally, we create a function to recommend movies based on a given movie title. It finds the movie index, computes the similarity scores, and returns the top 5 similar movies:

def recommend(movie):
    index = new[new['title'] == movie].index[0]
    movie_list = sorted(list(enumerate(similarity[index])), reverse=True, key=lambda x: x[1])
    recommendations = [new.iloc[i[0]].title for i in movie_list[1:6]]
    return recommendations

print(recommend('Batman Begins'))

5. Saving the Model

To reuse the model without recomputing everything, we save the processed data and similarity matrix to disk:

pickle.dump(new, open('movie_list.pkl', 'wb'))
pickle.dump(similarity, open('similarity.pkl', 'wb'))

Full Movie Recommendation System Project's Source Code

import numpy as np
import pandas as pd
import nltk
import pickle
from nltk.stem.porter import PorterStemmer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
import ast

# Load the datasets
credits = pd.read_csv('credits.csv')
movies = pd.read_csv('movies.csv')

movies.shape
movies = movies.merge(credits,on='title')
movies = movies[['movie_id','title','overview','genres','keywords','cast','crew']]
movies.isnull().sum()
movies.dropna(inplace=True)
movies.iloc[0].genres

def convert(text):
    L = []
    for i in ast.literal_eval(text):
        L.append(i['name'])
    return L

movies['genres'] = movies['genres'].apply(convert)

movies['keywords'] = movies['keywords'].apply(convert)
ast.literal_eval('[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name":"Science Fiction"}]')

def convert3(text):
    L = []
    counter = 0
    for i in ast.literal_eval(text):
        if counter < 3:
            L.append(i['name'])
        counter+=1
    return L

movies['cast'] = movies['cast'].apply(convert)
movies['cast'] = movies['cast'].apply(lambda x:x[0:3])

def fetch_director(text):
    L = []
    for i in ast.literal_eval(text):
        if i['job'] == 'Director':
            L.append(i['name'])
    return L

movies['crew'] = movies['crew'].apply(fetch_director)


def collapse(L):
    L1 = []
    for i in L:
        L1.append(i.replace(" ",""))
    return L1

movies['cast'] = movies['cast'].apply(collapse)
movies['crew'] = movies['crew'].apply(collapse)
movies['genres'] = movies['genres'].apply(collapse)
movies['keywords'] = movies['keywords'].apply(collapse)

movies['overview'] = movies['overview'].apply(lambda x:x.split())

movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']

new = movies.drop(columns=['overview','genres','keywords','cast','crew'])
new['tags'] = new['tags'].apply(lambda x: " ".join(x))


cv = CountVectorizer(max_features=5000,stop_words='english')
vector = cv.fit_transform(new['tags']).toarray()
vector.shape

ps= PorterStemmer()


def stem(text):
    y = []

    for i in text.split():
        y.append(ps.stem(i))

    return " ".join(y)


new['tags'] = new['tags'].apply(stem)
similarity = cosine_similarity(vector)

def recommend(movie):
    index = new[new['title'] == movie].index[0]
    movie_list = sorted(list(enumerate(similarity[index])),reverse=True,key = lambda x: x[1])
    for i in movie_list[1:6]:
        print(new.iloc[i[0]].title)

print(recommend('Batman Begins'))

pickle.dump(new,open('movie_list.pkl','wb'))
pickle.dump(similarity,open('similarity.pkl','wb'))

Conclusion

You've now built a basic movie recommendation system using Python. This system processes movie metadata, calculates similarities, and provides recommendations based on content. You can further enhance this system by incorporating user ratings, more advanced machine learning techniques, or additional features for better accuracy.

Feel free to experiment with the code and customize it according to your needs.

Code by: Kanishka Sah

Demo Here: https://movie-recommender-system-ml-ca6n1lthfcd-kanishka.streamlit.app/

That’s a wrap!

I hope you enjoyed this article

Did you like it? Let me know in the comments below 🔥 and you can support me by buying me a coffee.

And don’t forget to sign up to our email newsletter so you can get useful content like this sent right to your inbox!

Thanks!
Faraz 😊

End of the article

Subscribe to my Newsletter

Get the latest posts delivered right to your inbox


Latest Post

Please allow ads on our site🥺