Build a Movie Recommendation System Project in Python with Source Code

By Faraz - Last Updated: July 19, 2024

Learn how to create a movie recommendation system in Python with this detailed step-by-step guide. Discover data preprocessing, feature extraction, and similarity computation techniques to build your own recommendation engine.

Join us on Telegram

build-a-movie-recommendation-system-project-in-python-with-source-code.webp

Read Also

Convert Form Fill Data into Interactive Badge using HTML, CSS, and JavaScript

Creating a movie recommendation system involves several steps, including data preprocessing, feature extraction, and similarity computation. In this blog, we'll walk through how to build a simple movie recommendation system using Python. Our system will leverage movie metadata such as genres, keywords, cast, and crew to recommend movies similar to a given title.

Setting Up Your Environment

Installing Python

First, ensure you have Python installed on your system. You can download it from python.org. We recommend using Python 3.7 or later.

Necessary Libraries

You'll need several libraries to build your recommendation system:

Pandas: For data manipulation and analysis.
NumPy: For numerical operations.
Scikit-learn: For machine learning algorithms.
NLTK: For natural language processing.

Install these libraries using pip:

pip install numpy pandas nltk scikit-learn

Step by Step Movie Recommendation System in Python

1. Loading the Data

We will use two datasets: credits.csv and movies.csv. The credits.csv file contains information about the cast and crew of the movies, while the movies.csv file contains movie details such as title, overview, genres, and keywords.

import numpy as np
import pandas as pd
import nltk
import pickle
from nltk.stem.porter import PorterStemmer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
import ast

# Load the datasets
credits = pd.read_csv('credits.csv')
movies = pd.read_csv('movies.csv')

# Merge datasets on movie title
movies = movies.merge(credits, on='title')
movies = movies[['movie_id', 'title', 'overview', 'genres', 'keywords', 'cast', 'crew']]
movies.dropna(inplace=True)

2. Data Preprocessing

We need to preprocess the data to make it suitable for our recommendation system:

Convert genre, keywords, and cast information from JSON strings to lists.
Extract the top 3 cast members and the director from the crew.
Collapse the lists to remove spaces for uniformity.

def convert(text):
    return [i['name'] for i in ast.literal_eval(text)]

movies['genres'] = movies['genres'].apply(convert)
movies['keywords'] = movies['keywords'].apply(convert)
movies['cast'] = movies['cast'].apply(convert).apply(lambda x: x[0:3])
movies['crew'] = movies['crew'].apply(lambda x: [i['name'] for i in ast.literal_eval(x) if i['job'] == 'Director'])

def collapse(L):
    return [i.replace(" ", "") for i in L]

movies['cast'] = movies['cast'].apply(collapse)
movies['crew'] = movies['crew'].apply(collapse)
movies['genres'] = movies['genres'].apply(collapse)
movies['keywords'] = movies['keywords'].apply(collapse)

We also need to process the overview text by splitting it into words and combining it with other features into a single "tags" column:

movies['overview'] = movies['overview'].apply(lambda x: x.split())
movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']

new = movies.drop(columns=['overview', 'genres', 'keywords', 'cast', 'crew'])
new['tags'] = new['tags'].apply(lambda x: " ".join(x))

3. Feature Extraction and Similarity Computation

We will use CountVectorizer to convert the text data into numerical vectors and compute the cosine similarity between these vectors:

cv = CountVectorizer(max_features=5000, stop_words='english')
vector = cv.fit_transform(new['tags']).toarray()

ps = PorterStemmer()

def stem(text):
    return " ".join([ps.stem(word) for word in text.split()])

new['tags'] = new['tags'].apply(stem)
similarity = cosine_similarity(vector)

4. Building the Recommendation Function

Finally, we create a function to recommend movies based on a given movie title. It finds the movie index, computes the similarity scores, and returns the top 5 similar movies:

def recommend(movie):
    index = new[new['title'] == movie].index[0]
    movie_list = sorted(list(enumerate(similarity[index])), reverse=True, key=lambda x: x[1])
    recommendations = [new.iloc[i[0]].title for i in movie_list[1:6]]
    return recommendations

print(recommend('Batman Begins'))

5. Saving the Model

To reuse the model without recomputing everything, we save the processed data and similarity matrix to disk:

pickle.dump(new, open('movie_list.pkl', 'wb'))
pickle.dump(similarity, open('similarity.pkl', 'wb'))

Full Movie Recommendation System Project's Source Code

import numpy as np
import pandas as pd
import nltk
import pickle
from nltk.stem.porter import PorterStemmer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
import ast

# Load the datasets
credits = pd.read_csv('credits.csv')
movies = pd.read_csv('movies.csv')

movies.shape
movies = movies.merge(credits,on='title')
movies = movies[['movie_id','title','overview','genres','keywords','cast','crew']]
movies.isnull().sum()
movies.dropna(inplace=True)
movies.iloc[0].genres

def convert(text):
    L = []
    for i in ast.literal_eval(text):
        L.append(i['name'])
    return L

movies['genres'] = movies['genres'].apply(convert)

movies['keywords'] = movies['keywords'].apply(convert)
ast.literal_eval('[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name":"Science Fiction"}]')

def convert3(text):
    L = []
    counter = 0
    for i in ast.literal_eval(text):
        if counter < 3:
            L.append(i['name'])
        counter+=1
    return L

movies['cast'] = movies['cast'].apply(convert)
movies['cast'] = movies['cast'].apply(lambda x:x[0:3])

def fetch_director(text):
    L = []
    for i in ast.literal_eval(text):
        if i['job'] == 'Director':
            L.append(i['name'])
    return L

movies['crew'] = movies['crew'].apply(fetch_director)


def collapse(L):
    L1 = []
    for i in L:
        L1.append(i.replace(" ",""))
    return L1

movies['cast'] = movies['cast'].apply(collapse)
movies['crew'] = movies['crew'].apply(collapse)
movies['genres'] = movies['genres'].apply(collapse)
movies['keywords'] = movies['keywords'].apply(collapse)

movies['overview'] = movies['overview'].apply(lambda x:x.split())

movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']

new = movies.drop(columns=['overview','genres','keywords','cast','crew'])
new['tags'] = new['tags'].apply(lambda x: " ".join(x))


cv = CountVectorizer(max_features=5000,stop_words='english')
vector = cv.fit_transform(new['tags']).toarray()
vector.shape

ps= PorterStemmer()


def stem(text):
    y = []

    for i in text.split():
        y.append(ps.stem(i))

    return " ".join(y)


new['tags'] = new['tags'].apply(stem)
similarity = cosine_similarity(vector)

def recommend(movie):
    index = new[new['title'] == movie].index[0]
    movie_list = sorted(list(enumerate(similarity[index])),reverse=True,key = lambda x: x[1])
    for i in movie_list[1:6]:
        print(new.iloc[i[0]].title)

print(recommend('Batman Begins'))

pickle.dump(new,open('movie_list.pkl','wb'))
pickle.dump(similarity,open('similarity.pkl','wb'))

Conclusion

You've now built a basic movie recommendation system using Python. This system processes movie metadata, calculates similarities, and provides recommendations based on content. You can further enhance this system by incorporating user ratings, more advanced machine learning techniques, or additional features for better accuracy.

Feel free to experiment with the code and customize it according to your needs.

Code by: Kanishka Sah

Demo Here: https://movie-recommender-system-ml-ca6n1lthfcd-kanishka.streamlit.app/

Read Also creating a modern portfolio landing page with bootstrap and javascript.jpg

Create a Modern Portfolio Landing Page with Bootstrap and JavaScript

That’s a wrap!

I hope you enjoyed this article

Did you like it? Let me know in the comments below 🔥 and you can support me by buying me a coffee.

And don’t forget to sign up to our email newsletter so you can get useful content like this sent right to your inbox!

Thanks!
Faraz 😊

End of the article

Latest Post

Create Neon Brick Breaker Game using HTML CSS JS

by Faraz - April 05, 2025

Learn how to create a Neon Brick Breaker game using HTML, CSS, and JavaScript. Simple steps for beginners to build a glowing arcade-style game.

Landing Page

Create Movie Website Landing Page using HTML, CSS, and JavaScript

March 05, 2025

Form

Create Bootstrap 5 Feedback Form using HTML, CSS, & JavaScript

March 05, 2025

Button

Create Razorpay Payment Button using HTML and CSS

March 02, 2025

Festival

Valentine's Day Animation with HTML, CSS, and JavaScript

February 08, 2025

Create Razorpay Payment Button using HTML and CSS

by Faraz - March 02, 2025

Learn how to style a Razorpay payment button using HTML and CSS. Follow this easy step-by-step guide to design a beautiful payment button for your website.