Goodreads Data Analysis

Final research project for Econ 323. Project involved data analysis and visualization using Python of a dataset of book ratings from Goodreads.

Goodreads is a popular website for rating and reviewing books and it also used as a reference for many readers when trying to decide on a new book to read. In addition to simply title and author, the site collects quantitative information about each books as well including its average rating, length, and total number of reviews received. This project will use this data from Goodreads and create visualizations to see if there are any relationships between the variables. Some initial questions of interest that this project will attempt to answer are:

  1. Do longer books receive different ratings than shorter books?
  2. Are books with more reviews also higher rated?

Regressions will also be run to determine whether any of the variables can predict its average rating. Finally, a suggestion tool will be created to give readers an idea of what to read next, based on adjustable criteria.

Data for this project was obtain from a freely available dataset on Kaggle:https://www.kaggle.com/jealousleopard/goodreadsbooks, which was compiled from the Goodreads API.

The full project can be found on Github: https://github.com/Graham1212/Econ-323-Final-Project-Goodreads-Data-Analysis/blob/main/GoodreadsData_FinalProject_Graham.ipynb

 

Share this learning activity with others