Data Critique

Selecting Our Sources

Our project focuses on the success of a film based on female representation on and off-screen. In our analysis, the word “success” encompasses revenue amount and popularity vote. To determine whether female representation in the film industry controls the commercial “success” of a movie, the Bechdel test is a widely used tool that quantifies how much women play a role in a film. Using research databases such as Google Scholar, the UCLA library, and EBSCOhost, our team compiled several academic papers on the effect of female representation on the industry’s success, using keywords such as “women AND film,” “Bechdel,” “representation in film,” and “film revenue.” Several articles contextualize female representation in film over time and how that shift affects the success outcomes of the films.  

Our literature review encompasses articles that dive into the complexities of female authority within the industry, stereotypes revolving around female portrayal, and the gender dispersion among film critic reviews. The articles provide in-depth research that scales above what the dataset shows, allowing us to understand and analyze the trends we observe in the dataset. Overall, the articles reveal a correlation between perspectives of female representation within film, on and off-camera, and the cultural impact these outcomes have on society.

Processing Our Sources

After sourcing data and literature relevant to our central theme of female representation in film, we began processing the sources. Because we drew from a singular, extensive dataset, we dedicated our time to preparing it thoroughly, making sure to emphasize the percentage of female representation, which was central to our thesis. Using our thesis as a guideline for our research, we selected specific information most helpful in answering our inquiries. We then utilized R to clean the variables in our dataset and remove any errors to ensure adequate readability of our data.

Presenting Our Sources

In order to develop our website, the UCLA Digital Humanities Department gave us a Humspace domain, which we used WordPress to create our site. Our website designer, Lucy, took our collective brainstorming ideas from discussions in the lab and transformed them into the visualizations on the website. Together, we built the website wireframes and each contributed to the specific design and layout choices of the website. We decided on a light pink theme throughout the site and integrated our visualizations to follow the color thematics. 

We relied on Tableau to transform our data into useful visualizations and embedded them in our site to ensure a user-friendly experience for our audience. We also created our timeline using the open-source tool, Timeline JS, and processed several historical images to bring the rich history of female representation in film to life. To put everything together, we integrated a short film on the home page and implemented images showcasing the influence of women behind cinema. 

Information Provided by Our Dataset

The dataset we have chosen focuses on female representation within cinema. The representation is measured through the Bechdel test scores and evaluates the cast/crew gender ratios that are involved in a movie. The Bechdel test is specifically designed to measure the representation of women in fiction, and it is based on three criteria: a work must feature at least two named women, who speak to each other, and what they speak about does not include a man. 

The dataset includes 7,634 rows and 22 columns. Each column is for a specific characteristic for the movie, including its title, year and date that it was released, the Bechdel test score, whether or not the submitter considered the movie to be dubious, IMBD (Internet Movie Database) popularity, TMBD (The Movie Database) popularity, genre(s), production company, production country, the average vote, vote count, cast, crew, budget for the movie, cast gender, crew gender, the cast female representation, and revenue in the American dollar.

Female Representation in Cinema highlights a few important facts, events, and phenomena. The earliest film in our dataset is from 1878 (Sallie Gardner at a Gallop) and the most recent is 2020 (Bad Boys for Life), giving us an opportunity to dive into gender representation in film through the centuries. By analyzing Bechdel Test scores and cast and crew gender ratios over time, we can identify how gender representation has changed both on and off screen. Combining these metrics with Oscar nominations, TMDB popularity, and film revenue also allows us to explore potential effects that representation has on movie success. Something that’s interesting about the gender variable is that it has four options: not specified, female, male, and non-binary. Based on these categories, one potential phenomenon to look at will be when the non-binary category starts being utilized, and if the label’s popularity correlates with a decrease in “not specified.”


Original Sources

Our dataset consists of three different sources. It is a collaboration of the Bechdel Test Movie List, processed by Alison Yuhan Yao, and also by the authors of the dataset from The Movie Database API. Additionally, a scrape of the Academy of Motion Picture Arts and Sciences archive was used to analyze the gender gap between Oscar winners and nominees. The final dataset is derived from Kaggle. These three sources were combined to display various movies from 1878-2020, their revenues and ratings, and the gender representation within each movie utilizing the Bechdel test system. 

How The Dataset was Funded

The dataset selected is based on female representation within film and television. The data is taken from the Bechdel Test Movie List which was processed by Alison Yuhan Yao. The Bechdel test was originally created by Alison Bechdel, a cartoonist, as a form of satire. The test is simple, as it ensures whether a film has at least two named women, that talk to each other, and that their conversation centers around something other than men. The actual organization that created this dataset is The Movie Database. They took both the Bechdel Test Movie List and combined that database with the archive of the Academy of Motion Picture Arts and Sciences, which provides information on the gender gap between Oscar winners and nominees. This collectively constitutes our selected dataset of female representation in film and television. The funding for this company is actually provided by a for-profit media company, TiVo, which partly owns The Movie Database. Without the help from TiVo, there would be insufficient funds to support such a database, especially since the website refuses to use ads.


Limitations of Our Dataset

Our dataset provides quantitative information about films, including the year of their release and TMDB popularity. In short, this movie metadata is meant to capture the disparity in gender representation amongst film cast and crew. However, the dataset fails to capture the full complexity behind these numbers; for instance, the dataset does not factor in for other subsets of identity, such as race or sexuality. Furthermore, the dataset is also unable to account for the social and/or political contexts behind why certain films succeeded or failed. Another consideration is that earlier data, such as films from the 1950s, may not be entirely accurate due to errors such as lapses in record-keeping or missing data. Finally, when analyzing films, the dataset merely focuses on the revenues and Oscar nominations (or lack thereof) of these films, rather than the arguably equally or more important consideration of cultural significance. 

Our dataset focuses mainly on quantitative data to examine female representation in film. It provides numbers for variables such as cast and crew gender ratios, Bechdel test score, release year, and budget/box-office metadata of each movie. This primarily quantitative ontology frames the quantity of female presence as the relevant factor of representation, overlooking qualitative aspects of how well women are portrayed and the nuances of their characters’ agency within the film’s storyline. The ideological implications of this division into data categories like cast/crew ratios vs Bechdel pass/fail vs basic film metadata reinforce a reductive notion that the sufficiency of gender diversity and female representation are characterized by quantity rather than quality. In addition, our dataset lacks more complex dimensions of intersectionality, such as how gender relates to race, class, social status, and sexuality. Ultimately, acknowledging the biases and values embedded in our dataset will enable us to approach our data visualizations with a lens of critical examination and transparency.