Code
# Load libraries and data
library(tidyverse)
<- read_csv("app-data/train.csv")
data glimpse(data)
Riya Belani and Moyalo Ifebajo
June 26, 2025
Social media is an ever-growing force in our society. How people use these platforms and what platforms they are using are important metrics to understand who and how we are communicating with one another. This project aims to explore the landscape of social media use by age, gender, platform, dominant emotions, and different usage metrics. The goal for the project is to create an interactive dashboard with three graphs exploring the interactions between the different social media variables, demographic variables, and usage variables to answer the research questions posed:
The data is sourced from Kaggle and was researched and prepared by AI Inventor Emirhan Bulut. This data was not recorded longitudinally and looks at single responses for 1000 participants. Each participant responded with their age, gender, the platform they use, their daily usage minutes, the number of posts they make daily, the number of likes they receive daily, the number of comments they receive daily, the number of messages sent, and the dominant emotion they felt. The age range for the dataset is 20 to 35. The platform choices were: Instagram, Snapchat, Telegram, LinkedIn, Facebook, Twitter, and WhatsApp. The emotion choices were: Anger, Anxiety, Boredom, Happiness, Neutral, and Sadness. To process the data set from Kaggle, the train file was used, which contained all 1000 responses. There was an error with the age and gender columns where data was swapped between the two columns between rows 200 and 300, as well as rows 700 and 800, which was cleaned in the global.R file.
The app’s purpose is to answer the three research questions through three interactive graphs that the user can modify based on which variables they are most interested in viewing. The first graph explores how different social media usage metrics, which the user can select, versus gender to see how different genders interact with social media differently and how these interactions differ between platforms. The second graph explores how different age groups use social media daily, and users can group the ages, which range from 20 - 35, in different-sized bins as well as explore the usage data across the different platforms. The third graph explores what emotions are most dominant across platforms and how these emotions compare between platforms. Users are able to see the calculated ratio based on the number of people who reported an emotion while using a platform over the total number of people who reported using that platform.
Graph 1: A key takeaway from graph 1 is that across the four metrics, Instagram shows the highest mean values, and when looking at gender, nonbinary Instagram users have the highest mean reported values for posts made, messages sent, likes received, and comments received.
Graph 2: A key takeaway from graph two is that certain social media platforms skew towards older generations, and certain social media platforms skew towards younger generations. Specifically looking at Snapchat and Telegram, there are gaps in the higher age data bins for the older end of the data range for Snapchat and for Telegram, the opposite is seen with gaps being shown at the younger end of the data range.
Graph 3: A key takeaway from graph three that was unexpected was Instagram reporting the highest ratio for the dominant emotion being happy, and happy being the most reported emotion for Instagram users. Based on cultural knowledge of the app, it was expected that more negative emotions would be reported due to the potential unrealistic expectations that are portrayed on the app
There were some limitations with the way the dataset was recorded that constricted the visualizations that could be made. There was no longitudinal data, so no graphs exploring usage over time could be made. Likewise, there was no geographical data, so no maps of usage could be made as well. Furthermore, because each participant reported only one platform and therefore all of the metric data was based on one platform, it led to gaps in certain data analysis such as the one done for the heatmap as there were several blank spaces where there was no reported data, such as anger as the dominant emotion for Snapchat.
Overall, this project is a first look into the ways that social media use differentiates between age, gender, and the various platforms, as well as it begins to explore some of the outcomes of this usage through analyzing the dominant emotions reported for using each platform. An interesting next step in exploring these variables could be looking into whether time spent on each social media platform is a confounding variable for studying the dominant emotion felt while using a platform. Resurveying people in a manner where they have to give their usage, likes, posts, messages, comments, and emotions for each social media platform, as this would lead to no gaps for the heat map and having a more robust dataset. Widening the age range surveyed and adding other popular social media platforms like TikTok would also be an interesting next step. Likewise, asking people who are surveyed where they are located geographically could add a location variable to allow for maps to be generated showing what platforms are most popular in what regions. Finally, if the usage data were able to be recorded over time, it would be interesting to see how it changes over months, years, or even generations. This data set can help people better understand how they are interacting with media and how it makes them feel, which is important in our modern digital landscape.