Lab 3

Author

Your Name

Published

Invalid Date

Before your start the lab notebook

  • Update the header - put your name in the author argument and put today’s date in the date argument.
  • Click the “Render” button in RStudio and then open the rendered 1-lab1.html page.
  • Then go back and try changing the theme argument in the header to something else - you can see other available themes here. Notice the difference when you render now!

Overview of Lab3

There are two parts of this lab notebook. We will be using the same example datasets covered in class: forest and adult. In first part, we will practice faceting, which splits a plot into subplots that each display a subset of the data. In the second part, we will practice chart types that are more advanced such as radar chart, sankey diagrams, and parallel coordinates plot.

Skills practiced:

Load Required Packages and Datasets

Code
library(tidyverse)

forest <- read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/forest-fires/forestfires.csv") %>%
  mutate(
    month = tolower(month),
    day = tolower(day),
    season = case_when(
      month %in% c("dec", "jan", "feb") ~ "Winter",
      month %in% c("mar", "apr", "may") ~ "Spring",
      month %in% c("jun", "jul", "aug") ~ "Summer",
      month %in% c("sep", "oct", "nov") ~ "Fall",
      TRUE ~ "Unknown"
    ),
    RH_level = if_else(RH >= median(RH, na.rm = TRUE), "High RH", "Low RH"),
    season = factor(season, levels = c("Spring", "Summer", "Fall", "Winter")),
    RH_level = factor(RH_level, levels = c("Low RH", "High RH"))
  )

glimpse(forest)
#> Rows: 517
#> Columns: 15
#> $ X        <dbl> 7, 7, 7, 8, 8, 8, 8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 5, 8, 6, 6, 6…
#> $ Y        <dbl> 5, 4, 4, 6, 6, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4…
#> $ month    <chr> "mar", "oct", "oct", "mar", "mar", "aug", "aug", "aug", "sep"…
#> $ day      <chr> "fri", "tue", "sat", "fri", "sun", "sun", "mon", "mon", "tue"…
#> $ FFMC     <dbl> 86.2, 90.6, 90.6, 91.7, 89.3, 92.3, 92.3, 91.5, 91.0, 92.5, 9…
#> $ DMC      <dbl> 26.2, 35.4, 43.7, 33.3, 51.3, 85.3, 88.9, 145.4, 129.5, 88.0,…
#> $ DC       <dbl> 94.3, 669.1, 686.9, 77.5, 102.2, 488.0, 495.6, 608.2, 692.6, …
#> $ ISI      <dbl> 5.1, 6.7, 6.7, 9.0, 9.6, 14.7, 8.5, 10.7, 7.0, 7.1, 7.1, 22.6…
#> $ temp     <dbl> 8.2, 18.0, 14.6, 8.3, 11.4, 22.2, 24.1, 8.0, 13.1, 22.8, 17.8…
#> $ RH       <dbl> 51, 33, 33, 97, 99, 29, 27, 86, 63, 40, 51, 38, 72, 42, 21, 4…
#> $ wind     <dbl> 6.7, 0.9, 1.3, 4.0, 1.8, 5.4, 3.1, 2.2, 5.4, 4.0, 7.2, 4.0, 6…
#> $ rain     <dbl> 0.0, 0.0, 0.0, 0.2, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0…
#> $ area     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ season   <fct> Spring, Fall, Fall, Spring, Spring, Summer, Summer, Summer, F…
#> $ RH_level <fct> High RH, Low RH, Low RH, High RH, High RH, Low RH, Low RH, Hi…
Code
adult <- read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
                  col_names = c("age", "workclass", "fnlwgt", "education", "education_num",
                                "marital_status", "occupation", "relationship", "race", "sex",
                                "capital_gain", "capital_loss", "hours_per_week", "native_country", "income"))
glimpse(adult)
#> Rows: 32,561
#> Columns: 15
#> $ age            <dbl> 39, 50, 38, 53, 28, 37, 49, 52, 31, 42, 37, 30, 23, 32,…
#> $ workclass      <chr> "State-gov", "Self-emp-not-inc", "Private", "Private", …
#> $ fnlwgt         <dbl> 77516, 83311, 215646, 234721, 338409, 284582, 160187, 2…
#> $ education      <chr> "Bachelors", "Bachelors", "HS-grad", "11th", "Bachelors…
#> $ education_num  <dbl> 13, 13, 9, 7, 13, 14, 5, 9, 14, 13, 10, 13, 13, 12, 11,…
#> $ marital_status <chr> "Never-married", "Married-civ-spouse", "Divorced", "Mar…
#> $ occupation     <chr> "Adm-clerical", "Exec-managerial", "Handlers-cleaners",…
#> $ relationship   <chr> "Not-in-family", "Husband", "Not-in-family", "Husband",…
#> $ race           <chr> "White", "White", "White", "Black", "Black", "White", "…
#> $ sex            <chr> "Male", "Male", "Male", "Male", "Female", "Female", "Fe…
#> $ capital_gain   <dbl> 2174, 0, 0, 0, 0, 0, 0, 0, 14084, 5178, 0, 0, 0, 0, 0, …
#> $ capital_loss   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ hours_per_week <dbl> 40, 13, 40, 40, 40, 40, 16, 45, 50, 40, 80, 40, 30, 50,…
#> $ native_country <chr> "United-States", "United-States", "United-States", "Uni…
#> $ income         <chr> "<=50K", "<=50K", "<=50K", "<=50K", "<=50K", "<=50K", "…

Part 1: Plot faceting

1.1. Faceting on a continuous variable

Code
#|echo: TRUE
#|eval: FALSE

ggplot(forest) +
  geom_point(aes(x = temp, y = wind)) +
  facet_wrap(~ area)

✏️ Q: What do you observe in this plot? How do you fix this?

Code
# Your code here

1.2. What does . do?

Code
#|echo: TRUE
#|eval: FALSE

ggplot(forest) +
  geom_point(aes(x = temp, y = wind)) +
  facet_grid(season ~ .)

Code
ggplot(forest) +
  geom_point(aes(x = temp, y = wind)) +
  facet_grid(. ~ RH_level)

✏️ Q: Without running the code, guess what plots does the code above make? Then change eval: FALSE to eval: TRUE, run the code chunk and look at the output. What does . do?


1.3. Practice with nrow, ncol, and scales (Using Histograms)

How does the distribution of daily temperatures vary by season, and what layout best supports visual comparison?

Create a faceted histogram of temperature (temp) by season using facet_wrap(). Try different combinations of nrow and ncol to test how layout affects interpretability. Also experiment with the scales argument. Which combination best supports answering the research question? How would your layout choice change if you had 12 months instead of 4 seasons?

Code
# Your code here

✏️ Q: Which layout (rows vs columns) makes it easiest to compare shapes of the distributions?


Part 2: Advanced chart types

2.1. Radar Chart with adult Dataset

Research Question:

How do income groups differ in their numeric profiles (e.g., education, hours worked, capital gains)?

Your Tasks:

  • Group the data by income and summarize variables like education_num, hours_per_week, capital_gain, capital_loss.

  • Normalize all variables.

  • Use {ggradar} to plot.

  • Optional: Create radar charts for subgroups (e.g., income × sex).

Code
# Your code here

✏️ Summarize your findings based on the radar chart you created

2.2. Sankey Diagram with forest Dataset

Research Question:

How do seasonal and weather factors flow into different fire size categories?

Your Tasks:

  • Create new categorical variables:

    • season from month
    • RH_level as high/low relative humidity
  • Collapse area_group into 3 fire size bins.

  • Build a 3-axis Sankey: seasonRH_levelarea_group

Code
# Your code here

✏️ What seasonal transitions lead to larger fires?

2.3. Parallel Coordinates Plot

Research Question:

Can we identify fire weather patterns that differ across fire sizes?

Your Tasks:

  • Select variables: temp, wind, RH, rain, FFMC, DMC

  • Normalize, color lines by area_group

  • Use ggparcoord() to examine whether patterns differ by fire size

Code
# Your code here

✏️ How do weather patterns differ across fire sizes?

Save and Push Your Work

Remember to save your .qmd and render the HTML output before committing to GitHub.

Code
git add 3-lab3.qmd 3-lab3.html
git commit -m "Complete Lab 3"
git push