Code
library(readr)
library(tidyverse)
#| label: setup
#| include: false
knitr::opts_chunk$set(
warning = FALSE,
message = FALSE,
fig.path = "figs/",
fig.width = 7.252,
fig.height = 4,
comment = "#>",
fig.retina = 3
)Sora Ely; Ashlan Jackson
George Washington University
June 26, 2025
Cancer is the leading cause of death among American adults >45yo; lung cancer is the leading cause of cancer death and 3rd leading cause of death in adults in the U.S. (and worldwide) [1]. Lung cancer is responsible for more deaths annually than the next 3 leading causes of cancer death (colon, breast, prostate) combined [2]. This is also true in the District. There is a screening test for lung cancer that reduces lung cancer mortality by at least 20% [3], but lung cancer screening (LCS) is severely under-utilized more than a decade after the USPSTF first recommended it [4]. Because lung cancer is by far the #1 cancer killer, LCS could potentially save more lives than breast or colon cancer screening. Yet, while screening participation for breast cancer reached 81% in Washington, DC in 2020 and 76% nationwide in 2021 [5], the LCS rate was only 1.5% in DC and 4.5% nationally in 2023 [6].
Because the eligibility criteria for LCS require detailed tobacco history information, the tobacco history documented in discrete data fields in the patient electronic chart are often inadequate to determine eligibility and prompt screening. Ensuring a high level of complete tobacco history documentation is therefore a critical first step to increasing LCS and decreasing lung cancer mortality.
We were interested in examining (and tracking) the completeness rates of tobacco history among adult patient visits at GW Medicine as part of an ongoing pilot program to increase completeness rates.
Primary Research Q
**How does the percentage of visits with complete tobacco history compare between specialties/staff participating (“Pilot”) versus not participating (“Non-Pilot”) in the pilot program?** Has that changed over time (especially since the start of the pilot program)?
Secondary Research Q
Who are the top and bottom performers among all specialties and staff?
Data routinely collected for performance tracking as part of this approved QI project was fully de-identified and modified for use in this project.
Data De-identification - The primary data was fully de-identified of any patient-related data.
- Furthermore, the staff were also fully de-identified and assigned a unique ID number (and for the purposes of this project, no master list/link was retained; re-identification is not possible from the anonymized datasets).
Key Variables Data was structured such that each data point (row) reflects a visit (as each visit is an opportunity to collect a complete tobacco history), not a patient.
Several R packages to clean and explore the data. Tidyverse handled data manipulation, lubridate helped parse dates and group data by week and month, ggplot2 was for creating bar charts, shiny made an interactive dashboard, and quarto was for putting the report together. We filtered out “Pilot” and “Non-Pilot” groups to keep things focused on real specialties. Then we calculated how often each specialty completed tobacco history documentation, making sure to compare fairly since visit numbers vary.
#> # Downloading packages -------------------------------------------------------
#> - Downloading shiny from CRAN ... OK [4.3 Mb in 0.63s]
#> - Downloading commonmark from CRAN ... OK [126.1 Kb in 0.46s]
#> - Downloading promises from CRAN ... OK [1.6 Mb in 0.51s]
#> - Downloading otel from CRAN ... OK [273.3 Kb in 0.47s]
#> Successfully downloaded 4 packages in 3.1 seconds.
#>
#> The following package(s) will be installed:
#> - commonmark [2.0.0]
#> - otel [0.2.0]
#> - promises [1.5.0]
#> - shiny [1.13.0]
#> These packages will be installed into "~/Documents/Teaching/PUBH6199/hw6-bart copy/renv/library/macos/R-4.4/aarch64-apple-darwin20".
#>
#> # Installing packages --------------------------------------------------------
#> - Installing commonmark ... OK [installed binary and cached]
#> - Installing otel ... OK [installed binary and cached]
#> - Installing promises ... OK [installed binary and cached]
#> - Installing shiny ... OK [installed binary and cached in 0.28s]
#> Successfully installed 4 packages in 0.7 seconds.
#> # Downloading packages -------------------------------------------------------
#> - Downloading ggrepel from CRAN ... OK [351.3 Kb in 0.48s]
#> - Downloading S7 from CRAN ... OK [314.8 Kb in 0.35s]
#> Successfully downloaded 2 packages in 1.1 seconds.
#>
#> The following package(s) will be installed:
#> - ggrepel [0.9.8]
#> - S7 [0.2.1-1]
#> These packages will be installed into "~/Documents/Teaching/PUBH6199/hw6-bart copy/renv/library/macos/R-4.4/aarch64-apple-darwin20".
#>
#> # Installing packages --------------------------------------------------------
#> - Installing S7 ... OK [installed binary and cached]
#> - Installing ggrepel ... OK [installed binary and cached]
#> Successfully installed 2 packages in 0.19 seconds.
#> The following package(s) will be installed:
#> - viridis [0.6.5]
#> These packages will be installed into "~/Documents/Teaching/PUBH6199/hw6-bart copy/renv/library/macos/R-4.4/aarch64-apple-darwin20".
#>
#> # Installing packages --------------------------------------------------------
#> - Installing viridis ... OK [linked from cache]
#> Successfully installed 1 package in 2.4 milliseconds.
A weekly bar chart showing documentation rates by specialty — Nephrology and Gastroenterology came out on top, while Infectious Disease and Radiation Oncology were at the bottom. This plot will help providers diagnoses lung issues sooner.
library(ggplot2)
library(dplyr)
library(ggrepel)
library(viridis)
summary_groups <- tobacco_data_weekly_modified %>%
filter(!group %in% c("Pilot", "Non-Pilot")) %>%
filter(week_num == max(week_num)) %>%
group_by(group) %>%
summarize(final_value = max(percent_complete, na.rm = TRUE)) %>%
arrange(desc(final_value))
top_groups <- summary_groups %>% slice_head(n = 2) %>% pull(group)
bottom_groups <- summary_groups %>% slice_tail(n = 2) %>% pull(group)
highlight_groups <- c(top_groups, bottom_groups)
highlight_colors <- c(
setNames(rep("steelblue", 2), top_groups),
setNames(rep("firebrick", 2), bottom_groups),
"Other" = "gray80"
)
ggplot(
tobacco_data_weekly_modified %>%
filter(!group %in% c("Pilot", "Non-Pilot")) %>%
mutate(line_group = ifelse(group %in% highlight_groups, group, "Other")),
aes(x = week_num, y = percent_complete, group = group)
) +
geom_line(aes(color = line_group, size = line_group, alpha = line_group)) +
geom_point(aes(color = line_group, alpha = line_group), size = 2) +
geom_text_repel(
data = tobacco_data_weekly_modified %>%
filter(group %in% highlight_groups) %>%
group_by(group) %>%
filter(week_num == max(week_num)),
aes(label = group, color = group),
nudge_x = 0.3,
hjust = 0,
segment.color = NA,
size = 4,
show.legend = FALSE
) +
scale_color_manual(values = highlight_colors) +
scale_size_manual(values = c(setNames(rep(1.4, 4), highlight_groups), "Other" = 0.6)) +
scale_alpha_manual(values = c(setNames(rep(1, 4), highlight_groups), "Other" = 0.3)) +
scale_x_continuous(
breaks = unique(tobacco_data_weekly_modified$week_num),
labels = paste("Week", unique(tobacco_data_weekly_modified$week_num)),
expand = expansion(mult = c(0.05, 0.2))
) +
labs(
title = "Highest and Lowest Departments with Complete Tobacco History (Weekly)",
x = "Week of",
y = "Percent Complete"
) +
theme_minimal() +
theme(
legend.position = "none",
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)
)This plot shows the departments with the highest and lowest percentages of a complete tobacco history by month and this plot will help providers diagnoses lung issues sooner.This shows a slightly different picture — Physical Medicine and Rehabilitation did best, and Rheumatology and Radiation Oncology were lowest.
library(ggplot2)
library(dplyr)
library(ggrepel)
library(viridis)
clean_data <- tobacco_data_monthly_moified %>%
filter(!group %in% c("Pilot", "Non-Pilot", "Geriatrics (Ingleside)"))
summary_groups <- clean_data %>%
filter(month_num == max(month_num)) %>%
group_by(group) %>%
summarize(final_value = max(percent_complete, na.rm = TRUE)) %>%
arrange(desc(final_value))
top_groups <- summary_groups %>% slice_head(n = 2) %>% pull(group)
bottom_groups <- summary_groups %>% slice_tail(n = 2) %>% pull(group)
highlight_groups <- c(top_groups, bottom_groups)
highlight_colors <- c(
setNames(rep("steelblue", 2), top_groups),
setNames(rep("firebrick", 2), bottom_groups),
"Other" = "gray80"
)
plot_data <- clean_data %>%
mutate(line_group = ifelse(group %in% highlight_groups, group, "Other"))
ggplot(plot_data, aes(x = month_num, y = percent_complete, group = group)) +
geom_line(aes(color = line_group, size = line_group, alpha = line_group)) +
geom_point(aes(color = line_group, alpha = line_group), size = 2) +
geom_text_repel(
data = plot_data %>%
filter(group %in% highlight_groups) %>%
group_by(group) %>%
filter(month_num == max(month_num)),
aes(label = group, color = group),
nudge_x = 0.3,
hjust = 0,
segment.color = NA,
size = 4,
show.legend = FALSE
) +
scale_color_manual(values = highlight_colors) +
scale_size_manual(values = c(setNames(rep(1.4, 4), highlight_groups), "Other" = 0.6)) +
scale_alpha_manual(values = c(setNames(rep(1, 4), highlight_groups), "Other" = 0.3)) +
scale_x_continuous(
breaks = 1:12,
labels = month.abb[1:12],
expand = expansion(mult = c(0.05, 0.2))
) +
scale_y_continuous(
limits = c(0, 1),
labels = scales::percent_format(accuracy = 1)
) +
labs(
title = "Highest and Lowest Departments Complete Tobacco History (Monthly)",
x = "Month",
y = "Percent Complete"
) +
theme_minimal() +
theme(legend.position = "none")The Shiny app is a dashboard that allows users to interactively explore the completeness of tobacco history documentation across different specialties & staff at GW Medicine. Users can select time intervals and ranges to view trends, compare performance between pilot and non-pilot groups, and identify top performers.
Click here to open the interactive Shiny app
The design of the visualizations & dashboard was somewhat limited by the large number of specialties and especially staff, which made it challenging to visualize all data clearly in a single plot.
Additionally, it was important to maintain full anonymity for this public project, so we could not include staff names or data linkages, so they cannot use this dashboard in its current form to see their own performance.
The authentication (login-required) mode of shinyapps.io requires a $50/mo or $550/y subscription, and so more detailed information could not be used.
Importantly, the data used in this project was incomplete – only a subset of visits were included and development continues on creating a final dataset pulled from the EMR. Results may differ with the complete data.
This was actually the first time we were able to visualize data from this pilot project, although it was an incomplete dataset. Even though the trends are not as promising as we had hoped, it’s still exciting to see the data displayed. It will also serve as a great tool for reporting on the pilot program’s progress and outcomes with the funding agency and GW Medicine leadership.
A version of this dashboard will go into actual use at GW Medicine to help track the completeness of tobacco history documentation over time, and to identify top performers. The pilot participants will be able to track and compare their own performance. Hopefully, this will increase engagement and motivation.
Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics System, Mortality 2018-2023 on CDC WONDER Online Database, released in 2024. Data are from the Multiple Cause of Death Files, 2018-2023, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/ucd-icd10-expanded.html on Jun 22, 2025 5:09:41 PM
American Cancer Society. Facts & Figures 2025. American Cancer Society. Atlanta, Ga. 2025.
Reduced lung-cancer mortality with low-dose computed tomographic screening. (2011). New England Journal of Medicine, 365(5), 395–409. https://doi.org/10.1056/nejmoa1102873
Moyer, V. A. (2014). Screening for lung cancer: U.S. Preventive Services Task Force Recommendation Statement. Annals of Internal Medicine, 160(5), 330–338. https://doi.org/10.7326/m13-2771
American Cancer Society. Cancer Prevention & Early Detection Facts & Figures 2023-2024.
American Lung Association. (2023). (rep.). State of Lung Cancer: 2023 Report. Chicago, IL.
“All analyses were conducted in R (R version 4.0.1)”