πŸ† Stage 5 Β· The Final Chamber

~12 minutes Β· Collect the fragments Β· Claim the Toolkit

Instructor Β· Stage 5 (~12 min). Bring the class back together. Read each fragment aloud. Run the class debrief. Close with the Golden Data Toolkit reveal.

πŸ† The Final Chamber

Four doors. Four fragments. One toolkit.
You have passed through all of them.


The four fragments β€” assembled

Code
library(tidyverse)

fragments <- tibble(
  number = paste0("#", 0:5),
  source  = c("index.qmd β€” freebie", "demo.qmd", "report.qmd",
              "report.qmd (EDA)","report.qmd (tidymodels)",  "slides.qmd"),
  word    = c("Quarto", "Render", "Reproducible", "EDA", "Tidymodels", "Publish")
)
fragments
# A tibble: 6 Γ— 3
  number source                  word        
  <chr>  <chr>                   <chr>       
1 #0     index.qmd β€” freebie     Quarto      
2 #1     demo.qmd                Render      
3 #2     report.qmd              Reproducible
4 #3     report.qmd (EDA)        EDA         
5 #4     report.qmd (tidymodels) Tidymodels  
6 #5     slides.qmd              Publish     

πŸ΄β€β˜ οΈ The six words, in order:

Quarto Β· render Β· reproducible Β· EDA Β· tidymodels Β· publish

The data analyst/scientist who writes Quarto documents, clicks render, builds reproducibly, explores with EDA and analyze with tidymodels, and publishes for others β€” holds the Golden Data Toolkit.


The toolkit in plain language

πŸ—Ί Quarto

You tell the tool what you want. One document, endless outputs. More importantly you can easily combine code with text descriptions.

βš™οΈ Render

You do not export, copy, or paste. You run one command and the document builds itself.

πŸ” Reproducible

Your conclusions are inseparable from the code that produced them. No one can question whether you manually adjusted the chart.

πŸ” EDA

You look at the data before you model it. Distributions, correlations, outliers β€” the data has a story before you impose a model on it.

πŸ” Tidymodels

You model the data you explored and make predictions. You then evaluate those models to identify the best/most optimal model.

πŸ“€ Publish

Analysis that stays on your laptop has no value. Quarto makes sharing as easy as rendering. At this point you can communicate your results as a dynamic report or slides depending on your stakeholders.


Final scoreboard

Code
treasure <- read_csv("data/treasure_hunt.csv", show_col_types = FALSE) |>
  mutate(major = fct_reorder(major, final_treasure, .fun = mean))

treasure |>
  group_by(major) |>
  summarise(
    avg_score  = round(mean(final_treasure), 1),
    top_score  = max(final_treasure),
    n_teams    = n(),
    .groups = "drop"
  ) |>
  arrange(desc(avg_score)) |>
  mutate(rank  = row_number(),
         medal = c("πŸ₯‡", "πŸ₯ˆ", "πŸ₯‰", "πŸŽ–οΈ", "πŸŽ–οΈ")) |>
  select(medal, major, avg_score, top_score, n_teams)
# A tibble: 5 Γ— 5
  medal major      avg_score top_score n_teams
  <chr> <fct>          <dbl>     <dbl>   <int>
1 πŸ₯‡    OMIS            94          99       5
2 πŸ₯ˆ    Marketing       83.2        88       5
3 πŸ₯‰    Management      77.6        85       5
4 πŸŽ–οΈ     Accounting      76.8        79       5
5 πŸŽ–οΈ     Finance         71.8        87       5
Code
treasure |>
  ggplot(aes(x = major, y = final_treasure, fill = major)) +
  geom_boxplot(alpha = 0.55, outlier.shape = NA, width = 0.45) +
  geom_jitter(aes(colour = major), width = 0.1, size = 3, alpha = .9) +
  geom_hline(yintercept = mean(treasure$final_treasure),
             colour = "#d4a017", linewidth = 1, linetype = "dashed") +
  annotate("text", x = 5.35, y = mean(treasure$final_treasure) + 1.5,
           label = "class avg", colour = "#d4a017", size = 3.2) +
  scale_fill_viridis_d(option = "plasma", end = .82) +
  scale_colour_viridis_d(option = "plasma", end = .82) +
  labs(title = "Final treasure score by major", x = NULL, y = "Score") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")

Where every team landed

Class debrief

Discussion prompts β€” pick 2 or 3
  1. β€œWhat would happen to a report you built last semester if the raw data changed?”
  2. β€œWhen would you choose a website over a PDF?”
  3. β€œWhat is the business value of reproducibility in a professional setting?”
  4. β€œWhich part of today’s workflow surprised you most?”
  5. β€œWhere β€” internship, job, grad school β€” would this toolkit matter most for you personally?”

The Golden Data Toolkit β€” revealed

πŸ† The Golden Data Toolkit is yours

It is not a file. It is not a package.
It is a workflow.

Analyse once. Render to anything. Share reproducibly.

You have been building it all semester.
Today you just learned to name it.

Treasure chest opens β€” golden light


What to do next (optional extensions)

  • Leverage additional models. For example, compare the linear model with a random forest. Then determine the best model.
  • Add a communicate.qmd page to the website with your own EDA and tidymodels insight
  • Publish the site to Quarto Pub β€” free, one command or Github Pages.
  • Try quarto::quarto_render("report.qmd", output_format = "pdf") β€” same file, different format
# Publish to QuartoPub (free)
quarto::quarto_publish_site()

Pirate coding gif