Sankey Plot#

A Sankey plot is a flow diagram in which the width of the arrows is proportional to the magnitude of the flow. It’s ideal for visualizing how quantities split, merge, or move between stages or categories. Sankey plots can also be variants like alluvial and bump charts, which emphasize different aspects of the data.

Aside from the example below, a more comprehensive overview of Sankey plots can be found in the R Graph Gallery.

Basic Example with networkD3#

In this example, a super simple Sankey plot visualizes how patient sample might be split into training, testing, and validation sets.

library(tidyverse)
library(networkD3)

# Define sample dataset: patient counts for each split
df_split <- tibble(
  set       = c("Total Patients", "Total Patients", "Total Patients"),
  subset    = c("Training",       "Testing",        "Validation"),
  count     = c(60,               20,               20)
)

# Create nodes: data.frame
nodes <- data.frame(name = unique(c(df_split$set, df_split$subset)))

# Create links: map factor levels to node indices (zero-based)
links <- df_split |>
  mutate(
    source = match(set,    nodes$name) - 1,
    target = match(subset, nodes$name) - 1,
    value  = count
  ) |>
  select(source, target, value)

# Draw the Sankey diagram
sankeyNetwork(
  Links     = links,
  Nodes     = nodes,
  Source    = "source",
  Target    = "target",
  Value     = "value",
  NodeID    = "name",
  fontSize  = 12,
  nodeWidth = 30,
  units     = "patients"
)

Tip

Use ?sankeyNetwork to get specific details about the parameters and additional options for customizing the Sankey plot, such as how links and nodes should be defined.

References#

R Packages: