The world which he inherits bears him false witness. He is broken before a frozen god and he will never find he way
Cormac McCarthy, Blood Meridian
Seurat defaults to Louvain clustering, which came out of its use in CyTOF analysis (called PhenoGraph over there). Scanpy defaults to Leiden clustering, which is an improved version of Louvain. Seurat has the option of using Leiden, so here I take the pbmc3k dataset and the workflow from the Guided Clustering Tutorial, and assess its use across multiple seeds..
We are now going to take the cluster stability workflow I developed from my previous project and apply it to Louvain versus Leiden clustering. Luckily I made a number of functions for this already, which I will use below.
Let’s start with the workup.
library(Seurat)
## Loading required package: SeuratObject
## Loading required package: sp
## 'SeuratObject' was built under R 4.5.0 but the current version is
## 4.5.1; it is recomended that you reinstall 'SeuratObject' as the ABI
## for R may have changed
##
## Attaching package: 'SeuratObject'
## The following objects are masked from 'package:base':
##
## intersect, t
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(SeuratData)
## ── Installed datasets ──────────────────────────────── SeuratData v0.2.2.9002 ──
## ✔ cbmc 3.1.4 ✔ pbmcref 1.0.0
## ✔ pbmc3k 3.1.4
##
## ────────────────────────────────────── Key ─────────────────────────────────────
## ✔ Dataset loaded successfully
## ❯ Dataset built with a newer version of Seurat than installed
## ❓ Unknown version of Seurat installed
# For the gifs
library(magick)
## Linking to ImageMagick 6.9.13.29
## Enabled features: cairo, fontconfig, freetype, heic, lcms, pango, raw, rsvg, webp
## Disabled features: fftw, ghostscript, x11
library(base64enc)
set.seed(67)
# Main knob to turn
iters <- 20 # default 20
n_neighbors <- 20 # default 20
cells <- SeuratData::LoadData("pbmc3k")
## Validating object structure
## Updating object slots
## Ensuring keys are in the proper structure
## Warning: Assay RNA changing from Assay to Assay
## Ensuring keys are in the proper structure
## Ensuring feature names don't have underscores or pipes
## Updating slots in RNA
## Validating object structure for Assay 'RNA'
## Object representation is consistent with the most current Seurat version
## Warning: Assay RNA changing from Assay to Assay5
cells
## An object of class Seurat
## 13714 features across 2700 samples within 1 assay
## Active assay: RNA (13714 features, 0 variable features)
## 2 layers present: counts, data
cells <- NormalizeData(cells)
## Normalizing layer: counts
cells <- FindVariableFeatures(cells)
## Finding variable features for layer counts
cells <- ScaleData(cells)
## Centering and scaling data matrix
cells <- RunPCA(cells)
## PC_ 1
## Positive: MALAT1, LTB, IL32, CD2, ACAP1, STK17A, CTSW, CD247, CCL5, GIMAP5
## AQP3, GZMA, CST7, TRAF3IP3, MAL, HOPX, ITM2A, GZMK, MYC, BEX2
## GIMAP7, ETS1, LDLRAP1, ZAP70, LYAR, RIC3, TNFAIP8, KLRG1, SAMD3, NKG7
## Negative: CST3, TYROBP, LST1, AIF1, FTL, FCN1, LYZ, FTH1, S100A9, FCER1G
## TYMP, CFD, LGALS1, CTSS, S100A8, SERPINA1, LGALS2, SPI1, IFITM3, PSAP
## CFP, SAT1, IFI30, COTL1, S100A11, NPC2, LGALS3, GSTP1, PYCARD, NCF2
## PC_ 2
## Positive: CD79A, MS4A1, TCL1A, HLA-DQA1, HLA-DRA, HLA-DQB1, LINC00926, CD79B, HLA-DRB1, CD74
## HLA-DPB1, HLA-DMA, HLA-DQA2, HLA-DRB5, HLA-DPA1, HLA-DMB, FCRLA, HVCN1, LTB, BLNK
## KIAA0125, P2RX5, IRF8, IGLL5, SWAP70, ARHGAP24, SMIM14, PPP1R14A, FCRL2, C16orf74
## Negative: NKG7, PRF1, CST7, GZMA, GZMB, FGFBP2, CTSW, GNLY, GZMH, SPON2
## CCL4, FCGR3A, CCL5, CD247, XCL2, CLIC3, AKR1C3, SRGN, HOPX, CTSC
## TTC38, S100A4, ANXA1, IL32, IGFBP7, ID2, ACTB, XCL1, APOBEC3G, SAMD3
## PC_ 3
## Positive: HLA-DQA1, CD79A, CD79B, HLA-DQB1, HLA-DPB1, CD74, HLA-DPA1, MS4A1, HLA-DRB1, HLA-DRB5
## HLA-DRA, HLA-DQA2, TCL1A, LINC00926, HLA-DMB, HLA-DMA, HVCN1, FCRLA, IRF8, BLNK
## KIAA0125, SMIM14, PLD4, IGLL5, P2RX5, TMSB10, SWAP70, LAT2, MALAT1, IGJ
## Negative: PPBP, PF4, SDPR, SPARC, GNG11, NRGN, GP9, RGS18, TUBB1, CLU
## HIST1H2AC, AP001189.4, ITGA2B, CD9, TMEM40, CA2, PTCRA, ACRBP, MMD, TREML1
## NGFRAP1, F13A1, RUFY1, SEPT5, MPP1, CMTM5, TSC22D1, MYL9, RP11-367G6.3, GP1BA
## PC_ 4
## Positive: HLA-DQA1, CD79A, CD79B, HIST1H2AC, HLA-DQB1, PF4, MS4A1, SDPR, CD74, PPBP
## HLA-DPB1, GNG11, HLA-DQA2, SPARC, HLA-DRB1, HLA-DPA1, GP9, TCL1A, HLA-DRA, LINC00926
## NRGN, RGS18, HLA-DRB5, PTCRA, CD9, AP001189.4, CA2, CLU, TUBB1, ITGA2B
## Negative: VIM, S100A8, S100A6, S100A4, S100A9, TMSB10, IL32, GIMAP7, LGALS2, S100A10
## RBP7, FCN1, MAL, LYZ, S100A12, MS4A6A, CD2, FYB, S100A11, FOLR3
## GIMAP4, AQP3, ANXA1, AIF1, MALAT1, GIMAP5, IL8, IFI6, TRABD2A, TMSB4X
## PC_ 5
## Positive: GZMB, FGFBP2, NKG7, GNLY, PRF1, CCL4, CST7, SPON2, GZMA, CLIC3
## GZMH, XCL2, CTSW, TTC38, AKR1C3, CCL5, IGFBP7, XCL1, CCL3, S100A8
## TYROBP, HOPX, CD160, HAVCR2, S100A9, FCER1G, PTGDR, LGALS2, RBP7, S100A12
## Negative: LTB, VIM, AQP3, PPA1, MAL, KIAA0101, CD2, CYTIP, CORO1B, FYB
## IL32, TRADD, ANXA5, TUBA1B, HN1, TYMS, PTGES3, ITM2A, COTL1, GPR183
## TNFAIP8, ACTG1, TRAF3IP3, ATP5C1, GIMAP4, ZWINT, PRDX1, LDLRAP1, ABRACL, NGFRAP1
cells <- FindNeighbors(cells, k = n_neighbors)
## Computing nearest neighbor graph
## Computing SNN
And we make our UMAP.
cells <- RunUMAP(cells, dims = 1:10)
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 17:23:51 UMAP embedding parameters a = 0.9922 b = 1.112
## 17:23:51 Read 2700 rows and found 10 numeric columns
## 17:23:51 Using Annoy for neighbor search, n_neighbors = 30
## 17:23:51 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 17:23:51 Writing NN index file to temp file /var/folders/6y/qgvrbn_j2v35vfy946hlhh9c0000gn/T//RtmpvHDBNE/file15d255c6318b1
## 17:23:51 Searching Annoy index using 1 thread, search_k = 3000
## 17:23:51 Annoy recall = 100%
## 17:23:51 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 17:23:52 Initializing from normalized Laplacian + noise (using RSpectra)
## 17:23:52 Commencing optimization for 500 epochs, with 107868 positive edges
## 17:23:52 Using rng type: pcg
## 17:23:54 Optimization finished
dimr <- cells@reductions$umap@cell.embeddings %>% as_tibble()
dimr
## # A tibble: 2,700 × 2
## umap_1 umap_2
## <dbl> <dbl>
## 1 -2.98 -4.74
## 2 -6.82 12.2
## 3 -0.290 -1.93
## 4 11.5 3.11
## 5 -9.06 -3.17
## 6 -0.933 -5.44
## 7 -4.91 -3.62
## 8 -4.50 -2.76
## 9 -5.14 -5.12
## 10 12.3 3.85
## # ℹ 2,690 more rows
Here are the relevant functions, with a modification to the centroid list function to account for louvain vs leiden clustering.
MakeCentroidList <- function(seurat_obj, num_iter = 10, clust_res = 0.5, dimr, cluster_type = "louvain") {
# Annoying naming conventions
if(cluster_type == "louvain") {
algo <- 1
} else if (cluster_type == "leiden") {
algo <- 4
}
clust_list <- lapply(seq(num_iter), function(i) {
# Get the clusters
seurat_obj <- Seurat::FindClusters(seurat_obj, random.seed = i, resolution = clust_res, cluster.name = "tmp_clusters", verbose = FALSE, algorithm = algo)
clust <- seurat_obj@meta.data$tmp_clusters
curr <- bind_cols(dimr, cluster = clust)
# Get the centroids
centroids <- lapply(unique(clust), function(j) {
curr_cent <- dplyr::filter(curr, cluster == j) %>%
dplyr::select(umap_1, umap_2) %>%
apply(., 2, mean)
return(curr_cent)
}) %>% do.call(rbind, .) %>% as_tibble()
return(centroids)
})
return(clust_list)
}
CentPlot <- function(dimr, centroids) {
p <- ggplot() +
geom_point(data = dimr,
aes(x = umap_1, y = umap_2),
color = "black",
alpha = 0.9) +
geom_point(data = centroids,
aes(x = umap_1, y = umap_2),
color = "yellow",
size = 5)
return(p)
}
MakeGif <- function(plot_list, fps = 5, outfile) {
# Set up a graphics device to capture the plots.
# You can adjust width, height, and resolution as needed.
img_list <- image_graph(width = 800, height = 600, res = 96)
# Loop over your list and print each plot.
for (plot in plot_list) {
print(plot)
}
# Turn off the graphics device to record the plots.
dev.off()
# Animate the captured images (adjust fps for desired speed).
animation <- image_animate(img_list, fps = fps)
# Write the GIF to a raw vector in memory (without saving to disk)
gif_raw <- image_write(animation, format = "gif")
# Encode the raw GIF data into a Base64 string
gif_base64 <- base64encode(gif_raw)
# Output an HTML image tag with the embedded Base64 GIF
cat(sprintf('<img src="data:image/gif;base64,%s">', gif_base64))
# Save the animated GIF
image_write(animation, path = outfile)
}
And now we run things. I refrain from further comments only because I run this workflow across a number of settings, so I am going to just provide the code below.
Here is Louvain clustering over 3 resolution settings, with 0.5 being the default in the Guided Clustering Tutorial.
setwd(here::here("output"))
cents <- MakeCentroidList(seurat_obj = cells, num_iter = iters, clust_res = 0.5, dimr = dimr, cluster_type = "louvain")
cent_plots <- lapply(cents, function(i) {
CentPlot(dimr = dimr, centroids = i)
})
MakeGif(plot_list = cent_plots, fps = 10, outfile = "clust_louvain_0.5.gif")
setwd(here::here("output"))
cents <- MakeCentroidList(seurat_obj = cells, num_iter = iters, clust_res = 0.8, dimr = dimr, cluster_type = "louvain")
cent_plots <- lapply(cents, function(i) {
CentPlot(dimr = dimr, centroids = i)
})
MakeGif(plot_list = cent_plots, fps = 10, outfile = "clust_louvain_0.8.gif")
setwd(here::here("output"))
cents <- MakeCentroidList(seurat_obj = cells, num_iter = iters, clust_res = 1.2, dimr = dimr, cluster_type = "louvain")
cent_plots <- lapply(cents, function(i) {
CentPlot(dimr = dimr, centroids = i)
})
MakeGif(plot_list = cent_plots, fps = 10, outfile = "clust_louvain_1.2.gif")
And here is Leiden clustering across the same three resolution settings.
setwd(here::here("output"))
cents <- MakeCentroidList(seurat_obj = cells, num_iter = iters, clust_res = 0.5, dimr = dimr, cluster_type = "leiden")
cent_plots <- lapply(cents, function(i) {
CentPlot(dimr = dimr, centroids = i)
})
MakeGif(plot_list = cent_plots, fps = 10, outfile = "clust_leiden_0.5.gif")
setwd(here::here("output"))
cents <- MakeCentroidList(seurat_obj = cells, num_iter = iters, clust_res = 0.8, dimr = dimr, cluster_type = "leiden")
cent_plots <- lapply(cents, function(i) {
CentPlot(dimr = dimr, centroids = i)
})
MakeGif(plot_list = cent_plots, fps = 10, outfile = "clust_leiden_0.8.gif")
setwd(here::here("output"))
cents <- MakeCentroidList(seurat_obj = cells, num_iter = iters, clust_res = 1.2, dimr = dimr, cluster_type = "leiden")
cent_plots <- lapply(cents, function(i) {
CentPlot(dimr = dimr, centroids = i)
})
MakeGif(plot_list = cent_plots, fps = 10, outfile = "clust_leiden_1.2.gif")