Louvain vs Leiden Cluster Stability

The world which he inherits bears him false witness. He is broken before a frozen god and he will never find he way

Cormac McCarthy, Blood Meridian

Introduction

Seurat defaults to Louvain clustering, which came out of its use in CyTOF analysis (called PhenoGraph over there). Scanpy defaults to Leiden clustering, which is an improved version of Louvain. Seurat has the option of using Leiden, so here I take the pbmc3k dataset and the workflow from the Guided Clustering Tutorial, and assess its use across multiple seeds..

We are now going to take the cluster stability workflow I developed from my previous project and apply it to Louvain versus Leiden clustering. Luckily I made a number of functions for this already, which I will use below.

Let’s start with the workup.

Data preparation

library(Seurat)

## Loading required package: SeuratObject

## Loading required package: sp

## 'SeuratObject' was built under R 4.5.0 but the current version is
## 4.5.1; it is recomended that you reinstall 'SeuratObject' as the ABI
## for R may have changed

## 
## Attaching package: 'SeuratObject'

## The following objects are masked from 'package:base':
## 
##     intersect, t

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(SeuratData)

## ── Installed datasets ──────────────────────────────── SeuratData v0.2.2.9002 ──
## ✔ cbmc    3.1.4                         ✔ pbmcref 1.0.0
## ✔ pbmc3k  3.1.4                         
## 
## ────────────────────────────────────── Key ─────────────────────────────────────
## ✔ Dataset loaded successfully
## ❯ Dataset built with a newer version of Seurat than installed
## ❓ Unknown version of Seurat installed

# For the gifs
library(magick)

## Linking to ImageMagick 6.9.13.29
## Enabled features: cairo, fontconfig, freetype, heic, lcms, pango, raw, rsvg, webp
## Disabled features: fftw, ghostscript, x11

library(base64enc)

set.seed(67)

# Main knob to turn
iters <- 20 # default 20
n_neighbors <- 20 # default 20

cells <- SeuratData::LoadData("pbmc3k")

## Validating object structure

## Updating object slots

## Ensuring keys are in the proper structure

## Warning: Assay RNA changing from Assay to Assay

## Ensuring keys are in the proper structure

## Ensuring feature names don't have underscores or pipes

## Updating slots in RNA

## Validating object structure for Assay 'RNA'

## Object representation is consistent with the most current Seurat version

## Warning: Assay RNA changing from Assay to Assay5

cells

## An object of class Seurat 
## 13714 features across 2700 samples within 1 assay 
## Active assay: RNA (13714 features, 0 variable features)
##  2 layers present: counts, data

cells <- NormalizeData(cells)

## Normalizing layer: counts

cells <- FindVariableFeatures(cells)

## Finding variable features for layer counts

cells <- ScaleData(cells)

## Centering and scaling data matrix

cells <- RunPCA(cells)

## PC_ 1 
## Positive:  MALAT1, LTB, IL32, CD2, ACAP1, STK17A, CTSW, CD247, CCL5, GIMAP5 
##     AQP3, GZMA, CST7, TRAF3IP3, MAL, HOPX, ITM2A, GZMK, MYC, BEX2 
##     GIMAP7, ETS1, LDLRAP1, ZAP70, LYAR, RIC3, TNFAIP8, KLRG1, SAMD3, NKG7 
## Negative:  CST3, TYROBP, LST1, AIF1, FTL, FCN1, LYZ, FTH1, S100A9, FCER1G 
##     TYMP, CFD, LGALS1, CTSS, S100A8, SERPINA1, LGALS2, SPI1, IFITM3, PSAP 
##     CFP, SAT1, IFI30, COTL1, S100A11, NPC2, LGALS3, GSTP1, PYCARD, NCF2 
## PC_ 2 
## Positive:  CD79A, MS4A1, TCL1A, HLA-DQA1, HLA-DRA, HLA-DQB1, LINC00926, CD79B, HLA-DRB1, CD74 
##     HLA-DPB1, HLA-DMA, HLA-DQA2, HLA-DRB5, HLA-DPA1, HLA-DMB, FCRLA, HVCN1, LTB, BLNK 
##     KIAA0125, P2RX5, IRF8, IGLL5, SWAP70, ARHGAP24, SMIM14, PPP1R14A, FCRL2, C16orf74 
## Negative:  NKG7, PRF1, CST7, GZMA, GZMB, FGFBP2, CTSW, GNLY, GZMH, SPON2 
##     CCL4, FCGR3A, CCL5, CD247, XCL2, CLIC3, AKR1C3, SRGN, HOPX, CTSC 
##     TTC38, S100A4, ANXA1, IL32, IGFBP7, ID2, ACTB, XCL1, APOBEC3G, SAMD3 
## PC_ 3 
## Positive:  HLA-DQA1, CD79A, CD79B, HLA-DQB1, HLA-DPB1, CD74, HLA-DPA1, MS4A1, HLA-DRB1, HLA-DRB5 
##     HLA-DRA, HLA-DQA2, TCL1A, LINC00926, HLA-DMB, HLA-DMA, HVCN1, FCRLA, IRF8, BLNK 
##     KIAA0125, SMIM14, PLD4, IGLL5, P2RX5, TMSB10, SWAP70, LAT2, MALAT1, IGJ 
## Negative:  PPBP, PF4, SDPR, SPARC, GNG11, NRGN, GP9, RGS18, TUBB1, CLU 
##     HIST1H2AC, AP001189.4, ITGA2B, CD9, TMEM40, CA2, PTCRA, ACRBP, MMD, TREML1 
##     NGFRAP1, F13A1, RUFY1, SEPT5, MPP1, CMTM5, TSC22D1, MYL9, RP11-367G6.3, GP1BA 
## PC_ 4 
## Positive:  HLA-DQA1, CD79A, CD79B, HIST1H2AC, HLA-DQB1, PF4, MS4A1, SDPR, CD74, PPBP 
##     HLA-DPB1, GNG11, HLA-DQA2, SPARC, HLA-DRB1, HLA-DPA1, GP9, TCL1A, HLA-DRA, LINC00926 
##     NRGN, RGS18, HLA-DRB5, PTCRA, CD9, AP001189.4, CA2, CLU, TUBB1, ITGA2B 
## Negative:  VIM, S100A8, S100A6, S100A4, S100A9, TMSB10, IL32, GIMAP7, LGALS2, S100A10 
##     RBP7, FCN1, MAL, LYZ, S100A12, MS4A6A, CD2, FYB, S100A11, FOLR3 
##     GIMAP4, AQP3, ANXA1, AIF1, MALAT1, GIMAP5, IL8, IFI6, TRABD2A, TMSB4X 
## PC_ 5 
## Positive:  GZMB, FGFBP2, NKG7, GNLY, PRF1, CCL4, CST7, SPON2, GZMA, CLIC3 
##     GZMH, XCL2, CTSW, TTC38, AKR1C3, CCL5, IGFBP7, XCL1, CCL3, S100A8 
##     TYROBP, HOPX, CD160, HAVCR2, S100A9, FCER1G, PTGDR, LGALS2, RBP7, S100A12 
## Negative:  LTB, VIM, AQP3, PPA1, MAL, KIAA0101, CD2, CYTIP, CORO1B, FYB 
##     IL32, TRADD, ANXA5, TUBA1B, HN1, TYMS, PTGES3, ITM2A, COTL1, GPR183 
##     TNFAIP8, ACTG1, TRAF3IP3, ATP5C1, GIMAP4, ZWINT, PRDX1, LDLRAP1, ABRACL, NGFRAP1

cells <- FindNeighbors(cells, k = n_neighbors)

## Computing nearest neighbor graph

## Computing SNN

And we make our UMAP.

cells <- RunUMAP(cells, dims = 1:10)

## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session

## 17:23:51 UMAP embedding parameters a = 0.9922 b = 1.112

## 17:23:51 Read 2700 rows and found 10 numeric columns

## 17:23:51 Using Annoy for neighbor search, n_neighbors = 30

## 17:23:51 Building Annoy index with metric = cosine, n_trees = 50

## 0%   10   20   30   40   50   60   70   80   90   100%

## [----|----|----|----|----|----|----|----|----|----|

## **************************************************|
## 17:23:51 Writing NN index file to temp file /var/folders/6y/qgvrbn_j2v35vfy946hlhh9c0000gn/T//RtmpvHDBNE/file15d255c6318b1
## 17:23:51 Searching Annoy index using 1 thread, search_k = 3000
## 17:23:51 Annoy recall = 100%
## 17:23:51 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 17:23:52 Initializing from normalized Laplacian + noise (using RSpectra)
## 17:23:52 Commencing optimization for 500 epochs, with 107868 positive edges
## 17:23:52 Using rng type: pcg
## 17:23:54 Optimization finished

dimr <- cells@reductions$umap@cell.embeddings %>% as_tibble()
dimr

## # A tibble: 2,700 × 2
##    umap_1 umap_2
##     <dbl>  <dbl>
##  1 -2.98   -4.74
##  2 -6.82   12.2 
##  3 -0.290  -1.93
##  4 11.5     3.11
##  5 -9.06   -3.17
##  6 -0.933  -5.44
##  7 -4.91   -3.62
##  8 -4.50   -2.76
##  9 -5.14   -5.12
## 10 12.3     3.85
## # ℹ 2,690 more rows

Functions

Here are the relevant functions, with a modification to the centroid list function to account for louvain vs leiden clustering.

MakeCentroidList <- function(seurat_obj, num_iter = 10, clust_res = 0.5, dimr, cluster_type = "louvain") {
    # Annoying naming conventions
    if(cluster_type == "louvain") {
        algo <- 1
    } else if (cluster_type == "leiden") {
        algo <- 4
    }
    clust_list <- lapply(seq(num_iter), function(i) {
        # Get the clusters
        seurat_obj <- Seurat::FindClusters(seurat_obj, random.seed = i, resolution = clust_res, cluster.name = "tmp_clusters", verbose = FALSE, algorithm = algo)
        clust <- seurat_obj@meta.data$tmp_clusters
        curr <- bind_cols(dimr, cluster = clust)
        
        # Get the centroids
        centroids <- lapply(unique(clust), function(j) {
            curr_cent <- dplyr::filter(curr, cluster == j) %>%
                dplyr::select(umap_1, umap_2) %>%
                apply(., 2, mean)
            return(curr_cent)
        }) %>% do.call(rbind, .) %>% as_tibble()
        
        return(centroids)
    })
    return(clust_list)
}

CentPlot <- function(dimr, centroids) {
    p <- ggplot() + 
        geom_point(data = dimr, 
                 aes(x = umap_1, y = umap_2), 
                 color = "black", 
                 alpha = 0.9) +
        geom_point(data = centroids, 
                 aes(x = umap_1, y = umap_2), 
                 color = "yellow", 
                 size = 5)
    return(p)
}

MakeGif <- function(plot_list, fps = 5, outfile) {
    # Set up a graphics device to capture the plots.
    # You can adjust width, height, and resolution as needed.
    img_list <- image_graph(width = 800, height = 600, res = 96)
    
    # Loop over your list and print each plot.
    for (plot in plot_list) {
      print(plot)
    }
    
    # Turn off the graphics device to record the plots.
    dev.off()
    
    # Animate the captured images (adjust fps for desired speed).
    animation <- image_animate(img_list, fps = fps)
    
    # Write the GIF to a raw vector in memory (without saving to disk)
    gif_raw <- image_write(animation, format = "gif")
    
    # Encode the raw GIF data into a Base64 string
    gif_base64 <- base64encode(gif_raw)
    
    # Output an HTML image tag with the embedded Base64 GIF
    cat(sprintf('<img src="data:image/gif;base64,%s">', gif_base64))
    
    # Save the animated GIF
    image_write(animation, path = outfile)
}

The workflow

And now we run things. I refrain from further comments only because I run this workflow across a number of settings, so I am going to just provide the code below.

Here is Louvain clustering over 3 resolution settings, with 0.5 being the default in the Guided Clustering Tutorial.

setwd(here::here("output"))

cents <- MakeCentroidList(seurat_obj = cells, num_iter = iters, clust_res = 0.5, dimr = dimr, cluster_type = "louvain")

cent_plots <- lapply(cents, function(i) {
    CentPlot(dimr = dimr, centroids = i)
})

MakeGif(plot_list = cent_plots, fps = 10, outfile = "clust_louvain_0.5.gif")

setwd(here::here("output"))

cents <- MakeCentroidList(seurat_obj = cells, num_iter = iters, clust_res = 0.8, dimr = dimr, cluster_type = "louvain")

cent_plots <- lapply(cents, function(i) {
    CentPlot(dimr = dimr, centroids = i)
})

MakeGif(plot_list = cent_plots, fps = 10, outfile = "clust_louvain_0.8.gif")

setwd(here::here("output"))

cents <- MakeCentroidList(seurat_obj = cells, num_iter = iters, clust_res = 1.2, dimr = dimr, cluster_type = "louvain")

cent_plots <- lapply(cents, function(i) {
    CentPlot(dimr = dimr, centroids = i)
})

MakeGif(plot_list = cent_plots, fps = 10, outfile = "clust_louvain_1.2.gif")

And here is Leiden clustering across the same three resolution settings.

setwd(here::here("output"))

cents <- MakeCentroidList(seurat_obj = cells, num_iter = iters, clust_res = 0.5, dimr = dimr, cluster_type = "leiden")

cent_plots <- lapply(cents, function(i) {
    CentPlot(dimr = dimr, centroids = i)
})

MakeGif(plot_list = cent_plots, fps = 10, outfile = "clust_leiden_0.5.gif")

setwd(here::here("output"))

cents <- MakeCentroidList(seurat_obj = cells, num_iter = iters, clust_res = 0.8, dimr = dimr, cluster_type = "leiden")

cent_plots <- lapply(cents, function(i) {
    CentPlot(dimr = dimr, centroids = i)
})

MakeGif(plot_list = cent_plots, fps = 10, outfile = "clust_leiden_0.8.gif")

setwd(here::here("output"))

cents <- MakeCentroidList(seurat_obj = cells, num_iter = iters, clust_res = 1.2, dimr = dimr, cluster_type = "leiden")

cent_plots <- lapply(cents, function(i) {
    CentPlot(dimr = dimr, centroids = i)
})

MakeGif(plot_list = cent_plots, fps = 10, outfile = "clust_leiden_1.2.gif")

Louvain vs Leiden Cluster Stability

Tyler J Burns, PhD

2025-11-17 - 2025-11-19

Introduction

Data preparation

Functions

The workflow