Home

The purpose of this markdown is to recapitulate and expand upon the main findings in figure 2 of this COVID study. We will do this directly from the Seurat object publicly available (see seurat_object_explore.Rmd). We will begin with Figure 2A, which is the standard clustered UMAP of everything. We will then proceed to produce meaningful box and whisker plots from the data.

First, we’ll read in the data.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(here)
## here() starts at /Users/tylerburns/Documents/client_projects/presidio_labs/charite_covid_study
library(Seurat)
## The legacy packages maptools, rgdal, and rgeos, underpinning this package
## will retire shortly. Please refer to R-spatial evolution reports on
## https://r-spatial.org/r/2023/05/15/evolution4.html for details.
## This package is now running under evolution status 0 
## Attaching SeuratObject
setwd(here::here('data'))
dat <- readr::read_rds(list.files(pattern = ".rds"))

Figure 2A

Next, we’ll plot the UMAP of everything using the one previously made from Seurat.

Seurat::DimPlot(object = dat, reduction = "umap")

Figure 2B

From here, we will plot two UMAPs. One corresponding to mild covid and one corresponding to severe. Then we will plot a third UMAP which corresponds to knn overlap so we can see the difference. We start with a composite plot.

Now let’s try to make a composite plot.

Seurat::DimPlot(object = dat, reduction = "umap", group.by = "group_per_sample")

Now we make a new Seurat object that takes out the control samples.

unique(dat$orig.ident)
## [1] "Covid19"   "sepsis"    "one_k_v3"  "Five_k_v3" "Ten_k_v3"
cov <- subset(dat, orig.ident == 'Covid19')
Seurat::DimPlot(object = cov, reduction = "umap", group.by = "group_per_sample")

Now, we split it into two maps, corresponding to mild and severe, and we run them side by side. Here’s how we hack the color scheme to make this work.

cov$is_mild <- ifelse(cov$group_per_sample == 'mild', 1, 0) %>% as.numeric()
cov$is_severe <- ifelse(cov$group_per_sample == 'severe', 1, 0) %>% as.numeric()
p1 <- Seurat::FeaturePlot(object = cov, reduction = "umap", "is_mild")
p2 <- Seurat::FeaturePlot(object = cov, reduction = "umap", "is_severe")
p1 | p2

Now we do our KNN on it to make a composite map.

library(RANN)

nn <- RANN::nn2(data = cov@reductions$umap@cell.embeddings, k = 100)$nn.idx
pct_severe <- apply(nn, 1, function(i) {
  curr <- cov$is_severe[i]
  result <- sum(curr)/length(curr)
  return(result)
})
cov$pct_severe <- pct_severe
Seurat::FeaturePlot(object = cov, reduction = "umap", "pct_severe") + 
  scale_color_gradientn(colors = c("blue", "green", "yellow", "red"))
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.

Great, now let’s do this again taking the top principal components, which brings us a tad closer to the original manifold.

nn <- RANN::nn2(data = cov@reductions$pca@cell.embeddings[,1:20], k = 100)$nn.idx
pct_severe <- apply(nn, 1, function(i) {
  curr <- cov$is_severe[i]
  result <- sum(curr)/length(curr)
  return(result)
})

Now we make a plot.

cov$pct_severe <- pct_severe
Seurat::FeaturePlot(object = cov, reduction = "umap", "pct_severe") + 
  scale_color_gradientn(colors = c("blue", "green", "yellow", "red"))
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.

Ok, now we have to find all of our samples in the data.

dat$donor %>% unique()
##  [1] "C19-CB-0001" "C19-CB-0003" "C19-CB-0002" "C19-CB-0005" "C19-CB-0009"
##  [6] "C19-CB-0012" "C19-CB-0013" "C19-CB-0011" "C19-CB-0008" "C19-CB-0020"
## [11] "C19-CB-0021" "C19-CB-0016" "C19-CB-0198" "C19-CB-0204" "C19-CB-0199"
## [16] "C19-CB-0214" "C19-CB-0053" "C19-CB-0052" "P18F"        "P17H"       
## [21] "P20H"        "P15F"        "P08H"        "P13H"        "P07H"       
## [26] "P06F"        "P04H"        "C2P01H"      "P09H"        "P02H"       
## [31] "C2P05F"      "C2P07H"      "C2P13F"      "C2P16H"      "C2P10H"     
## [36] "C2P19H"      "C2P15H"      "one_k_v3"    "Five_k_v3"   "Ten_k_v3"

Figure 2E

And now we will make our box plots. First, we need to make a metadata table with the Seurat object.

patient_list <- lapply(unique(dat$donor), function(i) {
  curr <- subset(dat, donor == i)
  severity <- curr$group_per_sample[1] %>% unname()
  clusters <- table(curr$cluster_labels_res.0.4) 
  result <- c(severity, clusters)
  return(result)
}) %>% do.call(rbind, .) %>% as.data.frame()
names(patient_list)[1] <- "severity"

# Turn the numbers into percentages
severity <- patient_list$severity
patient_list$severity <- NULL
patient_list <- apply(patient_list, 2, as.numeric) %>% as_tibble()
patient_list <- apply(patient_list, 1, function(i) {
  100*(i/sum(i))
}) %>% t() %>% as_tibble()
patient_list$severity <- severity %>% as.factor()

Now we have to produce some box plots. This requires a bit of extra legwork with ggplot2. Below is a function I wrote in 2018, that surprisingly has not been equaled yet.

#' @title Add p-value to plot
#' @description Takes as input a ggplot object of the barplot-across-conditions
#' form, computes p-values using a wilcox test and then places the comparison bars
#' and the p-values onto the ggplot object, returning the modified plot object.
#' @param p A ggplot object in the form of a bar plot across two conditions.
#' @param groups A vector of type string corresponding to the groups being compared 
#' across the x axis of the bar plot.
#' @return The same ggplot object with the wilcox test having computed p values
#' for the conditions embedded in the original ggplot object. Within the image
#' one will see the standard scientific figure format with a line joining the
#' two 'bars' within the plot and wither "ns" printed above it, or 1-4 stars.
#' denoting respectively p < 0.05, 0.01, 0.001, and 0.0001.
AddPvalueToPlot <- function(p, groups) {
    # Make sure the control condition is first
    my_comparisons <- unique(groups) %>%
        as.character() %>%
        combn(2)

    # Makes a list of comparisons, if there is more than one condition
    # Assumes control + exp1, control + exp2, control + exp2, etc
    my_comparisons <- lapply(seq(ncol(my_comparisons)), function(j) {
        curr <- my_comparisons[,j]
    })

    # The statistical testing
    p <- p + ggpubr::stat_compare_means(
        method = "wilcox.test",
        aes(label = ..p.signif..),
        comparisons = my_comparisons,
        hide.ns = FALSE
    )

    # Some necessary aesthetics
    p$layers[[gginnards::which_layers(p, "GeomSignif")]]$aes_params$textsize <-
        6

    return(p)
}

Here is a single instance.

plt <- ggplot(patient_list, aes(severity, `HLA-DR+ CD83+ Monocytes`)) +
  geom_boxplot() +
  geom_jitter()

AddPvalueToPlot(p = plt, group = patient_list$severity)
## Warning in wilcox.test.default(c(14.775871610404, 57.3751451800232,
## 24.0498243372724, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(0.346260387811634, 0.27196083763938,
## 0.810910431256911, : cannot compute exact p-value with ties

And here is the rest of it. Note that ggplot2 gets really confused with backticks.

# Get rid of back ticks by getting rid of white space
for(i in names(patient_list)[1:length(patient_list) - 1]) {
  toplot <- tibble(severity = patient_list$severity, percent = patient_list[[i]])
  plt <- ggplot(toplot, aes(severity, percent)) +
  geom_boxplot() +
  geom_jitter() + 
  ggtitle(i)

  print(AddPvalueToPlot(p = plt, group = patient_list$severity))
}

## Warning in wilcox.test.default(c(14.775871610404, 57.3751451800232,
## 24.0498243372724, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(0.346260387811634, 0.27196083763938,
## 0.810910431256911, : cannot compute exact p-value with ties

## Warning in wilcox.test.default(c(30.3541781959048, 0.464576074332172,
## 0.127754710954966, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(30.3541781959048, 0.464576074332172,
## 0.127754710954966, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(1.70821791320406, 0.054392167527876, 0, :
## cannot compute exact p-value with ties

## Warning in wilcox.test.default(c(0, 0.0387146728610143, 0.0958160332162248, :
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(0, 0.0387146728610143, 0.0958160332162248, :
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(19.7368421052632, 14.6042969812347,
## 18.4666420936233, : cannot compute exact p-value with ties

## Warning in wilcox.test.default(c(0.332042058660764, 4.76190476190476,
## 1.5330565314596, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(0.332042058660764, 4.76190476190476,
## 1.5330565314596, : cannot compute exact p-value with ties

## Warning in wilcox.test.default(c(0.166021029330382, 0.232288037166086,
## 0.127754710954966, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(7.50230840258541, 4.10660864835464,
## 14.8175451529672, : cannot compute exact p-value with ties

## Warning in wilcox.test.default(c(0, 0, 0, 0.0755857898715042,
## 0.0354233085370174, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(0, 0, 0, 0.0755857898715042,
## 0.0354233085370174, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(8.21791320406279, 2.82839271144955,
## 0.589753040914117, : cannot compute exact p-value with ties

## Warning in wilcox.test.default(c(0.6087437742114, 0.193573364305072,
## 0.926221654423507, : cannot compute exact p-value with ties

## Warning in wilcox.test.default(c(19.0094078583287, 0.2710027100271,
## 0.670712232513574, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(4.73222530009234, 0.108784335055752,
## 0.110578695171397, : cannot compute exact p-value with ties

## Warning in wilcox.test.default(c(11.9258439402324, 11.4595431668602,
## 22.2612583839029, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(1.56971375807941, 1.30541202066902,
## 1.2532252119425, : cannot compute exact p-value with ties

## Warning in wilcox.test.default(c(5.86607636967349, 0.193573364305072,
## 0.0958160332162248, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(5.86607636967349, 0.193573364305072,
## 0.0958160332162248, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(0.023084025854109, 0, 0, 0, 0, 0,
## 0.0304136253041363, : cannot compute exact p-value with ties

## Warning in wilcox.test.default(c(0.0553403431101273, 0.0387146728610143, :
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(0.0553403431101273, 0.0387146728610143, :
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(0.0461680517082179, 0.108784335055752, :
## cannot compute exact p-value with ties

## Warning in wilcox.test.default(c(0.193691200885446, 0, 0, 0,
## 0.0708466170740347, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(0.193691200885446, 0, 0, 0,
## 0.0708466170740347, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(2.331486611265, 1.46858852325265,
## 1.1426465167711, : cannot compute exact p-value with ties

## Warning in wilcox.test.default(c(0.193691200885446, 0.387146728610143,
## 0.159693388693708, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(0.715604801477378, 0.652706010334512,
## 0.700331736085514, : cannot compute exact p-value with ties

## Warning in wilcox.test.default(c(0.0553403431101273, 0.116144018583043, :
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(0.0553403431101273, 0.116144018583043, :
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(0.323176361957525, 0.108784335055752,
## 0.147438260228529, : cannot compute exact p-value with ties

## Warning in wilcox.test.default(c(0, 0, 0, 0, 0, 0, 0.0551876379690949, 0:
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(0, 0, 0, 0, 0, 0, 0.0551876379690949, 0:
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(c(0.738688827331487, 0.244764753875442,
## 0.184297825285662, : cannot compute exact p-value with ties