This NIMMO R package contains the proposed NIMMO algorithm (function
NIMMO
), which is a computational method that can integrate
multimodal single-cell data.
In this tutorial (R version: 4.1.2), we use one example to explain the functions in NIMMO package. We first load the required package:
library(NIMMO)
and other necessary packages
library(Seurat)
## Attaching SeuratObject
## Attaching sp
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(umap)
library(reticulate)
NIMMO takes a seurat object with multimodal batches. Each batch has gone thourgh pre-processing steps, like normlization, highly variable gene selection, dimension reduction, etc.
We first load the scRNA-seq count data
dat.seu <- readRDS(system.file("extdata", "cbmc.seu.2mod.rds", package = "NIMMO"))
# Check the contents of this seurat object
dat.seu
## An object of class Seurat
## 20511 features across 8617 samples within 2 assays
## Active assay: ADT (10 features, 10 variable features)
## 1 other assay present: RNA
## 2 dimensional reductions calculated: pca, apca
It can be noted that there are two assays, which are
RNA
and
ADT
. The corresponding dimension-reduced data are
pca
and
apca
.
In this example, we use both assays and the dimension-reduced data
assay.ls <- c("RNA", "ADT")
red.name.ls <- c("pca", "apca")
We next check whether python can be used since the NIMMO package needs to run code in python scripts.
Sys.which("python")
## python
## ""
Note that there is nothing here and hence we need to set up the path to python using
use_python("/Users/david/opt/anaconda3/bin/python")
In order to use functions written in python, users also need to set up their python environment using the following code
### In python environment, run the following code
# conda install nomkl numpy scipy scikit-learn numexpr numba
# conda remove mkl mkl-service
# If the umap-learn python package is not installed, users can try
py_install("umap-learn")
## + '/Users/david/opt/anaconda3/bin/conda' 'install' '--yes' '--prefix' '/Users/david/opt/anaconda3' '-c' 'conda-forge' 'umap-learn'
After these setups, let’s run
NIMMO
!
options(mc.cores = 2)
nimmo.res <- NIMMO(dat.seu, assay.ls, red.name.ls,
q = 2, subsample=TRUE, n.subsample=1000, n.core=2)
The arguments
dat.seu
,
assay.ls
, and
red.name.ls
are obtained before. Setting
q = 2
means that we use the L2 norm to integrate multimodal single-cell data.
Using
subsample=TRUE
and
n.subsample=1000
means that when we downsample the data to 1000 cells to calculate the
scaling factors for each modality.
The output of
NIMMO
is also a seurat object similar to
the input
dat.seu
. The difference is that under
nimmo.res@reductions
, there is one more two-dimensional
embedding called
nimmo
, which is the integrated
results.
We next perform dimension reduction on the RNA and ADT data
separately and compare the low dimensional space among RNA, ADT and the
integrated results by
NIMMO
. Since
nimmo.res
is also a seurat object, functions like
RunUMAP
and
DimPlot
created in the seurat package can be easily
used.
nimmo.res <- RunUMAP(nimmo.res, reduction = 'pca', dims = 1:30, assay = 'RNA',
reduction.name = 'rna.umap', reduction.key = 'rnaUMAP_')
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 10:25:30 UMAP embedding parameters a = 0.9922 b = 1.112
## 10:25:30 Read 8617 rows and found 30 numeric columns
## 10:25:30 Using Annoy for neighbor search, n_neighbors = 30
## 10:25:30 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 10:25:31 Writing NN index file to temp file /var/folders/fd/29msl90x01jdwg5fdg2hb47m0000gn/T//RtmpXEED4w/file98fe4ca399bf
## 10:25:31 Searching Annoy index using 1 thread, search_k = 3000
## 10:25:32 Annoy recall = 100%
## 10:25:32 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 10:25:33 Initializing from normalized Laplacian + noise (using RSpectra)
## 10:25:33 Commencing optimization for 500 epochs, with 385062 positive edges
## 10:25:47 Optimization finished
nimmo.res <- RunUMAP(nimmo.res, reduction = 'apca', dims = 1:7, assay = 'ADT',
reduction.name = 'adt.umap', reduction.key = 'adtUMAP_')
## 10:25:47 UMAP embedding parameters a = 0.9922 b = 1.112
## 10:25:47 Read 8617 rows and found 7 numeric columns
## 10:25:47 Using Annoy for neighbor search, n_neighbors = 30
## 10:25:47 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 10:25:48 Writing NN index file to temp file /var/folders/fd/29msl90x01jdwg5fdg2hb47m0000gn/T//RtmpXEED4w/file98fe8c2f1e8
## 10:25:48 Searching Annoy index using 1 thread, search_k = 3000
## 10:25:50 Annoy recall = 100%
## 10:25:50 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 10:25:51 Initializing from normalized Laplacian + noise (using RSpectra)
## 10:25:51 Commencing optimization for 500 epochs, with 347352 positive edges
## 10:26:04 Optimization finished
library(ggplot2)
p1 <- DimPlot(nimmo.res, reduction='rna.umap', group.by='rna_annotations', label=T, repel=T, label.size=5.5) +
NoLegend() + ggtitle('RNA') + theme(plot.title = element_text(size=32), axis.title.x = element_blank(),
axis.title.y = element_blank())
p2 <- DimPlot(nimmo.res, reduction='adt.umap', group.by ='rna_annotations', label=T, repel=T, label.size=5.5) +
NoLegend() + ggtitle('Protein') + theme(plot.title = element_text(size=32), axis.title.x = element_blank(),
axis.title.y = element_blank())
p3 <- DimPlot(nimmo.res, reduction='nimmo', group.by = 'rna_annotations', label=T, repel=T, label.size = 5.5) +
NoLegend() + ggtitle('NIMMO') + theme(plot.title = element_text(size=32), axis.title.x = element_blank(),
axis.title.y = element_blank())
The UMAP plots are shown below.
cowplot::plot_grid(p1, p2, p3, nrow = 1)
Apart from obtaining the integrated results for multimodal data, we may be also interested in how each modality contributes to the integration of certain cell types. We also provide anxiliary functions to achieve this purpose.
We first get the cell type information from the seurat object
dat.seu
.
cell.type <- dat.seu$rna_annotations
We next specify the assays and their corresponding dimension-reduced data that we are interested in and extract the list of data matrices.
dat.list <- get.dat.list(dat.seu, assay.ls, red.name.ls)
First calculate the scaling factor for each modality. By using
subsample.norm=TRUE
and
n.subsample.norm=1000
,
we downsample the data to 1000 cells to calculate the scaling factors
for each modality.
dnorm <- cal.dnorm(dat.list, subsample.norm= TRUE, n.subsample.norm=1000, seed = NULL)
Next obtain the weights for each modality. Setting
q = 2
means we use the L2 norm, which is the same as that used in the function
NIMMO
. Using
subsample.wt=TRUE
and
n.subsample.wt=200
means that we downsample the number of
cell in each cell type to 200 cells if they exceed this number. This is
just for computational consideration.
res <- cal.modality.weights(dat.list, cell.type, q=2, dnorm, subsample.wt=TRUE, n.subsample.wt=200, seed = NULL)
Let’s have a look at the weights of each modality!
RNA.wt <- res[["RNA"]]
RNA.wt <- data.frame(rows = rownames(RNA.wt)[row(RNA.wt)], cols = colnames(RNA.wt)[col(RNA.wt)], wt = c(RNA.wt))
ADT.wt <- res[["ADT"]]
ADT.wt <- data.frame(rows = rownames(ADT.wt)[row(ADT.wt)], cols = colnames(ADT.wt)[col(ADT.wt)], wt = c(ADT.wt))
# heatmap of NIMMO weights (2-dimensional)
wt.p1 <- ggplot(RNA.wt, aes(rows, cols, fill = wt)) + geom_tile() + coord_fixed() +
scale_fill_gradient2(low = "darkblue", mid = "white", high = "darkred", midpoint = 0.5, breaks = seq(0,1,0.25), limits=c(0,1), na.value = "lightgrey") +
guides(fill = guide_colourbar(title = "weight")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
axis.title.x = element_blank(),
axis.title.y = element_blank()) +
ggtitle("RNA")
wt.p2 <- ggplot(ADT.wt, aes(rows, cols, fill = wt)) + geom_tile() + coord_fixed() +
scale_fill_gradient2(low = "darkblue", mid = "white", high = "darkred", midpoint = 0.5, breaks = seq(0,1,0.25), limits=c(0,1), na.value = "lightgrey") +
guides(fill = guide_colourbar(title = "weight")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
axis.title.x = element_blank(),
axis.title.y = element_blank()) +
ggtitle("ADT")
Here is the plot.
cowplot::plot_grid(wt.p1, wt.p2, nrow = 1)
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.3.6 reticulate_1.26 umap_0.2.10.0 dplyr_1.0.10
## [5] sp_1.5-0 SeuratObject_4.1.2 Seurat_4.2.0 NIMMO_0.1.0
##
## loaded via a namespace (and not attached):
## [1] Rtsne_0.16 colorspace_2.0-3 deldir_1.0-6
## [4] ellipsis_0.3.2 ggridges_0.5.4 rprojroot_2.0.3
## [7] rstudioapi_0.13 spatstat.data_3.0-0 farver_2.1.1
## [10] leiden_0.4.3 listenv_0.8.0 ggrepel_0.9.1
## [13] RSpectra_0.16-1 fansi_1.0.3 codetools_0.2-18
## [16] splines_4.1.2 cachem_1.0.6 knitr_1.39
## [19] polyclip_1.10-0 jsonlite_1.8.2 ica_1.0-3
## [22] cluster_2.1.3 png_0.1-7 rgeos_0.5-9
## [25] uwot_0.1.14 shiny_1.7.2 sctransform_0.3.5
## [28] spatstat.sparse_3.0-0 compiler_4.1.2 httr_1.4.4
## [31] assertthat_0.2.1 Matrix_1.5-1 fastmap_1.1.0
## [34] lazyeval_0.2.2 cli_3.4.1 later_1.3.0
## [37] htmltools_0.5.3 tools_4.1.2 igraph_1.3.5
## [40] gtable_0.3.1 glue_1.6.2 RANN_2.6.1
## [43] reshape2_1.4.4 Rcpp_1.0.9 scattermore_0.8
## [46] jquerylib_0.1.4 vctrs_0.4.2 nlme_3.1-158
## [49] progressr_0.11.0 lmtest_0.9-40 spatstat.random_3.0-1
## [52] xfun_0.31 stringr_1.4.1 globals_0.16.1
## [55] mime_0.12 miniUI_0.1.1.1 lifecycle_1.0.3
## [58] irlba_2.3.5.1 goftest_1.2-3 future_1.28.0
## [61] MASS_7.3-58 zoo_1.8-11 scales_1.2.1
## [64] spatstat.core_2.4-4 promises_1.2.0.1 spatstat.utils_3.0-1
## [67] parallel_4.1.2 RColorBrewer_1.1-3 yaml_2.3.5
## [70] pbapply_1.5-0 gridExtra_2.3 sass_0.4.2
## [73] rpart_4.1.16 stringi_1.7.8 highr_0.9
## [76] rlang_1.0.6 pkgconfig_2.0.3 matrixStats_0.62.0
## [79] evaluate_0.15 lattice_0.20-45 tensor_1.5
## [82] ROCR_1.0-11 purrr_0.3.5 labeling_0.4.2
## [85] patchwork_1.1.2 htmlwidgets_1.5.4 cowplot_1.1.1
## [88] tidyselect_1.2.0 here_1.0.1 parallelly_1.32.1
## [91] RcppAnnoy_0.0.19 plyr_1.8.7 magrittr_2.0.3
## [94] R6_2.5.1 generics_0.1.3 DBI_1.1.3
## [97] withr_2.5.0 mgcv_1.8-40 pillar_1.8.1
## [100] fitdistrplus_1.1-8 abind_1.4-5 survival_3.3-1
## [103] tibble_3.1.8 future.apply_1.9.1 KernSmooth_2.23-20
## [106] utf8_1.2.2 spatstat.geom_3.0-3 plotly_4.10.0
## [109] rmarkdown_2.19.2 grid_4.1.2 data.table_1.14.4
## [112] digest_0.6.29 xtable_1.8-4 tidyr_1.2.1
## [115] httpuv_1.6.6 openssl_2.0.4 munsell_0.5.0
## [118] viridisLite_0.4.1 bslib_0.4.0 askpass_1.1