Introduction

This NIMMO R package contains the proposed NIMMO algorithm (function NIMMO ), which is a computational method that can integrate multimodal single-cell data.

In this tutorial (R version: 4.1.2), we use one example to explain the functions in NIMMO package. We first load the required package:

                library(NIMMO)

and other necessary packages

                library(Seurat)

                ## Attaching SeuratObject

                ## Attaching sp

                library(dplyr)

                ## 
                ## Attaching package: 'dplyr'

                ## The following objects are masked from 'package:stats':
                ## 
                ##     filter, lag

                ## The following objects are masked from 'package:base':
                ## 
                ##     intersect, setdiff, setequal, union

                library(umap)
                library(reticulate)

NIMMO examples

NIMMO takes a seurat object with multimodal batches. Each batch has gone thourgh pre-processing steps, like normlization, highly variable gene selection, dimension reduction, etc.

Preparing single-cell data

We first load the scRNA-seq count data

                    dat.seu <- readRDS(system.file("extdata", "cbmc.seu.2mod.rds", package = "NIMMO"))
                    # Check the contents of this seurat object
                    dat.seu

                    ## An object of class Seurat 
                    ## 20511 features across 8617 samples within 2 assays 
                    ## Active assay: ADT (10 features, 10 variable features)
                    ##  1 other assay present: RNA
                    ##  2 dimensional reductions calculated: pca, apca

It can be noted that there are two assays, which are RNA and ADT . The corresponding dimension-reduced data are pca and apca .

In this example, we use both assays and the dimension-reduced data

                    assay.ls <- c("RNA", "ADT")
                    red.name.ls <- c("pca", "apca")

Setups in python environment

We next check whether python can be used since the NIMMO package needs to run code in python scripts.

                    Sys.which("python")

                    ## python 
                    ##     ""

Note that there is nothing here and hence we need to set up the path to python using

                    use_python("/Users/david/opt/anaconda3/bin/python")

In order to use functions written in python, users also need to set up their python environment using the following code

                    ### In python environment, run the following code
                    # conda install nomkl numpy scipy scikit-learn numexpr numba
                    # conda remove mkl mkl-service

                    # If the umap-learn python package is not installed, users can try
                    py_install("umap-learn")

                    ## + '/Users/david/opt/anaconda3/bin/conda' 'install' '--yes' '--prefix' '/Users/david/opt/anaconda3' '-c' 'conda-forge' 'umap-learn'

Running NIMMO

After these setups, let’s run NIMMO !

                    options(mc.cores = 2)
                    nimmo.res <- NIMMO(dat.seu, assay.ls, red.name.ls,
                                       q = 2, subsample=TRUE, n.subsample=1000, n.core=2)

The arguments dat.seu , assay.ls , and red.name.ls are obtained before. Setting q = 2 means that we use the L2 norm to integrate multimodal single-cell data. Using subsample=TRUE and n.subsample=1000 means that when we downsample the data to 1000 cells to calculate the scaling factors for each modality.

The output of NIMMO is also a seurat object similar to the input dat.seu . The difference is that under nimmo.res@reductions , there is one more two-dimensional embedding called nimmo , which is the integrated results.

We next perform dimension reduction on the RNA and ADT data separately and compare the low dimensional space among RNA, ADT and the integrated results by NIMMO . Since nimmo.res is also a seurat object, functions like RunUMAP and DimPlot created in the seurat package can be easily used.

                    nimmo.res <- RunUMAP(nimmo.res, reduction = 'pca', dims = 1:30, assay = 'RNA', 
                                   reduction.name = 'rna.umap', reduction.key = 'rnaUMAP_')

                    ## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
                    ## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
                    ## This message will be shown once per session

                    ## 10:25:30 UMAP embedding parameters a = 0.9922 b = 1.112

                    ## 10:25:30 Read 8617 rows and found 30 numeric columns

                    ## 10:25:30 Using Annoy for neighbor search, n_neighbors = 30

                    ## 10:25:30 Building Annoy index with metric = cosine, n_trees = 50

                    ## 0%   10   20   30   40   50   60   70   80   90   100%

                    ## [----|----|----|----|----|----|----|----|----|----|

                    ## **************************************************|
                    ## 10:25:31 Writing NN index file to temp file /var/folders/fd/29msl90x01jdwg5fdg2hb47m0000gn/T//RtmpXEED4w/file98fe4ca399bf
                    ## 10:25:31 Searching Annoy index using 1 thread, search_k = 3000
                    ## 10:25:32 Annoy recall = 100%
                    ## 10:25:32 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
                    ## 10:25:33 Initializing from normalized Laplacian + noise (using RSpectra)
                    ## 10:25:33 Commencing optimization for 500 epochs, with 385062 positive edges
                    ## 10:25:47 Optimization finished

                    nimmo.res <- RunUMAP(nimmo.res, reduction = 'apca', dims = 1:7, assay = 'ADT', 
                                       reduction.name = 'adt.umap', reduction.key = 'adtUMAP_')

                    ## 10:25:47 UMAP embedding parameters a = 0.9922 b = 1.112
                    ## 10:25:47 Read 8617 rows and found 7 numeric columns
                    ## 10:25:47 Using Annoy for neighbor search, n_neighbors = 30
                    ## 10:25:47 Building Annoy index with metric = cosine, n_trees = 50
                    ## 0%   10   20   30   40   50   60   70   80   90   100%
                    ## [----|----|----|----|----|----|----|----|----|----|
                    ## **************************************************|
                    ## 10:25:48 Writing NN index file to temp file /var/folders/fd/29msl90x01jdwg5fdg2hb47m0000gn/T//RtmpXEED4w/file98fe8c2f1e8
                    ## 10:25:48 Searching Annoy index using 1 thread, search_k = 3000
                    ## 10:25:50 Annoy recall = 100%
                    ## 10:25:50 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
                    ## 10:25:51 Initializing from normalized Laplacian + noise (using RSpectra)
                    ## 10:25:51 Commencing optimization for 500 epochs, with 347352 positive edges
                    ## 10:26:04 Optimization finished

                    library(ggplot2)
                    p1 <- DimPlot(nimmo.res, reduction='rna.umap', group.by='rna_annotations', label=T, repel=T, label.size=5.5) + 
                      NoLegend() + ggtitle('RNA') + theme(plot.title = element_text(size=32), axis.title.x = element_blank(),
                                                      axis.title.y = element_blank())
                    p2 <- DimPlot(nimmo.res, reduction='adt.umap', group.by ='rna_annotations', label=T, repel=T, label.size=5.5) + 
                      NoLegend() + ggtitle('Protein') + theme(plot.title = element_text(size=32), axis.title.x = element_blank(),
                                                          axis.title.y = element_blank())
                    p3 <- DimPlot(nimmo.res, reduction='nimmo', group.by = 'rna_annotations', label=T, repel=T, label.size = 5.5) +
                      NoLegend() + ggtitle('NIMMO') + theme(plot.title = element_text(size=32), axis.title.x = element_blank(),
                                                            axis.title.y = element_blank())

The UMAP plots are shown below.

                    cowplot::plot_grid(p1, p2, p3, nrow = 1)

Calculate the weights for each modality

Apart from obtaining the integrated results for multimodal data, we may be also interested in how each modality contributes to the integration of certain cell types. We also provide anxiliary functions to achieve this purpose.

We first get the cell type information from the seurat object dat.seu .

                cell.type <- dat.seu$rna_annotations

We next specify the assays and their corresponding dimension-reduced data that we are interested in and extract the list of data matrices.

                dat.list <- get.dat.list(dat.seu, assay.ls, red.name.ls)

First calculate the scaling factor for each modality. By using subsample.norm=TRUE and n.subsample.norm=1000 , we downsample the data to 1000 cells to calculate the scaling factors for each modality.

                dnorm <- cal.dnorm(dat.list, subsample.norm= TRUE, n.subsample.norm=1000, seed = NULL)

Next obtain the weights for each modality. Setting q = 2 means we use the L2 norm, which is the same as that used in the function NIMMO . Using subsample.wt=TRUE and n.subsample.wt=200 means that we downsample the number of cell in each cell type to 200 cells if they exceed this number. This is just for computational consideration.

                res <- cal.modality.weights(dat.list, cell.type, q=2, dnorm, subsample.wt=TRUE, n.subsample.wt=200, seed = NULL)

Let’s have a look at the weights of each modality!

                RNA.wt <- res[["RNA"]]
                RNA.wt <- data.frame(rows = rownames(RNA.wt)[row(RNA.wt)], cols = colnames(RNA.wt)[col(RNA.wt)], wt = c(RNA.wt))

                ADT.wt <- res[["ADT"]]
                ADT.wt <- data.frame(rows = rownames(ADT.wt)[row(ADT.wt)], cols = colnames(ADT.wt)[col(ADT.wt)], wt = c(ADT.wt))

                # heatmap of NIMMO weights (2-dimensional)
                wt.p1 <- ggplot(RNA.wt, aes(rows, cols, fill = wt)) + geom_tile() + coord_fixed() + 
                  scale_fill_gradient2(low = "darkblue", mid = "white", high = "darkred", midpoint = 0.5, breaks = seq(0,1,0.25), limits=c(0,1), na.value = "lightgrey") +
                  guides(fill = guide_colourbar(title = "weight")) + 
                  theme(axis.text.x = element_text(angle = 45, hjust = 1),
                        axis.title.x = element_blank(),
                        axis.title.y = element_blank()) + 
                  ggtitle("RNA")

                wt.p2 <- ggplot(ADT.wt, aes(rows, cols, fill = wt)) + geom_tile() + coord_fixed() + 
                  scale_fill_gradient2(low = "darkblue", mid = "white", high = "darkred", midpoint = 0.5, breaks = seq(0,1,0.25), limits=c(0,1), na.value = "lightgrey") +
                  guides(fill = guide_colourbar(title = "weight")) + 
                  theme(axis.text.x = element_text(angle = 45, hjust = 1),
                        axis.title.x = element_blank(),
                        axis.title.y = element_blank()) + 
                  ggtitle("ADT")

Here is the plot.

                cowplot::plot_grid(wt.p1, wt.p2, nrow = 1)

Session Info

                sessionInfo()

                ## R version 4.1.2 (2021-11-01)
                ## Platform: x86_64-apple-darwin17.0 (64-bit)
                ## Running under: macOS Big Sur 10.16
                ## 
                ## Matrix products: default
                ## BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
                ## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
                ## 
                ## locale:
                ## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
                ## 
                ## attached base packages:
                ## [1] stats     graphics  grDevices utils     datasets  methods   base     
                ## 
                ## other attached packages:
                ## [1] ggplot2_3.3.6      reticulate_1.26    umap_0.2.10.0      dplyr_1.0.10      
                ## [5] sp_1.5-0           SeuratObject_4.1.2 Seurat_4.2.0       NIMMO_0.1.0       
                ## 
                ## loaded via a namespace (and not attached):
                ##   [1] Rtsne_0.16            colorspace_2.0-3      deldir_1.0-6         
                ##   [4] ellipsis_0.3.2        ggridges_0.5.4        rprojroot_2.0.3      
                ##   [7] rstudioapi_0.13       spatstat.data_3.0-0   farver_2.1.1         
                ##  [10] leiden_0.4.3          listenv_0.8.0         ggrepel_0.9.1        
                ##  [13] RSpectra_0.16-1       fansi_1.0.3           codetools_0.2-18     
                ##  [16] splines_4.1.2         cachem_1.0.6          knitr_1.39           
                ##  [19] polyclip_1.10-0       jsonlite_1.8.2        ica_1.0-3            
                ##  [22] cluster_2.1.3         png_0.1-7             rgeos_0.5-9          
                ##  [25] uwot_0.1.14           shiny_1.7.2           sctransform_0.3.5    
                ##  [28] spatstat.sparse_3.0-0 compiler_4.1.2        httr_1.4.4           
                ##  [31] assertthat_0.2.1      Matrix_1.5-1          fastmap_1.1.0        
                ##  [34] lazyeval_0.2.2        cli_3.4.1             later_1.3.0          
                ##  [37] htmltools_0.5.3       tools_4.1.2           igraph_1.3.5         
                ##  [40] gtable_0.3.1          glue_1.6.2            RANN_2.6.1           
                ##  [43] reshape2_1.4.4        Rcpp_1.0.9            scattermore_0.8      
                ##  [46] jquerylib_0.1.4       vctrs_0.4.2           nlme_3.1-158         
                ##  [49] progressr_0.11.0      lmtest_0.9-40         spatstat.random_3.0-1
                ##  [52] xfun_0.31             stringr_1.4.1         globals_0.16.1       
                ##  [55] mime_0.12             miniUI_0.1.1.1        lifecycle_1.0.3      
                ##  [58] irlba_2.3.5.1         goftest_1.2-3         future_1.28.0        
                ##  [61] MASS_7.3-58           zoo_1.8-11            scales_1.2.1         
                ##  [64] spatstat.core_2.4-4   promises_1.2.0.1      spatstat.utils_3.0-1 
                ##  [67] parallel_4.1.2        RColorBrewer_1.1-3    yaml_2.3.5           
                ##  [70] pbapply_1.5-0         gridExtra_2.3         sass_0.4.2           
                ##  [73] rpart_4.1.16          stringi_1.7.8         highr_0.9            
                ##  [76] rlang_1.0.6           pkgconfig_2.0.3       matrixStats_0.62.0   
                ##  [79] evaluate_0.15         lattice_0.20-45       tensor_1.5           
                ##  [82] ROCR_1.0-11           purrr_0.3.5           labeling_0.4.2       
                ##  [85] patchwork_1.1.2       htmlwidgets_1.5.4     cowplot_1.1.1        
                ##  [88] tidyselect_1.2.0      here_1.0.1            parallelly_1.32.1    
                ##  [91] RcppAnnoy_0.0.19      plyr_1.8.7            magrittr_2.0.3       
                ##  [94] R6_2.5.1              generics_0.1.3        DBI_1.1.3            
                ##  [97] withr_2.5.0           mgcv_1.8-40           pillar_1.8.1         
                ## [100] fitdistrplus_1.1-8    abind_1.4-5           survival_3.3-1       
                ## [103] tibble_3.1.8          future.apply_1.9.1    KernSmooth_2.23-20   
                ## [106] utf8_1.2.2            spatstat.geom_3.0-3   plotly_4.10.0        
                ## [109] rmarkdown_2.19.2      grid_4.1.2            data.table_1.14.4    
                ## [112] digest_0.6.29         xtable_1.8-4          tidyr_1.2.1          
                ## [115] httpuv_1.6.6          openssl_2.0.4         munsell_0.5.0        
                ## [118] viridisLite_0.4.1     bslib_0.4.0           askpass_1.1