--- title: "Application on toy example" author: "Alexis Vandenbon" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Application on toy example} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = 'center' ) library(ggplot2) theme_set(cowplot::theme_cowplot()) ``` # Application on a toy dataset A small toy dataset is included in the package. The toy dataset includes: - `dat.expression`: a toy scRNA-seq dataset with genes (rows) and cells (columns) - `dat.tsne`: 2D coordinates of the cells in a t-SNE splot First, let's apply `haystack` (the main function of the package) on the toy dataset. This should take just several seconds on a typical desktop computer. ```{r example1} library(singleCellHaystack) set.seed(1234) # run the main 'haystack' analysis # inputs are: # 1) the coordinates of the cells in the input space (here: dat.tsne) # 2) the expression data (dat.expression) res <- haystack(dat.tsne, dat.expression) # the returned results 'res' is of class 'haystack' class(res) ``` Let's have a look at the most significant differentially expressed genes (DEGs). ```{r example2} # show top 10 DEGs show_result_haystack(res.haystack = res, n=10) # alternatively: use a p-value threshold #show_result_haystack(res.haystack = res, p.value.threshold = 1e-10) ``` One of the most significant DEGs is "gene_497". Here we visualize its expression in the t-SNE plot. As you can see, this DEG is expressed only in cells in the upper-left corner of the plot. ```{r} d <- cbind(dat.tsne, t(dat.expression)) d[1:4, 1:4] ``` ```{r fig.width=6, fig.height=4} library(ggplot2) ggplot(d, aes(tSNE1, tSNE2, color=gene_497)) + geom_point() + scale_color_distiller(palette="Spectral") ``` Yes, the coordinates of the cells in this toy example t-SNE space roughly resemble a haystack; see [the Haystack paintings by Monet](https://en.wikipedia.org/wiki/Haystacks_(Monet_series)). # Clustering and visualization You are not limited to single genes. Here, we pick up a set of DEGs, and group them by their expression pattern in the plot into 5 clusters. ```{r example4} # get the top most significant genes, and cluster them by their distribution pattern in the 2D plot sorted.table <- show_result_haystack(res.haystack = res, p.value.threshold = 1e-10) gene.subset <- row.names(sorted.table) # k-means clustering #km <- kmeans_haystack(dat.tsne, dat.expression[gene.subset, ], grid.coordinates=res$info$grid.coordinates, k=5) #km.clusters <- km$cluster # alternatively: hierarchical clustering hm <- hclust_haystack(dat.tsne, dat.expression[gene.subset, ], grid.coordinates=res$info$grid.coordinates) ``` ... and visualize the pattern of the selected genes. ```{r fig.width=6, fig.height=8} ComplexHeatmap::Heatmap(dat.expression[gene.subset, ], show_column_names=FALSE, cluster_rows=hm, name="expression") ``` We divide the genes into clusters with cutree. ```{r} hm.clusters <- cutree(hm, k=4) table(hm.clusters) ``` Then calculate the average expression of the genes in each cluster. ```{r} for (cluster in unique(hm.clusters)) { d[[paste0("cluster_", cluster)]] <- colMeans(dat.expression[names(which(hm.clusters == cluster)), ]) } ``` ```{r fig.width=8, fig.height=6} lapply(c("cluster_1", "cluster_2", "cluster_3", "cluster_4"), function(cluster) { ggplot(d, aes(tSNE1, tSNE2, color=.data[[cluster]])) + geom_point() + scale_color_distiller(palette="Spectral") }) |> patchwork::wrap_plots() ``` From this plot we can see that genes in each cluster are expressed in different subsets of cells.