Title: | A Universal Differential Expression Prediction Tool for Single-Cell and Spatial Genomics Data |
---|---|
Description: | One key exploratory analysis step in single-cell genomics data analysis is the prediction of features with different activity levels. For example, we want to predict differentially expressed genes (DEGs) in single-cell RNA-seq data, spatial DEGs in spatial transcriptomics data, or differentially accessible regions (DARs) in single-cell ATAC-seq data. 'singleCellHaystack' predicts differentially active features in single cell omics datasets without relying on the clustering of cells into arbitrary clusters. 'singleCellHaystack' uses Kullback-Leibler divergence to find features (e.g., genes, genomic regions, etc) that are active in subsets of cells that are non-randomly positioned inside an input space (such as 1D trajectories, 2D tissue sections, multi-dimensional embeddings, etc). For the theoretical background of 'singleCellHaystack' we refer to our original paper Vandenbon and Diez (Nature Communications, 2020) <doi:10.1038/s41467-020-17900-3> and our update Vandenbon and Diez (Scientific Reports, 2023) <doi:10.1038/s41598-023-38965-2>. |
Authors: | Alexis Vandenbon [aut, cre] |
Maintainer: | Alexis Vandenbon <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.2 |
Built: | 2025-02-04 03:24:44 UTC |
Source: | https://github.com/alexisvdb/singlecellhaystack |
Default function given by function bandwidth.nrd in MASS. No changes were made to this function.
default_bandwidth.nrd(x)
default_bandwidth.nrd(x)
x |
A numeric vector |
A suitable bandwith.
Returns a row of a sparse matrix of class dgRMatrix. Function made by Ben Bolker and Ott Toomet (see https://stackoverflow.com/questions/47997184/)
extract_row_dgRMatrix(m, i = 1)
extract_row_dgRMatrix(m, i = 1)
m |
a sparse matrix of class dgRMatrix |
i |
the index of the row to return |
A row (numerical vector) of the sparse matrix
Returns a row of a sparse matrix of class lgRMatrix. Function made by Ben Bolker and Ott Toomet (see https://stackoverflow.com/questions/47997184/)
extract_row_lgRMatrix(m, i = 1)
extract_row_lgRMatrix(m, i = 1)
m |
a sparse matrix of class lgRMatrix |
i |
the index of the row to return |
A row (logical vector) of the sparse matrix
Calculates the Kullback-Leibler divergence between distributions.
get_D_KL(classes, parameters, reference.prob, pseudo)
get_D_KL(classes, parameters, reference.prob, pseudo)
classes |
A logical vector. Values are T is the gene is expressed in a cell, F is not. |
parameters |
Parameters of the analysis, as set by function 'get_parameters_haystack' |
reference.prob |
A reference distribution to calculate the divergence against. |
pseudo |
A pseudocount, used to avoid log(0) problems. |
A numerical value, the Kullback-Leibler divergence
Calculates the Kullback-Leibler divergence between distributions for the high-dimensional continuous version of haystack.
get_D_KL_continuous_highD( weights, density.contributions, reference.prob, pseudo = 0 )
get_D_KL_continuous_highD( weights, density.contributions, reference.prob, pseudo = 0 )
weights |
A numerical vector with expression values of a gene. |
density.contributions |
A matrix of density contributions of each cell (rows) to each center point (columns). |
reference.prob |
A reference distribution to calculate the divergence against. |
pseudo |
A pseudocount, used to avoid log(0) problems. |
A numerical value, the Kullback-Leibler divergence
Calculates the Kullback-Leibler divergence between distributions for the high-dimensional version of haystack().
get_D_KL_highD(classes, density.contributions, reference.prob, pseudo = 0)
get_D_KL_highD(classes, density.contributions, reference.prob, pseudo = 0)
classes |
A logical vector. Values are T is the gene is expressed in a cell, F is not. |
density.contributions |
A matrix of density contributions of each cell (rows) to each center point (columns). |
reference.prob |
A reference distribution to calculate the divergence against. |
pseudo |
A pseudocount, used to avoid log(0) problems. |
A numerical value, the Kullback-Leibler divergence
Function to get the density of points with value TRUE in the (x,y) plot
get_density( x, y, detection, rows.subset = 1:nrow(detection), high.resolution = FALSE )
get_density( x, y, detection, rows.subset = 1:nrow(detection), high.resolution = FALSE )
x |
x-axis coordinates of cells in a 2D representation (e.g. resulting from PCA or t-SNE) |
y |
y-axis coordinates of cells in a 2D representation |
detection |
A logical matrix or dgRMatrix showing which gens (rows) are detected in which cells (columns) |
rows.subset |
Indices of the rows of 'detection' for which to get the densities. Default: all. |
high.resolution |
Logical: should high resolution be used? Default is FALSE. |
A 3-dimensional array (dim 1: genes/rows of expression, dim 2 and 3: x and y grid points) with density data
Calculate the pairwise Euclidean distances between the rows of 2 matrices.
get_dist_two_sets(set1, set2)
get_dist_two_sets(set1, set2)
set1 |
A numerical matrix. |
set2 |
A numerical matrix. |
A matrix of pairwise distances between the rows of 2 matrices.
Calculate the Euclidean distance between x and y.
get_euclidean_distance(x, y)
get_euclidean_distance(x, y)
x |
A numerical vector. |
y |
A numerical vector. |
A numerical value, the Euclidean distance.
A function to decide grid points in a higher-dimensional space
get_grid_points(input, method = "centroid", grid.points = 100)
get_grid_points(input, method = "centroid", grid.points = 100)
input |
A numerical matrix with higher-dimensional coordinates (columns) of points (rows) |
method |
The method to decide grid points. Should be "centroid" (default) or "seeding". |
grid.points |
The number of grid points to return. Default is 100. |
Coordinates of grid points in the higher-dimensonal space.
Estimates the significance of the observed Kullback-Leibler divergence by comparing to randomizations.
get_log_p_D_KL(T.counts, D_KL.observed, D_KL.randomized, output.dir = NULL)
get_log_p_D_KL(T.counts, D_KL.observed, D_KL.randomized, output.dir = NULL)
T.counts |
The number of cells in which a gene is detected. |
D_KL.observed |
A vector of observed Kullback-Leibler divergences. |
D_KL.randomized |
A matrix of Kullback-Leibler divergences of randomized datasets. |
output.dir |
Optional parameter. Default is NULL. If not NULL, some files will be written to this directory. |
A vector of log10 p values, not corrected for multiple testing using the Bonferroni correction.
Estimates the significance of the observed Kullback-Leibler divergence by comparing to randomizations for the continuous version of haystack.
get_log_p_D_KL_continuous( D_KL.observed, D_KL.randomized, all.coeffVar, train.coeffVar, output.dir = NULL, spline.method = "ns" )
get_log_p_D_KL_continuous( D_KL.observed, D_KL.randomized, all.coeffVar, train.coeffVar, output.dir = NULL, spline.method = "ns" )
D_KL.observed |
A vector of observed Kullback-Leibler divergences. |
D_KL.randomized |
A matrix of Kullback-Leibler divergences of randomized datasets. |
all.coeffVar |
Coefficients of variation of all genes. Used for fitting the Kullback-Leibler divergences. |
train.coeffVar |
Coefficients of variation of genes that will be used for fitting the Kullback-Leibler divergences. |
output.dir |
Optional parameter. Default is NULL. If not NULL, some files will be written to this directory. |
spline.method |
Method to use for fitting splines "ns" (default): natural splines, "bs": B-splines. |
A vector of log10 p values, not corrected for multiple testing using the Bonferroni correction.
Function that decides most of the parameters that will be used during the "Haystack" analysis.
get_parameters_haystack(x, y, high.resolution = FALSE)
get_parameters_haystack(x, y, high.resolution = FALSE)
x |
x-axis coordinates of cells in a 2D representation (e.g. resulting from PCA or t-SNE) |
y |
y-axis coordinates of cells in a 2D representation |
high.resolution |
Logical: should high resolution be used? Default is FALSE. |
A list containing various parameters to use in the analysis.
Get reference distribution
get_reference(param, use.advanced.sampling = NULL)
get_reference(param, use.advanced.sampling = NULL)
param |
Parameters of the analysis, as set by function 'get_parameters_haystack' |
use.advanced.sampling |
If NULL naive sampling is used. If a vector is given (of length = no. of cells) sampling is done according to the values in the vector. |
A list with two components, Q for the reference distribution and pseudo.
The main Haystack function
haystack(x, ...) ## S3 method for class 'matrix' haystack( x, expression, weights.advanced.Q = NULL, dir.randomization = NULL, scale = TRUE, grid.points = 100, grid.method = "centroid", ... ) ## S3 method for class 'data.frame' haystack( x, expression, weights.advanced.Q = NULL, dir.randomization = NULL, scale = TRUE, grid.points = 100, grid.method = "centroid", ... ) ## S3 method for class 'Seurat' haystack( x, coord, assay = "RNA", slot = "data", dims = NULL, cutoff = 1, method = NULL, weights.advanced.Q = NULL, ... ) ## S3 method for class 'SingleCellExperiment' haystack( x, assay = "counts", coord = "TSNE", dims = NULL, cutoff = 1, method = NULL, weights.advanced.Q = NULL, ... )
haystack(x, ...) ## S3 method for class 'matrix' haystack( x, expression, weights.advanced.Q = NULL, dir.randomization = NULL, scale = TRUE, grid.points = 100, grid.method = "centroid", ... ) ## S3 method for class 'data.frame' haystack( x, expression, weights.advanced.Q = NULL, dir.randomization = NULL, scale = TRUE, grid.points = 100, grid.method = "centroid", ... ) ## S3 method for class 'Seurat' haystack( x, coord, assay = "RNA", slot = "data", dims = NULL, cutoff = 1, method = NULL, weights.advanced.Q = NULL, ... ) ## S3 method for class 'SingleCellExperiment' haystack( x, assay = "counts", coord = "TSNE", dims = NULL, cutoff = 1, method = NULL, weights.advanced.Q = NULL, ... )
x |
a matrix or other object from which coordinates of cells can be extracted. |
... |
further parameters passed down to methods. |
expression |
a matrix with expression data of genes (rows) in cells (columns) |
weights.advanced.Q |
If NULL naive sampling is used. If a vector is given (of length = no. of cells) sampling is done according to the values in the vector. |
dir.randomization |
If NULL, no output is made about the random sampling step. If not NULL, files related to the randomizations are printed to this directory. |
scale |
Logical (default=TRUE) indicating whether input coordinates in x should be scaled to mean 0 and standard deviation 1. |
grid.points |
An integer specifying the number of centers (gridpoints) to be used for estimating the density distributions of cells. Default is set to 100. |
grid.method |
The method to decide grid points for estimating the density in the high-dimensional space. Should be "centroid" (default) or "seeding". |
coord |
name of coordinates slot for specific methods. |
assay |
name of assay data for Seurat method. |
slot |
name of slot for assay data for Seurat method. |
dims |
dimensions from coord to use. By default, all. |
cutoff |
cutoff for detection. |
method |
choose between highD (default) and 2D haystack. |
An object of class "haystack"
The main Haystack function, for 2-dimensional spaces.
haystack_2D( x, y, detection, use.advanced.sampling = NULL, dir.randomization = NULL )
haystack_2D( x, y, detection, use.advanced.sampling = NULL, dir.randomization = NULL )
x |
x-axis coordinates of cells in a 2D representation (e.g. resulting from PCA or t-SNE) |
y |
y-axis coordinates of cells in a 2D representation |
detection |
A logical matrix showing which genes (rows) are detected in which cells (columns) |
use.advanced.sampling |
If NULL naive sampling is used. If a vector is given (of length = no. of cells) sampling is done according to the values in the vector. |
dir.randomization |
If NULL, no output is made about the random sampling step. If not NULL, files related to the randomizations are printed to this directory. |
An object of class "haystack"
The main Haystack function, for higher-dimensional spaces and continuous expression levels.
haystack_continuous_highD( x, expression, grid.points = 100, weights.advanced.Q = NULL, dir.randomization = NULL, scale = TRUE, grid.method = "centroid", randomization.count = 100, n.genes.to.randomize = 100, selection.method.genes.to.randomize = "heavytails", grid.coord = NULL, spline.method = "ns" )
haystack_continuous_highD( x, expression, grid.points = 100, weights.advanced.Q = NULL, dir.randomization = NULL, scale = TRUE, grid.method = "centroid", randomization.count = 100, n.genes.to.randomize = 100, selection.method.genes.to.randomize = "heavytails", grid.coord = NULL, spline.method = "ns" )
x |
Coordinates of cells in a 2D or higher-dimensional space. Rows represent cells, columns the dimensions of the space. |
expression |
a matrix with expression data of genes (rows) in cells (columns) |
grid.points |
An integer specifying the number of centers (grid points) to be used for estimating the density distributions of cells. Default is set to 100. |
weights.advanced.Q |
(Default: NULL) Optional weights of cells for calculating a weighted distribution of expression. |
dir.randomization |
If NULL, no output is made about the random sampling step. If not NULL, files related to the randomizations are printed to this directory. |
scale |
Logical (default=TRUE) indicating whether input coordinates in x should be scaled to mean 0 and standard deviation 1. |
grid.method |
The method to decide grid points for estimating the density in the high-dimensional space. Should be "centroid" (default) or "seeding". |
randomization.count |
Number of randomizations to use. Default: 100 |
n.genes.to.randomize |
Number of genes to use in randomizations. Default: 100 |
selection.method.genes.to.randomize |
Method used to select genes for randomization. |
grid.coord |
matrix of grid coordinates. |
spline.method |
Method to use for fitting splines "ns" (default): natural splines, "bs": B-splines. |
An object of class "haystack", including the results of the analysis, and the coordinates of the grid points used to estimate densities.
# using the toy example of the singleCellHaystack package # running haystack res <- haystack(dat.tsne, dat.expression) # list top 10 biased genes show_result_haystack(res, n=10)
# using the toy example of the singleCellHaystack package # running haystack res <- haystack(dat.tsne, dat.expression) # list top 10 biased genes show_result_haystack(res, n=10)
The main Haystack function, for higher-dimensional spaces.
haystack_highD( x, detection, grid.points = 100, use.advanced.sampling = NULL, dir.randomization = NULL, scale = TRUE, grid.method = "centroid" )
haystack_highD( x, detection, grid.points = 100, use.advanced.sampling = NULL, dir.randomization = NULL, scale = TRUE, grid.method = "centroid" )
x |
Coordinates of cells in a 2D or higher-dimensional space. Rows represent cells, columns the dimensions of the space. |
detection |
A logical matrix showing which genes (rows) are detected in which cells (columns) |
grid.points |
An integer specifying the number of centers (grid points) to be used for estimating the density distributions of cells. Default is set to 100. |
use.advanced.sampling |
If NULL naive sampling is used. If a vector is given (of length = no. of cells) sampling is done according to the values in the vector. |
dir.randomization |
If NULL, no output is made about the random sampling step. If not NULL, files related to the randomizations are printed to this directory. |
scale |
Logical (default=TRUE) indicating whether input coordinates in x should be scaled to mean 0 and standard deviation 1. |
grid.method |
The method to decide grid points for estimating the density in the high-dimensional space. Should be "centroid" (default) or "seeding". |
An object of class "haystack", including the results of the analysis, and the coordinates of the grid points used to estimate densities.
# I need to add some examples. # A toy example will be added too.
# I need to add some examples. # A toy example will be added too.
Function for hierarchical clustering of genes according to their expression distribution in 2D or multi-dimensional space
hclust_haystack( x, expression, grid.coordinates, hclust.method = "ward.D", cor.method = "spearman", ... ) ## S3 method for class 'matrix' hclust_haystack( x, expression, grid.coordinates, hclust.method = "ward.D", cor.method = "spearman", ... ) ## S3 method for class 'data.frame' hclust_haystack( x, expression, grid.coordinates, hclust.method = "ward.D", cor.method = "spearman", ... )
hclust_haystack( x, expression, grid.coordinates, hclust.method = "ward.D", cor.method = "spearman", ... ) ## S3 method for class 'matrix' hclust_haystack( x, expression, grid.coordinates, hclust.method = "ward.D", cor.method = "spearman", ... ) ## S3 method for class 'data.frame' hclust_haystack( x, expression, grid.coordinates, hclust.method = "ward.D", cor.method = "spearman", ... )
x |
a matrix or other object from which coordinates of cells can be extracted. |
expression |
expression matrix. |
grid.coordinates |
coordinates of the grid points. |
hclust.method |
method used with hclust. |
cor.method |
method used with cor. |
... |
further parameters passed down to methods. |
Function for hierarchical clustering of genes according to their distribution in a higher-dimensional space.
hclust_haystack_highD( x, detection, genes, method = "ward.D", grid.coordinates = NULL, scale = TRUE )
hclust_haystack_highD( x, detection, genes, method = "ward.D", grid.coordinates = NULL, scale = TRUE )
x |
Coordinates of cells in a 2D or higher-dimensional space. Rows represent cells, columns the dimensions of the space. |
detection |
A logical matrix showing which genes (rows) are detected in which cells (columns) |
genes |
A set of genes (of the 'detection' data) which will be clustered. |
method |
The method to use for hierarchical clustering. See '?hclust' for more information. Default: "ward.D". |
grid.coordinates |
Coordinates of grid points in the same space as 'x', to be used to estimate densities for clustering. |
scale |
whether to scale data. |
An object of class hclust, describing a hierarchical clustering tree.
# to be added
# to be added
Function for hierarchical clustering of genes according to their distribution on a 2D plot.
hclust_haystack_raw(x, y, detection, genes, method = "ward.D")
hclust_haystack_raw(x, y, detection, genes, method = "ward.D")
x |
x-axis coordinates of cells in a 2D representation (e.g. resulting from PCA or t-SNE) |
y |
y-axis coordinates of cells in a 2D representation |
detection |
A logical matrix showing which genes (rows) are detected in which cells (columns) |
genes |
A set of genes (of the 'detection' data) which will be clustered. |
method |
The method to use for hierarchical clustering. See '?hclust' for more information. Default: "ward.D". |
An object of class hclust, describing a hierarchical clustering tree.
Based on the MASS kde2d() function, but heavily simplified; it's just tcrossprod() now.
kde2d_faster(dens.x, dens.y)
kde2d_faster(dens.x, dens.y)
dens.x |
Contribution of all cells to densities of the x-axis grid points. |
dens.y |
Contribution of all cells to densities of the y-axis grid points. |
Function for k-means clustering of genes according to their expression distribution in 2D or multi-dimensional space
kmeans_haystack(x, expression, grid.coordinates, k, ...) ## S3 method for class 'matrix' kmeans_haystack(x, expression, grid.coordinates, k, ...) ## S3 method for class 'data.frame' kmeans_haystack(x, expression, grid.coordinates, k, ...)
kmeans_haystack(x, expression, grid.coordinates, k, ...) ## S3 method for class 'matrix' kmeans_haystack(x, expression, grid.coordinates, k, ...) ## S3 method for class 'data.frame' kmeans_haystack(x, expression, grid.coordinates, k, ...)
x |
a matrix or other object from which coordinates of cells can be extracted. |
expression |
expression matrix. |
grid.coordinates |
coordinates of the grid points. |
k |
number of clusters. |
... |
further parameters passed down to methods. |
Function for k-means clustering of genes according to their distribution in a higher-dimensional space.
kmeans_haystack_highD( x, detection, genes, grid.coordinates = NULL, k, scale = TRUE, ... )
kmeans_haystack_highD( x, detection, genes, grid.coordinates = NULL, k, scale = TRUE, ... )
x |
Coordinates of cells in a 2D or higher-dimensional space. Rows represent cells, columns the dimensions of the space. |
detection |
A logical matrix showing which genes (rows) are detected in which cells (columns) |
genes |
A set of genes (of the 'detection' data) which will be clustered. |
grid.coordinates |
Coordinates of grid points in the same space as 'x', to be used to estimate densities for clustering. |
k |
The number of clusters to return. |
scale |
whether to scale data. |
... |
Additional parameters which will be passed on to the kmeans function. |
An object of class kmeans, describing a clustering into 'k' clusters
# to be added
# to be added
Function for k-means clustering of genes according to their distribution on a 2D plot.
kmeans_haystack_raw(x, y, detection, genes, k, ...)
kmeans_haystack_raw(x, y, detection, genes, k, ...)
x |
x-axis coordinates of cells in a 2D representation (e.g. resulting from PCA or t-SNE) |
y |
y-axis coordinates of cells in a 2D representation |
detection |
A logical matrix showing which genes (rows) are detected in which cells (columns) |
genes |
A set of genes (of the 'detection' data) which will be clustered. |
k |
The number of clusters to return. |
... |
Additional parameters which will be passed on to the kmeans function. |
An object of class kmeans, describing a clustering into 'k' clusters
plot_compare_ranks
plot_compare_ranks(res1, res2, sort_by = "log.p.vals")
plot_compare_ranks(res1, res2, sort_by = "log.p.vals")
res1 |
haystack result. |
res2 |
haystack result. |
sort_by |
column to sort results (default: log.p.vals). |
Visualizing the detection/expression of a gene in a 2D plot
plot_gene_haystack(x, ...) ## S3 method for class 'matrix' plot_gene_haystack(x, dim1 = 1, dim2 = 2, ...) ## S3 method for class 'data.frame' plot_gene_haystack(x, dim1 = 1, dim2 = 2, ...) ## S3 method for class 'SingleCellExperiment' plot_gene_haystack( x, dim1 = 1, dim2 = 2, assay = "counts", coord = "TSNE", ... ) ## S3 method for class 'Seurat' plot_gene_haystack( x, dim1 = 1, dim2 = 2, assay = "RNA", slot = "data", coord = "tsne", ... )
plot_gene_haystack(x, ...) ## S3 method for class 'matrix' plot_gene_haystack(x, dim1 = 1, dim2 = 2, ...) ## S3 method for class 'data.frame' plot_gene_haystack(x, dim1 = 1, dim2 = 2, ...) ## S3 method for class 'SingleCellExperiment' plot_gene_haystack( x, dim1 = 1, dim2 = 2, assay = "counts", coord = "TSNE", ... ) ## S3 method for class 'Seurat' plot_gene_haystack( x, dim1 = 1, dim2 = 2, assay = "RNA", slot = "data", coord = "tsne", ... )
x |
a matrix or other object from which coordinates of cells can be extracted. |
... |
further parameters passed to plot_gene_haystack_raw(). |
dim1 |
column index or name of matrix for x-axis coordinates. |
dim2 |
column index or name of matrix for y-axis coordinates. |
assay |
name of assay data for Seurat method. |
coord |
name of coordinates slot for specific methods. |
slot |
name of slot for assay data for Seurat method. |
Visualizing the detection/expression of a gene in a 2D plot
plot_gene_haystack_raw( x, y, gene, expression, detection = NULL, high.resolution = FALSE, point.size = 1, order.by.signal = FALSE )
plot_gene_haystack_raw( x, y, gene, expression, detection = NULL, high.resolution = FALSE, point.size = 1, order.by.signal = FALSE )
x |
x-axis coordinates of cells in a 2D representation (e.g. resulting from PCA or t-SNE) |
y |
y-axis coordinates of cells in a 2D representation |
gene |
name of a gene that is present in the input expression data, or a numerical index |
expression |
a logical/numerical matrix showing detection/expression of genes (rows) in cells (columns) |
detection |
an optional logical matrix showing detection of genes (rows) in cells (columns). If left as NULL, the density distribution of the gene is not plotted. |
high.resolution |
logical (default: FALSE). If set to TRUE, the density plot will be of a higher resolution |
point.size |
numerical value to set size of points in plot. Default is 1. |
order.by.signal |
If TRUE, cells with higher signal will be put on the foreground in the plot. Default is FALSE. |
A plot
Visualizing the detection/expression of a set of genes in a 2D plot
plot_gene_set_haystack(x, ...) ## S3 method for class 'matrix' plot_gene_set_haystack(x, dim1 = 1, dim2 = 2, ...) ## S3 method for class 'data.frame' plot_gene_set_haystack(x, dim1 = 1, dim2 = 2, ...) ## S3 method for class 'SingleCellExperiment' plot_gene_set_haystack( x, dim1 = 1, dim2 = 2, assay = "counts", coord = "TSNE", ... ) ## S3 method for class 'Seurat' plot_gene_set_haystack( x, dim1 = 1, dim2 = 2, assay = "RNA", slot = "data", coord = "tsne", ... )
plot_gene_set_haystack(x, ...) ## S3 method for class 'matrix' plot_gene_set_haystack(x, dim1 = 1, dim2 = 2, ...) ## S3 method for class 'data.frame' plot_gene_set_haystack(x, dim1 = 1, dim2 = 2, ...) ## S3 method for class 'SingleCellExperiment' plot_gene_set_haystack( x, dim1 = 1, dim2 = 2, assay = "counts", coord = "TSNE", ... ) ## S3 method for class 'Seurat' plot_gene_set_haystack( x, dim1 = 1, dim2 = 2, assay = "RNA", slot = "data", coord = "tsne", ... )
x |
a matrix or other object from which coordinates of cells can be extracted. |
... |
further parameters passed to plot_gene_haystack_raw(). |
dim1 |
column index or name of matrix for x-axis coordinates. |
dim2 |
column index or name of matrix for y-axis coordinates. |
assay |
name of assay data for Seurat method. |
coord |
name of coordinates slot for specific methods. |
slot |
name of slot for assay data for Seurat method. |
Visualizing the detection/expression of a set of genes in a 2D plot
plot_gene_set_haystack_raw( x, y, genes = NA, detection, high.resolution = TRUE, point.size = 1, order.by.signal = FALSE )
plot_gene_set_haystack_raw( x, y, genes = NA, detection, high.resolution = TRUE, point.size = 1, order.by.signal = FALSE )
x |
x-axis coordinates of cells in a 2D representation (e.g. resulting from PCA or t-SNE) |
y |
y-axis coordinates of cells in a 2D representation |
genes |
Gene names that are present in the input expression data, or a numerical indeces. If NA, all genes will be used. |
detection |
a logical matrix showing detection of genes (rows) in cells (columns) |
high.resolution |
logical (default: TRUE). If set to FALSE, the density plot will be of a lower resolution |
point.size |
numerical value to set size of points in plot. Default is 1. |
order.by.signal |
If TRUE, cells with higher signal will be put on the foreground in the plot. Default is FALSE. |
A plot
plot_rand_fit
plot_rand_fit(x, type = c("mean", "sd")) ## S3 method for class 'haystack' plot_rand_fit(x, type = c("mean", "sd"))
plot_rand_fit(x, type = c("mean", "sd")) ## S3 method for class 'haystack' plot_rand_fit(x, type = c("mean", "sd"))
x |
haystack object. |
type |
whether to plot mean or sd. |
Plots the distribution of randomized KLD for each of the genes, together with the mean and standard deviation, the 0.95 quantile and the 0.95 quantile from a normal distribution with mean and standard deviations from the distribution of KLDs. The logCV is indicated in the subtitle of each plot.
plot_rand_KLD(x, n = 12, log = TRUE, tail = FALSE)
plot_rand_KLD(x, n = 12, log = TRUE, tail = FALSE)
x |
haystack result. |
n |
number of genes from randomization set to plot. |
log |
whether to use log of KLD. |
tail |
whether the genes are chosen from the tail of randomized genes. |
Function to read haystack results from file.
read_haystack(file)
read_haystack(file)
file |
A file containing 'haystack' results to read |
An object of class "haystack"
Shows the results of the 'haystack' analysis in various ways, sorted by significance. Priority of params is genes > p.value.threshold > n.
show_result_haystack( res.haystack, n = NULL, p.value.threshold = NULL, gene = NULL ) ## S3 method for class 'haystack' show_result_haystack( res.haystack, n = NULL, p.value.threshold = NULL, gene = NULL )
show_result_haystack( res.haystack, n = NULL, p.value.threshold = NULL, gene = NULL ) ## S3 method for class 'haystack' show_result_haystack( res.haystack, n = NULL, p.value.threshold = NULL, gene = NULL )
res.haystack |
A 'haystack' result object. |
n |
If defined, the top "n" significant genes will be returned. Default: NA, which shows all results. |
p.value.threshold |
If defined, genes passing this p-value threshold will be returned. |
gene |
If defined, the results of this (these) gene(s) will be returned. |
The output is a data.frame with the following columns: * D_KL the calculated KL divergence. * log.p.vals log10 p.values calculated from randomization. * log.p.adj log10 p.values adjusted by Bonferroni correction.
A data.frame with 'haystack' results sorted by log.p.vals.
# using the toy example of the singleCellHaystack package # running haystack res <- haystack(dat.tsne, dat.expression) # below are variations for showing the results in a table # 1. list top 10 biased genes show_result_haystack(res.haystack = res, n =10) # 2. list genes with p value below a certain threshold show_result_haystack(res.haystack = res, p.value.threshold=1e-10) # 3. list a set of specified genes set <- c("gene_497","gene_386", "gene_275") show_result_haystack(res.haystack = res, gene = set)
# using the toy example of the singleCellHaystack package # running haystack res <- haystack(dat.tsne, dat.expression) # below are variations for showing the results in a table # 1. list top 10 biased genes show_result_haystack(res.haystack = res, n =10) # 2. list genes with p value below a certain threshold show_result_haystack(res.haystack = res, p.value.threshold=1e-10) # 3. list a set of specified genes set <- c("gene_497","gene_386", "gene_275") show_result_haystack(res.haystack = res, gene = set)
Function to write haystack result data to file.
write_haystack(res.haystack, file)
write_haystack(res.haystack, file)
res.haystack |
A 'haystack' result variable |
file |
A file to write to |