Package 'singleCellHaystack' reference manual

Title:	A Universal Differential Expression Prediction Tool for Single-Cell and Spatial Genomics Data
Description:	One key exploratory analysis step in single-cell genomics data analysis is the prediction of features with different activity levels. For example, we want to predict differentially expressed genes (DEGs) in single-cell RNA-seq data, spatial DEGs in spatial transcriptomics data, or differentially accessible regions (DARs) in single-cell ATAC-seq data. 'singleCellHaystack' predicts differentially active features in single cell omics datasets without relying on the clustering of cells into arbitrary clusters. 'singleCellHaystack' uses Kullback-Leibler divergence to find features (e.g., genes, genomic regions, etc) that are active in subsets of cells that are non-randomly positioned inside an input space (such as 1D trajectories, 2D tissue sections, multi-dimensional embeddings, etc). For the theoretical background of 'singleCellHaystack' we refer to our original paper Vandenbon and Diez (Nature Communications, 2020) <doi:10.1038/s41467-020-17900-3> and our update Vandenbon and Diez (Scientific Reports, 2023) <doi:10.1038/s41598-023-38965-2>.
Authors:	Alexis Vandenbon [aut, cre] , Diego Diez [aut]
Maintainer:	Alexis Vandenbon <[email protected]>
License:	MIT + file LICENSE
Version:	1.0.2
Built:	2025-03-06 03:34:21 UTC
Source:	https://github.com/alexisvdb/singlecellhaystack

Single cell tSNE coordingates.

Description

Single cell tSNE coordingates.

Default function given by function bandwidth.nrd in MASS. No changes were made to this function.

Description

Default function given by function bandwidth.nrd in MASS. No changes were made to this function.

Usage

default_bandwidth.nrd(x)
default_bandwidth.nrd(x)

Arguments

`x`	A numeric vector

Value

A suitable bandwith.

Returns a row of a sparse matrix of class dgRMatrix. Function made by Ben Bolker and Ott Toomet (see https://stackoverflow.com/questions/47997184/)

Description

Returns a row of a sparse matrix of class dgRMatrix. Function made by Ben Bolker and Ott Toomet (see https://stackoverflow.com/questions/47997184/)

Usage

extract_row_dgRMatrix(m, i = 1)
extract_row_dgRMatrix(m, i = 1)

Arguments

`m`	a sparse matrix of class dgRMatrix
`i`	the index of the row to return

Value

A row (numerical vector) of the sparse matrix

Returns a row of a sparse matrix of class lgRMatrix. Function made by Ben Bolker and Ott Toomet (see https://stackoverflow.com/questions/47997184/)

Description

Returns a row of a sparse matrix of class lgRMatrix. Function made by Ben Bolker and Ott Toomet (see https://stackoverflow.com/questions/47997184/)

Usage

extract_row_lgRMatrix(m, i = 1)
extract_row_lgRMatrix(m, i = 1)

Arguments

`m`	a sparse matrix of class lgRMatrix
`i`	the index of the row to return

Value

A row (logical vector) of the sparse matrix

Calculates the Kullback-Leibler divergence between distributions.

Description

Calculates the Kullback-Leibler divergence between distributions.

Usage

get_D_KL(classes, parameters, reference.prob, pseudo)
get_D_KL(classes, parameters, reference.prob, pseudo)

Arguments

`classes`	A logical vector. Values are T is the gene is expressed in a cell, F is not.
`parameters`	Parameters of the analysis, as set by function 'get_parameters_haystack'
`reference.prob`	A reference distribution to calculate the divergence against.
`pseudo`	A pseudocount, used to avoid log(0) problems.

Value

A numerical value, the Kullback-Leibler divergence

Calculates the Kullback-Leibler divergence between distributions for the high-dimensional continuous version of haystack.

Description

Calculates the Kullback-Leibler divergence between distributions for the high-dimensional continuous version of haystack.

Usage

get_D_KL_continuous_highD(
  weights,
  density.contributions,
  reference.prob,
  pseudo = 0
)
get_D_KL_continuous_highD(
  weights,
  density.contributions,
  reference.prob,
  pseudo = 0
)

Arguments

`weights`	A numerical vector with expression values of a gene.
`density.contributions`	A matrix of density contributions of each cell (rows) to each center point (columns).
`reference.prob`	A reference distribution to calculate the divergence against.
`pseudo`	A pseudocount, used to avoid log(0) problems.

Value

A numerical value, the Kullback-Leibler divergence

Calculates the Kullback-Leibler divergence between distributions for the high-dimensional version of haystack().

Description

Calculates the Kullback-Leibler divergence between distributions for the high-dimensional version of haystack().

Usage

get_D_KL_highD(classes, density.contributions, reference.prob, pseudo = 0)
get_D_KL_highD(classes, density.contributions, reference.prob, pseudo = 0)

Arguments

`classes`	A logical vector. Values are T is the gene is expressed in a cell, F is not.
`density.contributions`	A matrix of density contributions of each cell (rows) to each center point (columns).
`reference.prob`	A reference distribution to calculate the divergence against.
`pseudo`	A pseudocount, used to avoid log(0) problems.

Value

A numerical value, the Kullback-Leibler divergence

Function to get the density of points with value TRUE in the (x,y) plot

Description

Function to get the density of points with value TRUE in the (x,y) plot

Usage

get_density(
  x,
  y,
  detection,
  rows.subset = 1:nrow(detection),
  high.resolution = FALSE
)
get_density(
  x,
  y,
  detection,
  rows.subset = 1:nrow(detection),
  high.resolution = FALSE
)

Arguments

`x`	x-axis coordinates of cells in a 2D representation (e.g. resulting from PCA or t-SNE)
`y`	y-axis coordinates of cells in a 2D representation
`detection`	A logical matrix or dgRMatrix showing which gens (rows) are detected in which cells (columns)
`rows.subset`	Indices of the rows of 'detection' for which to get the densities. Default: all.
`high.resolution`	Logical: should high resolution be used? Default is FALSE.

Value

A 3-dimensional array (dim 1: genes/rows of expression, dim 2 and 3: x and y grid points) with density data

Calculate the pairwise Euclidean distances between the rows of 2 matrices.

Description

Calculate the pairwise Euclidean distances between the rows of 2 matrices.

Usage

get_dist_two_sets(set1, set2)
get_dist_two_sets(set1, set2)

Arguments

`set1`	A numerical matrix.
`set2`	A numerical matrix.

Value

A matrix of pairwise distances between the rows of 2 matrices.

Calculate the Euclidean distance between x and y.

Description

Calculate the Euclidean distance between x and y.

Usage

get_euclidean_distance(x, y)
get_euclidean_distance(x, y)

Arguments

`x`	A numerical vector.
`y`	A numerical vector.

Value

A numerical value, the Euclidean distance.

A function to decide grid points in a higher-dimensional space

Description

A function to decide grid points in a higher-dimensional space

Usage

get_grid_points(input, method = "centroid", grid.points = 100)
get_grid_points(input, method = "centroid", grid.points = 100)

Arguments

`input`	A numerical matrix with higher-dimensional coordinates (columns) of points (rows)
`method`	The method to decide grid points. Should be "centroid" (default) or "seeding".
`grid.points`	The number of grid points to return. Default is 100.

Value

Coordinates of grid points in the higher-dimensonal space.

Estimates the significance of the observed Kullback-Leibler divergence by comparing to randomizations.

Description

Estimates the significance of the observed Kullback-Leibler divergence by comparing to randomizations.

Usage

get_log_p_D_KL(T.counts, D_KL.observed, D_KL.randomized, output.dir = NULL)
get_log_p_D_KL(T.counts, D_KL.observed, D_KL.randomized, output.dir = NULL)

Arguments

`T.counts`	The number of cells in which a gene is detected.
`D_KL.observed`	A vector of observed Kullback-Leibler divergences.
`D_KL.randomized`	A matrix of Kullback-Leibler divergences of randomized datasets.
`output.dir`	Optional parameter. Default is NULL. If not NULL, some files will be written to this directory.

Value

A vector of log10 p values, not corrected for multiple testing using the Bonferroni correction.

Estimates the significance of the observed Kullback-Leibler divergence by comparing to randomizations for the continuous version of haystack.

Description

Estimates the significance of the observed Kullback-Leibler divergence by comparing to randomizations for the continuous version of haystack.

Usage

get_log_p_D_KL_continuous(
  D_KL.observed,
  D_KL.randomized,
  all.coeffVar,
  train.coeffVar,
  output.dir = NULL,
  spline.method = "ns"
)
get_log_p_D_KL_continuous(
  D_KL.observed,
  D_KL.randomized,
  all.coeffVar,
  train.coeffVar,
  output.dir = NULL,
  spline.method = "ns"
)

Arguments

`D_KL.observed`	A vector of observed Kullback-Leibler divergences.
`D_KL.randomized`	A matrix of Kullback-Leibler divergences of randomized datasets.
`all.coeffVar`	Coefficients of variation of all genes. Used for fitting the Kullback-Leibler divergences.
`train.coeffVar`	Coefficients of variation of genes that will be used for fitting the Kullback-Leibler divergences.
`output.dir`	Optional parameter. Default is NULL. If not NULL, some files will be written to this directory.
`spline.method`	Method to use for fitting splines "ns" (default): natural splines, "bs": B-splines.

Value

A vector of log10 p values, not corrected for multiple testing using the Bonferroni correction.

Function that decides most of the parameters that will be used during the "Haystack" analysis.

Description

Function that decides most of the parameters that will be used during the "Haystack" analysis.

Usage

get_parameters_haystack(x, y, high.resolution = FALSE)
get_parameters_haystack(x, y, high.resolution = FALSE)

Arguments

`x`	x-axis coordinates of cells in a 2D representation (e.g. resulting from PCA or t-SNE)
`y`	y-axis coordinates of cells in a 2D representation
`high.resolution`	Logical: should high resolution be used? Default is FALSE.

Value

A list containing various parameters to use in the analysis.

Get reference distribution

Description

Get reference distribution

Usage

get_reference(param, use.advanced.sampling = NULL)
get_reference(param, use.advanced.sampling = NULL)

Arguments

`param`	Parameters of the analysis, as set by function 'get_parameters_haystack'
`use.advanced.sampling`	If NULL naive sampling is used. If a vector is given (of length = no. of cells) sampling is done according to the values in the vector.

Value

A list with two components, Q for the reference distribution and pseudo.

The main Haystack function, for 2-dimensional spaces.

Description

The main Haystack function, for 2-dimensional spaces.

Usage

haystack_2D(
  x,
  y,
  detection,
  use.advanced.sampling = NULL,
  dir.randomization = NULL
)
haystack_2D(
  x,
  y,
  detection,
  use.advanced.sampling = NULL,
  dir.randomization = NULL
)

Arguments

`x`	x-axis coordinates of cells in a 2D representation (e.g. resulting from PCA or t-SNE)
`y`	y-axis coordinates of cells in a 2D representation
`detection`	A logical matrix showing which genes (rows) are detected in which cells (columns)
`use.advanced.sampling`	If NULL naive sampling is used. If a vector is given (of length = no. of cells) sampling is done according to the values in the vector.
`dir.randomization`	If NULL, no output is made about the random sampling step. If not NULL, files related to the randomizations are printed to this directory.

Value

An object of class "haystack"

The main Haystack function, for higher-dimensional spaces and continuous expression levels.

Description

The main Haystack function, for higher-dimensional spaces and continuous expression levels.

Usage

haystack_continuous_highD(
  x,
  expression,
  grid.points = 100,
  weights.advanced.Q = NULL,
  dir.randomization = NULL,
  scale = TRUE,
  grid.method = "centroid",
  randomization.count = 100,
  n.genes.to.randomize = 100,
  selection.method.genes.to.randomize = "heavytails",
  grid.coord = NULL,
  spline.method = "ns"
)
haystack_continuous_highD(
  x,
  expression,
  grid.points = 100,
  weights.advanced.Q = NULL,
  dir.randomization = NULL,
  scale = TRUE,
  grid.method = "centroid",
  randomization.count = 100,
  n.genes.to.randomize = 100,
  selection.method.genes.to.randomize = "heavytails",
  grid.coord = NULL,
  spline.method = "ns"
)

Arguments

`x`	Coordinates of cells in a 2D or higher-dimensional space. Rows represent cells, columns the dimensions of the space.
`expression`	a matrix with expression data of genes (rows) in cells (columns)
`grid.points`	An integer specifying the number of centers (grid points) to be used for estimating the density distributions of cells. Default is set to 100.
`weights.advanced.Q`	(Default: NULL) Optional weights of cells for calculating a weighted distribution of expression.
`dir.randomization`	If NULL, no output is made about the random sampling step. If not NULL, files related to the randomizations are printed to this directory.
`scale`	Logical (default=TRUE) indicating whether input coordinates in x should be scaled to mean 0 and standard deviation 1.
`grid.method`	The method to decide grid points for estimating the density in the high-dimensional space. Should be "centroid" (default) or "seeding".
`randomization.count`	Number of randomizations to use. Default: 100
`n.genes.to.randomize`	Number of genes to use in randomizations. Default: 100
`selection.method.genes.to.randomize`	Method used to select genes for randomization.
`grid.coord`	matrix of grid coordinates.
`spline.method`	Method to use for fitting splines "ns" (default): natural splines, "bs": B-splines.

Value

An object of class "haystack", including the results of the analysis, and the coordinates of the grid points used to estimate densities.

Examples

# using the toy example of the singleCellHaystack package

# running haystack
res <- haystack(dat.tsne, dat.expression)
# list top 10 biased genes
show_result_haystack(res, n=10)
# using the toy example of the singleCellHaystack package

# running haystack
res <- haystack(dat.tsne, dat.expression)
# list top 10 biased genes
show_result_haystack(res, n=10)

The main Haystack function, for higher-dimensional spaces.

Description

The main Haystack function, for higher-dimensional spaces.

Usage

haystack_highD(
  x,
  detection,
  grid.points = 100,
  use.advanced.sampling = NULL,
  dir.randomization = NULL,
  scale = TRUE,
  grid.method = "centroid"
)
haystack_highD(
  x,
  detection,
  grid.points = 100,
  use.advanced.sampling = NULL,
  dir.randomization = NULL,
  scale = TRUE,
  grid.method = "centroid"
)

Arguments

`x`	Coordinates of cells in a 2D or higher-dimensional space. Rows represent cells, columns the dimensions of the space.
`detection`	A logical matrix showing which genes (rows) are detected in which cells (columns)
`grid.points`	An integer specifying the number of centers (grid points) to be used for estimating the density distributions of cells. Default is set to 100.
`use.advanced.sampling`	If NULL naive sampling is used. If a vector is given (of length = no. of cells) sampling is done according to the values in the vector.
`dir.randomization`	If NULL, no output is made about the random sampling step. If not NULL, files related to the randomizations are printed to this directory.
`scale`	Logical (default=TRUE) indicating whether input coordinates in x should be scaled to mean 0 and standard deviation 1.
`grid.method`	The method to decide grid points for estimating the density in the high-dimensional space. Should be "centroid" (default) or "seeding".

Value

An object of class "haystack", including the results of the analysis, and the coordinates of the grid points used to estimate densities.

Examples

# I need to add some examples.
# A toy example will be added too.
# I need to add some examples.
# A toy example will be added too.

Function for hierarchical clustering of genes according to their expression distribution in 2D or multi-dimensional space

Description

Function for hierarchical clustering of genes according to their expression distribution in 2D or multi-dimensional space

Usage

hclust_haystack(
  x,
  expression,
  grid.coordinates,
  hclust.method = "ward.D",
  cor.method = "spearman",
  ...
)

## S3 method for class 'matrix'
hclust_haystack(
  x,
  expression,
  grid.coordinates,
  hclust.method = "ward.D",
  cor.method = "spearman",
  ...
)

## S3 method for class 'data.frame'
hclust_haystack(
  x,
  expression,
  grid.coordinates,
  hclust.method = "ward.D",
  cor.method = "spearman",
  ...
)
hclust_haystack(
  x,
  expression,
  grid.coordinates,
  hclust.method = "ward.D",
  cor.method = "spearman",
  ...
)

## S3 method for class 'matrix'
hclust_haystack(
  x,
  expression,
  grid.coordinates,
  hclust.method = "ward.D",
  cor.method = "spearman",
  ...
)

## S3 method for class 'data.frame'
hclust_haystack(
  x,
  expression,
  grid.coordinates,
  hclust.method = "ward.D",
  cor.method = "spearman",
  ...
)

Arguments

`x`	a matrix or other object from which coordinates of cells can be extracted.
`expression`	expression matrix.
`grid.coordinates`	coordinates of the grid points.
`hclust.method`	method used with hclust.
`cor.method`	method used with cor.
`...`	further parameters passed down to methods.

Function for hierarchical clustering of genes according to their distribution in a higher-dimensional space.

Description

Function for hierarchical clustering of genes according to their distribution in a higher-dimensional space.

Usage

hclust_haystack_highD(
  x,
  detection,
  genes,
  method = "ward.D",
  grid.coordinates = NULL,
  scale = TRUE
)
hclust_haystack_highD(
  x,
  detection,
  genes,
  method = "ward.D",
  grid.coordinates = NULL,
  scale = TRUE
)

Arguments

`x`	Coordinates of cells in a 2D or higher-dimensional space. Rows represent cells, columns the dimensions of the space.
`detection`	A logical matrix showing which genes (rows) are detected in which cells (columns)
`genes`	A set of genes (of the 'detection' data) which will be clustered.
`method`	The method to use for hierarchical clustering. See '?hclust' for more information. Default: "ward.D".
`grid.coordinates`	Coordinates of grid points in the same space as 'x', to be used to estimate densities for clustering.
`scale`	whether to scale data.

Value

An object of class hclust, describing a hierarchical clustering tree.

Examples

# to be added
# to be added

Function for hierarchical clustering of genes according to their distribution on a 2D plot.

Description

Function for hierarchical clustering of genes according to their distribution on a 2D plot.

Usage

hclust_haystack_raw(x, y, detection, genes, method = "ward.D")
hclust_haystack_raw(x, y, detection, genes, method = "ward.D")

Arguments

`x`	x-axis coordinates of cells in a 2D representation (e.g. resulting from PCA or t-SNE)
`y`	y-axis coordinates of cells in a 2D representation
`detection`	A logical matrix showing which genes (rows) are detected in which cells (columns)
`genes`	A set of genes (of the 'detection' data) which will be clustered.
`method`	The method to use for hierarchical clustering. See '?hclust' for more information. Default: "ward.D".

Value

An object of class hclust, describing a hierarchical clustering tree.

Based on the MASS kde2d() function, but heavily simplified; it's just tcrossprod() now.

Description

Based on the MASS kde2d() function, but heavily simplified; it's just tcrossprod() now.

Usage

kde2d_faster(dens.x, dens.y)
kde2d_faster(dens.x, dens.y)

Arguments

`dens.x`	Contribution of all cells to densities of the x-axis grid points.
`dens.y`	Contribution of all cells to densities of the y-axis grid points.

Function for k-means clustering of genes according to their expression distribution in 2D or multi-dimensional space

Description

Function for k-means clustering of genes according to their expression distribution in 2D or multi-dimensional space

Usage

kmeans_haystack(x, expression, grid.coordinates, k, ...)

## S3 method for class 'matrix'
kmeans_haystack(x, expression, grid.coordinates, k, ...)

## S3 method for class 'data.frame'
kmeans_haystack(x, expression, grid.coordinates, k, ...)
kmeans_haystack(x, expression, grid.coordinates, k, ...)

## S3 method for class 'matrix'
kmeans_haystack(x, expression, grid.coordinates, k, ...)

## S3 method for class 'data.frame'
kmeans_haystack(x, expression, grid.coordinates, k, ...)

Arguments

`x`	a matrix or other object from which coordinates of cells can be extracted.
`expression`	expression matrix.
`grid.coordinates`	coordinates of the grid points.
`k`	number of clusters.
`...`	further parameters passed down to methods.

Function for k-means clustering of genes according to their distribution in a higher-dimensional space.

Description

Function for k-means clustering of genes according to their distribution in a higher-dimensional space.

Usage

kmeans_haystack_highD(
  x,
  detection,
  genes,
  grid.coordinates = NULL,
  k,
  scale = TRUE,
  ...
)
kmeans_haystack_highD(
  x,
  detection,
  genes,
  grid.coordinates = NULL,
  k,
  scale = TRUE,
  ...
)

Arguments

`x`	Coordinates of cells in a 2D or higher-dimensional space. Rows represent cells, columns the dimensions of the space.
`detection`	A logical matrix showing which genes (rows) are detected in which cells (columns)
`genes`	A set of genes (of the 'detection' data) which will be clustered.
`grid.coordinates`	Coordinates of grid points in the same space as 'x', to be used to estimate densities for clustering.
`k`	The number of clusters to return.
`scale`	whether to scale data.
`...`	Additional parameters which will be passed on to the kmeans function.

Value

An object of class kmeans, describing a clustering into 'k' clusters

Examples

# to be added
# to be added

Function for k-means clustering of genes according to their distribution on a 2D plot.

Description

Function for k-means clustering of genes according to their distribution on a 2D plot.

Usage

kmeans_haystack_raw(x, y, detection, genes, k, ...)
kmeans_haystack_raw(x, y, detection, genes, k, ...)

Arguments

`x`	x-axis coordinates of cells in a 2D representation (e.g. resulting from PCA or t-SNE)
`y`	y-axis coordinates of cells in a 2D representation
`detection`	A logical matrix showing which genes (rows) are detected in which cells (columns)
`genes`	A set of genes (of the 'detection' data) which will be clustered.
`k`	The number of clusters to return.
`...`	Additional parameters which will be passed on to the kmeans function.

Value

An object of class kmeans, describing a clustering into 'k' clusters

plot_compare_ranks

Description

plot_compare_ranks

Usage

plot_compare_ranks(res1, res2, sort_by = "log.p.vals")
plot_compare_ranks(res1, res2, sort_by = "log.p.vals")

Arguments

`res1`	haystack result.
`res2`	haystack result.
`sort_by`	column to sort results (default: log.p.vals).

Visualizing the detection/expression of a gene in a 2D plot

Description

Visualizing the detection/expression of a gene in a 2D plot

Usage

plot_gene_haystack(x, ...)

## S3 method for class 'matrix'
plot_gene_haystack(x, dim1 = 1, dim2 = 2, ...)

## S3 method for class 'data.frame'
plot_gene_haystack(x, dim1 = 1, dim2 = 2, ...)

## S3 method for class 'SingleCellExperiment'
plot_gene_haystack(
  x,
  dim1 = 1,
  dim2 = 2,
  assay = "counts",
  coord = "TSNE",
  ...
)

## S3 method for class 'Seurat'
plot_gene_haystack(
  x,
  dim1 = 1,
  dim2 = 2,
  assay = "RNA",
  slot = "data",
  coord = "tsne",
  ...
)
plot_gene_haystack(x, ...)

## S3 method for class 'matrix'
plot_gene_haystack(x, dim1 = 1, dim2 = 2, ...)

## S3 method for class 'data.frame'
plot_gene_haystack(x, dim1 = 1, dim2 = 2, ...)

## S3 method for class 'SingleCellExperiment'
plot_gene_haystack(
  x,
  dim1 = 1,
  dim2 = 2,
  assay = "counts",
  coord = "TSNE",
  ...
)

## S3 method for class 'Seurat'
plot_gene_haystack(
  x,
  dim1 = 1,
  dim2 = 2,
  assay = "RNA",
  slot = "data",
  coord = "tsne",
  ...
)

Arguments

`x`	a matrix or other object from which coordinates of cells can be extracted.
`...`	further parameters passed to plot_gene_haystack_raw().
`dim1`	column index or name of matrix for x-axis coordinates.
`dim2`	column index or name of matrix for y-axis coordinates.
`assay`	name of assay data for Seurat method.
`coord`	name of coordinates slot for specific methods.
`slot`	name of slot for assay data for Seurat method.

Visualizing the detection/expression of a gene in a 2D plot

Description

Visualizing the detection/expression of a gene in a 2D plot

Usage

plot_gene_haystack_raw(
  x,
  y,
  gene,
  expression,
  detection = NULL,
  high.resolution = FALSE,
  point.size = 1,
  order.by.signal = FALSE
)
plot_gene_haystack_raw(
  x,
  y,
  gene,
  expression,
  detection = NULL,
  high.resolution = FALSE,
  point.size = 1,
  order.by.signal = FALSE
)

Arguments

`x`	x-axis coordinates of cells in a 2D representation (e.g. resulting from PCA or t-SNE)
`y`	y-axis coordinates of cells in a 2D representation
`gene`	name of a gene that is present in the input expression data, or a numerical index
`expression`	a logical/numerical matrix showing detection/expression of genes (rows) in cells (columns)
`detection`	an optional logical matrix showing detection of genes (rows) in cells (columns). If left as NULL, the density distribution of the gene is not plotted.
`high.resolution`	logical (default: FALSE). If set to TRUE, the density plot will be of a higher resolution
`point.size`	numerical value to set size of points in plot. Default is 1.
`order.by.signal`	If TRUE, cells with higher signal will be put on the foreground in the plot. Default is FALSE.

Value

A plot

Visualizing the detection/expression of a set of genes in a 2D plot

Description

Visualizing the detection/expression of a set of genes in a 2D plot

Usage

plot_gene_set_haystack(x, ...)

## S3 method for class 'matrix'
plot_gene_set_haystack(x, dim1 = 1, dim2 = 2, ...)

## S3 method for class 'data.frame'
plot_gene_set_haystack(x, dim1 = 1, dim2 = 2, ...)

## S3 method for class 'SingleCellExperiment'
plot_gene_set_haystack(
  x,
  dim1 = 1,
  dim2 = 2,
  assay = "counts",
  coord = "TSNE",
  ...
)

## S3 method for class 'Seurat'
plot_gene_set_haystack(
  x,
  dim1 = 1,
  dim2 = 2,
  assay = "RNA",
  slot = "data",
  coord = "tsne",
  ...
)
plot_gene_set_haystack(x, ...)

## S3 method for class 'matrix'
plot_gene_set_haystack(x, dim1 = 1, dim2 = 2, ...)

## S3 method for class 'data.frame'
plot_gene_set_haystack(x, dim1 = 1, dim2 = 2, ...)

## S3 method for class 'SingleCellExperiment'
plot_gene_set_haystack(
  x,
  dim1 = 1,
  dim2 = 2,
  assay = "counts",
  coord = "TSNE",
  ...
)

## S3 method for class 'Seurat'
plot_gene_set_haystack(
  x,
  dim1 = 1,
  dim2 = 2,
  assay = "RNA",
  slot = "data",
  coord = "tsne",
  ...
)

Arguments

`x`	a matrix or other object from which coordinates of cells can be extracted.
`...`	further parameters passed to plot_gene_haystack_raw().
`dim1`	column index or name of matrix for x-axis coordinates.
`dim2`	column index or name of matrix for y-axis coordinates.
`assay`	name of assay data for Seurat method.
`coord`	name of coordinates slot for specific methods.
`slot`	name of slot for assay data for Seurat method.

Visualizing the detection/expression of a set of genes in a 2D plot

Description

Visualizing the detection/expression of a set of genes in a 2D plot

Usage

plot_gene_set_haystack_raw(
  x,
  y,
  genes = NA,
  detection,
  high.resolution = TRUE,
  point.size = 1,
  order.by.signal = FALSE
)
plot_gene_set_haystack_raw(
  x,
  y,
  genes = NA,
  detection,
  high.resolution = TRUE,
  point.size = 1,
  order.by.signal = FALSE
)

Arguments

`x`	x-axis coordinates of cells in a 2D representation (e.g. resulting from PCA or t-SNE)
`y`	y-axis coordinates of cells in a 2D representation
`genes`	Gene names that are present in the input expression data, or a numerical indeces. If NA, all genes will be used.
`detection`	a logical matrix showing detection of genes (rows) in cells (columns)
`high.resolution`	logical (default: TRUE). If set to FALSE, the density plot will be of a lower resolution
`point.size`	numerical value to set size of points in plot. Default is 1.
`order.by.signal`	If TRUE, cells with higher signal will be put on the foreground in the plot. Default is FALSE.

Value

A plot

plot_rand_fit

Description

plot_rand_fit

Usage

plot_rand_fit(x, type = c("mean", "sd"))

## S3 method for class 'haystack'
plot_rand_fit(x, type = c("mean", "sd"))
plot_rand_fit(x, type = c("mean", "sd"))

## S3 method for class 'haystack'
plot_rand_fit(x, type = c("mean", "sd"))

Arguments

`x`	haystack object.
`type`	whether to plot mean or sd.

Plots the distribution of randomized KLD for each of the genes, together with the mean and standard deviation, the 0.95 quantile and the 0.95 quantile from a normal distribution with mean and standard deviations from the distribution of KLDs. The logCV is indicated in the subtitle of each plot.

Usage

plot_rand_KLD(x, n = 12, log = TRUE, tail = FALSE)
plot_rand_KLD(x, n = 12, log = TRUE, tail = FALSE)

Arguments

`x`	haystack result.
`n`	number of genes from randomization set to plot.
`log`	whether to use log of KLD.
`tail`	whether the genes are chosen from the tail of randomized genes.

Function to read haystack results from file.

Description

Function to read haystack results from file.

Usage

read_haystack(file)
read_haystack(file)

Arguments

file

A file containing 'haystack' results to read

Value

An object of class "haystack"

show_result_haystack

Description

Shows the results of the 'haystack' analysis in various ways, sorted by significance. Priority of params is genes > p.value.threshold > n.

Usage

show_result_haystack(
  res.haystack,
  n = NULL,
  p.value.threshold = NULL,
  gene = NULL
)

## S3 method for class 'haystack'
show_result_haystack(
  res.haystack,
  n = NULL,
  p.value.threshold = NULL,
  gene = NULL
)
show_result_haystack(
  res.haystack,
  n = NULL,
  p.value.threshold = NULL,
  gene = NULL
)

## S3 method for class 'haystack'
show_result_haystack(
  res.haystack,
  n = NULL,
  p.value.threshold = NULL,
  gene = NULL
)

Arguments

`res.haystack`	A 'haystack' result object.
`n`	If defined, the top "n" significant genes will be returned. Default: NA, which shows all results.
`p.value.threshold`	If defined, genes passing this p-value threshold will be returned.
`gene`	If defined, the results of this (these) gene(s) will be returned.

Details

The output is a data.frame with the following columns: * D_KL the calculated KL divergence. * log.p.vals log10 p.values calculated from randomization. * log.p.adj log10 p.values adjusted by Bonferroni correction.

Value

A data.frame with 'haystack' results sorted by log.p.vals.

Examples

# using the toy example of the singleCellHaystack package

# running haystack
res <- haystack(dat.tsne, dat.expression)

# below are variations for showing the results in a table
# 1. list top 10 biased genes
show_result_haystack(res.haystack = res, n =10)
# 2. list genes with p value below a certain threshold
show_result_haystack(res.haystack = res, p.value.threshold=1e-10)
# 3. list a set of specified genes
set <- c("gene_497","gene_386", "gene_275")
show_result_haystack(res.haystack = res, gene = set)
# using the toy example of the singleCellHaystack package

# running haystack
res <- haystack(dat.tsne, dat.expression)

# below are variations for showing the results in a table
# 1. list top 10 biased genes
show_result_haystack(res.haystack = res, n =10)
# 2. list genes with p value below a certain threshold
show_result_haystack(res.haystack = res, p.value.threshold=1e-10)
# 3. list a set of specified genes
set <- c("gene_497","gene_386", "gene_275")
show_result_haystack(res.haystack = res, gene = set)

Function to write haystack result data to file.

Description

Function to write haystack result data to file.

Usage

write_haystack(res.haystack, file)
write_haystack(res.haystack, file)

Arguments

`res.haystack`	A 'haystack' result variable
`file`	A file to write to

Package 'singleCellHaystack'

Help Index

Single cell RNA-seq dataset.

Description

Single cell tSNE coordingates.

Description

Default function given by function bandwidth.nrd in MASS. No changes were made to this function.

Description

Usage

Arguments

Value

Returns a row of a sparse matrix of class dgRMatrix. Function made by Ben Bolker and Ott Toomet (see https://stackoverflow.com/questions/47997184/)

Description

Usage

Arguments

Value

Returns a row of a sparse matrix of class lgRMatrix. Function made by Ben Bolker and Ott Toomet (see https://stackoverflow.com/questions/47997184/)

Description

Usage

Arguments

Value

Calculates the Kullback-Leibler divergence between distributions.

Description

Usage

Arguments

Value

Calculates the Kullback-Leibler divergence between distributions for the high-dimensional continuous version of haystack.

Description

Usage

Arguments

Value

Calculates the Kullback-Leibler divergence between distributions for the high-dimensional version of haystack().

Description

Usage

Arguments

Value

Function to get the density of points with value TRUE in the (x,y) plot

Description

Usage

Arguments

Value

Calculate the pairwise Euclidean distances between the rows of 2 matrices.

Description

Usage

Arguments

Value

Calculate the Euclidean distance between x and y.

Description

Usage

Arguments

Value

A function to decide grid points in a higher-dimensional space

Description

Usage

Arguments

Value

Estimates the significance of the observed Kullback-Leibler divergence by comparing to randomizations.

Description

Usage

Arguments

Value

Estimates the significance of the observed Kullback-Leibler divergence by comparing to randomizations for the continuous version of haystack.

Description

Usage

Arguments

Value

Function that decides most of the parameters that will be used during the "Haystack" analysis.

Description

Usage

Arguments

Value

Get reference distribution

Description

Usage

Arguments

Value

The main Haystack function

Description

Usage

Arguments