Preprocessing

cellarium.ml.preprocessing.get_highly_variable_genes(gene_names: list, mean: Tensor, var: Tensor, n_top_genes: int | None = None, min_disp: float | None = 0.5, max_disp: float | None = inf, min_mean: float | None = 0.0125, max_mean: float | None = 3, n_bins: int = 20) → DataFrame[source]

Get Highly variably genes. This is a replication of Highly Variable Genes from Scanpy with a Seurat flavor.

References:

Highly Variable Genes from Scanpy.

Parameters:

gene_names (list) – Ensembl gene ids.
mean (Tensor) – Gene expression means.
var (Tensor) – Gene expression vars.
n_top_genes (int | None) – Number of highly-variable genes to keep.
min_disp (float | None) – If n_top_genes unequals None, this and all other cutoffs for the means and the normalized dispersions are ignored.
max_disp (float | None) – If n_top_genes unequals None, this and all other cutoffs for the means and the normalized dispersions are ignored.
min_mean (float | None) – If n_top_genes unequals None, this and all other cutoffs for the means and the normalized dispersions are ignored.
max_mean (float | None) – If n_top_genes unequals None, this and all other cutoffs for the means and the normalized dispersions are ignored.
n_bins (int) – Number of bins for binning the mean gene expression. Normalization is done with respect to each bin. If just a single gene falls into a bin, the normalized dispersion is artificially set to 1. You’ll be informed about this

Return type:

DataFrame