Preprocessing
- cellarium.ml.preprocessing.get_highly_variable_genes(gene_names: list, mean: Tensor, var: Tensor, n_top_genes: int | None = None, min_disp: float | None = 0.5, max_disp: float | None = inf, min_mean: float | None = 0.0125, max_mean: float | None = 3, n_bins: int = 20, batch_mean_bg: Tensor | None = None, batch_var_bg: Tensor | None = None, batch_ids: list[str] | None = None) DataFrame[source]
Annotate highly variable genes using the
seuratflavor.Replicates
scanpy.pp.highly_variable_geneswithflavor='seurat'. Optionally accepts per-batch statistics for batch-aware selection.References:
- Parameters:
gene_names (list) – Ensembl gene ids.
mean (Tensor) – Overall gene expression means in count space (shape
n_genes).var (Tensor) – Overall gene expression variances in count space (shape
n_genes).n_top_genes (int | None) – Number of highly-variable genes to keep.
min_disp (float | None) – Ignored when
n_top_genesis set.max_disp (float | None) – Ignored when
n_top_genesis set.min_mean (float | None) – Ignored when
n_top_genesis set.max_mean (float | None) – Ignored when
n_top_genesis set.n_bins (int) – Number of bins for mean-expression binning.
batch_mean_bg (Tensor | None) – Per-batch means in count space of shape
(n_batch, n_genes).batch_var_bg (Tensor | None) – Per-batch variances in count space of shape
(n_batch, n_genes).batch_ids (list[str] | None) – Batch labels of length
n_batch.
- Returns:
DataFrame indexed by
gene_nameswith columnshighly_variable,means,dispersions,dispersions_norm,mean_bin(single-batch),highly_variable_nbatchesandhighly_variable_intersection(batch mode).- Return type:
DataFrame