Preprocessing

cellarium.ml.preprocessing.get_highly_variable_genes(gene_names: list, mean: Tensor, var: Tensor, n_top_genes: int | None = None, min_disp: float | None = 0.5, max_disp: float | None = inf, min_mean: float | None = 0.0125, max_mean: float | None = 3, n_bins: int = 20) DataFrame[source]

Get Highly variably genes. This is a replication of Highly Variable Genes from Scanpy with a Seurat flavor.

References:

  1. Highly Variable Genes from Scanpy.

Parameters:
  • gene_names (list) – Ensembl gene ids.

  • mean (Tensor) – Gene expression means.

  • var (Tensor) – Gene expression vars.

  • n_top_genes (int | None) – Number of highly-variable genes to keep.

  • min_disp (float | None) – If n_top_genes unequals None, this and all other cutoffs for the means and the normalized dispersions are ignored.

  • max_disp (float | None) – If n_top_genes unequals None, this and all other cutoffs for the means and the normalized dispersions are ignored.

  • min_mean (float | None) – If n_top_genes unequals None, this and all other cutoffs for the means and the normalized dispersions are ignored.

  • max_mean (float | None) – If n_top_genes unequals None, this and all other cutoffs for the means and the normalized dispersions are ignored.

  • n_bins (int) – Number of bins for binning the mean gene expression. Normalization is done with respect to each bin. If just a single gene falls into a bin, the normalized dispersion is artificially set to 1. You’ll be informed about this

Return type:

DataFrame