Transforms
- class cellarium.ml.transforms.DivideByScale(scale_g: Tensor, var_names_g: ndarray, eps: float = 1e-06)[source]
Bases:
Module
Divide gene counts by a scale.
\[y_{ng} = \frac{x_{ng}}{\mathrm{scale}_g + \mathrm{eps}}\]- Parameters:
scale_g (Tensor) – A scale for each gene.
var_names_g (ndarray) – The variable names schema for the input data validation.
eps (float) – A value added to the denominator for numerical stability.
- forward(x_ng: Tensor, var_names_g: ndarray) dict[str, Tensor] [source]
- Parameters:
x_ng (Tensor) – Gene counts.
var_names_g (ndarray) – The list of the variable names in the input data. If
None
, no validation is performed.
- Returns:
x_ng
: The gene counts divided by the scale.
- Return type:
A dictionary with the following keys
- class cellarium.ml.transforms.Filter(filter_list: Sequence[str])[source]
Bases:
Module
Filter gene counts by a list of features.
\[ \begin{align}\begin{aligned}\mathrm{mask}_g = \mathrm{feature}_g \in \mathrm{filter\_list}\\y_{ng} = x_{ng}[:, \mathrm{mask}_g]\end{aligned}\end{align} \]- Parameters:
filter_list (Sequence[str]) – A list of features to filter by.
- filter(var_names_g: tuple) ndarray[Any, dtype[int64]] [source]
- Parameters:
var_names_g (tuple) – The list of the variable names in the input data.
- Returns:
An array of indices of the features in
var_names_g
that are infilter_list
.- Return type:
ndarray[Any, dtype[int64]]
- forward(x_ng: Tensor, var_names_g: ndarray) dict[str, Tensor | ndarray] [source]
Note
When used with
CellariumModule
orCellariumPipeline
,x_ng
andvar_names_g
keys in the input dictionary will be overwritten with the filtered values.- Parameters:
x_ng (Tensor) – Gene counts.
var_names_g (ndarray) – The list of the variable names in the input data.
- Returns:
x_ng
: Gene counts filtered byfilter_list
.var_names_g
: The list of the variable names in the input data filtered byfilter_list
.
- Return type:
A dictionary with the following keys
- class cellarium.ml.transforms.Log1p(*args, **kwargs)[source]
Bases:
Module
Log1p transform gene counts.
\[y_{ng} = \log(1 + x_{ng})\]- forward(x_ng: Tensor) dict[str, Tensor] [source]
Note
When used with
CellariumModule
orCellariumPipeline
,x_ng
key in the input dictionary will be overwritten with the log1p transformed values.- Parameters:
x_ng (Tensor) – Gene counts.
- Returns:
x_ng
: The log1p transformed gene counts.
- Return type:
A dictionary with the following keys
- class cellarium.ml.transforms.NormalizeTotal(target_count: int = 10000, eps: float = 1e-06)[source]
Bases:
Module
Normalize total gene counts per cell to target count.
\[ \begin{align}\begin{aligned}\mathrm{total\_mrna\_umis}_n = \sum_{g=1}^G x_{ng}\\y_{ng} = \frac{\mathrm{target\_count} \times x_{ng}}{\mathrm{total\_mrna\_umis}_n + \mathrm{eps}}\end{aligned}\end{align} \]- Parameters:
target_count (int) – Target gene epxression count.
eps (float) – A value added to the denominator for numerical stability.
- forward(x_ng: Tensor, total_mrna_umis_n: Tensor | None = None) dict[str, Tensor] [source]
Note
When used with
CellariumModule
orCellariumPipeline
,x_ng
key in the input dictionary will be overwritten with the normalized values.- Parameters:
x_ng (Tensor) – Gene counts.
total_mrna_umis_n (Tensor | None) – Total mRNA UMI counts per cell. If
None
, it is computed fromx_ng
.
- Returns:
x_ng
: The gene counts normalized to target count.
- Return type:
A dictionary with the following keys
- class cellarium.ml.transforms.ZScore(mean_g: Tensor, std_g: Tensor, var_names_g: ndarray, eps: float = 1e-06)[source]
Bases:
Module
ZScore gene counts with mean and standard deviation.
\[y_{ng} = \frac{x_{ng} - \mathrm{mean}_g}{\mathrm{std}_g + \mathrm{eps}}\]- Parameters:
mean_g (Tensor) – Means for each gene.
std_g (Tensor) – Standard deviations for each gene.
var_names_g (ndarray) – The variable names schema for the input data validation.
eps (float) – A value added to the denominator for numerical stability.
- forward(x_ng: Tensor, var_names_g: ndarray) dict[str, Tensor] [source]
Note
When used with
CellariumModule
orCellariumPipeline
,x_ng
key in the input dictionary will be overwritten with the z-scored values.- Parameters:
x_ng (Tensor) – Gene counts.
var_names_g (ndarray) – The list of the variable names in the input data. If
None
, no validation is performed.
- Returns:
x_ng
: The z-scored gene counts.
- Return type:
A dictionary with the following keys