dance.transforms
- class dance.transforms.base.BaseTransform(out=None, log_level='WARNING')[source]
BaseTransform abstract object.
- Parameters:
log_level (
Literal['NOTSET','DEBUG','INFO','WARNING','ERROR']) – Logging level.out (
Optional[str]) – Name of the obsm channel or layer where the transformed features will be saved. Use the current transformation name if it is not set.
- class dance.transforms.AnnDataTransform(func, **kwargs)[source]
AnnData transformation interface object.
This object provides an interface with any function that apply in-place transformation to an AnnData object.
Example
Any one of the
scanpy.ppfunctions should be supported. For example, we can use thescanpy.pp.normalize_total()function on the dance data object as follows>>> AnnDataTransform(scanpy.pp.normalize_total, target_sum=10000)(data)
where
datais a dance data object, e.g.,dance.data.Data. Calling the above function is effectively equivalent to calling>>> scanpy.pp.normalize_total(data.data, target_sum=10000)
- Parameters:
func (Any) –
- class dance.transforms.BatchFeature(*, channel=None, mod=None, **kwargs)[source]
Assign statistical batch features for each cell.
- Parameters:
channel (str | None) –
mod (str | None) –
- class dance.transforms.CellPCA(n_components=400, *, channel=None, mod=None, **kwargs)[source]
- Parameters:
n_components (int) –
channel (str | None) –
mod (str | None) –
- class dance.transforms.CellTopicProfile(*, ct_select='auto', ct_key='cellType', batch_key=None, split_name=None, channel=None, channel_type='X', method='median', **kwargs)[source]
- Parameters:
ct_select (Literal['auto'] | ~typing.List[str]) –
ct_key (str) –
batch_key (str | None) –
split_name (str | None) –
channel (str | None) –
channel_type (str) –
method (Literal['median', 'mean']) –
- class dance.transforms.CellwiseMaskData(distr='exp', mask_rate=0.1, seed=None, min_gene_counts=5, **kwargs)[source]
Randomly mask data in a cell-wise approach.
For every cell that has more than 5 positive counts, mask positive counts according to masking rate and probabiliy generated from distribution.
- Parameters:
distr (
Optional[Literal['exp','uniform']]) – Distribution to generate masks.mask_rate (
Optional[float]) – Masking rate.seed (
Optional[int]) – Random seed.Min_gene_counts – Minimum number of genes expressed within a below which we do not mask that cell.
min_gene_counts (int) –
- class dance.transforms.Compose(*transforms, use_master_log_level=True, **kwargs)[source]
Compose transformation by combining several transfomration objects.
- Parameters:
transforms (
Tuple[BaseTransform,...]) – Transformation objects.use_master_log_level (
bool) – If set toTrue, then reset all transforms’ loggers to use :then reset all transforms’ loggers to uselog_leveloption passed to thisComposeobject.
Notes
The order in which the
transformobject are passed will be exactly the order in which they will be applied to the data object.
- class dance.transforms.FilterCellsScanpy(min_counts=None, min_genes=None, max_counts=None, max_genes=None, split_name=None, channel=None, channel_type='X', **kwargs)[source]
Scanpy filtering cell transformation with additional options.
Allow passing gene counts as ratio
- Parameters:
min_counts (
Optional[int]) – Minimum number of counts required for a cell to be kept.min_genes (
Union[float,int,None]) – Minimum number (or ratio) of genes required for a cell to be kept.max_counts (
Optional[int]) – Maximum number of counts required for a cell to be kept.max_genes (
Union[float,int,None]) – Maximum number (or ratio) of genes required for a cell to be kept.split_name (
Optional[str]) – Which split to be used for filtering.channel (
Optional[str]) – Channel to be used for filtering.channel_type (
Optional[str]) – Channel type to be used for filtering.
- class dance.transforms.FilterGenesCommon(batch_key=None, split_keys=None, **kwargs)[source]
Filter genes by taking the common genes across batches or splits.
- Parameters:
batch_key (
Optional[str]) – Which column in the.obstable to be used to distinguishing batches.split_keys (
Optional[List[str]]) – A list of split names, e.g., ‘train’, to be used to find common gnees.
Note
One and only one of
batch_keyorsplit_keyscan be specified.
- class dance.transforms.FilterGenesMarker(*, ct_profile_channel='CellTopicProfile', subset=True, label=None, threshold=1.25, eps=1e-06, **kwargs)[source]
Select marker genes based on log fold-change.
- Parameters:
ct_profile_channel (
str) – Name of the.varmchannel that contains the cell-topic profile which will be used to compute the log fold-changes for each cell-topic (e.g., cell type).subset (
bool) – If set toTrue, then inplace subset the variables to only contain the markers.label (
Optional[str]) – If set, e.g., to'marker', then save the marker indicator to theobscolumn named asmarker.threshold (
float) – Threshold value of the log fol-change above which the gene will be considered as a marker.eps (
float) – A small value that prevents taking log of zeros.
- class dance.transforms.FilterGenesMatch(prefixes=None, suffixes=None, case_sensitive=False, **kwargs)[source]
Filter genes based on prefixes and suffixes.
- Parameters:
prefixes (
Optional[List[str]]) – List of prefixes to remove.suffixes (
Optional[List[str]]) – List of suffixes to remove.case_sensitive (bool) –
- class dance.transforms.FilterGenesPercentile(min_val=1, max_val=99, *, mode='sum', channel=None, channel_type=None, whitelist_indicators=None, **kwargs)[source]
Filter genes based on percentiles of the summarized gene expressions.
- Parameters:
min_val (
Optional[float]) – Minimum percentile of the summarized expression value below which the genes will be discarded.max_val (
Optional[float]) – Maximum percentile of the summarized expression value above which the genes will be discarded.mode (
Literal['sum','cv','rv','var']) – Summarization mode. Available options are[sum|var|cv|rv].sumcalculates the sum of expression values,varcalculates the variance of the expression values,cvuses the coefficient of variation (std / mean ), andrvuses the relative variance (var / mean).channel (
Optional[str]) – Which channel, more specificailly,layers, to use. Use the default.Xif not set. Ifchannelis specified, then need to specifychannel_typeto belayersas well.channel_type (
Optional[str]) – Type of channels specified. Only allowNone(the default setting) orlayers(whenchannelis specified).whitelist_indicators (
Union[str,List[str],None]) – A list of (or a single)varcolumns that indicates the genes to be excluded from the filtering process. Note that these genes will still be used in the summary stats computation, and thus will still contribute to the threshold percentile. If not set, then no genes will be excluded from the filtering process.
- class dance.transforms.FilterGenesScanpy(min_counts=None, min_cells=None, max_counts=None, max_cells=None, split_name=None, channel=None, channel_type='X', **kwargs)[source]
Scanpy filtering gene transformation with additional options.
- Parameters:
min_counts (
Optional[int]) – Minimum number of counts required for a gene to be kept.min_cells (
Union[float,int,None]) – Minimum number (or ratio) of cells required for a gene to be kept.max_counts (
Optional[int]) – Maximum number of counts required for a gene to be kept.max_cells (
Union[float,int,None]) – Maximum number (or ratio) of cells required for a gene to be kept.split_name (
Optional[str]) – Which split to be used for filtering.channel (
Optional[str]) – Channel to be used for filtering.channel_type (
Optional[str]) – Channel type to be used for filtering.
- class dance.transforms.FilterGenesTopK(num_genes, top=True, *, mode='cv', channel=None, channel_type='X', whitelist_indicators=None, **kwargs)[source]
Select top/bottom genes based on the summarized gene expressions.
- Parameters:
num_genes (
int) – Number of genes to be selected.top (
bool) – If set toTrue, then use the genes with highest values of the specified gene summary stats.mode (
Literal['sum','cv','rv','var']) – Summarization mode. Available options are[sum|var|cv|rv].sumcalculates the sum of expression values,varcalculates the variance of the expression values,cvuses the coefficient of variation (std / mean ), andrvuses the relative variance (var / mean).channel (
Optional[str]) – Which channel, more specificailly,layers, to use. Use the default.Xif not set. Ifchannelis specified, then need to specifychannel_typeto belayersas well.channel_type (
Optional[str]) – Type of channels specified. Only allowNone(the default setting) orlayers(whenchannelis specified).whitelist_indicators (
Union[str,List[str],None]) – A list of (or a single)varcolumns that indicates the genes to be excluded from the filtering process. Note that these genes will still be used in the summary stats computation, and thus will still contribute to the threshold percentile. If not set, then no genes will be excluded from the filtering process.
- class dance.transforms.GeneHoldout(n_top=5, batch_size=512, random_state=None, **kwargs)[source]
Progressively hold out genes for DeepImpute.
Split genes into target batches. For every target gene in one batch, refer to the genes that are not in this batch and select predictor genes with high covariance with target gene.
- Parameters:
n_top (
int) – Number of predictor genes per target gene.batch_size (
int) – Target batch size.random_state (
Optional[int]) – Random state.
- class dance.transforms.GeneStats(genestats_select='all', *, fill_na=None, threshold=0, pseudo=False, split_name='train', channel=None, channel_type=None, **kwargs)[source]
Gene statistics computation.
- Parameters:
genestats_select (
Union[str,List[str]]) – List of names of the gene stats functions to use. If set to"all"(by default), then use all available gene stats functions.fill_na (
Optional[float]) – If not set (default), then do not fill nans. Otherwise, fill nans with the specified value.threshold (
float) – Threshold value for filtering gene expression when computing stats, e.g., mean expression values.pseudo (
bool) – If set toTrue, then add1to the numerator and denominator when computing the ratio (alpha) for which the gene expression values are above the specifiedthreshold.split_name (
Optional[str]) – Which split to compute the gene stats on.channel (str | None) –
channel_type (str | None) –
- class dance.transforms.MaskData(mask_rate=0.1, seed=None, **kwargs)[source]
Randomly mask data.
Randomly mask positive counts according to masking rate.
- Parameters:
mask_rate (
Optional[float]) – Masking rate.seed (
Optional[int]) – Random seed.
- class dance.transforms.MorphologyFeature(*, model_name='resnet50', n_components=50, random_state=0, crop_size=20, target_size=299, device='cpu', channels=('spatial_pixel', 'image'), channel_types=('obsm', 'uns'), **kwargs)[source]
- Parameters:
model_name (str) –
n_components (int) –
random_state (int) –
crop_size (int) –
target_size (int) –
device (str) –
channels (Sequence[str]) –
channel_types (Sequence[str]) –
- class dance.transforms.PseudoMixture(*, n_pseudo=1000, nc_min=2, nc_max=10, ct_select='auto', ct_key='cellType', channel=None, channel_type='X', random_state=0, prefix='ps_mix_', in_split_name='ref', out_split_name='pseudo', label_batch=False, **kwargs)[source]
- Parameters:
n_pseudo (int) –
nc_min (int) –
nc_max (int) –
ct_select (Literal['auto'] | ~typing.List[str]) –
ct_key (str) –
channel (str | None) –
channel_type (str | None) –
random_state (int | None) –
prefix (str) –
in_split_name (str) –
out_split_name (str | None) –
label_batch (bool) –
- class dance.transforms.RemoveSplit(*, split_name, **kwargs)[source]
Remove a particular split from the data.
- Parameters:
split_name (str) –
- class dance.transforms.SCNFeature(num_top_genes=10, alpha1=0.05, alpha2=0.001, mu=2, num_top_gene_pairs=25, max_gene_per_ct=3, *, split_name='train', channel=None, channel_type=None, **kwargs)[source]
Differential gene-pair feature used in SingleCellNet.
- Parameters:
num_top_genes (int) –
alpha1 (float) –
alpha2 (float) –
mu (float) –
num_top_gene_pairs (int) –
max_gene_per_ct (int) –
split_name (str | None) –
channel (str | None) –
channel_type (str | None) –
- class dance.transforms.SMEFeature(n_neighbors=3, n_components=50, random_state=0, *, channels=(None, 'SMEGraph'), channel_types=(None, 'obsp'), **kwargs)[source]
- Parameters:
n_neighbors (int) –
n_components (int) –
random_state (int) –
channels (Sequence[str | None]) –
channel_types (Sequence[str | None]) –
- class dance.transforms.SaveRaw(exist_ok=False, **kwargs)[source]
Save raw data.
See
anndata.AnnData.raw()for more information.- Parameters:
exist_ok (
bool) – If set to False, then raise an exception if therawattribute is already set.
- class dance.transforms.ScaleFeature(*, axis=0, split_names=None, batch_key=None, mode='normalize', eps=-1, **kwargs)[source]
Scale the feature matrix in the AnnData object.
This is an extension of
scanpy.pp.scale(), allowing split- or batch-wide scaling.- Parameters:
axis (
int) – Axis along which the scaling is performed.split_names (
Union[Literal['ALL'],List[str],None]) – Indicate which splits to perform the scaling independently. If set to ‘ALL’, then go through all splits available in the data.batch_key (
Optional[str]) – Indicate which column in.obsto use as the batch index to guide the batch-wide scaling.mode (
Literal['normalize','standardize','minmax','l2']) – Scaling mode, seedance.utils.matrix.normalize()for more information.eps (
float) – Correction fact, seedance.utils.matrix.normalize()for more information.
Note
The order of checking split- or batch-wide scaling mode is: batch_key > split_names > None (i.e., all).
- class dance.transforms.SetConfig(config_dict, **kwargs)[source]
Set configuration options of a dance data object.
- Parameters:
config_dict (
Dict[str,Any]) – Dance data object configuration dictionary. Seeset_config_from_dict().
- class dance.transforms.WeightedFeaturePCA(n_components=400, split_name=None, feat_norm_mode=None, feat_norm_axis=0, **kwargs)[source]
Compute the weighted gene PCA as cell features.
Given a gene expression matrix of dimension (cell x gene), the gene PCA is first compured. Then, the representation of each cell is computed by taking the weighted sum of the gene PCAs based on that cell’s gene expression values.
- Parameters:
n_components (
int) – Number of PCs to use.split_name (
Optional[str]) – Which split to use to compute the gene PCA. If not set, use all data.feat_norm_mode (str | None) –
feat_norm_axis (int) –