dance.datasets
- class dance.datasets.base.BaseDataset(root, full_download=False)[source]
BaseDataset abstract object.
- Parameters:
root (
str
) – Root directory of the dataset.full_download (
bool
) – If set toTrue
, then attempt to download all raw files of the dataset.
- classmethod get_available_data()[source]
List available data of the dataset.
- Return type:
List
[Union
[str
,Dict
[str
,str
]]]
- abstract is_complete()[source]
Return True if the selected files have been downloaded.
- Return type:
bool
- is_complete_all()[source]
Return True if all raw files of the dataset have been downloaded.
- Return type:
bool
- load_data(transform=None, cache=False, redo_cache=False)[source]
Load dance data object and perform transformation.
If
cache
option is set, then try to load the processed data from cache. Thecache
file hash is supposed to distinguish different datasets and different transformations. In particular, it is constructed by MD5 hashing the concatenation of the dataset MD5 hash (seehexdigest()
) and the transformation MD5 hash (hexdigest()
). In the case of no transformation, i.e.,transform=None
the transformation MD5 hash will be the empty string""
.- Parameters:
transform (
Optional
[BaseTransform
]) – Transformation to be applied.cache (
bool
) – If set toTrue
, then try to read and write cache to<root>/cache/<hash>.pkl
redo_cache (
bool
) – If set toTrue
, then redo the data loading and transformation, and overwrite the previous cache with the newly processed data.
- Return type:
Single modality datasets
- class dance.datasets.singlemodality.CellTypeAnnotationDataset(full_download=False, train_dataset=None, test_dataset=None, species=None, tissue=None, train_dir='train', test_dir='test', map_path='map', data_dir='./')[source]
- class dance.datasets.singlemodality.ClusteringDataset(data_dir='./data', dataset='mouse_bladder_cell')[source]
Data downloading and loading for clustering.
- Parameters:
data_dir (
str
) – Path to store datasets.dataset (
str
) – Choice of dataset. Available options are ‘10X_PBMC’, ‘mouse_bladder_cell’, ‘mouse_ES_cell’, ‘worm_neuron_cell’.
Multi modality datasets
- class dance.datasets.multimodality.JointEmbeddingNIPSDataset(subtask, root='./data', preprocess=None, normalize=False, pretrained_folder='.', selection_threshold=10000, span=0.3)[source]
- class dance.datasets.multimodality.ModalityMatchingDataset(subtask, root='./data', preprocess=None, pkl_path=None, span=0.3)[source]
Spatial datasets
- class dance.datasets.spatial.CellTypeDeconvoDataset(data_dir='data/spatial', data_id='GSE174746', subset_common_celltypes=True)[source]
Load raw data.
- Parameters:
subset_common_celltypes (
bool
) – If set to True, then subset both the reference and the real data to contain only cell types that are present in both reference and real.