dance.datasets

class dance.datasets.base.BaseDataset(root, full_download=False)[source]

BaseDataset abstract object.

Parameters:
  • root (str) – Root directory of the dataset.

  • full_download (bool) – If set to True, then attempt to download all raw files of the dataset.

abstract download()[source]

Download selected files of the dataset.

download_all()[source]

Download all raw files of the dataset.

classmethod get_available_data()[source]

List available data of the dataset.

Return type:

List[Union[str, Dict[str, str]]]

hexdigest()[source]

Return MD5 hash using the string valued items in __dict__.

Return type:

str

abstract is_complete()[source]

Return True if the selected files have been downloaded.

Return type:

bool

is_complete_all()[source]

Return True if all raw files of the dataset have been downloaded.

Return type:

bool

load_data(transform=None, cache=False, redo_cache=False)[source]

Load dance data object and perform transformation.

If cache option is set, then try to load the processed data from cache. The cache file hash is supposed to distinguish different datasets and different transformations. In particular, it is constructed by MD5 hashing the concatenation of the dataset MD5 hash (see hexdigest()) and the transformation MD5 hash (hexdigest()). In the case of no transformation, i.e., transform=None the transformation MD5 hash will be the empty string "".

Parameters:
  • transform (Optional[BaseTransform]) – Transformation to be applied.

  • cache (bool) – If set to True, then try to read and write cache to <root>/cache/<hash>.pkl

  • redo_cache (bool) – If set to True, then redo the data loading and transformation, and overwrite the previous cache with the newly processed data.

Return type:

Data

load_raw_data()[source]

Download data if necessary and return data in raw format.

Return type:

Any

Single modality datasets

class dance.datasets.singlemodality.CellTypeAnnotationDataset(full_download=False, train_dataset=None, test_dataset=None, species=None, tissue=None, train_dir='train', test_dir='test', map_path='map', data_dir='./')[source]
download(download_map=True)[source]

Download selected files of the dataset.

download_all()[source]

Download all raw files of the dataset.

static get_map_dict(map_file_path, tissue)[source]

Load cell-type mappings.

Parameters:
  • map_file_path (str) – Path to the mapping file.

  • tissue (str) – Tissue of interest.

Return type:

Dict[str, Set[str]]

Notes

Merge mapping across all test sets for the required tissue.

is_complete()[source]

Check if benchmarking data is complete.

is_complete_all()[source]

Check if data is complete.

class dance.datasets.singlemodality.ClusteringDataset(data_dir='./data', dataset='mouse_bladder_cell')[source]

Data downloading and loading for clustering.

Parameters:
  • data_dir (str) – Path to store datasets.

  • dataset (str) – Choice of dataset. Available options are ‘10X_PBMC’, ‘mouse_bladder_cell’, ‘mouse_ES_cell’, ‘worm_neuron_cell’.

download()[source]

Download selected files of the dataset.

is_complete()[source]

Return True if the selected files have been downloaded.

class dance.datasets.singlemodality.ImputationDataset(data_dir='data', dataset='human_stemcell', train_size=0.1)[source]
download()[source]

Download selected files of the dataset.

is_complete()[source]

Return True if the selected files have been downloaded.

Multi modality datasets

class dance.datasets.multimodality.JointEmbeddingNIPSDataset(subtask, root='./data', preprocess=None, normalize=False, pretrained_folder='.', selection_threshold=10000, span=0.3)[source]
class dance.datasets.multimodality.ModalityMatchingDataset(subtask, root='./data', preprocess=None, pkl_path=None, span=0.3)[source]
class dance.datasets.multimodality.ModalityPredictionDataset(subtask, root='./data', preprocess=None, span=0.3)[source]
class dance.datasets.multimodality.MultiModalityDataset(subtask, root='./data')[source]
download()[source]

Download selected files of the dataset.

is_complete()[source]

Return True if the selected files have been downloaded.

Return type:

bool

Spatial datasets

class dance.datasets.spatial.CellTypeDeconvoDataset(data_dir='data/spatial', data_id='GSE174746', subset_common_celltypes=True)[source]

Load raw data.

Parameters:

subset_common_celltypes (bool) – If set to True, then subset both the reference and the real data to contain only cell types that are present in both reference and real.

download()[source]

Download selected files of the dataset.

is_complete()[source]

Return True if the selected files have been downloaded.

class dance.datasets.spatial.SpatialLIBDDataset(root='.', full_download=False, data_id='151673', data_dir='data/spatial')[source]
download()[source]

Download selected files of the dataset.

download_all()[source]

Download all raw files of the dataset.

is_complete()[source]

Return True if the selected files have been downloaded.

is_complete_all()[source]

Return True if all raw files of the dataset have been downloaded.