dance.data
- class dance.data.base.BaseData(data, train_size=None, val_size=0, test_size=-1, split_index_range_dict=None, full_split_name=None)[source]
Base data object.
The
dancedata object is a wrapper of theAnnDataobject, with several utility methods to help retrieving data in specific splits in specific format (seeget_split_idx()andget_feature()). TheAnnDataobjcet is saved in the attributedataand can be accessed directly.Warning
Since the underlying data object is a reference to the input
AnnDataobject, please be extra cautious *NOT* initializing two different dancedataobject using the sameAnnDataobject! If you are unsure, we recommend always initialize the dancedataobject using acopyof the inputAnnDataobject, e.g.,>>> adata = anndata.AnnData(...) >>> ddata = dance.data.Data(adata.copy())
Note
You can directly access some main properties of
AnnData(orMuDatadepending on which type of data you passed in), such asX,obs,var, and etc.- Parameters:
train_size (
Optional[int]) – Number of cells to be used for training. If not specified, not splits will be generated.val_size (
int) – Number of cells to be used for validation. If set to -1, use what’s left from training and testing.test_size (
int) – Number of cells to be used for testing. If set to -1, used what’s left from training and validation.split_index_range_dict (Dict[str, Tuple[int, int]] | None) –
full_split_name (str | None) –
- append(data, *, mode='merge', rename_dict=None, new_split_name=None, label_batch=False, **concat_kwargs)[source]
Append another dance data object to the current data object.
- Parameters:
data – New dance data object to be added.
mode (
Optional[Literal['merge','rename','new_split']]) – How to combine the splits from the new data and the current data. (1)"merge": merge the splits from the data, e.g., the training indexes from both data are used as the training indexes in the new combined data. (2)"rename": rename the splits of the new data and add to the current split index dictionary, e.g., renaming ‘train’ to ‘ref’. Requires passing therename_dict. Raise an error if the newly renamed key is already used in the current split index dictionary. (3)"new_split": assign the whole new data to a new split. Requires pssing thenew_split_namethat is not already used as a split name in the current data. (4)None: do not specify split index to the newly added data.rename_dict (
Optional[Dict[str,str]]) – Optional argument that is only used whenmode="rename". A dictionary to map the split names in the new data to other names.new_split_name (
Optional[str]) – Optional argument that is only used whenmode="new_split". Name of the split to assign to the new data.label_batch (
bool) – Add “batch” column to.obswhen set to True.**concat_kwargs – See
anndata.concat().
- property config: Dict[str, Any]
Return the dance data object configuration dict.
Notes
The configuration dictionary is saved in the
dataattribute, which is anAnnDataobject. Inparticular, the config will be saved in the.unsattribute with the key"dance_config".
- get_feature(*, split_name=None, return_type='numpy', channel=None, channel_type='obsm', mod=None)[source]
Retrieve features from data.
- Parameters:
split_name (
Optional[str]) – Name of the split to retrieve. If not set, return all.return_type (
Literal['anndata','default','numpy','torch','sparse']) – How should the features be returned. sparse: return as a sparse matrix; numpy: return as a numpy array; torch: return as a torch tensor; anndata: return as an anndata object.channel (
Optional[str]) – Return a particular channel as features. Ifchannel_typeisXorraw_X, then return.Xor the.raw.Xattribute from theAnnDatadirectly. Ifchannel_typeisobs, return the column named bychannel, similarly forvar. Finally, ifchannel_typeisobsm,obsp,varm,varp,layers, oruns, then return the value correspond to thechannelin the dictionary.channel_type (
Optional[str]) – Channel type to use, default toobsm(will be changed toXin the near future).mod (
Optional[str]) – Modality to use, default toNone. Options other thanNoneare only available when the underlying data object isMudata.
- get_split_idx(split_name, error_on_miss=False)[source]
Obtain cell indices for a particular split.
- Parameters:
split_name (
str) – Name of the split to retrieve.error_on_miss (
bool) – If set to True, raise KeyError if the queried split does not exit, otherwise return None.
See also
- get_split_mask(split_name, return_type='numpy')[source]
Obtain mask representation of a particular split.
- Parameters:
split_name (
str) – Name of the split to retrieve.return_type (
Literal['anndata','default','numpy','torch','sparse']) – Return numpy array if set to ‘numpy’, or torch Tensor if set to ‘torch’.
- Return type:
Union[ndarray,Tensor]
- set_config(*, overwrite=False, **kwargs)[source]
Set dance data object configuration.
See :meth: ~BaseData.set_config_from_dict.
- Parameters:
overwrite (bool) –
- set_config_from_dict(config_dict, *, overwrite=False)[source]
Set dance data object configuration from a config dict.
- Parameters:
config_dict (
Dict[str,Any]) – Configuration dictionary.overwrite (
bool) – Used to determine the behaviour of resolving config conflicts. In the case of a conflict, where the config dict passed contains a key with value that differs from an existing setting, ifoverwriteis set toFalse, then raise aKeyError. Otherwise, overwrite the configuration with the new values.
- class dance.data.Data(data, train_size=None, val_size=0, test_size=-1, split_index_range_dict=None, full_split_name=None)[source]
- Parameters:
- get_data(split_name=None, return_type='numpy', x_kwargs={}, y_kwargs={})[source]
Retrieve cell features and labels from a particular split.
- Parameters:
split_name (
Optional[str]) – Name of the split to retrieve. If not set, return all.return_type (
Literal['anndata','default','numpy','torch','sparse']) – How should the features be returned. numpy: return as a numpy array; torch: return as a torch tensor; anndata: return as an anndata object.x_kwargs (Dict[str, Any]) –
y_kwargs (Dict[str, Any]) –
- Return type:
Tuple[Any,Any]
- get_test_data(return_type='numpy', x_kwargs={}, y_kwargs={})[source]
Retrieve cell features and labels from the ‘test’ split.
- Return type:
Tuple[Any,Any]- Parameters:
return_type (Literal['anndata', 'default', 'numpy', 'torch', 'sparse']) –
x_kwargs (Dict[str, Any]) –
y_kwargs (Dict[str, Any]) –
- get_train_data(return_type='numpy', x_kwargs={}, y_kwargs={})[source]
Retrieve cell features and labels from the ‘train’ split.
- Return type:
Tuple[Any,Any]- Parameters:
return_type (Literal['anndata', 'default', 'numpy', 'torch', 'sparse']) –
x_kwargs (Dict[str, Any]) –
y_kwargs (Dict[str, Any]) –
- get_val_data(return_type='numpy', x_kwargs={}, y_kwargs={})[source]
Retrieve cell features and labels from the ‘val’ split.
- Return type:
Tuple[Any,Any]- Parameters:
return_type (Literal['anndata', 'default', 'numpy', 'torch', 'sparse']) –
x_kwargs (Dict[str, Any]) –
y_kwargs (Dict[str, Any]) –