dance.data
- class dance.data.base.BaseData(data, train_size=None, val_size=0, test_size=-1, split_index_range_dict=None, full_split_name=None)[source]
Base data object.
The
dance
data object is a wrapper of theAnnData
object, with several utility methods to help retrieving data in specific splits in specific format (seeget_split_idx()
andget_feature()
). TheAnnData
objcet is saved in the attributedata
and can be accessed directly.Warning
Since the underlying data object is a reference to the input
AnnData
object, please be extra cautious *NOT* initializing two different dancedata
object using the sameAnnData
object! If you are unsure, we recommend always initialize the dancedata
object using acopy
of the inputAnnData
object, e.g.,>>> adata = anndata.AnnData(...) >>> ddata = dance.data.Data(adata.copy())
Note
You can directly access some main properties of
AnnData
(orMuData
depending on which type of data you passed in), such asX
,obs
,var
, and etc.- Parameters:
train_size (
Optional
[int
]) – Number of cells to be used for training. If not specified, not splits will be generated.val_size (
int
) – Number of cells to be used for validation. If set to -1, use what’s left from training and testing.test_size (
int
) – Number of cells to be used for testing. If set to -1, used what’s left from training and validation.split_index_range_dict (Dict[str, Tuple[int, int]] | None) –
full_split_name (str | None) –
- append(data, *, mode='merge', rename_dict=None, new_split_name=None, label_batch=False, **concat_kwargs)[source]
Append another dance data object to the current data object.
- Parameters:
data – New dance data object to be added.
mode (
Optional
[Literal
['merge'
,'rename'
,'new_split'
]]) – How to combine the splits from the new data and the current data. (1)"merge"
: merge the splits from the data, e.g., the training indexes from both data are used as the training indexes in the new combined data. (2)"rename"
: rename the splits of the new data and add to the current split index dictionary, e.g., renaming ‘train’ to ‘ref’. Requires passing therename_dict
. Raise an error if the newly renamed key is already used in the current split index dictionary. (3)"new_split"
: assign the whole new data to a new split. Requires pssing thenew_split_name
that is not already used as a split name in the current data. (4)None
: do not specify split index to the newly added data.rename_dict (
Optional
[Dict
[str
,str
]]) – Optional argument that is only used whenmode="rename"
. A dictionary to map the split names in the new data to other names.new_split_name (
Optional
[str
]) – Optional argument that is only used whenmode="new_split"
. Name of the split to assign to the new data.label_batch (
bool
) – Add “batch” column to.obs
when set to True.**concat_kwargs – See
anndata.concat()
.
- property config: Dict[str, Any]
Return the dance data object configuration dict.
Notes
The configuration dictionary is saved in the
data
attribute, which is anAnnData
object. Inparticular, the config will be saved in the.uns
attribute with the key"dance_config"
.
- get_feature(*, split_name=None, return_type='numpy', channel=None, channel_type='obsm', mod=None)[source]
Retrieve features from data.
- Parameters:
split_name (
Optional
[str
]) – Name of the split to retrieve. If not set, return all.return_type (
Literal
['anndata'
,'default'
,'numpy'
,'torch'
,'sparse'
]) – How should the features be returned. sparse: return as a sparse matrix; numpy: return as a numpy array; torch: return as a torch tensor; anndata: return as an anndata object.channel (
Optional
[str
]) – Return a particular channel as features. Ifchannel_type
isX
orraw_X
, then return.X
or the.raw.X
attribute from theAnnData
directly. Ifchannel_type
isobs
, return the column named bychannel
, similarly forvar
. Finally, ifchannel_type
isobsm
,obsp
,varm
,varp
,layers
, oruns
, then return the value correspond to thechannel
in the dictionary.channel_type (
Optional
[str
]) – Channel type to use, default toobsm
(will be changed toX
in the near future).mod (
Optional
[str
]) – Modality to use, default toNone
. Options other thanNone
are only available when the underlying data object isMudata
.
- get_split_idx(split_name, error_on_miss=False)[source]
Obtain cell indices for a particular split.
- Parameters:
split_name (
str
) – Name of the split to retrieve.error_on_miss (
bool
) – If set to True, raise KeyError if the queried split does not exit, otherwise return None.
See also
- get_split_mask(split_name, return_type='numpy')[source]
Obtain mask representation of a particular split.
- Parameters:
split_name (
str
) – Name of the split to retrieve.return_type (
Literal
['anndata'
,'default'
,'numpy'
,'torch'
,'sparse'
]) – Return numpy array if set to ‘numpy’, or torch Tensor if set to ‘torch’.
- Return type:
Union
[ndarray
,Tensor
]
- set_config(*, overwrite=False, **kwargs)[source]
Set dance data object configuration.
See :meth: ~BaseData.set_config_from_dict.
- Parameters:
overwrite (bool) –
- set_config_from_dict(config_dict, *, overwrite=False)[source]
Set dance data object configuration from a config dict.
- Parameters:
config_dict (
Dict
[str
,Any
]) – Configuration dictionary.overwrite (
bool
) – Used to determine the behaviour of resolving config conflicts. In the case of a conflict, where the config dict passed contains a key with value that differs from an existing setting, ifoverwrite
is set toFalse
, then raise aKeyError
. Otherwise, overwrite the configuration with the new values.
- class dance.data.Data(data, train_size=None, val_size=0, test_size=-1, split_index_range_dict=None, full_split_name=None)[source]
- Parameters:
- get_data(split_name=None, return_type='numpy', x_kwargs={}, y_kwargs={})[source]
Retrieve cell features and labels from a particular split.
- Parameters:
split_name (
Optional
[str
]) – Name of the split to retrieve. If not set, return all.return_type (
Literal
['anndata'
,'default'
,'numpy'
,'torch'
,'sparse'
]) – How should the features be returned. numpy: return as a numpy array; torch: return as a torch tensor; anndata: return as an anndata object.x_kwargs (Dict[str, Any]) –
y_kwargs (Dict[str, Any]) –
- Return type:
Tuple
[Any
,Any
]
- get_test_data(return_type='numpy', x_kwargs={}, y_kwargs={})[source]
Retrieve cell features and labels from the ‘test’ split.
- Return type:
Tuple
[Any
,Any
]- Parameters:
return_type (Literal['anndata', 'default', 'numpy', 'torch', 'sparse']) –
x_kwargs (Dict[str, Any]) –
y_kwargs (Dict[str, Any]) –
- get_train_data(return_type='numpy', x_kwargs={}, y_kwargs={})[source]
Retrieve cell features and labels from the ‘train’ split.
- Return type:
Tuple
[Any
,Any
]- Parameters:
return_type (Literal['anndata', 'default', 'numpy', 'torch', 'sparse']) –
x_kwargs (Dict[str, Any]) –
y_kwargs (Dict[str, Any]) –
- get_val_data(return_type='numpy', x_kwargs={}, y_kwargs={})[source]
Retrieve cell features and labels from the ‘val’ split.
- Return type:
Tuple
[Any
,Any
]- Parameters:
return_type (Literal['anndata', 'default', 'numpy', 'torch', 'sparse']) –
x_kwargs (Dict[str, Any]) –
y_kwargs (Dict[str, Any]) –