armlet.data.splitter module#
This module contains the data partitioning utilities.
- class armlet.data.splitter.ArmletDataSplitter(data_dict: dict, dist_cfg: DDict, client_split: float = 0.0, client_val_split: float = 0.0, sampling_perc: float = 1.0, server_test: bool = True, server_test_union: bool = False, keep_test: bool = True, server_split: float = 0.0, server_val_split: float = 0.0, uniform_test: bool = False)#
Bases:
objectUtility class for splitting the data across clients.
- assign(n_clients: int) dict#
- static iid(X_train: DataFrame, y_train: DataFrame, X_test: DataFrame | None, y_test: DataFrame | None, n: int) tuple[list[ndarray], list[ndarray] | None]#
- static label_dirichlet_skew(X_train: DataFrame, y_train: DataFrame, X_test: DataFrame | None, y_test: DataFrame | None, n: int, alpha: float = 0.1, min_sample_per_class: int = 0) tuple[list[ndarray], list[ndarray] | None]#
- static safe_label_dirichlet_skew(X_train: DataFrame, y_train: DataFrame, X_test: DataFrame | None, y_test: DataFrame | None, n: int, alpha: float = 0.1) tuple[list[ndarray], list[ndarray] | None]#
- class armlet.data.splitter.DummyDataSplitter(dataset: DataContainer, distribution: str = 'iid', client_split: float = 0.0, sampling_perc: float = 1.0, server_test: bool = True, keep_test: bool = True, server_split: float = 0.0, uniform_test: bool = False, dist_args: DDict = None)#
Bases:
DataSplitter- assign(n_clients: int, batch_size: int = 32) tuple[tuple[list[FastDataLoader], list[FastDataLoader] | None], FastDataLoader]#
Assign the data to the clients and the server according to the configuration. Specifically, we can have the following scenarios:
server_test = Trueandkeep_test = True: The server has a test set that corresponds to the test set of the dataset. The clients have a training set and, ifclient_split > 0, a test set.server_test = Trueandkeep_test = False: The server has a test set that is sampled from the whole dataset (training set and test set are merged). The server’s sample size is indicated by theserver_splitparameter. The clients have a training set and, ifclient_split > 0, a test set.server_test = Falseandkeep_test = True: The server does not have a test set. The clients have a training set and a test set that corresponds to the test set of the dataset distributed uniformly across the clients. In this case theclient_splitis ignored.server_test = Falseandkeep_test = False: The server does not have a test set. The clients have a training set and, ifclient_split > 0, a test set.
If
uniform_test = False, the training and test set are distributed across the clients according to the provided distribution. The only exception is done for the test set in scenario 3. The test set is IID distributed across clients ifuniform_test = True.- Parameters:
n_clients (int) – The number of clients.
batch_size (Optional[int], optional) – The batch size. Defaults to 32.
- Returns:
The clients’ training and testing assignments and the server’s testing assignment.
- Return type:
tuple[tuple[list[FastDataLoader], Optional[list[FastDataLoader]]], FastDataLoader]