armlet.data.splitter module#

This module contains the data partitioning utilities.

class armlet.data.splitter.ArmletDataSplitter(data_dict: dict, dist_cfg: DDict, client_split: float = 0.0, client_val_split: float = 0.0, sampling_perc: float = 1.0, server_test: bool = True, server_test_union: bool = False, keep_test: bool = True, server_split: float = 0.0, server_val_split: float = 0.0, uniform_test: bool = False)#

Bases: object

Utility class for splitting the data across clients.

assign(n_clients: int) dict#
static iid(X_train: DataFrame, y_train: DataFrame, X_test: DataFrame | None, y_test: DataFrame | None, n: int) tuple[list[ndarray], list[ndarray] | None]#
static label_dirichlet_skew(X_train: DataFrame, y_train: DataFrame, X_test: DataFrame | None, y_test: DataFrame | None, n: int, alpha: float = 0.1, min_sample_per_class: int = 0) tuple[list[ndarray], list[ndarray] | None]#
static safe_label_dirichlet_skew(X_train: DataFrame, y_train: DataFrame, X_test: DataFrame | None, y_test: DataFrame | None, n: int, alpha: float = 0.1) tuple[list[ndarray], list[ndarray] | None]#
class armlet.data.splitter.DummyDataSplitter(dataset: DataContainer, distribution: str = 'iid', client_split: float = 0.0, sampling_perc: float = 1.0, server_test: bool = True, keep_test: bool = True, server_split: float = 0.0, uniform_test: bool = False, dist_args: DDict = None)#

Bases: DataSplitter

assign(n_clients: int, batch_size: int = 32) tuple[tuple[list[FastDataLoader], list[FastDataLoader] | None], FastDataLoader]#

Assign the data to the clients and the server according to the configuration. Specifically, we can have the following scenarios:

  1. server_test = True and keep_test = True: The server has a test set that corresponds to the test set of the dataset. The clients have a training set and, if client_split > 0, a test set.

  2. server_test = True and keep_test = False: The server has a test set that is sampled from the whole dataset (training set and test set are merged). The server’s sample size is indicated by the server_split parameter. The clients have a training set and, if client_split > 0, a test set.

  3. server_test = False and keep_test = True: The server does not have a test set. The clients have a training set and a test set that corresponds to the test set of the dataset distributed uniformly across the clients. In this case the client_split is ignored.

  4. server_test = False and keep_test = False: The server does not have a test set. The clients have a training set and, if client_split > 0, a test set.

If uniform_test = False, the training and test set are distributed across the clients according to the provided distribution. The only exception is done for the test set in scenario 3. The test set is IID distributed across clients if uniform_test = True.

Parameters:
  • n_clients (int) – The number of clients.

  • batch_size (Optional[int], optional) – The batch size. Defaults to 32.

Returns:

The clients’ training and testing assignments and the server’s testing assignment.

Return type:

tuple[tuple[list[FastDataLoader], Optional[list[FastDataLoader]]], FastDataLoader]