PlnPCAcollection

class pyPLNmodels.PlnPCAcollection(endog: Tensor | ndarray | DataFrame, *, exog: Tensor | ndarray | DataFrame | None = None, offsets: Tensor | ndarray | DataFrame | None = None, offsets_formula: str = 'logsum', ranks: Iterable[int] = range(3, 5), dict_of_dict_initialization: dict | None = None, take_log_offsets: bool = False, add_const: bool = True)

Bases: object

A collection where value q corresponds to a PlnPCA object with rank q.

Examples

>>> from pyPLNmodels import PlnPCAcollection, get_real_count_data, get_simulation_parameters, sample_pln
>>> endog, labels = get_real_count_data(return_labels = True)
>>> data = {"endog": endog}
>>> plnpcas = PlnPCAcollection.from_formula("endog ~ 1", data = data, ranks = [5,8, 12])
>>> plnpcas.fit()
>>> print(plnpcas)
>>> plnpcas.show()
>>> print(plnpcas.best_model())
>>> print(plnpcas[5])
>>> plnparam = get_simulation_parameters(n_samples =100, dim = 60, nb_cov = 2, rank = 8)
>>> endog = sample_pln(plnparam)
>>> data = {"endog":endog, "cov": plnparam.exog, "offsets": plnparam.offsets}
>>> plnpcas = PlnPCAcollection.from_formula("endog ~ 0 + cov", data = data, ranks = [5,8,12])
>>> plnpcas.fit()
>>> print(plnpcas)
>>> plnpcas.show()

See also

PlnPCA

property AIC: Dict[int, int]

Property representing the AIC scores of the models in the collection.

Returns:

The AIC scores of the models.

Return type:

Dict[int, int]

property BIC: Dict[int, int]

Property representing the BIC scores of the models in the collection.

Returns:

The BIC scores of the models.

Return type:

Dict[int, int]

__init__(endog: Tensor | ndarray | DataFrame, *, exog: Tensor | ndarray | DataFrame | None = None, offsets: Tensor | ndarray | DataFrame | None = None, offsets_formula: str = 'logsum', ranks: Iterable[int] = range(3, 5), dict_of_dict_initialization: dict | None = None, take_log_offsets: bool = False, add_const: bool = True)

Constructor for PlnPCAcollection.

Parameters:
  • endog (Union[torch.Tensor, np.ndarray, pd.DataFrame]) – The endog.

  • exog (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The exog, by default None.

  • offsets (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The offsets, by default None.

  • offsets_formula (str, optional(keyword-only)) – The formula for offsets, by default “logsum”.

  • ranks (Iterable[int], optional(keyword-only)) – The range of ranks, by default range(3, 5).

  • dict_of_dict_initialization (dict, optional(keyword-only)) – The dictionary of initialization, by default None.

  • take_log_offsets (bool, optional(keyword-only)) – Whether to take the logarithm of offsets, by default False.

  • add_const (bool, optional(keyword-only)) – Whether to add a column of one in the exog. Defaults to True.

  • batch_size (int, optional(keyword-only)) – The batch size when optimizing the elbo. If None, batch gradient descent will be performed (i.e. batch_size = n_samples).

Return type:

PlnPCAcollection

property batch_size: Tensor

Property representing the batch_size.

Returns:

The batch_size.

Return type:

torch.Tensor

property best_AIC_model_rank: int

Property representing the rank of the best model according to the AIC criterion.

Returns:

The rank of the best model.

Return type:

int

property best_BIC_model_rank: int

Property representing the rank of the best model according to the BIC criterion.

Returns:

The rank of the best model.

Return type:

int

best_model(criterion: str = 'AIC') Any

Get the best model according to the specified criterion.

Parameters:

criterion (str, optional) – The criterion to use (‘AIC’ or ‘BIC’), by default ‘AIC’.

Returns:

The best model.

Return type:

Any

property coef: Dict[int, Tensor]

Property representing the coefficients.

Returns:

The coefficients.

Return type:

Dict[int, torch.Tensor]

property components: Dict[int, Tensor]

Property representing the components.

Returns:

The components.

Return type:

Dict[int, torch.Tensor]

property dim: int

Property representing the dimension.

Returns:

The dimension.

Return type:

int

property endog: Tensor

Property representing the endog.

Returns:

The endog.

Return type:

torch.Tensor

property exog: Tensor

Property representing the exog.

Returns:

The exog.

Return type:

torch.Tensor

fit(nb_max_iteration: int = 50000, *, lr: float = 0.01, tol: float = 0.001, do_smart_init: bool = True, verbose: bool = False, batch_size: int | None = None)

Fit each model in the PlnPCAcollection.

Parameters:
  • nb_max_iteration (int, optional) – The maximum number of iterations, by default 50000.

  • lr (float, optional(keyword-only)) – The learning rate, by default 0.01.

  • tol (float, optional(keyword-only)) – The tolerance, by default 1e-8.

  • do_smart_init (bool, optional(keyword-only)) – Whether to do smart initialization, by default True.

  • verbose (bool, optional(keyword-only)) – Whether to print verbose output, by default False.

  • batch_size (int, optional(keyword-only)) – The batch size when optimizing the elbo. If None, batch gradient descent will be performed (i.e. batch_size = n_samples).

Raises:

ValueError – If the batch_size is greater than the number of samples, or not int.

classmethod from_formula(formula: str, data: Dict[str, Tensor | ndarray | DataFrame], *, offsets_formula: str = 'logsum', ranks: Iterable[int] = range(3, 5), dict_of_dict_initialization: dict | None = None, take_log_offsets: bool = False) PlnPCAcollection

Create an instance of PlnPCAcollection from a formula.

Parameters:
  • formula (str) – The formula.

  • data (dict) – The data dictionary. Each value can be either a torch.Tensor, np.ndarray or pd.DataFrame

  • offsets_formula (str, optional(keyword-only)) – The formula for offsets, by default “logsum”. Overriden if data[“offsets”] is not None.

  • ranks (Iterable[int], optional(keyword-only)) – The range of ranks, by default range(3, 5).

  • dict_of_dict_initialization (dict, optional(keyword-only)) – The dictionary of initialization, by default None.

  • take_log_offsets (bool, optional(keyword-only)) – Whether to take the logarithm of offsets, by default False.

Returns:

The created PlnPCAcollection instance.

Return type:

PlnPCAcollection

Examples

>>> from pyPLNmodels import PlnPCAcollection, get_real_count_data
>>> endog = get_real_count_data()
>>> data = {"endog": endog}
>>> pca_col = PlnPCAcollection.from_formula("endog ~ 1", data = data, ranks = [5,6])

See also

PlnPCA, __init__()

get(key: Any, default: Any) Any

Get the model with the specified key, or return a default value if the key does not exist.

Parameters:
  • key (Any) – The key to search for.

  • default (Any) – The default value to return if the key does not exist.

Returns:

The model with the specified key, or the default value if the key does not exist.

Return type:

Any

items()

Get the key-value pairs of the models in the collection.

Returns:

The key-value pairs of the models.

Return type:

ItemsView

keys()

Get the ranks of the models in the collection.

Returns:

The ranks of the models.

Return type:

KeysView

property latent_mean: Dict[int, Tensor]

Property representing the latent means.

Returns:

The latent means.

Return type:

Dict[int, torch.Tensor]

property latent_sqrt_var: Dict[int, Tensor]

Property representing the latent variances.

Returns:

The latent variances.

Return type:

Dict[int, torch.Tensor]

property loglikes: Dict[int, Any]

Property representing the log-likelihoods of the models in the collection.

Returns:

The log-likelihoods of the models.

Return type:

Dict[int, Any]

property n_samples: int

Property representing the number of samples.

Returns:

The number of samples.

Return type:

int

property nb_cov: int

Property representing the number of exog.

Returns:

The number of exog.

Return type:

int

property offsets: Tensor

Property representing the offsets.

Returns:

The offsets.

Return type:

torch.Tensor

property ranks: List[int]

Property representing the ranks.

Returns:

The ranks.

Return type:

List[int]

save(path_of_directory: str = './', ranks: List[int] | None = None)

Save the models in the specified directory.

Parameters:
  • path_of_directory (str, optional) – The path of the directory to save the models, by default “./”.

  • ranks (Optional[List[int]], optional) – The ranks of the models to save, by default None.

show()

Show a plot with BIC scores, AIC scores, and negative log-likelihoods of the models.

values()

Get the models in the collection.

Returns:

The models in the collection.

Return type:

ValuesView