PlnPCAcollection
- class pyPLNmodels.PlnPCAcollection(endog: Tensor | ndarray | DataFrame, *, exog: Tensor | ndarray | DataFrame | None = None, offsets: Tensor | ndarray | DataFrame | None = None, offsets_formula: str = 'logsum', ranks: Iterable[int] = range(3, 5), dict_of_dict_initialization: dict | None = None, take_log_offsets: bool = False, add_const: bool = True)
Bases:
object
A collection where value q corresponds to a PlnPCA object with rank q.
Examples
>>> from pyPLNmodels import PlnPCAcollection, get_real_count_data, get_simulation_parameters, sample_pln >>> endog, labels = get_real_count_data(return_labels = True) >>> data = {"endog": endog} >>> plnpcas = PlnPCAcollection.from_formula("endog ~ 1", data = data, ranks = [5,8, 12]) >>> plnpcas.fit() >>> print(plnpcas) >>> plnpcas.show() >>> print(plnpcas.best_model()) >>> print(plnpcas[5])
>>> plnparam = get_simulation_parameters(n_samples =100, dim = 60, nb_cov = 2, rank = 8) >>> endog = sample_pln(plnparam) >>> data = {"endog":endog, "cov": plnparam.exog, "offsets": plnparam.offsets} >>> plnpcas = PlnPCAcollection.from_formula("endog ~ 0 + cov", data = data, ranks = [5,8,12]) >>> plnpcas.fit() >>> print(plnpcas) >>> plnpcas.show()
See also
- property AIC: Dict[int, int]
Property representing the AIC scores of the models in the collection.
- Returns:
The AIC scores of the models.
- Return type:
Dict[int, int]
- property BIC: Dict[int, int]
Property representing the BIC scores of the models in the collection.
- Returns:
The BIC scores of the models.
- Return type:
Dict[int, int]
- __init__(endog: Tensor | ndarray | DataFrame, *, exog: Tensor | ndarray | DataFrame | None = None, offsets: Tensor | ndarray | DataFrame | None = None, offsets_formula: str = 'logsum', ranks: Iterable[int] = range(3, 5), dict_of_dict_initialization: dict | None = None, take_log_offsets: bool = False, add_const: bool = True)
Constructor for PlnPCAcollection.
- Parameters:
endog (Union[torch.Tensor, np.ndarray, pd.DataFrame]) – The endog.
exog (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The exog, by default None.
offsets (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The offsets, by default None.
offsets_formula (str, optional(keyword-only)) – The formula for offsets, by default “logsum”.
ranks (Iterable[int], optional(keyword-only)) – The range of ranks, by default range(3, 5).
dict_of_dict_initialization (dict, optional(keyword-only)) – The dictionary of initialization, by default None.
take_log_offsets (bool, optional(keyword-only)) – Whether to take the logarithm of offsets, by default False.
add_const (bool, optional(keyword-only)) – Whether to add a column of one in the exog. Defaults to True.
batch_size (int, optional(keyword-only)) – The batch size when optimizing the elbo. If None, batch gradient descent will be performed (i.e. batch_size = n_samples).
- Return type:
See also
- property batch_size: Tensor
Property representing the batch_size.
- Returns:
The batch_size.
- Return type:
torch.Tensor
- property best_AIC_model_rank: int
Property representing the rank of the best model according to the AIC criterion.
- Returns:
The rank of the best model.
- Return type:
int
- property best_BIC_model_rank: int
Property representing the rank of the best model according to the BIC criterion.
- Returns:
The rank of the best model.
- Return type:
int
- best_model(criterion: str = 'AIC') Any
Get the best model according to the specified criterion.
- Parameters:
criterion (str, optional) – The criterion to use (‘AIC’ or ‘BIC’), by default ‘AIC’.
- Returns:
The best model.
- Return type:
Any
- property coef: Dict[int, Tensor]
Property representing the coefficients.
- Returns:
The coefficients.
- Return type:
Dict[int, torch.Tensor]
- property components: Dict[int, Tensor]
Property representing the components.
- Returns:
The components.
- Return type:
Dict[int, torch.Tensor]
- property dim: int
Property representing the dimension.
- Returns:
The dimension.
- Return type:
int
- property endog: Tensor
Property representing the endog.
- Returns:
The endog.
- Return type:
torch.Tensor
- property exog: Tensor
Property representing the exog.
- Returns:
The exog.
- Return type:
torch.Tensor
- fit(nb_max_iteration: int = 50000, *, lr: float = 0.01, tol: float = 0.001, do_smart_init: bool = True, verbose: bool = False, batch_size: int | None = None)
Fit each model in the PlnPCAcollection.
- Parameters:
nb_max_iteration (int, optional) – The maximum number of iterations, by default 50000.
lr (float, optional(keyword-only)) – The learning rate, by default 0.01.
tol (float, optional(keyword-only)) – The tolerance, by default 1e-8.
do_smart_init (bool, optional(keyword-only)) – Whether to do smart initialization, by default True.
verbose (bool, optional(keyword-only)) – Whether to print verbose output, by default False.
batch_size (int, optional(keyword-only)) – The batch size when optimizing the elbo. If None, batch gradient descent will be performed (i.e. batch_size = n_samples).
- Raises:
ValueError – If the batch_size is greater than the number of samples, or not int.
- classmethod from_formula(formula: str, data: Dict[str, Tensor | ndarray | DataFrame], *, offsets_formula: str = 'logsum', ranks: Iterable[int] = range(3, 5), dict_of_dict_initialization: dict | None = None, take_log_offsets: bool = False) PlnPCAcollection
Create an instance of PlnPCAcollection from a formula.
- Parameters:
formula (str) – The formula.
data (dict) – The data dictionary. Each value can be either a torch.Tensor, np.ndarray or pd.DataFrame
offsets_formula (str, optional(keyword-only)) – The formula for offsets, by default “logsum”. Overriden if data[“offsets”] is not None.
ranks (Iterable[int], optional(keyword-only)) – The range of ranks, by default range(3, 5).
dict_of_dict_initialization (dict, optional(keyword-only)) – The dictionary of initialization, by default None.
take_log_offsets (bool, optional(keyword-only)) – Whether to take the logarithm of offsets, by default False.
- Returns:
The created PlnPCAcollection instance.
- Return type:
Examples
>>> from pyPLNmodels import PlnPCAcollection, get_real_count_data >>> endog = get_real_count_data() >>> data = {"endog": endog} >>> pca_col = PlnPCAcollection.from_formula("endog ~ 1", data = data, ranks = [5,6])
See also
- get(key: Any, default: Any) Any
Get the model with the specified key, or return a default value if the key does not exist.
- Parameters:
key (Any) – The key to search for.
default (Any) – The default value to return if the key does not exist.
- Returns:
The model with the specified key, or the default value if the key does not exist.
- Return type:
Any
- items()
Get the key-value pairs of the models in the collection.
- Returns:
The key-value pairs of the models.
- Return type:
ItemsView
- keys()
Get the ranks of the models in the collection.
- Returns:
The ranks of the models.
- Return type:
KeysView
- property latent_mean: Dict[int, Tensor]
Property representing the latent means.
- Returns:
The latent means.
- Return type:
Dict[int, torch.Tensor]
- property latent_sqrt_var: Dict[int, Tensor]
Property representing the latent variances.
- Returns:
The latent variances.
- Return type:
Dict[int, torch.Tensor]
- property loglikes: Dict[int, Any]
Property representing the log-likelihoods of the models in the collection.
- Returns:
The log-likelihoods of the models.
- Return type:
Dict[int, Any]
- property n_samples: int
Property representing the number of samples.
- Returns:
The number of samples.
- Return type:
int
- property nb_cov: int
Property representing the number of exog.
- Returns:
The number of exog.
- Return type:
int
- property offsets: Tensor
Property representing the offsets.
- Returns:
The offsets.
- Return type:
torch.Tensor
- property ranks: List[int]
Property representing the ranks.
- Returns:
The ranks.
- Return type:
List[int]
- save(path_of_directory: str = './', ranks: List[int] | None = None)
Save the models in the specified directory.
- Parameters:
path_of_directory (str, optional) – The path of the directory to save the models, by default “./”.
ranks (Optional[List[int]], optional) – The ranks of the models to save, by default None.
- show()
Show a plot with BIC scores, AIC scores, and negative log-likelihoods of the models.
- values()
Get the models in the collection.
- Returns:
The models in the collection.
- Return type:
ValuesView