ZIPln
- class pyPLNmodels.ZIPln(endog: Tensor | ndarray | DataFrame | None, *, exog: Tensor | ndarray | DataFrame | None = None, offsets: Tensor | ndarray | DataFrame | None = None, offsets_formula: str = 'logsum', dict_initialization: Dict[str, Tensor] | None = None, take_log_offsets: bool = False, add_const: bool = True, use_closed_form_prob: bool = False)
Bases:
_model
- property AIC
Property representing the Akaike Information Criterion (AIC).
- Returns:
The AIC value.
- Return type:
float
- property BIC
Property representing the Bayesian Information Criterion (BIC).
- Returns:
The BIC value.
- Return type:
float
- __init__(endog: Tensor | ndarray | DataFrame | None, *, exog: Tensor | ndarray | DataFrame | None = None, offsets: Tensor | ndarray | DataFrame | None = None, offsets_formula: str = 'logsum', dict_initialization: Dict[str, Tensor] | None = None, take_log_offsets: bool = False, add_const: bool = True, use_closed_form_prob: bool = False)
Initializes the ZIPln class.
- Parameters:
endog (Union[torch.Tensor, np.ndarray, pd.DataFrame]) – The count data.
exog (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The covariate data. Defaults to None.
offsets (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The offsets data. Defaults to None.
offsets_formula (str, optional(keyword-only)) – The formula for offsets. Defaults to “logsum”. Overriden if offsets is not None.
dict_initialization (dict, optional(keyword-only)) – The initialization dictionary. Defaults to None.
take_log_offsets (bool, optional(keyword-only)) – Whether to take the log of offsets. Defaults to False.
add_const (bool, optional(keyword-only)) – Whether to add a column of one in the exog. Defaults to True. If exog is None, add_const is set to True anyway and a warnings is launched.
use_closed_form_prob (bool, optional) – Whether or not use the closed formula for the latent probability. Default is False.
- Raises:
ValueError – If the batch_size is greater than the number of samples, or not int.
- Return type:
A ZIPln object
See also
Examples
>>> from pyPLNmodels import ZIPln, get_real_count_data >>> endog= get_real_count_data() >>> zi = ZIPln(endog, add_const = True) >>> zi.fit() >>> print(zi)
- property batch_size: int
The batch size of the model. Should not be greater than the number of samples.
- property closed_formula_latent_prob
The closed form for the latent probability.
- property coef
Property representing the coefficients.
- Returns:
The coefficients or None.
- Return type:
torch.Tensor or None
- property coef_inflation
Property representing the coefficients of the inflation.
- Returns:
The coefficients or None.
- Return type:
torch.Tensor or None
- property components: Tensor
Property representing the components.
- Returns:
The components.
- Return type:
torch.Tensor
- compute_elbo()
Compute the Evidence Lower BOund (ELBO) that will be maximized by pytorch.
- Returns:
The computed ELBO.
- Return type:
torch.Tensor
- property covariance: Tensor
Property representing the covariance of the latent variables.
- Returns:
The covariance tensor or None if components are not present.
- Return type:
Optional[torch.Tensor]
- property dict_data
Property representing the data dictionary.
- Returns:
The dictionary of data.
- Return type:
dict
- property dim: int
The second dimension of the endog.
- Returns:
The second dimension of the endog.
- Return type:
int
- display_covariance(ax=None, savefig=False, name_file='')
Display the covariance matrix.
- Parameters:
ax (matplotlib.axes.Axes, optional) – The axes to plot on. If None, a new figure will be created. Defaults to None.
savefig (bool, optional) – Whether to save the figure. Defaults to False.
name_file (str, optional) – The name of the file to save. Defaults to “”.
- property endog
Property representing the endog.
- Returns:
The endog or None.
- Return type:
torch.Tensor or None
- property exog
Property representing the exog.
- Returns:
The exog or None.
- Return type:
torch.Tensor or None
- fit(nb_max_iteration: int = 50000, *, lr: float = 0.01, tol: float = 0.001, do_smart_init: bool = True, verbose: bool = False, batch_size: int | None = None)
Fit the model. The lower tol, the more accurate the model.
- Parameters:
nb_max_iteration (int, optional) – The maximum number of iterations. Defaults to 50000.
lr (float, optional(keyword-only)) – The learning rate. Defaults to 0.01.
tol (float, optional(keyword-only)) – The tolerance for convergence. Defaults to 1e-8.
do_smart_init (bool, optional(keyword-only)) – Whether to perform smart initialization. Defaults to True.
verbose (bool, optional(keyword-only)) – Whether to print training progress. Defaults to False.
batch_size (int, optional(keyword-only)) – The batch size when optimizing the elbo. If None, batch gradient descent will be performed (i.e. batch_size = n_samples).
- Raises:
ValueError – If the batch_size is greater than the number of samples, or not int.
Examples
>>> from pyPLNmodels import ZIPln, get_real_count_data >>> endog = get_real_count_data() >>> zi = ZIPln(endog,add_const = True) >>> zi.fit() >>> print(zi)
- property fitted: bool
Whether the model is fitted.
- Returns:
True if the model is fitted, False otherwise.
- Return type:
bool
- classmethod from_formula(formula: str, data: Dict[str, Tensor | ndarray | DataFrame], *, offsets_formula: str = 'logsum', dict_initialization: Dict[str, Tensor] | None = None, take_log_offsets: bool = False, use_closed_form_prob: bool = False)
Create a ZIPln instance from a formula and data.
- Parameters:
formula (str) – The formula.
data (dict) – The data dictionary. Each value can be either a torch.Tensor, a np.ndarray or pd.DataFrame
offsets_formula (str, optional(keyword-only)) – The formula for offsets. Defaults to “logsum”.
dict_initialization (dict, optional(keyword-only)) – The initialization dictionary. Defaults to None.
take_log_offsets (bool, optional(keyword-only)) – Whether to take the log of offsets. Defaults to False.
use_closed_form_prob (bool, optional) – Whether or not use the closed formula for the latent probability. Default is False.
- Return type:
A ZIPln object
Examples
>>> from pyPLNmodels import ZIPln, get_real_count_data >>> endog = get_real_count_data() >>> data = {"endog": endog} >>> zi = ZIPln.from_formula("endog ~ 1", data = data)
- grad_C()
- grad_M()
- grad_S()
- grad_rho()
- grad_theta()
- grad_theta_0()
- gradients_closed_form_thetas(derivative)
- property latent_mean
Property representing the latent mean.
- Returns:
The latent mean or None if it has not yet been initialized.
- Return type:
torch.Tensor or None
- property latent_parameters
Property representing the latent parameters.
- Returns:
The dictionary of latent parameters.
- Return type:
dict
- property latent_prob
- property latent_sqrt_var
Property representing the latent variance.
- Returns:
The latent variance or None.
- Return type:
torch.Tensor or None
- property latent_variables: (<class 'torch.Tensor'>, <class 'torch.Tensor'>)
Property representing the latent variables. Two latent variables are available if exog is not None
- Returns:
The latent variables of a classic Pln model (size (n_samples, dim)) and zero inflated latent variables of size (n_samples, dim).
- Return type:
tuple(torch.Tensor, torch.Tensor)
Examples
>>> from pyPLNmodels import ZIPln, get_real_count_data >>> endog, labels = get_real_count_data(return_labels = True) >>> zi = ZIPln(endog,add_const = True) >>> zi.fit() >>> latent_mean, latent_inflated = zi.latent_variables >>> print(latent_mean.shape) >>> print(latent_inflated.shape)
- property latent_variance: Tensor
Property representing the latent variance.
- Returns:
The latent variance tensor.
- Return type:
torch.Tensor
- property loglike
Property representing the log-likelihood.
- Returns:
The log-likelihood.
- Return type:
float
- property model_parameters: Dict[str, Tensor]
Property representing the model parameters.
- Returns:
The dictionary of model parameters.
- Return type:
dict
- property n_samples: int
The number of samples, i.e. the first dimension of the endog.
- Returns:
The number of samples.
- Return type:
int
- property nb_batches
- property nb_cov: int
The number of exog.
- Returns:
The number of exog.
- Return type:
int
- property nb_iteration_done: int
The number of iterations done.
- Returns:
The number of iterations done.
- Return type:
int
- property number_of_parameters
Number of parameters of the model.
- property offsets
Property representing the offsets.
- Returns:
The offsets or None.
- Return type:
torch.Tensor or None
- property optim_parameters
Property representing the optimization parameters.
- Returns:
The dictionary of optimization parameters.
- Return type:
dict
- pca_projected_latent_variables(n_components: int | None = None)
Perform PCA on the latent variables and project them onto a lower-dimensional space.
- Parameters:
n_components (int, optional) – The number of components to keep. If None, all components are kept. Defaults to None.
- Returns:
The projected latent variables.
- Return type:
numpy.ndarray
- Raises:
ValueError – If the number of components asked is greater than the number of dimensions.
- plot_expected_vs_true(ax=None, colors=None)
Plot the predicted value of the endog against the endog.
- Parameters:
ax (Optional[matplotlib.axes.Axes], optional) – The matplotlib axis to use. If None, the current axis is used, by default None.
colors (Optional[Any], optional) – The colors to use for plotting, by default None.
- Returns:
matplotlib.axes.Axes – The matplotlib axis.
>>>
Examples
>>> import matplotlib.pyplot as plt >>> from pyPLNmodels import ZIPln, get_real_count_data >>> endog, labels = get_real_count_data(return_labels = True) >>> zi = ZIPln(endog,add_const = True) >>> zi.fit() >>> zi.plot_expected_vs_true() >>> plt.show() >>> zi.plot_expected_vs_true(colors = labels) >>> plt.show()
- plot_pca_correlation_graph(variables_names, indices_of_variables=None)
Visualizes variables using PCA and plots a correlation graph.
- Parameters:
variables_names (List[str]) – A list of variable names to visualize.
indices_of_variables (Optional[List[int]], optional) – A list of indices corresponding to the variables. If None, indices are determined based on column_endog, by default None
- Raises:
ValueError – If indices_of_variables is None and column_endog is not set.
ValueError – If the length of indices_of_variables is different from the length of variables_names.
- Return type:
None
- predict(exog: Tensor | ndarray | DataFrame | None = None)
Method for making predictions.
- Parameters:
exog (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional) – The exog, by default None.
- Returns:
The predicted values or None.
- Return type:
torch.Tensor or None
- Raises:
AttributeError – If there are no exog in the model but some are provided.
RuntimeError – If the shape of the exog is incorrect.
Notes
- If exog is not provided and there are no exog in the model, None is returned.
If there are exog in the model, then the mean exog @ coef is returned.
If exog is provided, it should have the shape (_, nb_cov), where nb_cov is the number of exog.
The predicted values are obtained by multiplying the exog by the coefficients.
- predict_prob_inflation(exog: Tensor | ndarray | DataFrame)
Method for estimating the probability of a zero coming from the zero inflated component.
- Parameters:
exog (Union[torch.Tensor, np.ndarray, pd.DataFrame]) – The exog.
- Returns:
The predicted values.
- Return type:
torch.Tensor
- Raises:
RuntimeError – If the shape of the exog is incorrect.
Notes
The mean sigmoid(exog @ coef_inflation) is returned.
exog should have the shape (_, nb_cov), where nb_cov is the number of exog variables.
- qq_plots()
- save(path: str | None = None)
Save the model parameters to disk.
- Parameters:
path (str, optional) – The path of the directory to save the parameters, by default “./”.
- scatter_pca_matrix(n_components=None, color=None)
Generates a scatter matrix plot based on Principal Component Analysis (PCA).
- Parameters:
(int (n_components) – If not specified, the maximum number of components will be used. Defaults to None.
optional) (The number of components to consider for plotting.) – If not specified, the maximum number of components will be used. Defaults to None.
(str (color) – sample in the endog property of the object. Defaults to None.
np.ndarray) (An array with one label for each) – sample in the endog property of the object. Defaults to None.
- Raises:
ValueError – If the number of components requested is greater than the number of variables in the dataset.:
- show(axes=None)
Show 3 plots. The first one is the covariance of the model. The second one is the stopping criterion with the runtime in abscisse. The third one is the elbo.
- Parameters:
axes (numpy.ndarray, optional) – The axes to plot on. If None, a new figure will be created. Defaults to None.
- sigma()
Method returning the covariance matrix.
- Returns:
The covariance matrix or None.
- Return type:
torch.Tensor or None
- sk_PCA(n_components=None)
Perform the scikit-learn PCA on the latent variables.
- Parameters:
n_components (int, optional) – The number of components to keep. If None, all components are kept. Defaults to None.
- Returns:
sklearn.decomposition.PCA object with all the features from sklearn.
- Return type:
sklearn.decomposition.PCA
- Raises:
ValueError – If the number of components asked is greater than the number of dimensions.
- transform(return_latent_prob=False)
Method for transforming the endog. Can be seen as a normalization of the endog.
- Parameters:
return_latent_prob (bool, optional) – Wheter to return or not the latent_probability of zero inflation.
- Return type:
The latent mean if return_latent_prob is False and (latent_mean, latent_prob) else.
- viz(*, ax=None, colors=None, show_cov: bool = False)
Visualize the latent variables with a classic PCA.
- Parameters:
ax (Optional[matplotlib.axes.Axes], optional(keyword-only)) – The matplotlib axis to use. If None, the current axis is used, by default None.
colors (Optional[np.ndarray], optional(keyword-only)) – The colors to use for plotting, by default None.
show_cov (bool, Optional(keyword-only)) – If True, will display ellipses with right covariances. Default is False.
- Raises:
RuntimeError – If the rank is less than 2.
- Returns:
The matplotlib axis.
- Return type:
Any