ZIPln

Bases: _model

property AIC

Property representing the Akaike Information Criterion (AIC).

Returns:: The AIC value.
Return type:: float

property BIC

Property representing the Bayesian Information Criterion (BIC).

Returns:: The BIC value.
Return type:: float

Initializes the ZIPln class.

Parameters:

endog (Union[torch.Tensor, np.ndarray, pd.DataFrame]) – The count data.
exog (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The covariate data. Defaults to None.
offsets (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The offsets data. Defaults to None.
offsets_formula (str, optional(keyword-only)) – The formula for offsets. Defaults to “logsum”. Overriden if offsets is not None.
dict_initialization (dict, optional(keyword-only)) – The initialization dictionary. Defaults to None.
take_log_offsets (bool, optional(keyword-only)) – Whether to take the log of offsets. Defaults to False.
add_const (bool, optional(keyword-only)) – Whether to add a column of one in the exog. Defaults to True. If exog is None, add_const is set to True anyway and a warnings is launched.
use_closed_form_prob (bool, optional) – Whether or not use the closed formula for the latent probability. Default is False.

Raises:

ValueError – If the batch_size is greater than the number of samples, or not int.

Return type:

A ZIPln object

See also

pyPLNmodels.ZIPln.from_formula()

Examples

>>> from pyPLNmodels import ZIPln, get_real_count_data
>>> endog= get_real_count_data()
>>> zi = ZIPln(endog, add_const = True)
>>> zi.fit()
>>> print(zi)

property batch_size: int: The batch size of the model. Should not be greater than the number of samples.

property closed_formula_latent_prob: The closed form for the latent probability.

property coef

Property representing the coefficients.

Returns:: The coefficients or None.
Return type:: torch.Tensor or None

property coef_inflation

Property representing the coefficients of the inflation.

Returns:: The coefficients or None.
Return type:: torch.Tensor or None

property components: Tensor

Property representing the components.

Returns:: The components.
Return type:: torch.Tensor

compute_elbo()

Compute the Evidence Lower BOund (ELBO) that will be maximized by pytorch.

Returns:: The computed ELBO.
Return type:: torch.Tensor

property covariance: Tensor

Property representing the covariance of the latent variables.

Returns:: The covariance tensor or None if components are not present.
Return type:: Optional[torch.Tensor]

property dict_data

Property representing the data dictionary.

Returns:: The dictionary of data.
Return type:: dict

property dim: int

The second dimension of the endog.

Returns:: The second dimension of the endog.
Return type:: int

display_covariance(ax=None, savefig=False, name_file='')

Display the covariance matrix.

Parameters:

ax (matplotlib.axes.Axes, optional) – The axes to plot on. If None, a new figure will be created. Defaults to None.
savefig (bool, optional) – Whether to save the figure. Defaults to False.
name_file (str, optional) – The name of the file to save. Defaults to “”.

property endog

Property representing the endog.

Returns:: The endog or None.
Return type:: torch.Tensor or None

property exog

Property representing the exog.

Returns:: The exog or None.
Return type:: torch.Tensor or None

fit(nb_max_iteration: int = 50000, *, lr: float = 0.01, tol: float = 0.001, do_smart_init: bool = True, verbose: bool = False, batch_size: int | None = None)

Fit the model. The lower tol, the more accurate the model.

Parameters:

nb_max_iteration (int, optional) – The maximum number of iterations. Defaults to 50000.
lr (float, optional(keyword-only)) – The learning rate. Defaults to 0.01.
tol (float, optional(keyword-only)) – The tolerance for convergence. Defaults to 1e-8.
do_smart_init (bool, optional(keyword-only)) – Whether to perform smart initialization. Defaults to True.
verbose (bool, optional(keyword-only)) – Whether to print training progress. Defaults to False.
batch_size (int, optional(keyword-only)) – The batch size when optimizing the elbo. If None, batch gradient descent will be performed (i.e. batch_size = n_samples).

Raises:

ValueError – If the batch_size is greater than the number of samples, or not int.

Examples

>>> from pyPLNmodels import ZIPln, get_real_count_data
>>> endog = get_real_count_data()
>>> zi = ZIPln(endog,add_const = True)
>>> zi.fit()
>>> print(zi)

property fitted: bool

Whether the model is fitted.

Returns:: True if the model is fitted, False otherwise.
Return type:: bool

classmethod from_formula(formula: str, data: Dict[str, Tensor | ndarray | DataFrame], *, offsets_formula: str = 'logsum', dict_initialization: Dict[str, Tensor] | None = None, take_log_offsets: bool = False, use_closed_form_prob: bool = False)

Create a ZIPln instance from a formula and data.

Parameters:

formula (str) – The formula.
data (dict) – The data dictionary. Each value can be either a torch.Tensor, a np.ndarray or pd.DataFrame
offsets_formula (str, optional(keyword-only)) – The formula for offsets. Defaults to “logsum”.
dict_initialization (dict, optional(keyword-only)) – The initialization dictionary. Defaults to None.
take_log_offsets (bool, optional(keyword-only)) – Whether to take the log of offsets. Defaults to False.
use_closed_form_prob (bool, optional) – Whether or not use the closed formula for the latent probability. Default is False.

Return type:

A ZIPln object

Examples

>>> from pyPLNmodels import ZIPln, get_real_count_data
>>> endog = get_real_count_data()
>>> data = {"endog": endog}
>>> zi = ZIPln.from_formula("endog ~ 1", data = data)

grad_C()

grad_M()

grad_S()

grad_rho()

grad_theta()

grad_theta_0()

gradients_closed_form_thetas(derivative)

property latent_mean

Property representing the latent mean.

Returns:: The latent mean or None if it has not yet been initialized.
Return type:: torch.Tensor or None

property latent_parameters

Property representing the latent parameters.

Returns:: The dictionary of latent parameters.
Return type:: dict

property latent_prob

property latent_sqrt_var

Property representing the latent variance.

Returns:: The latent variance or None.
Return type:: torch.Tensor or None

property latent_variables: (<class 'torch.Tensor'>, <class 'torch.Tensor'>)

Property representing the latent variables. Two latent variables are available if exog is not None

Returns:: The latent variables of a classic Pln model (size (n_samples, dim)) and zero inflated latent variables of size (n_samples, dim).
Return type:: tuple(torch.Tensor, torch.Tensor)

Examples

>>> from pyPLNmodels import ZIPln, get_real_count_data
>>> endog, labels = get_real_count_data(return_labels = True)
>>> zi = ZIPln(endog,add_const = True)
>>> zi.fit()
>>> latent_mean, latent_inflated = zi.latent_variables
>>> print(latent_mean.shape)
>>> print(latent_inflated.shape)

property latent_variance: Tensor

Property representing the latent variance.

Returns:: The latent variance tensor.
Return type:: torch.Tensor

property loglike

Property representing the log-likelihood.

Returns:: The log-likelihood.
Return type:: float

property model_parameters: Dict[str, Tensor]

Property representing the model parameters.

Returns:: The dictionary of model parameters.
Return type:: dict

property n_samples: int

The number of samples, i.e. the first dimension of the endog.

Returns:: The number of samples.
Return type:: int

property nb_batches

property nb_cov: int

The number of exog.

Returns:: The number of exog.
Return type:: int

property nb_iteration_done: int

The number of iterations done.

Returns:: The number of iterations done.
Return type:: int

property number_of_parameters: Number of parameters of the model.

property offsets

Property representing the offsets.

Returns:: The offsets or None.
Return type:: torch.Tensor or None

property optim_parameters

Property representing the optimization parameters.

Returns:: The dictionary of optimization parameters.
Return type:: dict

pca_projected_latent_variables(n_components: int | None = None)

Perform PCA on the latent variables and project them onto a lower-dimensional space.

Parameters:: n_components (int, optional) – The number of components to keep. If None, all components are kept. Defaults to None.
Returns:: The projected latent variables.
Return type:: numpy.ndarray
Raises:: ValueError – If the number of components asked is greater than the number of dimensions.

plot_expected_vs_true(ax=None, colors=None)

Plot the predicted value of the endog against the endog.

Parameters:

ax (Optional[matplotlib.axes.Axes], optional) – The matplotlib axis to use. If None, the current axis is used, by default None.
colors (Optional[Any], optional) – The colors to use for plotting, by default None.

Returns:

matplotlib.axes.Axes – The matplotlib axis.
>>>

Examples

>>> import matplotlib.pyplot as plt
>>> from pyPLNmodels import ZIPln, get_real_count_data
>>> endog, labels = get_real_count_data(return_labels = True)
>>> zi = ZIPln(endog,add_const = True)
>>> zi.fit()
>>> zi.plot_expected_vs_true()
>>> plt.show()
>>> zi.plot_expected_vs_true(colors = labels)
>>> plt.show()

plot_pca_correlation_graph(variables_names, indices_of_variables=None)

Visualizes variables using PCA and plots a correlation graph.

Parameters:

variables_names (List[str]) – A list of variable names to visualize.
indices_of_variables (Optional[List[int]], optional) – A list of indices corresponding to the variables. If None, indices are determined based on column_endog, by default None

Raises:

ValueError – If indices_of_variables is None and column_endog is not set.
ValueError – If the length of indices_of_variables is different from the length of variables_names.

Return type:

None

predict(exog: Tensor | ndarray | DataFrame | None = None)

Method for making predictions.

Parameters:

exog (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional) – The exog, by default None.

Returns:

The predicted values or None.

Return type:

torch.Tensor or None

Raises:

AttributeError – If there are no exog in the model but some are provided.
RuntimeError – If the shape of the exog is incorrect.

Notes

If exog is not provided and there are no exog in the model, None is returned.
If there are exog in the model, then the mean exog @ coef is returned.
If exog is provided, it should have the shape (_, nb_cov), where nb_cov is the number of exog.
The predicted values are obtained by multiplying the exog by the coefficients.

predict_prob_inflation(exog: Tensor | ndarray | DataFrame)

Method for estimating the probability of a zero coming from the zero inflated component.

Parameters:: exog (Union[torch.Tensor, np.ndarray, pd.DataFrame]) – The exog.
Returns:: The predicted values.
Return type:: torch.Tensor
Raises:: RuntimeError – If the shape of the exog is incorrect.

Notes

The mean sigmoid(exog @ coef_inflation) is returned.
exog should have the shape (_, nb_cov), where nb_cov is the number of exog variables.

qq_plots()

save(path: str | None = None)

Save the model parameters to disk.

Parameters:: path (str, optional) – The path of the directory to save the parameters, by default “./”.

scatter_pca_matrix(n_components=None, color=None)

Generates a scatter matrix plot based on Principal Component Analysis (PCA).

Parameters:

(int (n_components) – If not specified, the maximum number of components will be used. Defaults to None.
optional) (The number of components to consider for plotting.) – If not specified, the maximum number of components will be used. Defaults to None.
(str (color) – sample in the endog property of the object. Defaults to None.
np.ndarray) (An array with one label for each) – sample in the endog property of the object. Defaults to None.

Raises:

ValueError – If the number of components requested is greater than the number of variables in the dataset.:

show(axes=None)

Show 3 plots. The first one is the covariance of the model. The second one is the stopping criterion with the runtime in abscisse. The third one is the elbo.

Parameters:: axes (numpy.ndarray, optional) – The axes to plot on. If None, a new figure will be created. Defaults to None.

sigma()

Method returning the covariance matrix.

Returns:: The covariance matrix or None.
Return type:: torch.Tensor or None

sk_PCA(n_components=None)

Perform the scikit-learn PCA on the latent variables.

Parameters:: n_components (int, optional) – The number of components to keep. If None, all components are kept. Defaults to None.
Returns:: sklearn.decomposition.PCA object with all the features from sklearn.
Return type:: sklearn.decomposition.PCA
Raises:: ValueError – If the number of components asked is greater than the number of dimensions.

transform(return_latent_prob=False)

Method for transforming the endog. Can be seen as a normalization of the endog.

Parameters:: return_latent_prob (bool, optional) – Wheter to return or not the latent_probability of zero inflation.
Return type:: The latent mean if return_latent_prob is False and (latent_mean, latent_prob) else.

viz(*, ax=None, colors=None, show_cov: bool = False)

Visualize the latent variables with a classic PCA.

Parameters:

ax (Optional[matplotlib.axes.Axes], optional(keyword-only)) – The matplotlib axis to use. If None, the current axis is used, by default None.
colors (Optional[np.ndarray], optional(keyword-only)) – The colors to use for plotting, by default None.
show_cov (bool, Optional(keyword-only)) – If True, will display ellipses with right covariances. Default is False.

Raises:

RuntimeError – If the rank is less than 2.

Returns:

The matplotlib axis.

Return type:

Any