ZIPln

class pyPLNmodels.ZIPln(endog: Tensor | ndarray | DataFrame | None, *, exog: Tensor | ndarray | DataFrame | None = None, offsets: Tensor | ndarray | DataFrame | None = None, offsets_formula: str = 'logsum', dict_initialization: Dict[str, Tensor] | None = None, take_log_offsets: bool = False, add_const: bool = True, use_closed_form_prob: bool = False)

Bases: _model

property AIC

Property representing the Akaike Information Criterion (AIC).

Returns:

The AIC value.

Return type:

float

property BIC

Property representing the Bayesian Information Criterion (BIC).

Returns:

The BIC value.

Return type:

float

__init__(endog: Tensor | ndarray | DataFrame | None, *, exog: Tensor | ndarray | DataFrame | None = None, offsets: Tensor | ndarray | DataFrame | None = None, offsets_formula: str = 'logsum', dict_initialization: Dict[str, Tensor] | None = None, take_log_offsets: bool = False, add_const: bool = True, use_closed_form_prob: bool = False)

Initializes the ZIPln class.

Parameters:
  • endog (Union[torch.Tensor, np.ndarray, pd.DataFrame]) – The count data.

  • exog (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The covariate data. Defaults to None.

  • offsets (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The offsets data. Defaults to None.

  • offsets_formula (str, optional(keyword-only)) – The formula for offsets. Defaults to “logsum”. Overriden if offsets is not None.

  • dict_initialization (dict, optional(keyword-only)) – The initialization dictionary. Defaults to None.

  • take_log_offsets (bool, optional(keyword-only)) – Whether to take the log of offsets. Defaults to False.

  • add_const (bool, optional(keyword-only)) – Whether to add a column of one in the exog. Defaults to True. If exog is None, add_const is set to True anyway and a warnings is launched.

  • use_closed_form_prob (bool, optional) – Whether or not use the closed formula for the latent probability. Default is False.

Raises:

ValueError – If the batch_size is greater than the number of samples, or not int.

Return type:

A ZIPln object

Examples

>>> from pyPLNmodels import ZIPln, get_real_count_data
>>> endog= get_real_count_data()
>>> zi = ZIPln(endog, add_const = True)
>>> zi.fit()
>>> print(zi)
property batch_size: int

The batch size of the model. Should not be greater than the number of samples.

property closed_formula_latent_prob

The closed form for the latent probability.

property coef

Property representing the coefficients.

Returns:

The coefficients or None.

Return type:

torch.Tensor or None

property coef_inflation

Property representing the coefficients of the inflation.

Returns:

The coefficients or None.

Return type:

torch.Tensor or None

property components: Tensor

Property representing the components.

Returns:

The components.

Return type:

torch.Tensor

compute_elbo()

Compute the Evidence Lower BOund (ELBO) that will be maximized by pytorch.

Returns:

The computed ELBO.

Return type:

torch.Tensor

property covariance: Tensor

Property representing the covariance of the latent variables.

Returns:

The covariance tensor or None if components are not present.

Return type:

Optional[torch.Tensor]

property dict_data

Property representing the data dictionary.

Returns:

The dictionary of data.

Return type:

dict

property dim: int

The second dimension of the endog.

Returns:

The second dimension of the endog.

Return type:

int

display_covariance(ax=None, savefig=False, name_file='')

Display the covariance matrix.

Parameters:
  • ax (matplotlib.axes.Axes, optional) – The axes to plot on. If None, a new figure will be created. Defaults to None.

  • savefig (bool, optional) – Whether to save the figure. Defaults to False.

  • name_file (str, optional) – The name of the file to save. Defaults to “”.

property endog

Property representing the endog.

Returns:

The endog or None.

Return type:

torch.Tensor or None

property exog

Property representing the exog.

Returns:

The exog or None.

Return type:

torch.Tensor or None

fit(nb_max_iteration: int = 50000, *, lr: float = 0.01, tol: float = 0.001, do_smart_init: bool = True, verbose: bool = False, batch_size: int | None = None)

Fit the model. The lower tol, the more accurate the model.

Parameters:
  • nb_max_iteration (int, optional) – The maximum number of iterations. Defaults to 50000.

  • lr (float, optional(keyword-only)) – The learning rate. Defaults to 0.01.

  • tol (float, optional(keyword-only)) – The tolerance for convergence. Defaults to 1e-8.

  • do_smart_init (bool, optional(keyword-only)) – Whether to perform smart initialization. Defaults to True.

  • verbose (bool, optional(keyword-only)) – Whether to print training progress. Defaults to False.

  • batch_size (int, optional(keyword-only)) – The batch size when optimizing the elbo. If None, batch gradient descent will be performed (i.e. batch_size = n_samples).

Raises:

ValueError – If the batch_size is greater than the number of samples, or not int.

Examples

>>> from pyPLNmodels import ZIPln, get_real_count_data
>>> endog = get_real_count_data()
>>> zi = ZIPln(endog,add_const = True)
>>> zi.fit()
>>> print(zi)
property fitted: bool

Whether the model is fitted.

Returns:

True if the model is fitted, False otherwise.

Return type:

bool

classmethod from_formula(formula: str, data: Dict[str, Tensor | ndarray | DataFrame], *, offsets_formula: str = 'logsum', dict_initialization: Dict[str, Tensor] | None = None, take_log_offsets: bool = False, use_closed_form_prob: bool = False)

Create a ZIPln instance from a formula and data.

Parameters:
  • formula (str) – The formula.

  • data (dict) – The data dictionary. Each value can be either a torch.Tensor, a np.ndarray or pd.DataFrame

  • offsets_formula (str, optional(keyword-only)) – The formula for offsets. Defaults to “logsum”.

  • dict_initialization (dict, optional(keyword-only)) – The initialization dictionary. Defaults to None.

  • take_log_offsets (bool, optional(keyword-only)) – Whether to take the log of offsets. Defaults to False.

  • use_closed_form_prob (bool, optional) – Whether or not use the closed formula for the latent probability. Default is False.

Return type:

A ZIPln object

Examples

>>> from pyPLNmodels import ZIPln, get_real_count_data
>>> endog = get_real_count_data()
>>> data = {"endog": endog}
>>> zi = ZIPln.from_formula("endog ~ 1", data = data)
grad_C()
grad_M()
grad_S()
grad_rho()
grad_theta()
grad_theta_0()
gradients_closed_form_thetas(derivative)
property latent_mean

Property representing the latent mean.

Returns:

The latent mean or None if it has not yet been initialized.

Return type:

torch.Tensor or None

property latent_parameters

Property representing the latent parameters.

Returns:

The dictionary of latent parameters.

Return type:

dict

property latent_prob
property latent_sqrt_var

Property representing the latent variance.

Returns:

The latent variance or None.

Return type:

torch.Tensor or None

property latent_variables: (<class 'torch.Tensor'>, <class 'torch.Tensor'>)

Property representing the latent variables. Two latent variables are available if exog is not None

Returns:

The latent variables of a classic Pln model (size (n_samples, dim)) and zero inflated latent variables of size (n_samples, dim).

Return type:

tuple(torch.Tensor, torch.Tensor)

Examples

>>> from pyPLNmodels import ZIPln, get_real_count_data
>>> endog, labels = get_real_count_data(return_labels = True)
>>> zi = ZIPln(endog,add_const = True)
>>> zi.fit()
>>> latent_mean, latent_inflated = zi.latent_variables
>>> print(latent_mean.shape)
>>> print(latent_inflated.shape)
property latent_variance: Tensor

Property representing the latent variance.

Returns:

The latent variance tensor.

Return type:

torch.Tensor

property loglike

Property representing the log-likelihood.

Returns:

The log-likelihood.

Return type:

float

property model_parameters: Dict[str, Tensor]

Property representing the model parameters.

Returns:

The dictionary of model parameters.

Return type:

dict

property n_samples: int

The number of samples, i.e. the first dimension of the endog.

Returns:

The number of samples.

Return type:

int

property nb_batches
property nb_cov: int

The number of exog.

Returns:

The number of exog.

Return type:

int

property nb_iteration_done: int

The number of iterations done.

Returns:

The number of iterations done.

Return type:

int

property number_of_parameters

Number of parameters of the model.

property offsets

Property representing the offsets.

Returns:

The offsets or None.

Return type:

torch.Tensor or None

property optim_parameters

Property representing the optimization parameters.

Returns:

The dictionary of optimization parameters.

Return type:

dict

pca_projected_latent_variables(n_components: int | None = None)

Perform PCA on the latent variables and project them onto a lower-dimensional space.

Parameters:

n_components (int, optional) – The number of components to keep. If None, all components are kept. Defaults to None.

Returns:

The projected latent variables.

Return type:

numpy.ndarray

Raises:

ValueError – If the number of components asked is greater than the number of dimensions.

plot_expected_vs_true(ax=None, colors=None)

Plot the predicted value of the endog against the endog.

Parameters:
  • ax (Optional[matplotlib.axes.Axes], optional) – The matplotlib axis to use. If None, the current axis is used, by default None.

  • colors (Optional[Any], optional) – The colors to use for plotting, by default None.

Returns:

  • matplotlib.axes.Axes – The matplotlib axis.

  • >>>

Examples

>>> import matplotlib.pyplot as plt
>>> from pyPLNmodels import ZIPln, get_real_count_data
>>> endog, labels = get_real_count_data(return_labels = True)
>>> zi = ZIPln(endog,add_const = True)
>>> zi.fit()
>>> zi.plot_expected_vs_true()
>>> plt.show()
>>> zi.plot_expected_vs_true(colors = labels)
>>> plt.show()
plot_pca_correlation_graph(variables_names, indices_of_variables=None)

Visualizes variables using PCA and plots a correlation graph.

Parameters:
  • variables_names (List[str]) – A list of variable names to visualize.

  • indices_of_variables (Optional[List[int]], optional) – A list of indices corresponding to the variables. If None, indices are determined based on column_endog, by default None

Raises:
  • ValueError – If indices_of_variables is None and column_endog is not set.

  • ValueError – If the length of indices_of_variables is different from the length of variables_names.

Return type:

None

predict(exog: Tensor | ndarray | DataFrame | None = None)

Method for making predictions.

Parameters:

exog (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional) – The exog, by default None.

Returns:

The predicted values or None.

Return type:

torch.Tensor or None

Raises:
  • AttributeError – If there are no exog in the model but some are provided.

  • RuntimeError – If the shape of the exog is incorrect.

Notes

  • If exog is not provided and there are no exog in the model, None is returned.

    If there are exog in the model, then the mean exog @ coef is returned.

  • If exog is provided, it should have the shape (_, nb_cov), where nb_cov is the number of exog.

  • The predicted values are obtained by multiplying the exog by the coefficients.

predict_prob_inflation(exog: Tensor | ndarray | DataFrame)

Method for estimating the probability of a zero coming from the zero inflated component.

Parameters:

exog (Union[torch.Tensor, np.ndarray, pd.DataFrame]) – The exog.

Returns:

The predicted values.

Return type:

torch.Tensor

Raises:

RuntimeError – If the shape of the exog is incorrect.

Notes

  • The mean sigmoid(exog @ coef_inflation) is returned.

  • exog should have the shape (_, nb_cov), where nb_cov is the number of exog variables.

qq_plots()
save(path: str | None = None)

Save the model parameters to disk.

Parameters:

path (str, optional) – The path of the directory to save the parameters, by default “./”.

scatter_pca_matrix(n_components=None, color=None)

Generates a scatter matrix plot based on Principal Component Analysis (PCA).

Parameters:
  • (int (n_components) – If not specified, the maximum number of components will be used. Defaults to None.

  • optional) (The number of components to consider for plotting.) – If not specified, the maximum number of components will be used. Defaults to None.

  • (str (color) – sample in the endog property of the object. Defaults to None.

  • np.ndarray) (An array with one label for each) – sample in the endog property of the object. Defaults to None.

Raises:

ValueError – If the number of components requested is greater than the number of variables in the dataset.:

show(axes=None)

Show 3 plots. The first one is the covariance of the model. The second one is the stopping criterion with the runtime in abscisse. The third one is the elbo.

Parameters:

axes (numpy.ndarray, optional) – The axes to plot on. If None, a new figure will be created. Defaults to None.

sigma()

Method returning the covariance matrix.

Returns:

The covariance matrix or None.

Return type:

torch.Tensor or None

sk_PCA(n_components=None)

Perform the scikit-learn PCA on the latent variables.

Parameters:

n_components (int, optional) – The number of components to keep. If None, all components are kept. Defaults to None.

Returns:

sklearn.decomposition.PCA object with all the features from sklearn.

Return type:

sklearn.decomposition.PCA

Raises:

ValueError – If the number of components asked is greater than the number of dimensions.

transform(return_latent_prob=False)

Method for transforming the endog. Can be seen as a normalization of the endog.

Parameters:

return_latent_prob (bool, optional) – Wheter to return or not the latent_probability of zero inflation.

Return type:

The latent mean if return_latent_prob is False and (latent_mean, latent_prob) else.

viz(*, ax=None, colors=None, show_cov: bool = False)

Visualize the latent variables with a classic PCA.

Parameters:
  • ax (Optional[matplotlib.axes.Axes], optional(keyword-only)) – The matplotlib axis to use. If None, the current axis is used, by default None.

  • colors (Optional[np.ndarray], optional(keyword-only)) – The colors to use for plotting, by default None.

  • show_cov (bool, Optional(keyword-only)) – If True, will display ellipses with right covariances. Default is False.

Raises:

RuntimeError – If the rank is less than 2.

Returns:

The matplotlib axis.

Return type:

Any