PlnPCA
- class pyPLNmodels.PlnPCA(endog: Tensor | ndarray | DataFrame, *, exog: Tensor | ndarray | DataFrame | None = None, offsets: Tensor | ndarray | DataFrame | None = None, offsets_formula: str = 'logsum', rank: int = 5, dict_initialization: Dict[str, Tensor] | None = None, take_log_offsets: bool = False, add_const: bool = True)
Bases:
_model
PlnPCA object where the covariance has low rank.
Examples
>>> from pyPLNmodels import PlnPCA, get_real_count_data, get_simulation_parameters, sample_pln >>> endog, labels = get_real_count_data(return_labels = True) >>> data = {"endog": endog} >>> pca = PlnPCA.from_formula("endog ~ 1", data = data, rank = 5) >>> pca.fit() >>> print(pca) >>> pca.viz(colors = labels)
>>> plnparam = get_simulation_parameters(n_samples =100, dim = 60, nb_cov = 2, rank = 8) >>> endog = sample_pln(plnparam) >>> data = {"endog": endog, "cov": plnparam.exog, "offsets": plnparam.offsets} >>> plnpca = PlnPCA.from_formula("endog ~ 0 + cov", data = data, rank = 5) >>> plnpca.fit() >>> print(plnpca)
See also
- property AIC
Property representing the Akaike Information Criterion (AIC).
- Returns:
The AIC value.
- Return type:
float
- property BIC
Property representing the Bayesian Information Criterion (BIC).
- Returns:
The BIC value.
- Return type:
float
- __init__(endog: Tensor | ndarray | DataFrame, *, exog: Tensor | ndarray | DataFrame | None = None, offsets: Tensor | ndarray | DataFrame | None = None, offsets_formula: str = 'logsum', rank: int = 5, dict_initialization: Dict[str, Tensor] | None = None, take_log_offsets: bool = False, add_const: bool = True)
Initializes the model class.
- Parameters:
endog (Union[torch.Tensor, np.ndarray, pd.DataFrame]) – The count data.
exog (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The covariate data. Defaults to None.
offsets (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The offsets data. Defaults to None.
offsets_formula (str, optional(keyword-only)) – The formula for offsets. Defaults to “logsum”. Overriden if offsets is not None.
dict_initialization (dict, optional(keyword-only)) – The initialization dictionary. Defaults to None.
take_log_offsets (bool, optional(keyword-only)) – Whether to take the log of offsets. Defaults to False.
add_const (bool, optional(keyword-only)) – Whether to add a column of one in the exog. Defaults to True.
rank (int, optional(keyword-only)) – The rank of the approximation, by default 5.
- Return type:
See also
Examples
>>> from pyPLNmodels import PlnPCA, get_real_count_data >>> endog= get_real_count_data() >>> pca = PlnPCA(endog, add_const = True) >>> pca.fit() >>> print(pca)
- property batch_size: int
The batch size of the model. Should not be greater than the number of samples.
- property coef
Property representing the coefficients.
- Returns:
The coefficients or None.
- Return type:
torch.Tensor or None
- property components: Tensor
Property representing the components.
- Returns:
The components.
- Return type:
torch.Tensor
- compute_elbo() Tensor
Compute the Evidence Lower BOund (ELBO) that will be maximized by pytorch.
- Returns:
The computed ELBO.
- Return type:
torch.Tensor
Examples
>>> from pyPLNmodels import PlnPCA, get_real_count_data >>> endog = get_real_count_data(return_labels = False) >>> pca = PlnPCA(endog,add_const = True) >>> pca.fit() >>> elbo = pca.compute_elbo() >>> print("elbo", elbo) >>> print("loglike/n", pca.loglike/pca.n_samples)
- property covariance: Tensor
Property representing the covariance of the latent variables.
- Returns:
The covariance tensor or None if components are not present.
- Return type:
Optional[torch.Tensor]
- property covariance_a_posteriori: Tensor | None
Property representing the covariance a posteriori of the latent variables.
- Returns:
The covariance tensor or None if components are not present.
- Return type:
Optional[torch.Tensor]
- property dict_data
Property representing the data dictionary.
- Returns:
The dictionary of data.
- Return type:
dict
- property dim: int
The second dimension of the endog.
- Returns:
The second dimension of the endog.
- Return type:
int
- display_covariance(ax=None, savefig=False, name_file='')
Display the covariance matrix.
- Parameters:
ax (matplotlib.axes.Axes, optional) – The axes to plot on. If None, a new figure will be created. Defaults to None.
savefig (bool, optional) – Whether to save the figure. Defaults to False.
name_file (str, optional) – The name of the file to save. Defaults to “”.
- property endog
Property representing the endog.
- Returns:
The endog or None.
- Return type:
torch.Tensor or None
- property exog: Tensor
Property representing the exog.
- Returns:
The exog tensor.
- Return type:
torch.Tensor
- fit(nb_max_iteration: int = 50000, *, lr: float = 0.01, tol: float = 0.001, do_smart_init: bool = True, verbose: bool = False, batch_size=None)
Fit the model. The lower tol, the more accurate the model.
- Parameters:
nb_max_iteration (int, optional) – The maximum number of iterations. Defaults to 50000.
lr (float, optional(keyword-only)) – The learning rate. Defaults to 0.01.
tol (float, optional(keyword-only)) – The tolerance for convergence. Defaults to 1e-8.
do_smart_init (bool, optional(keyword-only)) – Whether to perform smart initialization. Defaults to True.
verbose (bool, optional(keyword-only)) – Whether to print training progress. Defaults to False.
batch_size (int, optional(keyword-only)) – The batch size when optimizing the elbo. If None, batch gradient descent will be performed (i.e. batch_size = n_samples).
- Raises:
ValueError – If the batch_size is greater than the number of samples, or not int.
Examples
>>> from pyPLNmodels import PlnPCA, get_real_count_data >>> endog = get_real_count_data() >>> plnpca = PlnPCA(endog,add_const = True, rank = 6) >>> plnpca.fit() >>> print(plnpca)
- property fitted: bool
Whether the model is fitted.
- Returns:
True if the model is fitted, False otherwise.
- Return type:
bool
- classmethod from_formula(formula: str, data: Dict[str, Tensor | ndarray | DataFrame], *, rank: int = 5, offsets_formula: str = 'logsum', dict_initialization: Dict[str, Tensor] | None = None)
Create a model instance from a formula and data.
- Parameters:
formula (str) – The formula.
data (dict) – The data dictionary. Each value can be either a torch.Tensor, a np.ndarray or pd.DataFrame
offsets_formula (str, optional(keyword-only)) – The formula for offsets. Defaults to “logsum”.
dict_initialization (dict, optional(keyword-only)) – The initialization dictionary. Defaults to None.
take_log_offsets (bool, optional(keyword-only)) – Whether to take the log of offsets. Defaults to False.
rank (int, optional(keyword-only)) – The rank of the approximation, by default 5.
- Return type:
See also
Examples
>>> from pyPLNmodels import PlnPCA, get_real_count_data >>> endog = get_real_count_data() >>> data = {"endog": endog} >>> pca = PlnPCA.from_formula("endog ~ 1", data = data, rank = 5)
- property latent_mean: Tensor
Property representing the latent mean.
- Returns:
The latent mean or None if it has not yet been initialized.
- Return type:
torch.Tensor or None
Examples
>>> from pyPLNmodels import PlnPCA, get_real_count_data >>> endog = get_real_count_data() >>> data = {"endog": endog} >>> plnpca = PlnPCA.from_formula("endog ~ 1", data = data) >>> plnpca.fit() >>> print(plnpca.latent_mean.shape)
- property latent_parameters
Property representing the latent parameters.
- Returns:
The dictionary of latent parameters.
- Return type:
dict
- property latent_sqrt_var
Property representing the latent variance.
- Returns:
The latent variance or None.
- Return type:
torch.Tensor or None
- property latent_variables: Tensor
Property representing the latent variables.
- Returns:
The latent variables of size (n_samples, dim).
- Return type:
torch.Tensor
Examples
>>> from pyPLNmodels import PlnPCA, get_real_count_data >>> endog = get_real_count_data(return_labels=False) >>> pca = PlnPCA(endog,add_const = True) >>> pca.fit() >>> print(pca.latent_variables.shape)
- property latent_variance: Tensor
Property representing the latent variance.
- Returns:
The latent variance tensor.
- Return type:
torch.Tensor
- property loglike
Property representing the log-likelihood.
- Returns:
The log-likelihood.
- Return type:
float
- property model_parameters: Dict[str, Tensor]
Property representing the model parameters.
- Returns:
The dictionary of model parameters.
- Return type:
dict
- property n_samples: int
The number of samples, i.e. the first dimension of the endog.
- Returns:
The number of samples.
- Return type:
int
- property nb_batches
- property nb_cov: int
The number of exog.
- Returns:
The number of exog.
- Return type:
int
- property nb_iteration_done: int
The number of iterations done.
- Returns:
The number of iterations done.
- Return type:
int
- property number_of_parameters: int
Property representing the number of parameters.
- Returns:
The number of parameters.
- Return type:
int
- property offsets
Property representing the offsets.
- Returns:
The offsets or None.
- Return type:
torch.Tensor or None
- property optim_parameters
Property representing the optimization parameters.
- Returns:
The dictionary of optimization parameters.
- Return type:
dict
- property ortho_components
Orthogonal components of the model.
- pca_projected_latent_variables(n_components: int | None = None)
Perform PCA on the latent variables and project them onto a lower-dimensional space.
- Parameters:
n_components (int, optional) – The number of components to keep. If None, all components are kept. Defaults to None.
- Returns:
The projected latent variables.
- Return type:
numpy.ndarray
- Raises:
ValueError – If the number of components asked is greater than the number of dimensions.
Examples
>>> from pyPLNmodels import PlnPCA, get_real_count_data >>> endog = get_real_count_data() >>> data = {"endog": endog} >>> plnpca = PlnPCA.from_formula("endog ~ 1", data = data) >>> plnpca.fit() >>> pca_proj = plnpca.pca_projected_latent_variables() >>> print(pca_proj.shape)
- plot_expected_vs_true(ax=None, colors=None)
Plot the predicted value of the endog against the endog.
- Parameters:
ax (Optional[matplotlib.axes.Axes], optional) – The matplotlib axis to use. If None, the current axis is used, by default None.
colors (Optional[Any], optional) – The colors to use for plotting, by default None.
- Returns:
matplotlib.axes.Axes – The matplotlib axis.
>>>
Examples
>>> import matplotlib.pyplot as plt >>> from pyPLNmodels import PlnPCA, get_real_count_data >>> endog, labels = get_real_count_data(return_labels = True) >>> plnpca = PlnPCA(endog,add_const = True) >>> plnpca.fit() >>> plnpca.plot_expected_vs_true() >>> plt.show() >>> plnpca.plot_expected_vs_true(colors = labels) >>> plt.show()
- plot_pca_correlation_graph(variables_names: List[str], indices_of_variables=None)
Visualizes variables using PCA and plots a correlation graph.
- Parameters:
variables_names (List[str]) – A list of variable names to visualize.
indices_of_variables (Optional[List[int]], optional) – A list of indices corresponding to the variables. If None, indices are determined based on column_endog, by default None
- Raises:
ValueError – If indices_of_variables is None and column_endog is not set.
ValueError – If the length of indices_of_variables is different from the length of variables_names.
- Return type:
None
Examples
>>> from pyPLNmodels import PlnPCA, get_real_count_data >>> endog = get_real_count_data() >>> data = {"endog": endog} >>> plnpca = PlnPCA.from_formula("endog ~ 1", data = data) >>> plnpca.fit() >>> plnpca.plot_pca_correlation_graph(["a","b"], indices_of_variables = [4,8])
- predict(exog: Tensor | ndarray | DataFrame | None = None)
Method for making predictions.
- Parameters:
exog (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional) – The exog, by default None.
- Returns:
The predicted values or None.
- Return type:
torch.Tensor or None
- Raises:
AttributeError – If there are no exog in the model but some are provided.
RuntimeError – If the shape of the exog is incorrect.
Notes
- If exog is not provided and there are no exog in the model, None is returned.
If there are exog in the model, then the mean exog @ coef is returned.
If exog is provided, it should have the shape (_, nb_cov), where nb_cov is the number of exog.
The predicted values are obtained by multiplying the exog by the coefficients.
- property projected_latent_variables: Tensor
Property representing the projected latent variables.
- Returns:
The projected latent variables.
- Return type:
torch.Tensor
- qq_plots()
- property rank: int
Property representing the rank.
- Returns:
The rank.
- Return type:
int
- save(path: str | None = None)
Save the model parameters to disk.
- Parameters:
path (str, optional) – The path of the directory to save the parameters, by default “./”.
- scatter_pca_matrix(n_components=None, color=None)
Generates a scatter matrix plot based on Principal Component Analysis (PCA).
- Parameters:
(int (n_components) – If not specified, the maximum number of components will be used. Defaults to None.
optional) (The number of components to consider for plotting.) – If not specified, the maximum number of components will be used. Defaults to None.
(str (color) – sample in the endog property of the object. Defaults to None.
np.ndarray) (An array with one label for each) – sample in the endog property of the object. Defaults to None.
- Raises:
ValueError – If the number of components requested is greater than the number of variables in the dataset.:
Examples
>>> from pyPLNmodels import PlnPCA, get_real_count_data >>> endog = get_real_count_data() >>> data = {"endog": endog} >>> plnpca = PlnPCA.from_formula("endog ~ 1", data = data) >>> plnpca.fit() >>> plnpca.scatter_pca_matrix(n_components = 5)
- show(axes=None)
Show 3 plots. The first one is the covariance of the model. The second one is the stopping criterion with the runtime in abscisse. The third one is the elbo.
- Parameters:
axes (numpy.ndarray, optional) – The axes to plot on. If None, a new figure will be created. Defaults to None.
- sigma()
Method returning the covariance matrix.
- Returns:
The covariance matrix or None.
- Return type:
torch.Tensor or None
- sk_PCA(n_components=None)
Perform the scikit-learn PCA on the latent variables.
- Parameters:
n_components (int, optional) – The number of components to keep. If None, all components are kept. Defaults to None.
- Returns:
sklearn.decomposition.PCA object with all the features from sklearn.
- Return type:
sklearn.decomposition.PCA
- Raises:
ValueError – If the number of components asked is greater than the number of dimensions.
- transform(project: bool = False) Tensor
Method for transforming the endog. Can be seen as a normalization of the endog. :param project: Whether to project the latent variables, by default False. :type project: bool, optional
- Returns:
The transformed endog (latent variables of the model).
- Return type:
torch.Tensor
Examples
>>> from pyPLNmodels import PlnPCA, get_real_count_data >>> endog= get_real_count_data() >>> pca = PlnPCA(endog, add_const = True) >>> pca.fit() >>> transformed_endog_low_dim = pca.transform() >>> transformed_endog_high_dim = pca.transform(project = False) >>> print(transformed_endog_low_dim.shape) >>> print(transformed_endog_high_dim.shape)
- viz(ax: Axes | None = None, colors=None, show_cov: bool = False)
Visualize the latent variables with a classic PCA.
- Parameters:
ax (Optional[matplotlib.axes.Axes], optional(keyword-only)) – The matplotlib axis to use. If None, the current axis is used, by default None.
colors (Optional[np.ndarray], optional(keyword-only)) – The colors to use for plotting, by default None.
show_cov (bool, Optional(keyword-only)) – If True, will display ellipses with right covariances. Default is False.
- Raises:
RuntimeError – If the rank is less than 2.
- Returns:
The matplotlib axis.
- Return type:
Any
Examples
>>> import matplotlib.pyplot as plt >>> from pyPLNmodels import PlnPCA, get_real_count_data >>> endog, labels = get_real_count_data(return_labels = True) >>> plnpca = PlnPCA(endog,add_const = True) >>> plnpca.fit() >>> plnpca.viz() >>> plt.show() >>> plnpca.viz(colors = labels) >>> plt.show() >>> plnpca.viz(show_cov = True) >>> plt.show()