hyperspy.learn.svd_pca module
- hyperspy.learn.svd_pca.svd_flip_signs(u, v, u_based_decision=True)
Sign correction to ensure deterministic output from SVD.
Adjusts the columns of u and the rows of v such that the loadings in the columns in u that are largest in absolute value are always positive.
- Parameters:
u (numpy array) – u and v are the outputs of a singular value decomposition.
v (numpy array) – u and v are the outputs of a singular value decomposition.
u_based_decision (bool, default True) – If True, use the columns of u as the basis for sign flipping. Otherwise, use the rows of v. The choice of which variable to base the decision on is generally algorithm dependent.
- Returns:
u, v – Adjusted outputs with same dimensions as inputs.
- Return type:
numpy array
- hyperspy.learn.svd_pca.svd_pca(data, output_dimension=None, svd_solver='auto', centre=None, auto_transpose=True, svd_flip=True, **kwargs)
Perform PCA using singular value decomposition (SVD).
Read more in the User Guide.
- Parameters:
data (numpy array) – MxN array of input data (M features, N samples)
output_dimension (None or int) – Number of components to keep/calculate
svd_solver ({"auto", "full", "arpack", "randomized"}, default "auto") –
- If auto:
The solver is selected by a default policy based on data.shape and output_dimension: if the input data is larger than 500x500 and the number of components to extract is lower than 80% of the smallest dimension of the data, then the more efficient “randomized” method is enabled. Otherwise the exact full SVD is computed and optionally truncated afterwards.
- If full:
run exact SVD, calling the standard LAPACK solver via
scipy.linalg.svd()
, and select the components by postprocessing- If arpack:
use truncated SVD, calling ARPACK solver via
scipy.sparse.linalg.svds()
. It requires strictly 0 < output_dimension < min(data.shape)- If randomized:
use truncated SVD, calling
sklearn.utils.extmath.randomized_svd()
to estimate a limited number of components
centre ({None, "navigation", "signal"}, default None) –
If None, the data is not centered prior to decomposition.
If “navigation”, the data is centered along the navigation axis.
If “signal”, the data is centered along the signal axis.
auto_transpose (bool, default True) – If True, automatically transposes the data to boost performance.
svd_flip (bool, default True) – If True, adjusts the signs of the loadings and factors such that the loadings that are largest in absolute value are always positive. See
svd_flip()
for more details.
- Returns:
factors (numpy array)
loadings (numpy array)
explained_variance (numpy array)
mean (numpy array or None (if centre is None))
- hyperspy.learn.svd_pca.svd_solve(data, output_dimension=None, svd_solver='auto', svd_flip=True, u_based_decision=True, **kwargs)
Apply singular value decomposition to input data.
- Parameters:
data (numpy array, shape (m, n)) – Input data array
output_dimension (None or int) – Number of components to keep/calculate
svd_solver ({"auto", "full", "arpack", "randomized"}, default "auto") –
- If auto:
The solver is selected by a default policy based on data.shape and output_dimension: if the input data is larger than 500x500 and the number of components to extract is lower than 80% of the smallest dimension of the data, then the more efficient “randomized” method is enabled. Otherwise the exact full SVD is computed and optionally truncated afterwards.
- If full:
run exact SVD, calling the standard LAPACK solver via
scipy.linalg.svd()
, and select the components by postprocessing- If arpack:
use truncated SVD, calling ARPACK solver via
scipy.sparse.linalg.svds()
. It requires strictly 0 < output_dimension < min(data.shape)- If randomized:
use truncated SVD, calling
sklearn.utils.extmath.randomized_svd()
to estimate a limited number of components
svd_flip (bool, default True) – If True, adjusts the signs of the loadings and factors such that the loadings that are largest in absolute value are always positive. See
svd_flip()
for more details.u_based_decision (bool, default True) – If True, and svd_flip is True, use the columns of u as the basis for sign-flipping. Otherwise, use the rows of v. The choice of which variable to base the decision on is generally algorithm dependent.
- Returns:
U, S, V – Output of SVD such that X = U*S*V.T
- Return type:
numpy array