Principal components analysis¶
Utility functions for running principal components analysis and plotting the results.
See also the examples at:
- anhima.pca.pca(gn, n_components=10, whiten=False)[source]¶
Perform a principal components analysis of genotypes, treating each variant as a feature.
Parameters: gn : array_like, shape (n_variants, n_samples)
A 2-dimensional array where each element is a genotype call coded as a single integer counting the number of non-reference alleles.
n_components : int, None or string
Number of components to keep. If n_components is None all components are kept: n_components == min(n_samples, n_features). If n_components == ‘mle’, Minka’s MLE is used to guess the dimension. If 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.
whiten : bool
When True (False by default) the components vectors are divided by n_samples times singular values to ensure uncorrelated outputs with unit component-wise variances.
Returns: model : sklearn.decomposition.PCA
The fitted model.
coords : ndarray, shape (n_samples, n_components)
The result of fitting the model with genotypes and applying dimensionality reduction to genotypes.
See also
sklearn.decomposition.PCA, anhima.ld.ld_prune_pairwise
Notes
The anhima.ld.ld_prune_pairwise() can be used to obtain a set of variants in approximate linkage equilibrium prior to running PCA.
- anhima.pca.plot_coords(model, coords, pcx=1, pcy=2, ax=None, colors=u'b', sizes=20, labels=None, scatter_kwargs=None, annotate_kwargs=None)[source]¶
Scatter plot of transformed coordinates from principal components analysis.
Parameters: model : sklearn.decomposition.PCA
The fitted model.
coords : ndarray, shape (n_samples, n_components)
The transformed coordinates.
pcx : int, optional
The principal component to plot on the X axis. N.B., this is one-based, so 1 is the first principal component, 2 is the second component, etc.
pcy : int, optional
The principal component to plot on the Y axis. N.B., this is one-based, so 1 is the first principal component, 2 is the second component, etc.
ax : axes, optional
The axes on which to draw. If not provided, a new figure will be created.
colors : color or sequence of color, optional
Can be a single color format string, or a sequence of color specifications of length n_samples.
sizes : scalar or array_like, shape (n_samples), optional
Size in points^2.
labels : sequence of strings
If provided, will be used to label points in the plot.
scatter_kwargs : dict-like
Additional keyword arguments passed through to plt.scatter.
annotate_kwargs : dict-like
Additional keyword arguments passed through to plt.annotate when labelling points.
Returns: ax : axes
The axes on which the plot was drawn.
- anhima.pca.plot_variance_explained(model, bar_kwargs=None, ax=None)[source]¶
Parameters: model : sklearn.decomposition.PCA
The fitted model.
bar_kwargs : dict-like, optional
Additional keyword arguments passed through to ax.bar().
ax : axes, optional
The axes on which to draw. If not provided, a new figure will be created.
Returns: ax : axes
The axes on which the plot was drawn.
- anhima.pca.plot_loadings(model, pc=1, pos=None, plot_kwargs=None, ax=None)[source]¶
Plot loadings for the given principal component.
Parameters: model : sklearn.decomposition.PCA
The fitted model.
pc : int, optional
The principal component to plot loadings for. N.B., this is one-based, so 1 is the first principal component, 2 is the second component, etc.
pos : array_like, int, optional
An array of variant positions to use for the X axis, If not given, variant index will be used for the X axis.
plot_kwargs : dict-like, optional
Additional keyword arguments passed through to ax.plot().
ax : axes, optional
The axes on which to draw. If not provided, a new figure will be created.
Returns: ax : axes
The axes on which the plot was drawn.