Principal components analysis¶
Utility functions for running principal components analysis and plotting the results.
See also the examples at:

anhima.pca.
pca
(gn, n_components=10, whiten=False)[source]¶ Perform a principal components analysis of genotypes, treating each variant as a feature.
Parameters: gn : array_like, shape (n_variants, n_samples)
A 2dimensional array where each element is a genotype call coded as a single integer counting the number of nonreference alleles.
n_components : int, None or string
Number of components to keep. If n_components is None all components are kept:
n_components == min(n_samples, n_features)
. If n_components == ‘mle’, Minka’s MLE is used to guess the dimension. If 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.whiten : bool
When True (False by default) the components vectors are divided by n_samples times singular values to ensure uncorrelated outputs with unit componentwise variances.
Returns: model :
sklearn.decomposition.PCA
The fitted model.
coords : ndarray, shape (n_samples, n_components)
The result of fitting the model with genotypes and applying dimensionality reduction to genotypes.
See also
sklearn.decomposition.PCA
,anhima.ld.ld_prune_pairwise
Notes
The
anhima.ld.ld_prune_pairwise()
can be used to obtain a set of variants in approximate linkage equilibrium prior to running PCA.

anhima.pca.
plot_coords
(model, coords, pcx=1, pcy=2, ax=None, colors='b', sizes=20, labels=None, scatter_kwargs=None, annotate_kwargs=None)[source]¶ Scatter plot of transformed coordinates from principal components analysis.
Parameters: model :
sklearn.decomposition.PCA
The fitted model.
coords : ndarray, shape (n_samples, n_components)
The transformed coordinates.
pcx : int, optional
The principal component to plot on the X axis. N.B., this is onebased, so 1 is the first principal component, 2 is the second component, etc.
pcy : int, optional
The principal component to plot on the Y axis. N.B., this is onebased, so 1 is the first principal component, 2 is the second component, etc.
ax : axes, optional
The axes on which to draw. If not provided, a new figure will be created.
colors : color or sequence of color, optional
Can be a single color format string, or a sequence of color specifications of length n_samples.
sizes : scalar or array_like, shape (n_samples), optional
Size in points^2.
labels : sequence of strings
If provided, will be used to label points in the plot.
scatter_kwargs : dictlike
Additional keyword arguments passed through to plt.scatter.
annotate_kwargs : dictlike
Additional keyword arguments passed through to plt.annotate when labelling points.
Returns: ax : axes
The axes on which the plot was drawn.

anhima.pca.
plot_variance_explained
(model, bar_kwargs=None, ax=None)[source]¶ Parameters: model :
sklearn.decomposition.PCA
The fitted model.
bar_kwargs : dictlike, optional
Additional keyword arguments passed through to
ax.bar()
.ax : axes, optional
The axes on which to draw. If not provided, a new figure will be created.
Returns: ax : axes
The axes on which the plot was drawn.

anhima.pca.
plot_loadings
(model, pc=1, pos=None, plot_kwargs=None, ax=None)[source]¶ Plot loadings for the given principal component.
Parameters: model :
sklearn.decomposition.PCA
The fitted model.
pc : int, optional
The principal component to plot loadings for. N.B., this is onebased, so 1 is the first principal component, 2 is the second component, etc.
pos : array_like, int, optional
An array of variant positions to use for the X axis, If not given, variant index will be used for the X axis.
plot_kwargs : dictlike, optional
Additional keyword arguments passed through to
ax.plot()
.ax : axes, optional
The axes on which to draw. If not provided, a new figure will be created.
Returns: ax : axes
The axes on which the plot was drawn.