Principal components analysis

Utility functions for running principal components analysis and plotting the results.

See also the examples at:

anhima.pca.pca(gn, n_components=10, whiten=False)[source]

Perform a principal components analysis of genotypes, treating each variant as a feature.

Parameters:

gn : array_like, shape (n_variants, n_samples)

A 2-dimensional array where each element is a genotype call coded as a single integer counting the number of non-reference alleles.

n_components : int, None or string

Number of components to keep. If n_components is None all components are kept: n_components == min(n_samples, n_features). If n_components == ‘mle’, Minka’s MLE is used to guess the dimension. If 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.

whiten : bool

When True (False by default) the components vectors are divided by n_samples times singular values to ensure uncorrelated outputs with unit component-wise variances.

Returns:

model : sklearn.decomposition.PCA

The fitted model.

coords : ndarray, shape (n_samples, n_components)

The result of fitting the model with genotypes and applying dimensionality reduction to genotypes.

See also

sklearn.decomposition.PCA, anhima.ld.ld_prune_pairwise

Notes

The anhima.ld.ld_prune_pairwise() can be used to obtain a set of variants in approximate linkage equilibrium prior to running PCA.

anhima.pca.plot_coords(model, coords, pcx=1, pcy=2, ax=None, colors=u'b', sizes=20, labels=None, scatter_kwargs=None, annotate_kwargs=None)[source]

Scatter plot of transformed coordinates from principal components analysis.

Parameters:

model : sklearn.decomposition.PCA

The fitted model.

coords : ndarray, shape (n_samples, n_components)

The transformed coordinates.

pcx : int, optional

The principal component to plot on the X axis. N.B., this is one-based, so 1 is the first principal component, 2 is the second component, etc.

pcy : int, optional

The principal component to plot on the Y axis. N.B., this is one-based, so 1 is the first principal component, 2 is the second component, etc.

ax : axes, optional

The axes on which to draw. If not provided, a new figure will be created.

colors : color or sequence of color, optional

Can be a single color format string, or a sequence of color specifications of length n_samples.

sizes : scalar or array_like, shape (n_samples), optional

Size in points^2.

labels : sequence of strings

If provided, will be used to label points in the plot.

scatter_kwargs : dict-like

Additional keyword arguments passed through to plt.scatter.

annotate_kwargs : dict-like

Additional keyword arguments passed through to plt.annotate when labelling points.

Returns:

ax : axes

The axes on which the plot was drawn.

anhima.pca.plot_variance_explained(model, bar_kwargs=None, ax=None)[source]
Parameters:

model : sklearn.decomposition.PCA

The fitted model.

bar_kwargs : dict-like, optional

Additional keyword arguments passed through to ax.bar().

ax : axes, optional

The axes on which to draw. If not provided, a new figure will be created.

Returns:

ax : axes

The axes on which the plot was drawn.

anhima.pca.plot_loadings(model, pc=1, pos=None, plot_kwargs=None, ax=None)[source]

Plot loadings for the given principal component.

Parameters:

model : sklearn.decomposition.PCA

The fitted model.

pc : int, optional

The principal component to plot loadings for. N.B., this is one-based, so 1 is the first principal component, 2 is the second component, etc.

pos : array_like, int, optional

An array of variant positions to use for the X axis, If not given, variant index will be used for the X axis.

plot_kwargs : dict-like, optional

Additional keyword arguments passed through to ax.plot().

ax : axes, optional

The axes on which to draw. If not provided, a new figure will be created.

Returns:

ax : axes

The axes on which the plot was drawn.