Doubleton sharing¶

Doubleton sharing, a.k.a., analysis of f2 variants.

anhima.f2.count_shared_doubletons(subpops_ac)[source]

Count subpopulation pairs sharing doubletons (where one allele is observed in each subpopulation).

Parameters: subpops_ac : array_like, int An array of shape (n_variants, n_subpops) holding alternate allele counts for each subpopulation. counts : ndarray, int or float A square matrix of shape (n_subpops, n_subpops) where the array element at index (i, j) holds the count of shared doubletons between the ith and jth subpopulations.
anhima.f2.normalise_doubleton_counts(counts, n_samples, ploidy=2)[source]

Normalise doubleton counts by dividing by the number of distinct pairs of haplotypes in each population comparison.

Parameters: counts : array_like, ints A square matrix of shape (n_subpops, n_subpops) where the array element at index (i, j) holds the count of shared doubletons between the ith and jth subpopulations. n_samples : int or sequence of ints The number of samples in each sub-population. ploidy : int, optional The sample ploidy. normed_counts : ndarray, float Normalised counts of shared doubletons.

Notes

This function corrects for the fact that there are fewer pairs of haplotypes when looking for doubletons within a single subpopulation of size n than there are when comparing two different subpopulations of size n.

This function may also help to correct for the case where the number of samples from each subpopulation is not equal. However, note that if this is the case then there may still also be some bias in how doubletons have been ascertained.

anhima.f2.plot_shared_doubletons(counts, subpop_labels=None, subpop_colors=u'bgrcmyk', axs=None, figsize_factor=1, ylim=None, relative=False, flip=False)[source]

Plot counts of doubleton sharing between subpopulations as a bar chart.

Parameters: counts : array_like, ints A square matrix of shape (n_subpops, n_subpops) where the array element at index (i, j) holds the count of shared doubletons between the ith and jth subpopulations. subpop_labels : sequence of strings, optional Labels for the subpopulations. subpop_colors : sequence of colors, optional Colors for the subpopulations. axs : sequence of axes, optional The axes to use. If not provided, a new figure will be created. figsize_factor : float, optional Figure size in inches per subpopulation. Only used if axs is None. ylim : pair of ints or floats, optional Limits for the Y axes of all subplots. relative : bool, optional If True, normalise counts by dividing by the sum along each row. flip : bool, optional If True, invert the Y axis. axs : sequence of axes The axes on which the plot was drawn.
anhima.f2.plot_total_doubletons(counts, subpop_labels=None, width=0.8, orientation=u'vertical', n_samples=None, ax=None, bar_kwargs=None)[source]

Plot total counts of doubletons per subpopulations as a bar chart.

Parameters: counts : array_like, ints A square matrix of shape (n_subpops, n_subpops) where the array element at index (i, j) holds the count of shared doubletons between the ith and jth subpopulations. subpop_labels : sequence of strings, optional Labels for the subpopulations. width : float, optional The relative width of each bar. orientation : {‘vertical’, ‘horizontal’} The bar orientation. n_samples : int or sequence of ints The number of samples in each sub-population. ax : axes, optional The axes on which to plot. If not provided, a new figure will be created. bar_kwargs : dict, optional Keyword arguments passed through to ax.bar(). ax : axes The axes on which the plot was drawn.
anhima.f2.plot_f2_fig(counts, subpop_labels=None, subpop_colors=u'bgrcmyk', fig=None, figsize_factor=1, relative=False, normed=False, n_samples=None, ploidy=2)[source]

Plot a combined figure of shared doubleton counts and total counts per subpopulation.

Parameters: counts : array_like, ints A square matrix of shape (n_subpops, n_subpops) where the array element at index (i, j) holds the count of shared doubletons between the ith and jth subpopulations. subpop_labels : sequence of strings, optional Labels for the subpopulations. subpop_colors : sequence of colors, optional Colors for the subpopulations. fig : figure, optional The figure to use. If not provided, a new figure will be created. figsize_factor : float, optional Figure size in inches per subpopulation. Only used if fig is None. relative : bool, optional If True, plot counts relative to the sum along each row. normed : bool, optional If True, normalise counts by dividing by the number of possible pairs of haplotypes. n_samples : int or sequence of ints The number of samples in each sub-population. ploidy : int, optional The sample ploidy. (Only relevant if normed is True.) fig : figure The figure on which the plot was drawn.