Locating samples and variants¶
Utilities for locating samples and variants.
See also the examples at:
-
anhima.loc.
view_sample
(a, selection, all_samples=None)[source]¶ View a single column from the array a corresponding to a selected sample.
Parameters: a : array_like
An array with 2 or more dimensions, where the second dimension corresponds to samples.
selection : int or object
A sample identifier or column index.
all_samples : sequence, optional
A sequence (e.g., list) of sample identifiers corresponding to the second dimension of a, used to map selection to a column index. If not given, assume selection is a column index.
Returns: b : ndarray
An array obtained from a by taking the column corresponding to the selected sample.
-
anhima.loc.
take_samples
(a, selection, all_samples=None)[source]¶ Extract columns from the array a corresponding to selected samples.
Parameters: a : array_like
An array with 2 or more dimensions, where the second dimension corresponds to samples.
selection : sequence of ints or objects
A sequence of sample identifiers or column indices.
all_samples : sequence, optional
A sequence (e.g., list) of sample identifiers corresponding to the second dimension of a, used to map selection to column indices. If not given, assume selection is a sequence of column indices.
Returns: b : ndarray
An array obtained from a by taking columns corresponding to the selected samples.
-
anhima.loc.
query_variants
(expression, variants)[source]¶ Evaluate expression with respect to the given variants.
Parameters: expression : string
The query expression to apply. The expression will be evaluated by
numexpr
against the provided variants.variants : dict-like
The variables to include in scope for the expression evaluation.
Returns: result : ndarray
The result of evaluating expression against variants.
-
anhima.loc.
compress_variants
(a, condition)[source]¶ Extract rows from the array a corresponding to a boolean condition.
Parameters: a : array_like
An array to extract rows from (e.g., genotypes).
condition : array_like, bool
A 1-D boolean array of the same length as the first dimension of a.
Returns: b : ndarray
An array obtained from a by taking rows corresponding to the selected variants.
See also
-
anhima.loc.
take_variants
(a, indices, mode='raise')[source]¶ Extract rows from the array a corresponding to indices.
Parameters: a : array_like
An array to extract rows from (e.g., genotypes).
indices : sequence of integers
The variant indices to extract.
mode : {‘raise’, ‘wrap’, ‘clip’}, optional
Specifies how out-of-bounds indices will behave.
Returns: b : ndarray
An array obtained from a by taking rows corresponding to the selected variants.
See also
-
anhima.loc.
locate_position
(pos, p)[source]¶ Locate the index of coordinate p within sorted array of genomic positions pos.
Parameters: pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig, with no duplicates.
p : int
The position to locate.
Returns: index : int or None
The index of p in pos if present, else None.
See also
-
anhima.loc.
view_position
(a, pos, p)[source]¶ View a slice along the first dimension of a corresponding to a genome position.
Parameters: a : array_like
The array to extract from.
pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig, with no duplicates.
p : int
The position to locate.
Returns: b : ndarray
A view of a obtained by slicing along the first dimension.
See also
-
anhima.loc.
locate_interval
(pos, start_position=0, stop_position=None)[source]¶ Locate the start and stop indices within the pos array that include all positions within the start_position and stop_position range.
Parameters: pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig.
start_position : int
Start position of interval.
stop_position : int
Stop position of interval
Returns: loc : slice
A slice object with the start and stop indices that include all positions within the interval.
See also
-
anhima.loc.
view_interval
(a, pos, start_position, stop_position)[source]¶ View a contiguous slice along the first dimension of a corresponding to a genome interval defined by start_position and stop_position.
Parameters: a : array_like
The array to extract from.
pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig.
start_position : int
Start position of interval.
stop_position : int
Stop position of interval
Returns: b : ndarray
A view of a obtained by slicing along the first dimension.
See also
-
anhima.loc.
locate_positions
(pos1, pos2)[source]¶ Find the intersection of two sets of positions.
Parameters: pos1, pos2 : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig, with no duplicates.
Returns: cond1 : ndarray, bool
An array of the same length as pos1 where an element is True if the corresponding item in pos1 is also found in pos2.
cond2 : ndarray, bool
An array of the same length as pos2 where an element is True if the corresponding item in pos2 is also found in pos1.
See also
-
anhima.loc.
locate_intervals
(pos, start_positions, stop_positions)[source]¶ Locate items within the pos array that fall within any of the intervals given by start_positions and stop_positions.
Parameters: pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig.
start_positions : array_like, int
Start positions of intervals.
stop_positions : array_like, int
Stop positions of intervals
Returns: cond1 : ndarray, bool
An array of the same length as pos where an element is True if the corresponding item in pos is also found in any of the intervals.
cond2 : ndarray, bool
An array of the same length as the number of intervals, where an element is True if the corresponding interval contains one or more positions in pos.
See also
-
anhima.loc.
plot_variant_locator
(pos, step=1, ax=None, start_position=None, stop_position=None, flip=False, line_args=None)[source]¶ Plot lines indicating the physical genome location of variants. By default the top x axis is in variant index space, and the bottom x axis is in genome position space.
Parameters: pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig.
step : int, optional
Plot a line for every step variants.
ax : axes, optional
The axes on which to draw. If not provided, a new figure will be created.
start_position : int, optional
The start position for the region over which to work.
stop_position : int, optional
The stop position for the region over which to work.
flip : bool, optional
Flip the plot upside down.
line_args : dict-like
Additional keyword arguments passed through to plt.Line2D.
Returns: ax : axes
The axes on which the plot was drawn
-
anhima.loc.
windowed_variant_counts
(pos, window_size, start_position=None, stop_position=None)[source]¶ Count variants in non-overlapping windows over the genome.
Parameters: pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig.
window_size : int
The size in base-pairs of the windows.
start_position : int, optional
The start position for the region over which to work.
stop_position : int, optional
The stop position for the region over which to work.
Returns: counts : ndarray, int
The number of variants in each window.
bin_edges : ndarray, int
The edge positions of each window. Note that this has length
len(counts)+1
. To determine bin centers use(bin_edges[:-1] + bin_edges[1:]) / 2
. To determine bin widths usenp.diff(bin_edges)
.See also
windowed_variant_counts_plot
,windowed_variant_density
-
anhima.loc.
plot_windowed_variant_counts
(pos, window_size, start_position=None, stop_position=None, ax=None, plot_kwargs=None)[source]¶ Plot windowed variant counts.
Parameters: pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig.
window_size : int
The size in base-pairs of the windows.
start_position : int, optional
The start position for the region over which to work.
stop_position : int, optional
The stop position for the region over which to work.
ax : axes, optional
The axes on which to draw. If not provided, a new figure will be created.
plot_kwargs : dict-like
Additional keyword arguments passed through to plt.plot.
Returns: ax : axes
The axes on which the plot was drawn.
See also
windowed_variant_counts
,windowed_variant_density_plot
-
anhima.loc.
windowed_variant_density
(pos, window_size, start_position=None, stop_position=None)[source]¶ Calculate per-base-pair density of variants in non-overlapping windows over the genome.
Parameters: pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig.
window_size : int
The size in base-pairs of the windows.
start_position : int, optional
The start position for the region over which to work.
stop_position : int, optional
The stop position for the region over which to work.
Returns: density : ndarray, int
The density of variants in each window.
bin_edges : ndarray, int
The edge positions of each window. Note that this has length
len(density)+1
. To determine bin centers use(bin_edges[:-1] + bin_edges[1:]) / 2
. To determine bin widths usenp.diff(bin_edges)
.See also
windowed_variant_density_plot
,windowed_variant_counts
-
anhima.loc.
plot_windowed_variant_density
(pos, window_size, start_position=None, stop_position=None, ax=None, plot_kwargs=None)[source]¶ Plot windowed variant density.
Parameters: pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig.
window_size : int
The size in base-pairs of the windows.
start_position : int, optional
The start position for the region over which to work.
stop_position : int, optional
The stop position for the region over which to work.
ax : axes, optional
The axes on which to draw. If not provided, a new figure will be created.
plot_kwargs : dict-like
Additional keyword arguments passed through to plt.plot.
Returns: ax : axes
The axes on which the plot was drawn.
See also
windowed_variant_density
,windowed_variant_counts_plot
-
anhima.loc.
windowed_statistic
(pos, values, window_size, start_position=None, stop_position=None, statistic='mean')[source]¶ Calculate a statistic for values binned in non-overlapping windows over the genome.
Parameters: pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig.
values : array_like
A 1-D array of the same length as pos.
window_size : int
The size in base-pairs of the windows.
start_position : int, optional
The start position for the region over which to work.
stop_position : int, optional
The stop position for the region over which to work.
statistic : string or function
The function to apply to values in each bin.
Returns: stats : ndarray
The values of the statistic within each bin.
bin_edges : ndarray
The edge positions of each window. Note that this has length
len(stats)+1
. To determine bin centers use(bin_edges[:-1] + bin_edges[1:]) / 2
. To determine bin widths usenp.diff(bin_edges)
.
-
anhima.loc.
evenly_downsample_variants
(a, k)[source]¶ Evenly downsample an array along the first dimension to length k (or as near as possible), assuming the first dimension corresponds to variants.
Parameters: a : array_like
The array to downsample.
k : int
The target number of variants.
Returns: b : array_like
A downsampled view of a.
-
anhima.loc.
randomly_downsample_variants
(a, k)[source]¶ Evenly downsample an array along the first dimension to length k, assuming the first dimension corresponds to variants.
Parameters: a : array_like
The array to downsample.
k : int
The k number of variants.
Returns: b : array_like
A downsampled copy of a.