Miscellaneous utilities

Miscellaneous utilities.

anhima.util.block_apply(f, dataset, block_size=None, out=None)[source]

Apply function f to dataset split along the first axis into contiguous slices of block_size. The result should be equivalent to calling f(dataset) directly, however may require less total memory, especially if dataset is an HDF5 dataset.

Parameters:

f : function

The function to apply.

dataset : array_like or HDF5 dataset

The input dataset.

block_size : int, optional

The size (in number of items along axis) of the blocks passed to f.

out : array_like or HDF5 dataset, optional

If given, used to store the output.

Returns:

out : ndarray

The result of applying f to dataset blockwise.

anhima.util.block_take2d(dataset, row_indices, col_indices=None, block_size=None)[source]

Select rows and optionally columns from a Numpy array or HDF5 dataset with 2 or more dimensions.

Parameters:

dataset : array_like or HDF5 dataset

The input dataset.

row_indices : sequence of ints

The indices of the selected rows. N.B., will be sorted in ascending order.

col_indices : sequence of ints, optional

The indices of the selected columns. If not provided, all columns will be returned.

block_size : int, optional

The size (in number of rows) of the block of data to process at a time.

Returns:

out : ndarray

An array containing the selected rows and columns.

Notes

This function is mainly a work-around for the fact that fancy indexing via h5py is currently slow, and fancy indexing along more than one axis is not supported. The function works by reading the entire dataset in blocks of block_size rows, and processing each block in memory using numpy.

anhima.util.block_compress2d(dataset, row_condition, col_condition=None, block_size=None)[source]

Select rows and optionally columns from a Numpy array or HDF5 dataset with 2 or more dimensions.

Parameters:

dataset : array_like or HDF5 dataset

The input dataset.

row_condition : array_like, bool

A boolean array indicating the selected rows.

col_indices : array_like, bool, optonal

A boolean array indicated the selected columns. If not provided, all columns will be returned.

block_size : int, optional

The size (in number of rows) of the block of data to process at a time.

Returns:

out : ndarray

An array containing the selected rows and columns.

Notes

This function is mainly a work-around for the fact that fancy indexing via h5py is currently slow, and fancy indexing along more than one axis is not supported. The function works by reading the entire dataset in blocks of block_size rows, and processing each block in memory using numpy.