Generic_Util package#
Subpackages#
Submodules#
Generic_Util.benchmarking module#
Functions covering typical code-timing scenarios, such as a “with” statement context, an n-executions timer, and a convenient function for comparing and summarising n execution times of different implementations of the same function.
- Generic_Util.benchmarking.compare_implementations(fs_with_shared_args: dict[str, Callable], n=200, wait=1, verbose=True, fs_with_own_args: dict[str, tuple[Callable, list, dict]] | None = None, args: list | None = None, kwargs: dict | None = None)#
Benchmark multiple implementations of the same function called n times (each with the same args and kwargs), with a break between functions. Recommended later output view if verbose is False:
print(table.to_markdown(index = False))
. :param fs_with_own_args: alternative to fs_with_shared_args, args and kwargs arguments: meant for additional functions taking different *args and **kwargs.
- Generic_Util.benchmarking.merge_bench_tables(*tables, verbose=True)#
- Generic_Util.benchmarking.time_context(name: str = None)#
“with” statement context for timing the execution of the enclosed code block, i.e.
with time_context('Name of code block'): ...
- Generic_Util.benchmarking.time_n(f: Callable, n=2, *args, **kwargs)#
Run f (with given arguments) n times and return the execution intervals
Generic_Util.iter module#
Iterable-focussed functions, covering multiple varieties of flattening, iterable combining, grouping and predicate/property-based processing (including topological sorting), element-comparison-based operations, value interspersal, and finally batching.
- Generic_Util.iter.all_combinations(xs: Sequence, min_size=1, max_size=None) list #
Produce all combinations of elements of xs between min_size and max_size subset size (i.e. the powerset elements of certain sizes).
- Generic_Util.iter.batch_iter(n: int, xs: Iterable[_a]) Generator[_a, None, None] #
Batch an iterable in batches of size n (possibly except the last). If len(xs) is knowable use batch_seq instead.
- Generic_Util.iter.batch_iter_by(by: Callable[[_a], float], size: int, xs: Iterable[_a], keep_totals=False) Generator[_a, None, None] #
Batch an iterable by sum of some weight/score/cost of each element, with no batch exceeding size. :param keep_totals: if True, the function returns tuples of (batch_total_value, batch) instead of just batches
- Generic_Util.iter.batch_seq(n: int, xs: Sequence[_a], n_is_number_of_batches=False) Generator[_a, None, None] #
Batch an iterable of knowable length in batches of size n (possibly except the last). :param n_is_number_of_batches: If True, divide into n batches of equal length (possibly except the last) rather than in batches of n elements
- Generic_Util.iter.batch_seq_by_into(by: Callable[[_a], float], k: int, xs: Sequence[_a], keep_totals=False, optimal_but_reordered=False) Generator[_a, None, None] #
Batch an iterable of knowable length into k batches containing elements whose sum of some weight/score/cost (by) is roughly equal. :param keep_totals: if True, the function returns tuples of (batch_total_value, batch) instead of just batches :param optimal_but_reordered: if True, the function produces the optimal (hence not order-preserving) batching
of summable xs into a given number of batches containing roughly equal totals. If False, the process retains order but may return more batches than requested.
- Generic_Util.iter.complement_intervals(intervals: ndarray[Any, dtype[int]], true_length: int, closed_interval_ends=True) ndarray[Any, dtype[int]] #
Invert a series of index intervals (in the form of an (n,2)-array), i.e. return an (m,2)-array of intervals starting after the ends of the given ones and ending before their starts.
- Parameters:
true_length – 1 more than the final index for the overall range these intervals are within (which might be
intervals[-1,1]
); if coming from isin_sorted_intervals, then simplylen(xs)
closed_interval_ends – whether BOTH input and output intervals are (and will be) closed, i.e. their end-index is INCLUDED in the interval, rather than being the first index after it
- Generic_Util.iter.deep_extract(xss_: Iterable[Iterable], *key_path) Generator #
Given a nested combination of iterables and a path of keys in it, return a deep_flatten-ed list of entries under said path. Note:
deep_extract(xss_, *key_path) == deep_flatten(Generic_Util.operator.get_nested(xss_, *key_path))
- Generic_Util.iter.deep_flatten(xss_: Iterable) Generator #
Flatten any nested combination of iterables to its leaves. The flattening ignores keys of Mappings and stops at tuples, strings, bytes or non-Iterables. Same as a recursive chain.from_iterable but handling values of Mappings instead of keys.
- Generic_Util.iter.diff(xs: Iterable[_a], ys: Iterable[_a]) list[_a] #
Difference of iterables. .. rubric:: Notes
not a set difference, so strictly removing as many xs duplicate entries as there are in ys
preserves order in xs
- Generic_Util.iter.eq_elems(xs: Iterable[_a], ys: Iterable[_a]) bool #
Equality of iterables by their elements.
- Generic_Util.iter.first(c: Callable[[_a], bool], xs: Iterable[_a], default: _a | None = None) _a #
Return the first value in xs which satisfies condition c.
- Generic_Util.iter.first_i(c: Callable[[_a], bool], xs: Sequence[_a], default: _a | None = None) _a #
Return the index of the first value in xs which satisfies condition c.
- Generic_Util.iter.flatten(xss: Iterable[Iterable], as_generator=False) list | Generator #
Standard flattening (returning a list by default).
- Generic_Util.iter.foldq(f: Callable[[_b, _a], _b], g: Callable[[_b, _a, list[_a]], list[_a]], c: Callable[[_a], bool], xs: Sequence[_a], acc: _b) tuple[_b, list[_a]] #
Fold-like higher-order function where xs is traversed by consumption conditional on c, and remaining xs are updated by g (therefore consumption order is not known a priori):
the first/next item to be ingested is the first in the remaining xs to fulfil condition c
at every x ingestion, the item is removed from (a copy of) xs, and all the remaining ones are potentially modified by function g
this function always returns a tuple of
(acc, remaining_xs)
, unlike the stricterfoldq_
, which raises an exception for leftover xsNote:
fold(f, xs, acc) == foldq(f, lambda acc, x, xs: xs, lambda x: True, xs, acc)
.Sequence of suitable names leading to the current one: consumption_fold, condition_update_fold, cu_fold, q_fold, qfold or foldq
- Parameters:
f – ‘Traditional’ fold function :: acc -> x -> acc
g – ‘Update’ function for all remaining xs at every iteration :: acc -> x -> xs -> xs
c – ‘Condition’ function to select the next x (first which satisfies it) :: x -> Bool
xs – Structure to consume
acc – Starting value for the accumulator
- Returns:
(acc, remaining_xs)
- Generic_Util.iter.foldq_(f: Callable[[_b, _a], _b], g: Callable[[_b, _a, list[_a]], list[_a]], c: Callable[[_a], bool], xs: list[_a], acc: _b) _b #
Stricter version of foldq (see its description for details); only returns the accumulator and raises an exception on leftover xs. :raises ValueError on leftover xs
- Generic_Util.iter.group_by(f: Callable[[_a], _b], xs: Iterable[_a]) dict[_b, list[_a]] #
Generalisation of partition to any-output key-function. .. rubric:: Notes
‘Retrieval’ functions from the operator package are typical f values (
itemgetter(...)
,attrgetter(...)
ormethodcaller(...)
)This is NOT Haskell’s groupBy function
- Generic_Util.iter.intersperse(xs: Sequence[_a], ys: list[_a], n: int, prepend=False, append=False) Sequence[_a] #
Intersperse elements of ys every n elements of xs.
- Generic_Util.iter.intersperse_val(xs: Sequence[_a], y: _a, n: int, prepend=False, append=False) Sequence[_a] #
Intersperse y every n elements of xs.
- Generic_Util.iter.isin_sorted(xs: ndarray[Any, dtype[_a]], ys: ndarray[Any, dtype[_a]]) ndarray[Any, dtype[bool_]] #
Optimised (10x faster) 1D version of np.isin assuming BOTH xs and ys are (ascendingly) sorted arrays of the same type (both int or both float): return a boolean array of the same length as xs indicating whether the corresponding element at that index is present in ys.
- Generic_Util.iter.isin_sorted_intervals(xs: ndarray[Any, dtype[_a]], ys: ndarray[Any, dtype[_a]], strict=True) ndarray[Any, dtype[int64]] #
Assuming BOTH xs and ys are (ascendingly) sorted arrays of the same type (both int or both float): return a 2D array of the intervals (in terms of indices of xs) of subsequences shared with ys.
Notes
The interval-end indices are of the last matching value, not of the next (non-matching) one; run
res[:,1] += 1
to switch the behaviourIf the desired intervals are the opposite of the matching ones, call
complement_intervals(intervals, len(xs), closed_interval_ends = True)
- Parameters:
strict – Whether xs and ys subsequences need to match exactly (e.g. 1223 will not match 123 and vice-versa if strict). If the strictness of the assumption is not respected, the function will produce extra len-1 intervals for each duplicate.
- Generic_Util.iter.partition(p: Callable[[_a], bool], xs: Iterable[_a]) tuple[Iterable[_a], Iterable[_a]] #
Haskell’s partition function, partitioning xs by some boolean predicate p:
partition p xs == (filter p xs, filter (not . p) xs)
.
- Generic_Util.iter.topological_sort(nodes_incoming_edges_tuples: Iterable[tuple[_a, list[_b]]]) list[_a] #
Topological sort, i.e. sort (non-uniquely) DAG nodes by directed path, e.g. sort packages by dependency order.
- Generic_Util.iter.unique(xs: Iterable[_a]) list[_a] #
Order-preserving uniqueness. If the values of xs are hashable, use unique_a instead.
- Generic_Util.iter.unique_a(xs: ndarray[Any, dtype[ScalarType]]) ndarray[Any, dtype[ScalarType]] #
Order-preserving uniqueness for an array (of hashable objects). Much faster than a compiled version of the non-hashing-using algorithm, even including the final sort.
- Generic_Util.iter.unique_by(f: Callable[[_a], Any], xs: Iterable[_a]) list[_a] #
Order-preserving uniqueness by some property f. If the values of xs are hashable, it is recommended to use unique_a.
- Generic_Util.iter.unzip(list_of_ntuples: Iterable[Iterable], as_generator=False) list[list] | Generator[list, None, None] #
Standard unzip (i.e. zip(*…)) but with concretised outputs (as lists).
- Generic_Util.iter.update_dict_with(d0: dict, d1: dict, f: Callable[[_a, _b], _a | _b]) dict[Any, Union[_a, _b]] #
Update a dictionary’s entries with those of another using a given function, e.g. appending (operator.add is ideal for this). NOTE: This modifies d0, so a prior copy or deepcopy is advisable.
- Generic_Util.iter.zip_maps(*maps: list[Mapping[_a, _b]]) Generator[tuple[_a, tuple], None, None] #
Generic_Util.misc module#
Functions with less generic purpose than in the above; currently mostly to do with min/max-based operations.
- Generic_Util.misc.interval_overlap(ab: tuple[float, float], cd: tuple[float, float]) float #
Compute the overlap of two intervals (expressed as tuples of start and end values)
- Generic_Util.misc.min_max(xs: Sequence[_a]) tuple[_a, _a] #
Mathematically most efficient joint identification of min and max (minimum comparisons = 3n/2 - 2). .. note:
- This function is numba-compilable, e.g. as ``njit(nTup(f8,f8)(f8[::1]))(min_max)`` (see ``Generic_Util.numba.types`` for ``nTup`` shorthand), - If using numpy arrays, min and max are cached for O(1) lookup, and one would imagine this is the used algorithm
Generic_Util.operator module#
Functions regarding item retrieval, and syntactic sugar for patterns of function application.
- Generic_Util.operator.fst(ab: tuple[_a, _b]) _a #
Same as operator.itemgetter(0), but intended (and type-annotated) specifically for 2-tuple use
- Generic_Util.operator.get_nested(xss_: Iterable[Iterable], *key_path) Generator #
Follow a path of keys for a nested combination of iterables
- Generic_Util.operator.on(f: Callable, xs: Iterable[_a], g: Callable[[_a, ...], _b], *args, **kwargs)#
Transform xs by element-wise application of g and call f with them as its arguments. E.g.
on(operator.gt, (a, b), len)
.. rubric:: Notes
- Generic_Util.operator.on_a(f: Callable, xs: Iterable, a: str)#
Extract attribute a from xs elements and call f with them as its arguments. E.g.
on_a(operator.eq, [a, b], '__class__')
- Generic_Util.operator.on_m(f: Callable, xs: Iterable, m: str, *args, **kwargs)#
Call method m on xs elements and call f with their results as its arguments. E.g.
on_m(operator.gt, [a, b], 'count', 'hello')
.. rubric:: Notes
- Generic_Util.operator.snd(ab: tuple[_a, _b]) _b #
Same as operator.itemgetter(1), but intended (and type-annotated) specifically for 2-tuple use
Module contents#
Copyright (c) 2023 Thomas Fletcher. All rights reserved.
Generic-Util: Convenient functions not found in Python’s Standard Library (regarding benchmarking, iterables, functional tools, operators and numba compilation)