mt.pandas.series
Extra functions to augment pandas.Series.
Functions
Series4json()
: Converts a json-like object in to a pandas.Series.json4Series()
: Converts a pandas.Series into a json-like object.to_categorical()
: Converts a string series representing a categorical field into a zero-indexed categorical field and a one-hot field in json format.series_apply()
: Applies a function on every item of a pandas.Series, optionally with a progress bar.series_parallel_apply()
: Parallel-applies a function on every item of a pandas.Series, optionally with a progress bar.stats()
: Computes some statistics out of a scalar series.
- mt.pandas.series.Series4json(obj)
Converts a json-like object in to a pandas.Series.
- Parameters:
obj (json_like) – a json-like object
- Returns:
output Series
- Return type:
pandas.Series
Notes
A json-like object contains 2 array_likes, each has length K. The first array represents the index array. The second array represents the value array.
- mt.pandas.series.json4Series(obj)
Converts a pandas.Series into a json-like object.
- Parameters:
obj (pandas.Series) – input Series
- Returns:
a json-like object
- Return type:
json_like
Notes
A json-like object contains 2 array_likes, each has length K. The first array represents the index array. The second array represents the value array.
- mt.pandas.series.to_categorical(series, value_list, missing_values='raise_exception', logger=None)
Converts a string series representing a categorical field into a zero-indexed categorical field and a one-hot field in json format.
- Parameters:
series (pandas.Series) – an indexed series whose values are strings representing categories
value_list (list) – list of accepted values to be converted into integers. The first value is converted into 0, the second value into 2, and so on. The length of the list is the number of categories.
missing_values ({'raise_exception', 'remove_and_warn', 'remove_in_silence'}) – policy for treating missing values upon encountered. If ‘raise_exception’ is specified, raise a ValueError when encountering the first missing value. If ‘remove_and_warn’ is specified, all missing values are warned to the logger, but records containing missing values are removed from output. If ‘remove_in_silence’ is specified, all records containing missing values are removed from output in slience.
logger (logging.Logger to equivalent, optional) – logger for debugging purposes
- Returns:
df – indexed dataframe with the same index as series. However, some records may be removed due to containing missing values. Field cat_id contains the converted zero-indexed categorical id intenger. Field one_hot contains a numpy 1d array representing the one-hot representation of the categorical id. The number of components of the one-hot vector is the same as the length of value_list.
- Return type:
pandas.DataFrame(columns=[‘cat_id’, ‘one_hot’])
- Raises:
ValueError – if something has gone wrong
- mt.pandas.series.series_apply(s: Series, func, bar_unit='it', logger: IndentedLoggerAdapter | None = None, func_uses_logger: bool = False) Series
Applies a function on every item of a pandas.Series, optionally with a progress bar.
- Parameters:
s (pandas.Series) – a series
func (function) – a function to map each item of the series to something
bar_unit (str, optional) – unit name to be passed to the progress bar. If None is provided, no bar is displayed.
logger (mt.logg.IndentedLoggerAdapter, optional) – logger for debugging purposes. Only used if bar_unit is not None.
func_uses_logger (bool) – whether or not the function takes logger as an additional keyword argument. Valid only if logger is used.
- Returns:
output series by invoking s.apply. And a progress bar is shown if asked.
- Return type:
pandas.Series
- mt.pandas.series.series_parallel_apply(s: Series, func, n_cores: int = -1, parallelism: str = 'multiprocess', logger: IndentedLoggerAdapter | None = None, scoped_msg: str | None = None) Series
Parallel-applies a function on every item of a pandas.Series, optionally with a progress bar.
The method wraps class:pandas_parallel_apply.SeriesParallel. The progress bars are shown if and only if a logger is provided.
- Parameters:
s (pandas.Series) – a series
func (function) – a function to map each item of the series to something. It must be pickable for parallel processing.
n_cores (int) – number of CPUs to use. Passed as-is to
pandas_parallel_apply.SeriesParallel
.parallelism ({'multithread', 'multiprocess'}) – multi-threading or multi-processing. Passed as-is to
pandas_parallel_apply.SeriesParallel
.logger (mt.logg.IndentedLoggerAdapter, optional) – logger for debugging purposes.
scoped_msg (str, optional) – whether or not to scoped_info the progress bars. Only valid if a logger is provided
- Returns:
output series by invoking s.apply.
- Return type:
pandas.Series
See also
pandas_parallel_apply.SeriesParallel
the wrapped class for the parallel_apply purpose
- mt.pandas.series.stats(s: Series) dict
Computes some statistics out of a scalar series.
- Parameters:
s (pandas.Series) – a scalar series of non-null numeric values
- Returns:
an output dictionary containing keys ‘min’, ‘max’, ‘mean’, ‘std’
- Return type:
dict