mt.pandas.convert
Functions
dfload_asyn()
: An asyn function that loads a dataframe file based on the file’s extension.dfload()
: Loads a dataframe file based on the file’s extension.dfsave_asyn()
: An asyn function that saves a dataframe to a file based on the file’s extension.dfsave()
: Saves a dataframe to a file based on the file’s extension.dfpack()
: Packs a dataframe into a more compact format.dfunpack()
: Unpacks a compact dataframe into a more expanded format.
- async mt.pandas.convert.dfload_asyn(df_filepath, *args, show_progress=False, unpack=True, parquet_convert_ndarray_to_list=False, file_read_delayed: bool = False, max_rows: int | None = None, nrows: int | None = None, context_vars: dict = {}, **kwargs) DataFrame
An asyn function that loads a dataframe file based on the file’s extension.
- Parameters:
df_filepath (str) – local path to an existing dataframe. The file extension is used to determine the file type.
show_progress (bool) – show a progress spinner in the terminal
unpack (bool) – whether or not to unpack the dataframe after loading. Ignored for ‘.pdh5’ format.
parquet_convert_ndarray_to_list (bool) – whether or not to convert 1D ndarrays in the loaded parquet table into Python lists. Ignored for ‘.pdh5’ format.
file_read_delayed (bool) – whether or not some columns can be delayed for reading later. Only valid for ‘.pdh5’ format.
max_rows (int, optional) – limit the maximum number of rows to read. Only valid for ‘.csv’, ‘.pdh5’ and ‘.parquet’ formats. This argument is only for backward compatibility. Please use nrows instead.
nrows (int, optional) – limit the maximum number of rows to read. Only valid for ‘.csv’, ‘.pdh5’ and ‘.parquet’ formats.
context_vars (dict) – a dictionary of context variables within which the function runs. It must include context_vars[‘async’] to tell whether to invoke the function asynchronously or not. Ignored for ‘.pdh5’ format.
*args (tuple) – list of positional arguments to pass to the corresponding reader. Ignored for ‘.pdh5’ format.
**kwargs (dict) – dictionary of keyword arguments to pass to the corresponding reader. Ignored for ‘.pdh5’ format.
- Returns:
loaded dataframe
- Return type:
pandas.DataFrame
Notes
For ‘.csv’ or ‘.csv.zip’ files, we use
mt.pandas.csv.read_csv()
. For ‘.parquet’ files, we usepandas.read_parquet()
. For .pdh5 files, we usemt.pandas.pdh5.load_pdh5_asyn()
.- Raises:
TypeError – if file type is unknown
- mt.pandas.convert.dfload(df_filepath, *args, show_progress=False, unpack=True, parquet_convert_ndarray_to_list=False, file_read_delayed: bool = False, max_rows: int | None = None, nrows: int | None = None, **kwargs) DataFrame
Loads a dataframe file based on the file’s extension.
- Parameters:
df_filepath (str) – local path to an existing dataframe. The file extension is used to determine the file type.
show_progress (bool) – show a progress spinner in the terminal
unpack (bool) – whether or not to unpack the dataframe after loading. Ignored for ‘.pdh5’ format.
parquet_convert_ndarray_to_list (bool) – whether or not to convert 1D ndarrays in the loaded parquet table into Python lists. Ignored for ‘.pdh5’ format.
file_read_delayed (bool) – whether or not some columns can be delayed for reading later. Only valid for ‘.pdh5’ format.
max_rows (int, optional) – limit the maximum number of rows to read. Only valid for ‘.csv’, ‘.pdh5’ and ‘.parquet’ formats. This argument is only for backward compatibility. Please use nrows instead.
nrows (int, optional) – limit the maximum number of rows to read. Only valid for ‘.csv’, ‘.pdh5’ and ‘.parquet’ formats.
*args (tuple) – list of positional arguments to pass to the corresponding reader. Ignored for ‘.pdh5’ format.
**kwargs (dict) – dictionary of keyword arguments to pass to the corresponding reader. Ignored for ‘.pdh5’ format.
- Returns:
loaded dataframe
- Return type:
pandas.DataFrame
Notes
For ‘.csv’ or ‘.csv.zip’ files, we use
mt.pandas.csv.read_csv()
. For ‘.parquet’ files, we usepandas.read_parquet()
. For .pdh5 files, we usemt.pandas.pdh5.load_pdh5_asyn()
.- Raises:
TypeError – if file type is unknown
- async mt.pandas.convert.dfsave_asyn(df, df_filepath, file_mode=436, show_progress=False, pack=True, context_vars: dict = {}, file_write_delayed: bool = False, **kwargs)
An asyn function that saves a dataframe to a file based on the file’s extension.
- Parameters:
df (pandas.DataFrame) – a dataframe
df_filepath (str) – local path to an existing dataframe. The file extension is used to determine the file type.
file_mode (int) – file mode to be set to using
os.chmod()
. If None is given, no setting of file mode will happen.show_progress (bool) – show a progress spinner in the terminal
pack (bool) – whether or not to pack the dataframe before saving. Ignored for ‘.pdh5’ format.
context_vars (dict) – a dictionary of context variables within which the function runs. It must include context_vars[‘async’] to tell whether to invoke the function asynchronously or not. Ignored for ‘.pdh5’ format.
file_write_delayed (bool) – Only valid in asynchronous mode. If True, wraps the file write task into a future and returns the future. In all other cases, proceeds as usual. Ignored for ‘.pdh5’ format.
**kwargs (dict) – dictionary of keyword arguments to pass to the corresponding writer. Ignored for ‘.pdh5’ format.
- Returns:
either a future or the number of bytes written, depending on whether the file write task is delayed or not. For ‘.pdh5’ format, 1 is returned.
- Return type:
asyncio.Future or int
Notes
For ‘.csv’ or ‘.csv.zip’ files, we use
mt.pandas.csv.to_csv()
. For ‘.parquet’ files, we usepandas.DataFrame.to_parquet()
. For ‘.pdh5’ files, we usemt.pandas.pdh5.save_pdh5()
.- Raises:
TypeError – if file type is unknown or if the input is not a dataframe
- mt.pandas.convert.dfsave(df, df_filepath, file_mode=436, show_progress=False, pack=True, **kwargs)
Saves a dataframe to a file based on the file’s extension.
- Parameters:
df (pandas.DataFrame) – a dataframe
df_filepath (str) – local path to an existing dataframe. The file extension is used to determine the file type.
file_mode (int) – file mode to be set to using
os.chmod()
. If None is given, no setting of file mode will happen.show_progress (bool) – show a progress spinner in the terminal
pack (bool) – whether or not to pack the dataframe before saving
**kwargs (dict) – dictionary of keyword arguments to pass to the corresponding writer
- Returns:
whatever the corresponding writer returns
- Return type:
object
Notes
For ‘.csv’ or ‘.csv.zip’ files, we use
mt.pandas.csv.to_csv()
. For ‘.parquet’ files, we usepandas.DataFrame.to_parquet()
.- Raises:
TypeError – if file type is unknown or if the input is not a dataframe
- mt.pandas.convert.dfpack(df, spinner=None)
Packs a dataframe into a more compact format.
At the moment, it converts each ndarray column into 3 columns, and each cv.Image column into a json column.
- Parameters:
df (pandas.DataFrame) – dataframe to be packed
spinner (Halo, optional) – spinner for tracking purposes
- Returns:
output dataframe
- Return type:
pandas.DataFrame
- mt.pandas.convert.dfunpack(df, spinner=None)
Unpacks a compact dataframe into a more expanded format.
This is the reverse function of
dfpack()
.- Parameters:
df (pandas.DataFrame) – dataframe to be unpacked
spinner (Halo, optional) – spinner for tracking purposes
- Returns:
output dataframe
- Return type:
pandas.DataFrame
Classes
Pdh5Cell
: A read-only cell of a pdh5 column.
- class mt.pandas.convert.Pdh5Cell(col: Pdh5Column, row_id: int)
A read-only cell of a pdh5 column.
Inheritance
digraph inheritance36576215af { bgcolor=transparent; rankdir=LR; size="8.0, 12.0"; "Pdh5Cell" [URL="mt.pandas.pdh5.html#mt.pandas.pdh5.Pdh5Cell",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="A read-only cell of a pdh5 column."]; }