`mt.pandas.word`

Custom word accessor for pandas.

Classes 

WordAccessor: Accessor for word fields.

class mt.pandas.word.WordAccessor(pandas_obj)

Accessor for word fields.

Inheritance

digraph inheritance0b51fb1c99 { bgcolor=transparent; rankdir=LR; size="8.0, 12.0"; "WordAccessor" [URL="#mt.pandas.word.WordAccessor",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Accessor for word fields."]; }

property bigram: Returns a list of letter bigrams for each word. See ngram().

property english: Returns which item is like an English word

property extract_vietnamese_tone: Extracts the tone marks {””’, “’”, “?”, “~”, “.”}` each Vietnamese word.

property letter: Returns a list of letters for each word. .

property move_vi_tone_to_last: Moves the first tone mark to the end of a word.

ngram(n)

Returns a list letter n-grams for each word.

Parameters:: n (int) – number n specifying the letter n-gram. Must be integer greater than 1.
Returns:: each element of the returning series is a list of n-grams of the corresponding element in the input series
Return type:: pandas.Series
Raises:: ValueError – if an argument is wrong

Notes

You can use pandas’ explode() function to process further.

property remove_vietnamese_tone: Removes the tone marks in each Vietnamese word.

property split_vi_diacritical: Splits any untoned diacritical Vietnamese letter into its base letter followed by a symbol representing the diacritical mark, in each word.

property split_vi_tone: Splits any Vietnamese toned letter into its base letter followed by a symbol representing the tone mark (‘?~.)`, in each word.

sub_map(substr_map)

Substitutes substrings using a dictionary/map.

For each substring of a word, the substring is replaced with a replacement string.

Parameters:: substr_map (dict) – a map that maps each substring into a replacement string

property trigram: Returns a list of letter trigrams for each word. See ngram().

property truncate_first_vi_mark: Truncates each word to the first occurence of a split Vietnamese mark.

property vietnamese: Returns which item is like a Vietnamese word

mt.pandas.word

Classes

`mt.pandas.word`

Classes 