mt.pandas.word
Custom word accessor for pandas.
Classes
WordAccessor
: Accessor for word fields.
- class mt.pandas.word.WordAccessor(pandas_obj)
Accessor for word fields.
Inheritance
digraph inheritance0b51fb1c99 { bgcolor=transparent; rankdir=LR; size="8.0, 12.0"; "WordAccessor" [URL="#mt.pandas.word.WordAccessor",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Accessor for word fields."]; }- property bigram
Returns a list of letter bigrams for each word. See ngram().
- property english
Returns which item is like an English word
- property extract_vietnamese_tone
Extracts the tone marks {””’, “’”, “?”, “~”, “.”}` each Vietnamese word.
- property letter
Returns a list of letters for each word. .
- property move_vi_tone_to_last
Moves the first tone mark to the end of a word.
- ngram(n)
Returns a list letter n-grams for each word.
- Parameters:
n (int) – number n specifying the letter n-gram. Must be integer greater than 1.
- Returns:
each element of the returning series is a list of n-grams of the corresponding element in the input series
- Return type:
pandas.Series
- Raises:
ValueError – if an argument is wrong
Notes
You can use pandas’ explode() function to process further.
- property remove_vietnamese_tone
Removes the tone marks in each Vietnamese word.
- property split_vi_diacritical
Splits any untoned diacritical Vietnamese letter into its base letter followed by a symbol representing the diacritical mark, in each word.
- property split_vi_tone
Splits any Vietnamese toned letter into its base letter followed by a symbol representing the tone mark (‘?~.)`, in each word.
- sub_map(substr_map)
Substitutes substrings using a dictionary/map.
For each substring of a word, the substring is replaced with a replacement string.
- Parameters:
substr_map (dict) – a map that maps each substring into a replacement string
- property trigram
Returns a list of letter trigrams for each word. See ngram().
- property truncate_first_vi_mark
Truncates each word to the first occurence of a split Vietnamese mark.
- property vietnamese
Returns which item is like a Vietnamese word