API Documentation¶
High-level API¶
The high level API exposes functions that works on plain unicode strings.
If you need to process other sources or have implemented your own tokenizer, you’d better use the other API below.
- text_to_num.text2num(text, lang)¶
Convert the
textstring containing an integer number written as letters into an integer value.Raises a ValueError if
textdoes not describe a valid number. Return an int.
- text_to_num.alpha2digit(text, lang, threshold=3.0)¶
Return the text of
textwith all thelangspelled numbers converted to digits.The function is punctuation aware.
Isolated numbers, that is, integers and ordinals that don’t belong to a group of numbers, are converted if and only if their value is above the
threshold.
Custom Token Processing¶
- class text_to_num.Occurence¶
An occurence of a number was found in the sequence of tokens.
An occurence can span multiple consecutive tokens.
- end¶
Offset in the sequence of tokens where the number ends.
- is_ordinal¶
Is this an ordinal?
- start¶
Offset in the sequence of tokens where the number starts.
- text¶
The text representation of the number.
- value¶
The value of the number as float
- class text_to_num.Token(*args, **kwargs)¶
Protocol for natural language tokens suitable for
find_numbers`.The only mandatory method is
self.text().- not_a_number_part() bool¶
Despite its form, we have evidence that this token is not a number part.
Default implementation return
False.
- nt_separated(previous: Token) bool¶
In some token streams (e.g. ASR output), there is no punctuation tokens to separate words that must be undestood separately, but the tokens themselves may embed additional information to convey that distinction (e.g. timing information that can reveal voice pauses). This method should return true if self and previous are unrelated.
Default implementation return
False.
- abstractmethod text() str¶
Return the text content of the token.
- text_to_num.find_numbers(input, lang, threshold=3.0)¶
Find the numbers and their positions in a stream of Tokens (the
input). Return a list ofOccurenceinstances.