API Documentation¶

High-level API¶

The high level API exposes functions that works on plain unicode strings.

If you need to process other sources or have implemented your own tokenizer, you’d better use the other API below.

text_to_num.text2num(text, lang)¶

Convert the text string containing an integer number written as letters into an integer value.

Raises a ValueError if text does not describe a valid number. Return an int.

text_to_num.alpha2digit(text, lang, threshold=3.0)¶

Return the text of text with all the lang spelled numbers converted to digits.

The function is punctuation aware.

Isolated numbers, that is, integers and ordinals that don’t belong to a group of numbers, are converted if and only if their value is above the threshold.

Custom Token Processing¶

class text_to_num.Occurence¶

An occurence of a number was found in the sequence of tokens.

An occurence can span multiple consecutive tokens.

end¶: Offset in the sequence of tokens where the number ends.

is_ordinal¶: Is this an ordinal?

start¶: Offset in the sequence of tokens where the number starts.

text¶: The text representation of the number.

value¶: The value of the number as float

class text_to_num.Token(*args, **kwargs)¶

Protocol for natural language tokens suitable for find_numbers`.

The only mandatory method is self.text().

not_a_number_part() → bool¶

Despite its form, we have evidence that this token is not a number part.

Default implementation return False.

nt_separated(previous: Token) → bool¶

In some token streams (e.g. ASR output), there is no punctuation tokens to separate words that must be undestood separately, but the tokens themselves may embed additional information to convey that distinction (e.g. timing information that can reveal voice pauses). This method should return true if self and previous are unrelated.

Default implementation return False.

abstractmethod text() → str¶: Return the text content of the token.

text_to_num.find_numbers(input, lang, threshold=3.0)¶: Find the numbers and their positions in a stream of Tokens (the input). Return a list of Occurence instances.

API Documentation¶

High-level API¶

Custom Token Processing¶

text2num

Navigation

Related Topics