API Documentation

High-level API

The high level API exposes functions that works on plain unicode strings.

If you need to process other sources or have implemented your own tokenizer, you’d better use the other API below.

text_to_num.text2num(text, lang)

Convert the text string containing an integer number written as letters into an integer value.

Raises a ValueError if text does not describe a valid number. Return an int.

text_to_num.alpha2digit(text, lang, threshold=3.0)

Return the text of text with all the lang spelled numbers converted to digits.

The function is punctuation aware.

Isolated numbers, that is, integers and ordinals that don’t belong to a group of numbers, are converted if and only if their value is above the threshold.

Custom Token Processing

class text_to_num.Occurence

An occurence of a number was found in the sequence of tokens.

An occurence can span multiple consecutive tokens.

end

Offset in the sequence of tokens where the number ends.

is_ordinal

Is this an ordinal?

start

Offset in the sequence of tokens where the number starts.

text

The text representation of the number.

value

The value of the number as float

class text_to_num.Token(*args, **kwargs)

Protocol for natural language tokens suitable for find_numbers`.

The only mandatory method is self.text().

not_a_number_part() bool

Despite its form, we have evidence that this token is not a number part.

Default implementation return False.

nt_separated(previous: Token) bool

In some token streams (e.g. ASR output), there is no punctuation tokens to separate words that must be undestood separately, but the tokens themselves may embed additional information to convey that distinction (e.g. timing information that can reveal voice pauses). This method should return true if self and previous are unrelated.

Default implementation return False.

abstractmethod text() str

Return the text content of the token.

text_to_num.find_numbers(input, lang, threshold=3.0)

Find the numbers and their positions in a stream of Tokens (the input). Return a list of Occurence instances.