API Documentation

High-level API

The high level API exposes functions that works on plain unicode strings.

If you need to process other source or have implemented your own tokenizer, you’d better use the lower level parser classes below.

text_to_num.text2num(text: str, relaxed: bool = False) → int

Convert the text string containing an integer number written in French into an integer value.

Set relaxed to True if you want to accept “quatre vingt(s)” as “quatre-vingt”.

Raises an AssertionError if text does not describe a valid number. Return an int.

text_to_num.alpha2digit(text: str, relaxed: bool = False) → str

Return the text of text with all the French spelled numbers converted to digits. Takes care of punctuation. Set relaxed to True if you want to accept “quatre vingt(s)” as “quatre-vingt”.

Parsers

The high-level API is build upon these parsers implemented as classes.

Those classes passively consume word tokens and thus can be easly integrated into your own tokenizer/framework.

Convert French spelled numbers into numeric values or digit strings.

class text_to_num.parsers.WordStreamValueParser(relaxed: bool = False)

The actual value builder engine.

The engine incrementaly recognize a stream of words as a valid number and build the corresponding numeric (interger) value.

The algorithm is based on the observation that humans gather the digits by group of three to more easily speak them out. And indeed, the language uses powers of 1000 to structure big numbers.

Public API:

  • self.push(word)
  • self.value: int
group_expects(word: str, update: bool = True) → bool

Does the current group expect word to complete it as a valid number? word should not be a multiplier; multiplier should be handled first.

is_coef_appliable(coef: int) → bool

Is this multiplier expected?

push(word: str, look_ahead: Optional[str] = None) → bool

Push next word from the stream.

Don’t push punctuation marks or symbols, only words. It is the responsability of the caller to handle punctuation or any marker of pause in the word stream. The best practice is to call self.close() on such markers and start again after.

Return True if word contributes to the current value else False.

The first time (after instanciating self) this function returns True marks the beginning of a number.

If this function returns False, and the last call returned True, that means you reached the end of a number. You can get its value from self.value.

Then, to parse a new number, you need to instanciate a new engine and start again from the last word you tried (the one that has just been rejected).

value

At any moment, get the value of the currently recognized number.

class text_to_num.parsers.WordToDigitParser(relaxed: bool = False)

Words to digit transcriber.

The engine incrementaly recognize a stream of words as a valid cardinal, ordinal, decimal or formal number (including leading zeros) and build the corresponding digit string.

Zeros are not treated as isolates but are considered as starting a new formal number and are concatenated to the following digit.

Public API:

  • self.push(word, look_ahead)
  • self.close()
  • self.value: str
at_start() → bool

Return True if nothing valid parsed yet.

at_start_of_seq() → bool

Return true if we are waiting for the start of the integer part or the start of the fraction part.

close() → None

Signal end of input if input stream ends while still in a number.

It’s safe to call it multiple times.

is_article(word: str, following: Optional[str]) → bool
push(word: str, look_ahead: Optional[str] = None) → bool

Push next word from the stream.

Return True if word contributes to the current value else False.

The first time (after instanciating self) this function returns True marks the beginning of a number.

If this function returns False, and the last call returned True, that means you reached the end of a number. You can get its value from self.value.

Then, to parse a new number, you need to instanciate a new engine and start again from the last word you tried (the one that has just been rejected).

value
text_to_num.parsers.not_numeric_word(word: Optional[str]) → bool
text_to_num.parsers.ord2card(word: str) → Optional[str]

Convert ordinal number to cardinal.

Return None if word is not an ordinal.

Misc.

text_to_num.transforms.look_ahead(sequence: Sequence[Any]) → Iterator[Tuple[Any, Any]]

Look-ahead iterator.

Iterate over a sequence by returning couples (current element, next element). The last couple returned before StopIteration is raised, is (last element, None).

Example:

>>> for elt, nxt_elt in look_ahead(sequence):
... # do something