API Documentation¶
High-level API¶
The high level API exposes functions that works on plain unicode strings.
If you need to process other source or have implemented your own tokenizer, you’d better use the lower level parser classes below.
-
text_to_num.text2num(text: str, lang: str, relaxed: bool = False) → int¶ Convert the
textstring containing an integer number written in French into an integer value.Set
relaxedto True if you want to accept “quatre vingt(s)” as “quatre-vingt”.Raises an AssertionError if
textdoes not describe a valid number. Return an int.
-
text_to_num.alpha2digit(text: str, lang: str, relaxed: bool = False, signed: bool = True) → str¶ Return the text of
textwith all the French spelled numbers converted to digits. Takes care of punctuation. Setrelaxedto True if you want to accept “quatre vingt(s)” as “quatre-vingt”. Setsignedto False if you don’t want to produce signed numbers, that is, for example, if you prefer to get « moins 2 » instead of « -2 ».
Parsers¶
The high-level API is build upon these parsers implemented as classes.
Those classes passively consume word tokens and thus can be easly integrated into your own tokenizer/framework.
Convert spelled numbers into numeric values or digit strings.
-
class
text_to_num.parsers.WordStreamValueParser(lang: text_to_num.lang.base.Language, relaxed: bool = False)¶ The actual value builder engine.
The engine incrementaly recognize a stream of words as a valid number and build the corresponding numeric (interger) value.
The algorithm is based on the observation that humans gather the digits by group of three to more easily speak them out. And indeed, the language uses powers of 1000 to structure big numbers.
Public API:
self.push(word)self.value: int
-
group_expects(word: str, update: bool = True) → bool¶ Does the current group expect
wordto complete it as a valid number?wordshould not be a multiplier; multiplier should be handled first.
-
is_coef_appliable(coef: int) → bool¶ Is this multiplier expected?
-
push(word: str, look_ahead: Optional[str] = None) → bool¶ Push next word from the stream.
Don’t push punctuation marks or symbols, only words. It is the responsability of the caller to handle punctuation or any marker of pause in the word stream. The best practice is to call
self.close()on such markers and start again after.Return
Trueifwordcontributes to the current value elseFalse.The first time (after instanciating
self) this function returns True marks the beginning of a number.If this function returns False, and the last call returned True, that means you reached the end of a number. You can get its value from
self.value.Then, to parse a new number, you need to instanciate a new engine and start again from the last word you tried (the one that has just been rejected).
-
value¶ At any moment, get the value of the currently recognized number.
-
class
text_to_num.parsers.WordToDigitParser(lang: text_to_num.lang.base.Language, relaxed: bool = False, signed: bool = True)¶ Words to digit transcriber.
The engine incrementaly recognize a stream of words as a valid cardinal, ordinal, decimal or formal number (including leading zeros) and build the corresponding digit string.
Zeros are not treated as isolates but are considered as starting a new formal number and are concatenated to the following digit.
Public API:
self.push(word, look_ahead)self.close()self.value: str
-
at_start() → bool¶ Return True if nothing valid parsed yet.
-
at_start_of_seq() → bool¶ Return true if we are waiting for the start of the integer part or the start of the fraction part.
-
close() → None¶ Signal end of input if input stream ends while still in a number.
It’s safe to call it multiple times.
-
is_article(word: str, following: Optional[str]) → bool¶
-
push(word: str, look_ahead: Optional[str] = None) → bool¶ Push next word from the stream.
Return
Trueifwordcontributes to the current value elseFalse.The first time (after instanciating
self) this function returns True marks the beginning of a number.If this function returns False, and the last call returned True, that means you reached the end of a number. You can get its value from
self.value.Then, to parse a new number, you need to instanciate a new engine and start again from the last word you tried (the one that has just been rejected).
-
value¶
Misc.¶
-
text_to_num.transforms.look_ahead(sequence: Sequence[Any]) → Iterator[Tuple[Any, Any]]¶ Look-ahead iterator.
Iterate over a sequence by returning couples (current element, next element). The last couple returned before StopIteration is raised, is (last element, None).
Example:
>>> for elt, nxt_elt in look_ahead(sequence): ... # do something