Lexis

Lexical scanning is the initial stage of source code text analysis.

During this process, the scanner iterates through the characters of a Unicode string, establishing token boundaries and associating each scanned fragment, delimited by these boundaries, with a corresponding token instance.

The lexical scanner is a simple program that implements finite-state automata, always looking at most one character ahead. Consequently, the scanner can be restarted at any character of the text, which is particularly beneficial for incremental rescanning. For instance, when an end user modifies a specific portion of the source code text, the scanner restarts before the altered fragment, eventually converging to the state of the tail of the text.

The resulting token stream serves as input for the syntax parser.