docutils.utils.code_analyzer module

Lexical analysis of formal languages (i.e. code) using Pygments.

exception LexerError[source]: Bases: ApplicationError

class Lexer(code, language, tokennames='short')[source]

Bases: object

Parse code lines and yield “classified” tokens.

Arguments

code – string of source code to parse, language – formal language the code is written in, tokennames – either ‘long’, ‘short’, or ‘none’ (see below).

Merge subsequent tokens of the same token-type.

Iterating over an instance yields the tokens as (tokentype, value) tuples. The value of tokennames configures the naming of the tokentype:

‘long’: downcased full token type name, ‘short’: short name defined by pygments.token.STANDARD_TYPES

(= class argument used in pygments html output),

‘none’: skip lexical analysis.

__init__(code, language, tokennames='short')[source]: Set up a lexical analyzer for code in language.

merge(tokens)[source]

Merge subsequent tokens of same token-type.

Also strip the final newline (added by pygments).

__iter__()[source]: Parse self.code and yield “classified” tokens.

class NumberLines(tokens, startline, endline)[source]

Bases: object

Insert linenumber-tokens at the start of every code line.

Arguments

tokens – iterable of (classes, value) tuples startline – first line number endline – last line number

Iterating over an instance yields the tokens with a (['ln'], '<the line number>') token added for every code line. Multi-line tokens are split.