docutils.parsers.rst package

This is docutils.parsers.rst package. It exports a single class, Parser, the reStructuredText parser.

Usage

  1. Create a parser:

    parser = docutils.parsers.rst.Parser()
    

    Several optional arguments may be passed to modify the parser’s behavior. Please see Customizing the Parser below for details.

  2. Gather input (a multi-line string), by reading a file or the standard input:

    input = sys.stdin.read()
    
  3. Create a new empty docutils.nodes.document tree:

    document = docutils.utils.new_document(source, settings)
    

    See docutils.utils.new_document() for parameter details.

  4. Run the parser, populating the document tree:

    parser.parse(input, document)
    

Parser Overview

The reStructuredText parser is implemented as a state machine, examining its input one line at a time. To understand how the parser works, please first become familiar with the docutils.statemachine module, then see the states module.

Customizing the Parser

Anything that isn’t already customizable is that way simply because that type of customizability hasn’t been implemented yet. Patches welcome!

When instantiating an object of the Parser class, two parameters may be passed: rfc2822 and inliner. Pass rfc2822=True to enable an initial RFC-2822 style header block, parsed as a “field_list” element (with “class” attribute set to “rfc2822”). Currently this is the only body-level element which is customizable without subclassing. (Tip: subclass Parser and change its “state_classes” and “initial_state” attributes to refer to new classes. Contact the author if you need more details.)

The inliner parameter takes an instance of states.Inliner or a subclass. It handles inline markup recognition. A common extension is the addition of further implicit hyperlinks, like “RFC 2822”. This can be done by subclassing states.Inliner, adding a new method for the implicit markup, and adding a (pattern, method) pair to the “implicit_dispatch” attribute of the subclass. See states.Inliner.implicit_inline() for details. Explicit inline markup can be customized in a states.Inliner subclass via the patterns.initial and dispatch attributes (and new methods as appropriate).

class Parser(rfc2822=False, inliner=None)[source]

Bases: Parser

The reStructuredText parser.

supported = ('rst', 'restructuredtext', 'rest', 'restx', 'rtxt', 'rstx')

Aliases this parser supports.

settings_spec = ('Generic Parser Options', None, (('Disable directives that insert the contents of an external file; replaced with a "warning" system message.', ['--no-file-insertion'], {'action': 'store_false', 'default': 1, 'dest': 'file_insertion_enabled', 'validator': <function validate_boolean>}), ('Enable directives that insert the contents of an external file. (default)', ['--file-insertion-enabled'], {'action': 'store_true'}), ('Disable the "raw" directive; replaced with a "warning" system message.', ['--no-raw'], {'action': 'store_false', 'default': 1, 'dest': 'raw_enabled', 'validator': <function validate_boolean>}), ('Enable the "raw" directive. (default)', ['--raw-enabled'], {'action': 'store_true'}), ('Maximal number of characters in an input line. Default 10 000.', ['--line-length-limit'], {'default': 10000, 'metavar': '<length>', 'type': 'int', 'validator': <function validate_nonnegative_int>})), 'reStructuredText Parser Options', None, (('Recognize and link to standalone PEP references (like "PEP 258").', ['--pep-references'], {'action': 'store_true', 'validator': <function validate_boolean>}), ('Base URL for PEP references (default "https://peps.python.org/").', ['--pep-base-url'], {'default': 'https://peps.python.org/', 'metavar': '<URL>', 'validator': <function validate_url_trailing_slash>}), ('Template for PEP file part of URL. (default "pep-%04d")', ['--pep-file-url-template'], {'default': 'pep-%04d', 'metavar': '<URL>'}), ('Recognize and link to standalone RFC references (like "RFC 822").', ['--rfc-references'], {'action': 'store_true', 'validator': <function validate_boolean>}), ('Base URL for RFC references (default "https://tools.ietf.org/html/").', ['--rfc-base-url'], {'default': 'https://tools.ietf.org/html/', 'metavar': '<URL>', 'validator': <function validate_url_trailing_slash>}), ('Set number of spaces for tab expansion (default 8).', ['--tab-width'], {'default': 8, 'metavar': '<width>', 'type': 'int', 'validator': <function validate_nonnegative_int>}), ('Remove spaces before footnote references.', ['--trim-footnote-reference-space'], {'action': 'store_true', 'validator': <function validate_boolean>}), ('Leave spaces before footnote references.', ['--leave-footnote-reference-space'], {'action': 'store_false', 'dest': 'trim_footnote_reference_space'}), ('Token name set for parsing code with Pygments: one of "long", "short", or "none" (no parsing). Default is "long".', ['--syntax-highlight'], {'choices': ['long', 'short', 'none'], 'default': 'long', 'metavar': '<format>'}), ('Change straight quotation marks to typographic form: one of "yes", "no", "alt[ernative]" (default "no").', ['--smart-quotes'], {'default': False, 'metavar': '<yes/no/alt>', 'validator': <function validate_ternary>}), ('Characters to use as "smart quotes" for <language>. ', ['--smartquotes-locales'], {'action': 'append', 'metavar': '<language:quotes[,language:quotes,...]>', 'validator': <function validate_smartquotes_locales>}), ('Inline markup recognized at word boundaries only (adjacent to punctuation or whitespace). Force character-level inline markup recognition with "\\ " (backslash + space). Default.', ['--word-level-inline-markup'], {'action': 'store_false', 'dest': 'character_level_inline_markup'}), ('Inline markup recognized anywhere, regardless of surrounding characters. Backslash-escapes must be used to avoid unwanted markup recognition. Useful for East Asian languages. Experimental.', ['--character-level-inline-markup'], {'action': 'store_true', 'default': False, 'dest': 'character_level_inline_markup'})))

Runtime settings specification. Override in subclasses.

Defines runtime settings and associated command-line options, as used by docutils.frontend.OptionParser. This is a tuple of:

  • Option group title (string or None which implies no group, just a list of single options).

  • Description (string or None).

  • A sequence of option tuples. Each consists of:

    • Help text (string)

    • List of option strings (e.g. ['-Q', '--quux']).

    • Dictionary of keyword arguments sent to the OptionParser/OptionGroup add_option method.

      Runtime setting names are derived implicitly from long option names (’–a-setting’ becomes settings.a_setting) or explicitly from the ‘dest’ keyword argument.

      Most settings will also have a ‘validator’ keyword & function. The validator function validates setting values (from configuration files and command-line option arguments) and converts them to appropriate types. For example, the docutils.frontend.validate_boolean function, required by all boolean settings, converts true values (‘1’, ‘on’, ‘yes’, and ‘true’) to 1 and false values (‘0’, ‘off’, ‘no’, ‘false’, and ‘’) to 0. Validators need only be set once per setting. See the docutils.frontend.validate_* functions.

      See the optparse docs for more details.

  • More triples of group title, description, options, as many times as needed. Thus, settings_spec tuples can be simply concatenated.

config_section = 'restructuredtext parser'

The name of the config file section specific to this component (lowercase, no brackets). Override in subclasses.

config_section_dependencies = ('parsers',)

A list of names of config file sections that are to be applied before config_section, in order (from general to specific). In other words, the settings in config_section are to be overlaid on top of the settings from these sections. The “general” section is assumed implicitly. Override in subclasses.

get_transforms()[source]

Transforms required by this class. Override in subclasses.

parse(inputstring, document)[source]

Parse inputstring and populate document, a document tree.

exception DirectiveError(level, message)[source]

Bases: Exception

Store a message and a system message level.

To be thrown from inside directive code.

Do not instantiate directly – use Directive.directive_error() instead!

__init__(level, message)[source]

Set error message and level

class Directive(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]

Bases: object

Base class for reStructuredText directives.

The following attributes may be set by subclasses. They are interpreted by the directive parser (which runs the directive class):

  • required_arguments: The number of required arguments (default: 0).

  • optional_arguments: The number of optional arguments (default: 0).

  • final_argument_whitespace: A boolean, indicating if the final argument may contain whitespace (default: False).

  • option_spec: A dictionary, mapping known option names to conversion functions such as int or float (default: {}, no options). Several conversion functions are defined in the directives/__init__.py module.

    Option conversion functions take a single parameter, the option argument (a string or None), validate it and/or convert it to the appropriate form. Conversion functions may raise ValueError and TypeError exceptions.

  • has_content: A boolean; True if content is allowed. Client code must handle the case where content is required but not supplied (an empty content list will be supplied).

Arguments are normally single whitespace-separated words. The final argument may contain whitespace and/or newlines if final_argument_whitespace is True.

If the form of the arguments is more complex, specify only one argument (either required or optional) and set final_argument_whitespace to True; the client code must do any context-sensitive parsing.

When a directive implementation is being run, the directive class is instantiated, and the run() method is executed. During instantiation, the following instance variables are set:

  • name is the directive type or name (string).

  • arguments is the list of positional arguments (strings).

  • options is a dictionary mapping option names (strings) to values (type depends on option conversion functions; see option_spec above).

  • content is a list of strings, the directive content line by line.

  • lineno is the absolute line number of the first line of the directive.

  • content_offset is the line offset of the first line of the content from the beginning of the current input. Used when initiating a nested parse.

  • block_text is a string containing the entire directive.

  • state is the state which called the directive function.

  • state_machine is the state machine which controls the state which called the directive function.

  • reporter is the state machine’s reporter instance.

Directive functions return a list of nodes which will be inserted into the document tree at the point where the directive was encountered. This can be an empty list if there is nothing to insert.

For ordinary directives, the list must contain body elements or structural elements. Some directives are intended specifically for substitution definitions, and must return a list of Text nodes and/or inline elements (suitable for inline insertion, in place of the substitution reference). Such directives must verify substitution definition context, typically using code like this:

if not isinstance(state, states.SubstitutionDef):
    error = self.reporter.error(
        'Invalid context: the "%s" directive can only be used '
        'within a substitution definition.' % (name),
        nodes.literal_block(block_text, block_text), line=lineno)
    return [error]
required_arguments = 0

Number of required directive arguments.

optional_arguments = 0

Number of optional arguments after the required arguments.

final_argument_whitespace = False

May the final argument contain whitespace?

option_spec = None

Mapping of option names to validator functions.

has_content = False

May the directive have content?

run()[source]
directive_error(level, message)[source]

Return a DirectiveError suitable for being thrown as an exception.

Call “raise self.directive_error(level, message)” from within a directive implementation to return one single system message at level level, which automatically gets the directive block and the line number added.

Preferably use the debug, info, warning, error, or severe wrapper methods, e.g. self.error(message) to generate an ERROR-level directive error.

debug(message)[source]
info(message)[source]
warning(message)[source]
error(message)[source]
severe(message)[source]
assert_has_content()[source]

Throw an ERROR-level DirectiveError if the directive doesn’t have contents.

add_name(node)[source]

Append self.options[‘name’] to node[‘names’] if it exists.

Also normalize the name string and register it as explicit target.

convert_directive_function(directive_fn)[source]

Define & return a directive class generated from directive_fn.

directive_fn uses the old-style, functional interface.

Subpackages

Submodules