docutils.parsers.rst package
This is docutils.parsers.rst package. It exports a single class, Parser,
the reStructuredText parser.
Usage
Create a parser:
parser = docutils.parsers.rst.Parser()
Several optional arguments may be passed to modify the parser’s behavior. Please see Customizing the Parser below for details.
Gather input (a multi-line string), by reading a file or the standard input:
input = sys.stdin.read()
Create a new empty docutils.nodes.document tree:
document = docutils.utils.new_document(source, settings)
See docutils.utils.new_document() for parameter details.
Run the parser, populating the document tree:
parser.parse(input, document)
Parser Overview
The reStructuredText parser is implemented as a state machine, examining its input one line at a time. To understand how the parser works, please first become familiar with the docutils.statemachine module, then see the states module.
Customizing the Parser
Anything that isn’t already customizable is that way simply because that type of customizability hasn’t been implemented yet. Patches welcome!
When instantiating an object of the Parser class, two parameters may be
passed: rfc2822 and inliner. Pass rfc2822=True to enable an
initial RFC-2822 style header block, parsed as a “field_list” element (with
“class” attribute set to “rfc2822”). Currently this is the only body-level
element which is customizable without subclassing. (Tip: subclass Parser
and change its “state_classes” and “initial_state” attributes to refer to new
classes. Contact the author if you need more details.)
The inliner parameter takes an instance of states.Inliner or a subclass.
It handles inline markup recognition. A common extension is the addition of
further implicit hyperlinks, like “RFC 2822”. This can be done by subclassing
states.Inliner, adding a new method for the implicit markup, and adding a
(pattern, method) pair to the “implicit_dispatch” attribute of the
subclass. See states.Inliner.implicit_inline() for details. Explicit
inline markup can be customized in a states.Inliner subclass via the
patterns.initial and dispatch attributes (and new methods as
appropriate).
- class Parser(rfc2822=False, inliner=None)[source]
Bases:
ParserThe reStructuredText parser.
- supported = ('rst', 'restructuredtext', 'rest', 'restx', 'rtxt', 'rstx')
Aliases this parser supports.
- settings_spec = ('Generic Parser Options', None, (('Disable directives that insert the contents of an external file; replaced with a "warning" system message.', ['--no-file-insertion'], {'action': 'store_false', 'default': 1, 'dest': 'file_insertion_enabled', 'validator': <function validate_boolean>}), ('Enable directives that insert the contents of an external file. (default)', ['--file-insertion-enabled'], {'action': 'store_true'}), ('Disable the "raw" directive; replaced with a "warning" system message.', ['--no-raw'], {'action': 'store_false', 'default': 1, 'dest': 'raw_enabled', 'validator': <function validate_boolean>}), ('Enable the "raw" directive. (default)', ['--raw-enabled'], {'action': 'store_true'}), ('Maximal number of characters in an input line. Default 10 000.', ['--line-length-limit'], {'default': 10000, 'metavar': '<length>', 'type': 'int', 'validator': <function validate_nonnegative_int>})), 'reStructuredText Parser Options', None, (('Recognize and link to standalone PEP references (like "PEP 258").', ['--pep-references'], {'action': 'store_true', 'validator': <function validate_boolean>}), ('Base URL for PEP references (default "https://peps.python.org/").', ['--pep-base-url'], {'default': 'https://peps.python.org/', 'metavar': '<URL>', 'validator': <function validate_url_trailing_slash>}), ('Template for PEP file part of URL. (default "pep-%04d")', ['--pep-file-url-template'], {'default': 'pep-%04d', 'metavar': '<URL>'}), ('Recognize and link to standalone RFC references (like "RFC 822").', ['--rfc-references'], {'action': 'store_true', 'validator': <function validate_boolean>}), ('Base URL for RFC references (default "https://tools.ietf.org/html/").', ['--rfc-base-url'], {'default': 'https://tools.ietf.org/html/', 'metavar': '<URL>', 'validator': <function validate_url_trailing_slash>}), ('Set number of spaces for tab expansion (default 8).', ['--tab-width'], {'default': 8, 'metavar': '<width>', 'type': 'int', 'validator': <function validate_nonnegative_int>}), ('Remove spaces before footnote references.', ['--trim-footnote-reference-space'], {'action': 'store_true', 'validator': <function validate_boolean>}), ('Leave spaces before footnote references.', ['--leave-footnote-reference-space'], {'action': 'store_false', 'dest': 'trim_footnote_reference_space'}), ('Token name set for parsing code with Pygments: one of "long", "short", or "none" (no parsing). Default is "long".', ['--syntax-highlight'], {'choices': ['long', 'short', 'none'], 'default': 'long', 'metavar': '<format>'}), ('Change straight quotation marks to typographic form: one of "yes", "no", "alt[ernative]" (default "no").', ['--smart-quotes'], {'default': False, 'metavar': '<yes/no/alt>', 'validator': <function validate_ternary>}), ('Characters to use as "smart quotes" for <language>. ', ['--smartquotes-locales'], {'action': 'append', 'metavar': '<language:quotes[,language:quotes,...]>', 'validator': <function validate_smartquotes_locales>}), ('Inline markup recognized at word boundaries only (adjacent to punctuation or whitespace). Force character-level inline markup recognition with "\\ " (backslash + space). Default.', ['--word-level-inline-markup'], {'action': 'store_false', 'dest': 'character_level_inline_markup'}), ('Inline markup recognized anywhere, regardless of surrounding characters. Backslash-escapes must be used to avoid unwanted markup recognition. Useful for East Asian languages. Experimental.', ['--character-level-inline-markup'], {'action': 'store_true', 'default': False, 'dest': 'character_level_inline_markup'})))
Runtime settings specification. Override in subclasses.
Defines runtime settings and associated command-line options, as used by docutils.frontend.OptionParser. This is a tuple of:
Option group title (string or None which implies no group, just a list of single options).
Description (string or None).
A sequence of option tuples. Each consists of:
Help text (string)
List of option strings (e.g.
['-Q', '--quux']).Dictionary of keyword arguments sent to the OptionParser/OptionGroup
add_optionmethod.Runtime setting names are derived implicitly from long option names (’–a-setting’ becomes
settings.a_setting) or explicitly from the ‘dest’ keyword argument.Most settings will also have a ‘validator’ keyword & function. The validator function validates setting values (from configuration files and command-line option arguments) and converts them to appropriate types. For example, the
docutils.frontend.validate_booleanfunction, required by all boolean settings, converts true values (‘1’, ‘on’, ‘yes’, and ‘true’) to 1 and false values (‘0’, ‘off’, ‘no’, ‘false’, and ‘’) to 0. Validators need only be set once per setting. See the docutils.frontend.validate_* functions.See the optparse docs for more details.
More triples of group title, description, options, as many times as needed. Thus, settings_spec tuples can be simply concatenated.
- config_section = 'restructuredtext parser'
The name of the config file section specific to this component (lowercase, no brackets). Override in subclasses.
- config_section_dependencies = ('parsers',)
A list of names of config file sections that are to be applied before config_section, in order (from general to specific). In other words, the settings in config_section are to be overlaid on top of the settings from these sections. The “general” section is assumed implicitly. Override in subclasses.
- exception DirectiveError(level, message)[source]
Bases:
ExceptionStore a message and a system message level.
To be thrown from inside directive code.
Do not instantiate directly – use Directive.directive_error() instead!
- class Directive(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]
Bases:
objectBase class for reStructuredText directives.
The following attributes may be set by subclasses. They are interpreted by the directive parser (which runs the directive class):
required_arguments: The number of required arguments (default: 0).
optional_arguments: The number of optional arguments (default: 0).
final_argument_whitespace: A boolean, indicating if the final argument may contain whitespace (default: False).
option_spec: A dictionary, mapping known option names to conversion functions such as int or float (default: {}, no options). Several conversion functions are defined in the directives/__init__.py module.
Option conversion functions take a single parameter, the option argument (a string or
None), validate it and/or convert it to the appropriate form. Conversion functions may raise ValueError and TypeError exceptions.has_content: A boolean; True if content is allowed. Client code must handle the case where content is required but not supplied (an empty content list will be supplied).
Arguments are normally single whitespace-separated words. The final argument may contain whitespace and/or newlines if final_argument_whitespace is True.
If the form of the arguments is more complex, specify only one argument (either required or optional) and set final_argument_whitespace to True; the client code must do any context-sensitive parsing.
When a directive implementation is being run, the directive class is instantiated, and the run() method is executed. During instantiation, the following instance variables are set:
nameis the directive type or name (string).argumentsis the list of positional arguments (strings).optionsis a dictionary mapping option names (strings) to values (type depends on option conversion functions; see option_spec above).contentis a list of strings, the directive content line by line.linenois the absolute line number of the first line of the directive.content_offsetis the line offset of the first line of the content from the beginning of the current input. Used when initiating a nested parse.block_textis a string containing the entire directive.stateis the state which called the directive function.state_machineis the state machine which controls the state which called the directive function.reporteris the state machine’s reporter instance.
Directive functions return a list of nodes which will be inserted into the document tree at the point where the directive was encountered. This can be an empty list if there is nothing to insert.
For ordinary directives, the list must contain body elements or structural elements. Some directives are intended specifically for substitution definitions, and must return a list of Text nodes and/or inline elements (suitable for inline insertion, in place of the substitution reference). Such directives must verify substitution definition context, typically using code like this:
if not isinstance(state, states.SubstitutionDef): error = self.reporter.error( 'Invalid context: the "%s" directive can only be used ' 'within a substitution definition.' % (name), nodes.literal_block(block_text, block_text), line=lineno) return [error]
- required_arguments = 0
Number of required directive arguments.
- optional_arguments = 0
Number of optional arguments after the required arguments.
- final_argument_whitespace = False
May the final argument contain whitespace?
- option_spec = None
Mapping of option names to validator functions.
- has_content = False
May the directive have content?
- directive_error(level, message)[source]
Return a DirectiveError suitable for being thrown as an exception.
Call “raise self.directive_error(level, message)” from within a directive implementation to return one single system message at level level, which automatically gets the directive block and the line number added.
Preferably use the debug, info, warning, error, or severe wrapper methods, e.g.
self.error(message)to generate an ERROR-level directive error.
- convert_directive_function(directive_fn)[source]
Define & return a directive class generated from directive_fn.
directive_fn uses the old-style, functional interface.