docutils.nodes module

Docutils document tree element class library.

Classes in CamelCase are abstract base classes or auxiliary classes. The one exception is Text, for a text (PCDATA) node; uppercase is used to differentiate from element classes. Classes in lower_case_with_underscores are element classes, matching the XML element generic identifiers in the DTD.

The position of each node (the level at which it can occur) is significant and is represented by abstract base classes (Root, Structural, Body, Inline, etc.). Certain transformations will be easier because we can use isinstance(node, base_class) to determine the position of the node in the hierarchy.

class Node[source]

Bases: object

Abstract base class of nodes in a document tree.

parent = None: Back-reference to the Node immediately containing this Node.

source = None: Path or description of the input source which generated this Node.

line = None: The line number (1-based) of the beginning of this Node in source.

_document = None

property document: Return the document root node of the tree containing this Node.

__bool__()[source]

Node instances are always true, even if they’re empty. A node is more than a simple container. Its boolean “truth” does not depend on having one or more subnodes in the doctree.

Use len() to check node length.

asdom(dom=None)[source]: Return a DOM fragment representation of this Node.

pformat(indent=' ', level=0)[source]

Return an indented pseudo-XML representation, for test purposes.

Override in subclasses.

copy()[source]: Return a copy of self.

deepcopy()[source]: Return a deep copy of self (also copying children).

astext()[source]: Return a string representation of this Node.

setup_child(child)[source]

walk(visitor)[source]

Traverse a tree of Node objects, calling the dispatch_visit() method of visitor when entering each node. (The walkabout() method is similar, except it also calls the dispatch_departure() method before exiting each node.)

This tree traversal supports limited in-place tree modifications. Replacing one node with one or more nodes is OK, as is removing an element. However, if the node removed or replaced occurs after the current node, the old node will still be traversed, and any new nodes will not.

Within visit methods (and depart methods for walkabout()), TreePruningException subclasses may be raised (SkipChildren, SkipSiblings, SkipNode, SkipDeparture).

Parameter visitor: A NodeVisitor object, containing a visit implementation for each Node subclass encountered.

Return true if we should stop the traversal.

walkabout(visitor)[source]

Perform a tree traversal similarly to Node.walk() (which see), except also call the dispatch_departure() method before exiting each node.

Parameter visitor: A NodeVisitor object, containing a visit and depart implementation for each Node subclass encountered.

Return true if we should stop the traversal.

_fast_findall(cls)[source]: Return iterator that only supports instance checks.

_superfast_findall()[source]: Return iterator that doesn’t check for a condition.

traverse(condition=None, include_self=True, descend=True, siblings=False, ascend=False)[source]

Return list of nodes following self.

For looping, Node.findall() is faster and more memory efficient.

findall(condition=None, include_self=True, descend=True, siblings=False, ascend=False)[source]

Return an iterator yielding nodes following self:

self (if include_self is true)
all descendants in tree traversal order (if descend is true)
the following siblings (if siblings is true) and their descendants (if also descend is true)
the following siblings of the parent (if ascend is true) and their descendants (if also descend is true), and so on.

If condition is not None, the iterator yields only nodes for which condition(node) is true. If condition is a node class cls, it is equivalent to a function consisting of return isinstance(node, cls).

If ascend is true, assume siblings to be true as well.

If the tree structure is modified during iteration, the result is undefined.

For example, given the following tree:

<paragraph>
    <emphasis>      <--- emphasis.traverse() and
        <strong>    <--- strong.traverse() are called.
            Foo
        Bar
    <reference name="Baz" refid="baz">
        Baz

Then tuple(emphasis.traverse()) equals

(<emphasis>, <strong>, <#text: Foo>, <#text: Bar>)

and list(strong.traverse(ascend=True) equals

[<strong>, <#text: Foo>, <#text: Bar>, <reference>, <#text: Baz>]

next_node(condition=None, include_self=False, descend=True, siblings=False, ascend=False)[source]

Return the first node in the iterator returned by findall(), or None if the iterable is empty.

Parameter list is the same as of traverse. Note that include_self defaults to False, though.

class reprunicode(s)[source]

Bases: str

Deprecated backwards compatibility stub. Use the standard str instead.

ensure_str(s)[source]: Deprecated backwards compatibility stub returning s.

unescape(text, restore_backslashes=False, respect_whitespace=False)[source]: Return a string with nulls removed or restored to backslashes. Backslash-escaped spaces are also removed.

class Text(data, rawsource=None)[source]

Bases: Node, str

Instances are terminal nodes (leaves) containing text only; no child nodes or attributes. Initialize by passing a string to the constructor.

Access the raw (null-escaped) text with str(<instance>) and unescaped text with <instance>.astext().

tagname = '#text'

children = (): Text nodes have no children, and cannot have children.

static __new__(cls, data, rawsource=None)[source]: Assert that data is not an array of bytes and warn if the deprecated rawsource argument is used.

shortrepr(maxlen=18)[source]

_dom_node(domroot)[source]

astext()[source]: Return a string representation of this Node.

copy()[source]: Return a copy of self.

deepcopy()[source]: Return a deep copy of self (also copying children).

pformat(indent=' ', level=0)[source]

Return an indented pseudo-XML representation, for test purposes.

Override in subclasses.

rstrip(chars=None)[source]

Return a copy of the string with trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

lstrip(chars=None)[source]

Return a copy of the string with leading whitespace removed.

If chars is given and not None, remove characters in chars instead.

class Element(rawsource='', *children, **attributes)[source]

Bases: Node

Element is the superclass to all specific elements.

Elements contain attributes and child nodes. They can be described as a cross between a list and a dictionary.

Elements emulate dictionaries for external [1] attributes, indexing by attribute name (a string). To set the attribute ‘att’ to ‘value’, do:

element['att'] = 'value'

There are two special attributes: ‘ids’ and ‘names’. Both are lists of unique identifiers: ‘ids’ conform to the regular expression [a-z](-?[a-z0-9]+)* (see the make_id() function for rationale and details). ‘names’ serve as user-friendly interfaces to IDs; they are case- and whitespace-normalized (see the fully_normalize_name() function).

Elements emulate lists for child nodes (element nodes and/or text nodes), indexing by integer. To get the first child node, use:

element[0]

to iterate over the child nodes (without descending), use:

for child in element:
    ...

Elements may be constructed using the += operator. To add one new child node to element, do:

element += node

This is equivalent to element.append(node).

To add a list of multiple child nodes at once, use the same += operator:

element += [node1, node2]

This is equivalent to element.extend([node1, node2]).

basic_attributes = ('ids', 'classes', 'names', 'dupnames'): Tuple of attributes which are defined for every Element-derived class instance and can be safely transferred to a different node.

local_attributes = ('backrefs',)

Tuple of class-specific attributes that should not be copied with the standard attributes when replacing a node.

NOTE: Derived classes should override this value to prevent any of its attributes being copied by adding to the value in its parent class.

list_attributes = ('ids', 'classes', 'names', 'dupnames', 'backrefs'): Tuple of attributes that are automatically initialized to empty lists for all nodes.

known_attributes = ('ids', 'classes', 'names', 'dupnames', 'backrefs', 'source'): Tuple of attributes that are known to the Element base class.

child_text_separator = '\n\n': Separator for child nodes, used by astext() method.

rawsource

The raw text from which this element was constructed.

NOTE: some elements do not set this value (default ‘’).

children: List of child nodes (elements and/or Text).

attributes

value}.

Type:: Dictionary of attribute {name

tagname = None: The element generic identifier. If None, it is set as an instance attribute to the name of the class.

_dom_node(domroot)[source]

shortrepr()[source]

starttag(quoteattr=None)[source]

endtag()[source]

emptytag()[source]

__iadd__(other)[source]: Append a node or a list of nodes to self.children.

astext()[source]: Return a string representation of this Node.

non_default_attributes()[source]

attlist()[source]

get(key, failobj=None)[source]

hasattr(attr)[source]

delattr(attr)[source]

setdefault(key, failobj=None)[source]

has_key(attr)

get_language_code(fallback='')[source]

Return node’s language tag.

Look iteratively in self and parents for a class argument starting with language- and return the remainder of it (which should be a BCP49 language tag) or the fallback.

append(item)[source]

extend(item)[source]

insert(index, item)[source]

pop(i=-1)[source]

remove(item)[source]

index(item, start=0, stop=9223372036854775807)[source]

previous_sibling()[source]: Return preceding sibling node or None.

is_not_default(key)[source]

update_basic_atts(dict_)[source]: Update basic attributes (‘ids’, ‘names’, ‘classes’, ‘dupnames’, but not ‘source’) from node or dictionary dict_.

append_attr_list(attr, values)[source]

For each element in values, if it does not exist in self[attr], append it.

NOTE: Requires self[attr] and values to be sequence type and the former should specifically be a list.

coerce_append_attr_list(attr, value)[source]

First, convert both self[attr] and value to a non-string sequence type; if either is not already a sequence, convert it to a list of one element. Then call append_attr_list.

NOTE: self[attr] and value both must not be None.

replace_attr(attr, value, force=True)[source]: If self[attr] does not exist or force is True or omitted, set self[attr] to value, otherwise do nothing.

copy_attr_convert(attr, value, replace=True)[source]

If attr is an attribute of self, set self[attr] to [self[attr], value], otherwise set self[attr] to value.

NOTE: replace is not used by this function and is kept only for: compatibility with the other copy functions.

copy_attr_coerce(attr, value, replace)[source]: If attr is an attribute of self and either self[attr] or value is a list, convert all non-sequence values to a sequence of 1 element and then concatenate the two sequence, setting the result to self[attr]. If both self[attr] and value are non-sequences and replace is True or self[attr] is None, replace self[attr] with value. Otherwise, do nothing.

copy_attr_concatenate(attr, value, replace)[source]: If attr is an attribute of self and both self[attr] and value are lists, concatenate the two sequences, setting the result to self[attr]. If either self[attr] or value are non-sequences and replace is True or self[attr] is None, replace self[attr] with value. Otherwise, do nothing.

copy_attr_consistent(attr, value, replace)[source]: If replace is True or self[attr] is None, replace self[attr] with value. Otherwise, do nothing.

update_all_atts(dict_, update_fun=<function Element.copy_attr_consistent>, replace=True, and_source=False)[source]