rst-in-md: reStructuredText in Markdown#
The Implementation
The implementation of this extension can be found in the GitHub repository sebpretzer/rst-in-md.
It follows the general structure as described in the design doc, although improvements have been made since the design doc was written. If you want the latest documentation for the extension, you can read rst-in-md
's docs.
Background and Motivation#
Markdown is an easy-to-read, easy-to-write plain text format that can be converted to HTML.1 This very website is written in Markdown, and then converted to HTML using mkdocs.2 By being so easy to interact with, Markdown lacks some more powerful features that are available in other markup languages. When users do find Markdown lacking, they often turn to writing in HTML directly.3 That is a perfectly valid solution, but it can be cumbersome at times.
There are other markup languages that are more powerful than Markdown, such as reStructuredText,4 AsciiDoc,5 Markdoc,6 MDX,7 and LaTeX.8 Each of these have their own strengths and weaknesses. Some were even born out of a frustration with Markdown. It would be useful to be able to use these languages in conjunction with Markdown, to get the best of both worlds. Users could write in Markdown for most of their content, and then switch to a more powerful but cumbersome language when they need to, without having to switch to HTML. Of course, this should not be a requirement, but an option for users who want it.
In my particular case, I wanted to be able to write a list-table
, since Markdown tables are hard to read and write. markdoc implemented a similar variation of this feature as well.9 List tables make it easy to write tables, without having to worry much about the formatting.
Goals and Non-Goals#
Goals#
- Integrate a reStructuredText parser into Python-Markdown
- Make it as low friction as possible to write reStructuredText, as if it were Markdown
- Integrate with mkdocs and mkdocs-material themes10
Non-Goals#
- Minimize configuration required to use the extension, it should be plug-and-play
Design#
Note
We will refer to this extension as rst-in-md
moving forward.
Fenced Code Blocks#
With the goal of making it as low friction as possible, rst-in-md
should work with existing markdown elements. Fenced Code Blocks are a feature of Python-Markdown that allow users to specify a language for syntax highlighting. This is a feature that is already familiar to users, and resembles a good bounding box for any reStructuredText content. One of the maintainers of Python-Markdown has also suggested this method of overloading fenced code blocks to embed reStructuredText.11 MyST, a markdown parser, also does something similar with their ```{directive}
syntax.12 Mermaid diagrams are also supported in mkdocs using this method.13 So you could imagine that the rst content in a markdown file could look something like this:
```rst
.. list-table::
:widths: 10 10 10
:header-rows: 1
* - Name
- Age
- Height
* - Alice
- 20
- 5'5"
* - Bob
- 25
- 6'0"
```
rst-in-md
should take in a fenced code block, and if the language is reStructuredText, it should parse the content as reStructuredText instead of Markdown. This can use docutils,4 the standard reStructuredText parser. rst-in-md
should then inject that HTML into the text and let Python-Markdown parse the rest. This means that rst-in-md
should run before the fenced code block extension, so that the reStructuredText content is not parsed as it would be in a normal fenced code block. The flow of the extension should look something like this:
flowchart TB
md("markdown text")
fenced_code("fenced_code extension")
subgraph rst_in_md ["rst-in-md"]
direction TB
check("check if restructured text")
return("return text")
docutils("docutils parser")
replace("replace rst markdown with rst html")
check --> |"no rst"| return
check --> |"rst extsts"| docutils
docutils --> |"rst html"| replace
replace --> |"updated markdown text"| return
end
md --> |"pass raw markdown file to extension"| rst_in_md
rst_in_md --> |"pass rst-parsed markdown file to other extensions"| fenced_code
Python-Markdown Extension#
Python-Markdown has a fantastic extensions API. Depending on when an extension should be run, there are different types of extensions that can be used. For this particular use case, a preprocessor extension will be used. This is because fenced code blocks are parsed in that fashion, and rst-in-md
should mimic that behavior.
PyMdown Extensions Superfences#
Since the extension should run in a similar manner to fenced code blocks, it should also be compatible with PyMdown Extensions Superfences. Superfences provide a lot of improvements over standard fenced code blocks, and are a popular extension. When a user installs Superfences, the extension should be able to run in the same way as it would with standard fenced code blocks.
Superfences provide a way to define custom fences,14 which is a relatively easy way for this extension to be integrated. If implemented correctly, the user should see no difference between using standard fenced code blocks and Superfences, with this extension installed.
To work with superfences, nothing should change from a markdown perspective, but user's will have to add a custom formatter and validator to their config. When using mkdocs, it should look something like this:
markdown_extensions:
- pymdownx.superfences:
custom_fences:
- name: rst
class: rst
format: !!python/name:rst_in_md.format_rst
validator: !!python/name:rst_in_md.validate_rst
They may have to do something similar for each language type that reStructuredText can be called. So if they want to use rst-in-md
with rst
, rest
, and reStructuredText
, they will have to add a formatter and validator config entry for each of those.
Ignoring Code Blocks#
While the goal is to convert any reStructuredText fenced code block to HTML, there may be cases where the user does not want this to happen. To prevent this, the user can add a flag attribute to the code block they do not wish to convert. The fenced code block will be treated as a normal code block, with whatever other extensions the user has installed.
The attribute flag can look something like this:
Drawbacks#
- reStructuredText is a similar language to Markdown, so the additional functionality may not be worth the added complexity to the user
- The reStructuredText parser may behave in ways that make it incompatible with Python-Markdown's parser, or at least make it difficult to integrate
Alternatives Considered#
- Write a mkdocs plugin instead of a Python-Markdown Extension: At this point, the fenced code blocks have already been parsed by Python-Markdown. Any mkdocs plugin would have to work with the HTML that has already been generated by reversing the parsing process and then re-parsing with docutils. Separately, the author of mkdocs has also dissuaded developers from writing a mkdocs plugin for this purpose, as plugins were not designed for this use case.15
- Extend a different markup language: There are many other markup languages that were considered. It should be relatively easy to repeat this process for other markup languages, if the need arises. reStructuredText was chosen because it is heavily used in the Python community, and fit my specific use case as well.
- Drop Markdown altogether: AsciiDoc argues that as soon as you need something more powerful than standard Markdown, you get on a slippery slope of needing more and more extensions.16 This is a valid point, and something I may personally consider in the future. markdocs and I are aligned however, that the ubiquitous nature of Markdown makes it the best choice to build on top of, rather than avoid.17