Writing a Simple Markdown Parser Using JavaScript

6 min readMar 8, 2021

Many developers opt to use Markdown, a markup language which is popular for writing blogs, readme files, and many other forms of documentation related to web development. Markdown is a text-to-HTML conversion tool which allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML). The main philosophy behind Markdown is that plain text documents should be readable without tags obfuscating readability, but there should still be ways to add text modifiers like lists, bold, italics, etc. Companies such as Facebook chat, Skype, and Reddit all let you use different incarnations of Markdown to format your messages.

Markdown is ubiquitous around the web primarily due to its ease of use. A quick example would be adding decoration to a string of text such as making a word bold. You may simply enclose them in double asterisks, such that **bold word** would look like bold word when successfully compiled. There is of course an official syntax to Markdown, but much of the basic functionality involves surrounding words with symbols, such as the aforementioned in the case of bold text, or utilizing an underscore (_) for italicization. Links may be written like so: [anchor text](http://www.URL.com), and lists are generated by adding a symbol in front of each list item, such as: -, *, +. Thus the following:

- List item 1

- List item 2

- List item 3

The rendered result using a dash:

List item 1
List item 2
List item 3

There are countless examples of applying text decoration and formatting using Markdown, but it is useful in this context to just cover the basic syntax and as a result illuminate its readability relative to other markup languages.

Conversion To HTML and Other Differentia

Typically, writing text using Markdown is fast and readable. Yet there are some other, more complicated operations which are easier to generate using HTML (tables, certain input forms, etc.). Luckily Markdown possesses full HTML support, so you can write a more complex text element such as a table in HTML and revert back to Markdown in the same document. Suppose an individual intends to write an email or a readme file where HTML’s formatting options are needed but not the whole suite of features. Markdown actually has the built in software to convert the plain text to HTML. Thus in this respect it is a text-to-HTML conversion tool, written in Perl, in addition to being a markup language.

Markdown is a plain text format, so as long as plain text is considered a standard, the language will be usable by modern programs. In contrast, a text editor such as Microsoft Word has a multitude of filetypes. Thus by remaining a plain text editor, Markdown will never become outdated. It does have its own filename extension, .md, but it was designed such to remain perfectly readable as a raw text file.

Markdown is the unofficial standard on popular coding sites such as GitHub, and is also the default formatting option for popular communication tools such as Skype, Slack, and (to a lesser extent) Facebook Messenger. Wikipedia even uses a modified Markdown syntax, which is termed wikitext.

Summary Visualization: Markdown and Regular Expressions

Before creating our Markdown parser, it is worthwhile to visualize an example of a file written in Markdown, as well as the use of regular expressions (or regex) created in JavaScript. First let’s look at a Markdown file:

An example Markdown file

This is a wonderful ‘cheatsheet’ for numerous commonly used Markdown expressions, with their respective text conversions: Markdown Cheatsheet.

In case the reader is unfamiliar with regular expressions, they are character sequences which can aid in capturing patterns in user inputs. Virtually all programming languages recognize regular expressions, JavaScript being no exception, however the syntax used to for pattern capture varies between languages. Here is a quick regular expression example using JavaScript:

Regular Expression example using JavaScript

One more feature of regular expressions which we will use involves the capture of groups. Groups are a means by which matching patterns may be remembered, and can be referenced in a similar manner as an index, whereas the symbols $1-$9 represent indices in the predefined RegExp object. The following is an implementation of groups:

Referencing a group by ‘index’ $1

Building Our Customizable Parser

Now that we have covered some of the essential syntax and features of Markdown, we may build a function in JavaScript which accepts a string of text written in Markdown and converts it into useable HTML. In order to mimic a real world example, we will limit our parser to only converting three Markdown features: bold text, italicized text, and the use of headers. However, once we go through the process of creating features using JavaScript functions and methods, it should become clear how to add additional features. Here is our initial function structure:

The main JavaScript method used will be .replace(), and we will be using it with regular expressions to convert the passed in text. Beginning with headers, we can parse a given Markdown header as follows:

const h1 = /^# (.*$)/gim

Within this regular expression, we can use a carat (^) to capture patterns which start with a hash symbol, and .* is used to capture any characters which come afterward. ‘g’, ‘i’, and ‘m’ are flags which mean global, insensitive (case), and multiline, respectively. Next we can handle bold text:

const bold = /\*\*(.*)\*\*/gim

The preceding regular expression will capture any text which is surrounded by two asterisks. We use the same three flags as in the header example. Finally, we may write a regex for italic text:

const italics = /\*(.*)\*/gim

In Markdown, italic text can be represented by surrounding the text with either single asterisks or single underscore symbols. Bold text is similar, except it requires double asterisks or underscore symbols. In both examples we are using asterisks.

Now we can fit our code altogether, accounting for multiple header tags: h1-h3. Using .replace(), we can replace each instance of the regex pattern with the respective HTML tags we would like to render. Here is the result, which also utilizes the aforementioned group syntax:

Adding Features and Testing

As touched on above, the Markdown parser we created just converts three commonly used features for a total of five possible HTML tags; but we could add on to the parser rather easily, by using the .replace() method and specifying any additional features. A more ‘advanced’ parser may include blockquotes, hyperlinks, images, line breaks, ideally any markup tag that HTML uses. Testing a Markdown parser is relatively easy as well, by creating a string of text and using console.log() to render to the console the string passed into the parser, in order to view the result.

The preceding should render the Markdown message as possessing an <h1> tag and a full line surrounded by <b> or <em> tags, depending how the parser is written.

I hope that this short introduction to Markdown and the accompanying customizable parser example has been edifying and may be implemented with minimal difficulty. Thank you for reading!

Sources:

Markdown — by John Gruber: https://daringfireball.net/projects/markdown/
Groups and Ranges — Mozilla Developer Network (MDN): https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Groups_and_Ranges