Skip to content

Reference for MarkdownParser

The MarkdownParser is used to process markdown-formatted string or files to a MarkdownDoc object that can be fed to a chunker. It will ensure that the markdown formatting is as expected by the chunker (ATX heading style, parsing of metadata, etc...).

Bases: AbstractParser

convert_setext_to_atx(md_string) staticmethod

Converts headers from setext style to atx style

Parameters:

Name Type Description Default
md_string str

the markdown string

required
Return

str: the string with formatted headers

parse_file(filepath)

Reads and parses a markdown-formatted string. Ensures that the formatting is suited to be passed to the MarkdownChunker.

Parameters:

Name Type Description Default
filepath FilePath

the path to a .md file

required

Returns:

Name Type Description
TypedString MarkdownDoc

the typed string

parse_metadata(md_string) staticmethod

Parses the metadatas of a markdown string. Assumes the metadata are in YAML format, with '---' as first line. Example :

---
metakey : metavalue
---

Content of document...
Args: md_string (str): the string to get the metadata from

Returns:

Name Type Description
str str

the content of the docu, with the metadata section removed

dict[str, Any]

dict[str, Any]: the parsed metadata, as dict

parse_string(string)

Parses a markdown-formatted string. Ensures that the formatting is suited to be passed to the MarkdownChunker.

Parameters:

Name Type Description Default
string str

the markdown formatted string

required

Returns:

Name Type Description
TypedString MarkdownDoc

the formatted markdown string

read_file(filepath) staticmethod

Reads a Markdown file

Parameters:

Name Type Description Default
filepath str

the path to the markdown file

required

Returns:

Name Type Description
str str

the markdown string