Skip to content

Reference for MarkdownParser

The MarkdownParser is used to process markdown-formatted string or files to a MarkdownDoc object that can be fed to a chunker. It will ensure that the markdown formatting is as expected by the chunker (ATX heading style, parsing of metadata, etc...).

Bases: AbstractParser[str]

cleanup_string(md_string) staticmethod

Cleans up the markdown string.

Parameters:

Name Type Description Default
md_string str

the markdown string

required

Returns:

Name Type Description
str str

the cleaned up string.

convert_setext_to_atx(md_string) staticmethod

Converts headers from setext style to atx style

Parameters:

Name Type Description Default
md_string str

the markdown string

required

Returns:

Name Type Description
str str

the string with formatted headers

parse_file(filepath)

Reads and parses a markdown file. Ensures that the formatting is suited to be passed to the MarkdownChunker.

Parameters:

Name Type Description Default
filepath str

the path to a .md file

required

Returns:

Name Type Description
MarkdownDoc MarkdownDoc

the parsed markdown document

parse_metadata(md_string) staticmethod

Parses the metadatas of a markdown string. Assumes the metadata are in YAML format, with '---' as first line. Example :

---
metakey : metavalue
---

Content of document...
Args: md_string (str): the string to get the metadata from

Returns:

Name Type Description
str str

the content of the docu, with the metadata section removed

dict[str, Any]

dict[str, Any]: the parsed metadata, as dict

parse_string(string)

Parses a markdown-formatted string. Ensures that the formatting is suited to be passed to the MarkdownChunker.

Parameters:

Name Type Description Default
string str

the markdown formatted string

required

Returns:

Name Type Description
MarkdownDoc MarkdownDoc

the formatted markdown document

read_file(filepath) staticmethod

Reads a Markdown file

Parameters:

Name Type Description Default
filepath str

the path to the markdown file

required

Returns:

Name Type Description
str str

the markdown string