Reference for MarkdownParser
The MarkdownParser
is used to process markdown-formatted string or files to a MarkdownDoc
object that can be fed to a chunker. It will ensure that the markdown formatting is as expected by the chunker (ATX heading style, parsing of metadata, etc...).
Bases: AbstractParser
convert_setext_to_atx(md_string)
staticmethod
Converts headers from setext style to atx style
Parameters:
Name | Type | Description | Default |
---|---|---|---|
md_string
|
str
|
the markdown string |
required |
Return
str: the string with formatted headers
parse_file(filepath)
Reads and parses a markdown-formatted string. Ensures that the formatting is suited to be passed to the MarkdownChunker.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath
|
FilePath
|
the path to a .md file |
required |
Returns:
Name | Type | Description |
---|---|---|
TypedString |
MarkdownDoc
|
the typed string |
parse_metadata(md_string)
staticmethod
Parses the metadatas of a markdown string. Assumes the metadata are in YAML format, with '---' as first line. Example :
---
metakey : metavalue
---
Content of document...
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
the content of the docu, with the metadata section removed |
dict[str, Any]
|
dict[str, Any]: the parsed metadata, as dict |
parse_string(string)
Parses a markdown-formatted string. Ensures that the formatting is suited to be passed to the MarkdownChunker.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
string
|
str
|
the markdown formatted string |
required |
Returns:
Name | Type | Description |
---|---|---|
TypedString |
MarkdownDoc
|
the formatted markdown string |
read_file(filepath)
staticmethod
Reads a Markdown file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath
|
str
|
the path to the markdown file |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
the markdown string |