Skip to content

Reference for DocxParser

Bases: HTMLParser

parse_file(filepath)

Reads and parses a markdown-formatted string. Ensures that the formatting is suited to be passed to the MarkdownChunker.

Parameters:

Name Type Description Default
filepath FilePath

the path to a .html file

required

Returns:

Name Type Description
MarkdownDoc MarkdownDoc

the parsed document. Can be fed to chunker.

parse_string(string)

Parses a HTML-formatted string. Ensures that the formatting is suited to be passed to the MarkdownChunker.

Parameters:

Name Type Description Default
string str

the markdown formatted string

required

Returns:

Name Type Description
MarkdownDoc MarkdownDoc

the parsed document. Can be fed to chunker.

read_file(filepath) staticmethod

Reads a Markdown file

Parameters:

Name Type Description Default
filepath str

the path to the HTML file.

required

Returns:

Name Type Description
str str

the HTML string.