Skip to content

Reference for HTMLParser

Bases: AbstractParser

apply_markdownify(html_string) staticmethod

Applies markdownify to the html text

Parameters:

Name Type Description Default
html_string str

an HTML-formatted string.

required

Returns:

Name Type Description
str str

the markdownified string.

cleanup_string(md_string) staticmethod

Cleans up the html string.

Parameters:

Name Type Description Default
md_string str

the markdown string, output from apply_markdownify()

required

Returns:

Name Type Description
str str

the cleaned up string.

parse_file(filepath)

Reads and parses a markdown-formatted string. Ensures that the formatting is suited to be passed to the MarkdownChunker.

Parameters:

Name Type Description Default
filepath FilePath

the path to a .html file

required

Returns:

Name Type Description
MarkdownDoc MarkdownDoc

the parsed document. Can be fed to chunker.

parse_string(string)

Parses a markdown-formatted string. Ensures that the formatting is suited to be passed to the MarkdownChunker.

Parameters:

Name Type Description Default
string str

the markdown formatted string

required

Returns:

Name Type Description
MarkdownDoc MarkdownDoc

the parsed document. Can be fed to chunker.

read_file(filepath) staticmethod

Reads a Markdown file

Parameters:

Name Type Description Default
filepath str

the path to the HTML file.

required

Returns:

Name Type Description
str str

the HTML string.