Reference for HTMLParser
Bases: AbstractParser
apply_markdownify(html_string)
staticmethod
Applies markdownify to the html text
Parameters:
Name | Type | Description | Default |
---|---|---|---|
html_string
|
str
|
an HTML-formatted string. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
the markdownified string. |
cleanup_string(md_string)
staticmethod
Cleans up the html string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
md_string
|
str
|
the markdown string, output from apply_markdownify() |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
the cleaned up string. |
parse_file(filepath)
Reads and parses a markdown-formatted string. Ensures that the formatting is suited to be passed to the MarkdownChunker.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath
|
FilePath
|
the path to a .html file |
required |
Returns:
Name | Type | Description |
---|---|---|
MarkdownDoc |
MarkdownDoc
|
the parsed document. Can be fed to chunker. |
parse_string(string)
Parses a markdown-formatted string. Ensures that the formatting is suited to be passed to the MarkdownChunker.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
string
|
str
|
the markdown formatted string |
required |
Returns:
Name | Type | Description |
---|---|---|
MarkdownDoc |
MarkdownDoc
|
the parsed document. Can be fed to chunker. |
read_file(filepath)
staticmethod
Reads a Markdown file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath
|
str
|
the path to the HTML file. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
the HTML string. |