Skip to content

Reference for MarkdownDoc

The MarkdownDoc is the entity returned by chunknorris's parsers. It's purpose it mainly to be fed to the MarkdownChunker.

Bases: BaseModel

A parsed Markdown Formatted-String, resulting in a list of MarkdownLine. Feats : - ATX header formatting. - Remove base64 images

Show JSON schema:
{
  "$defs": {
    "MarkdownLine": {
      "properties": {
        "text": {
          "description": "the text content of the line",
          "title": "Text",
          "type": "string"
        },
        "line_idx": {
          "description": "the index of the line in the markdown string",
          "title": "Line Idx",
          "type": "integer"
        },
        "isin_code_block": {
          "description": "whether or not the line belongs to a code block",
          "title": "Isin Code Block",
          "type": "boolean"
        },
        "page": {
          "anyOf": [
            {
              "type": "integer"
            },
            {
              "type": "null"
            }
          ],
          "description": "the page the line belongs to (if markdown comes from converted paginated document)",
          "title": "Page"
        }
      },
      "required": [
        "text",
        "line_idx",
        "isin_code_block",
        "page"
      ],
      "title": "MarkdownLine",
      "type": "object"
    }
  },
  "description": "A parsed Markdown Formatted-String,\nresulting in a list of MarkdownLine.\nFeats :\n- ATX header formatting.\n- Remove base64 images",
  "properties": {
    "content": {
      "items": {
        "$ref": "#/$defs/MarkdownLine"
      },
      "title": "Content",
      "type": "array"
    },
    "metadata": {
      "additionalProperties": true,
      "default": {},
      "title": "Metadata",
      "type": "object"
    }
  },
  "required": [
    "content"
  ],
  "title": "MarkdownDoc",
  "type": "object"
}

Config:

  • arbitrary_types_allowed: True

Fields:

from_string(md_string) staticmethod

Get the MardownDoc object from a markdown formatted string.

Parameters:

Name Type Description Default
md_string str

the markdown string

required

Returns:

Name Type Description
MarkdownDoc MarkdownDoc

the markdown document

to_string()

Get the markdown string corresponding to the document's content