Reference for `Chunk`

The Chunk is the entity returned by chunknorris's chunkers. It contains various elements related to the chunks : it's text content, headers, the pages it comes from (if from paginated documents) etc. You might essentially need to use Chunk.get_text() to get the cleaned chunk's content as text preceded by its headers.

Bases: BaseModel

`word_count: int` `property`

Gets the amount of words in the chunk's content (headers not included)

`get_text(remove_links=False, prepend_headers=True)`

Gets the text of the chunk.

Parameters:

Name	Type	Description	Default
`remove_links`	`bool`	If True, the markdown links will be removed (text of the link is kept). Defaults to False.	`False`

Returns:

Name	Type	Description
`str`	`str`	the text

`remove_links(text)` `staticmethod`

Removes the markdown format of the links in the text.

Parameters:

Name	Type	Description	Default
`text`	`str`	the text to find the links in	required

Returns:

Name	Type	Description
`str`	`str`	the formated text

Reference for Chunk

word_count: int property

get_text(remove_links=False, prepend_headers=True)

remove_links(text) staticmethod

Reference for `Chunk`

`word_count: int` `property`

`get_text(remove_links=False, prepend_headers=True)`

`remove_links(text)` `staticmethod`