Welcome to ChunkNorris' documentation !

What is `chunknorris` ?

In a nutshell, chunknorris is a python package that aims at drastically improve the chunking of documents from various sources (HTML, PDFs, Markdown, ...) while keeping the usage of computational ressources to the minimum.

🧪 Try it out !

Why should I use it ?

In the context of Retrieval Augmented Generation (RAG), an optimized chunking strategy leads to :

Better relevancy of chunks and thus easier identification of useful chunks through more expressive embeddings.
Less hallucinations of generation models because of superfluous information in the prompt
Less errors because of chunks exceeding the API limits in terms of number of tokens
Reduced cost as the prompt can have reduced size

As of today, many packages exist with the intent of parsing documents. Though the vast majority of them :

rely on high computational requirements
do not provide chunks out of the box, and instead provide parsing of the documents on top of which the user has to build the chunking implementation.

Welcome to ChunkNorris' documentation !

What is chunknorris ?

Why should I use it ?

What is `chunknorris` ?