Skip to content

Welcome to ChunkNorris' documentation !

What is chunknorris ?

In a nutshell, chunknorris is a python package that aims at drastically improve the chunking of documents from various sources (HTML, PDFs, Markdown, ...) while keeping the usage of computational ressources to the minimum.

🧪 Try it out !

Why should I use it ?

In the context of Retrieval Augmented Generation (RAG), an optimized chunking strategy leads to :

  • Better relevancy of chunks and thus easier identification of useful chunks through more expressive embeddings.
  • Less hallucinations of generation models because of superfluous information in the prompt
  • Less errors because of chunks exceeding the API limits in terms of number of tokens
  • Reduced cost as the prompt can have reduced size

As of today, many packages exist with the intent of parsing documents. Though the vast majority of them :

  • rely on high computational requirements
  • do not provide chunks out of the box, and instead provide parsing of the documents on top of which the user has to build the chunking implementation.