Skip to content

Reference for CSVParser

The CSVParser is dedicated to the parsing of comma-separated Value file (.csv). By default it will attempt to infer the delimiter used (comma, semicolon, ...). Otherwise you may specify the delimiter it should use.

Bases: AbstractParser

Parser for Comma-Separated Values file (.csv)

__init__(csv_delimiter=None)

Initializes a sheet parser

Parameters:

Name Type Description Default
csv_delimiter str | None

The delimiter to consider to parse the .csv files. If None, we will try to guess what the delimiter is. Defaults to None.

None

convert_df_to_markdown(df) staticmethod

Converts a DataFrame to markdown. Wraps tabula's method pd.DataFrame.to_markdown() between pre and post processing. Preprocess : - Remove in text columns PostProcess : - Replace multiple spaces with 2 spaces.

   Args:
       df (pd.DataFrame): the dataframe to convert.

   Returns:
       str: a markdown formatted table.

parse_file(filepath)

Parses a csv file to markdown.

Parameters:

Name Type Description Default
filepath str

the path to the csv file.

required

Returns:

Name Type Description
MarkdownDoc MarkdownDoc

the markdown-formatted csv.

parse_string(string)

Parses a string representing a csv file to markdown.

Parameters:

Name Type Description Default
string str

the csv-formatted string.

required

Returns:

Name Type Description
MarkdownDoc MarkdownDoc

the markdown-formatted csv.

read_file(filepath)

Read the provided filepath. For a list of handled filetypes, refer to https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html.

Parameters:

Name Type Description Default
filepath str

path to the file.

required

Returns:

Name Type Description
str str

the csv file content as a string.