Reference for CSVParser
The CSVParser
is dedicated to the parsing of comma-separated Value file (.csv). By default it will attempt to infer the delimiter used (comma, semicolon, ...). Otherwise you may specify the delimiter it should use.
Bases: AbstractParser
Parser for Comma-Separated Values file (.csv)
__init__(csv_delimiter=None)
Initializes a sheet parser
Parameters:
Name | Type | Description | Default |
---|---|---|---|
csv_delimiter
|
str | None
|
The delimiter to consider to parse the .csv files. If None, we will try to guess what the delimiter is. Defaults to None. |
None
|
convert_df_to_markdown(df)
staticmethod
Converts a DataFrame to markdown. Wraps tabula's method pd.DataFrame.to_markdown() between pre and post processing. Preprocess : - Remove in text columns PostProcess : - Replace multiple spaces with 2 spaces.
Args:
df (pd.DataFrame): the dataframe to convert.
Returns:
str: a markdown formatted table.
parse_file(filepath)
Parses a csv file to markdown.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath
|
str
|
the path to the csv file. |
required |
Returns:
Name | Type | Description |
---|---|---|
MarkdownDoc |
MarkdownDoc
|
the markdown-formatted csv. |
parse_string(string)
Parses a string representing a csv file to markdown.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
string
|
str
|
the csv-formatted string. |
required |
Returns:
Name | Type | Description |
---|---|---|
MarkdownDoc |
MarkdownDoc
|
the markdown-formatted csv. |
read_file(filepath)
Read the provided filepath. For a list of handled filetypes, refer to https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath
|
str
|
path to the file. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
the csv file content as a string. |