Reference for ExcelParser
The ExcelParser
enables parsing spreadsheets, such as .xslx files. All sheets in the notebook will be parsed.
Parser for spreadsheets, such as Excel workbooks (.xslx). For a list of handled filetypes, refer to https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html
convert_df_to_markdown(df)
staticmethod
Converts a DataFrame to markdown. Wraps tabula's method pd.DataFrame.to_markdown() between pre and post processing. Preprocess : - Remove in text columns PostProcess : - Replace multiple spaces with 2 spaces.
Args:
df (pd.DataFrame): the dataframe to convert.
Returns:
str: a markdown formatted table.
convert_sheets_to_markdown(sheets)
staticmethod
Handle the conversion of the sheets obtained from pandas.read_excel() method to markdown.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sheets
|
dict[str, DataFrame]
|
the sheets returned from pd.read_excel(sheet_name=None). |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
the markdown formatted string |
parse_file(filepath)
Parses a excel-like file to markdown.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
string
|
bytes
|
the path to the excel-like file. |
required |
Returns:
Name | Type | Description |
---|---|---|
MarkdownDoc |
MarkdownDoc
|
the markdown formatted excel file. |
parse_string(string)
Parses a bytes string representing an excel file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
string
|
bytes
|
the excel as a byte string. |
required |
Returns:
Name | Type | Description |
---|---|---|
MarkdownDoc |
MarkdownDoc
|
the markdown formatted excel file |
read_file(filepath)
staticmethod
Read the provided filepath.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath
|
str
|
path to the file. |
required |
Returns:
Type | Description |
---|---|
dict[str, DataFrame]
|
dict[str, pd.DataFrame]: a mapping containing {sheet_name: corresponding_dataframe.} |