Skip to content

Reference for ExcelParser

The ExcelParser enables parsing spreadsheets, such as .xslx files. All sheets in the notebook will be parsed.

Parser for spreadsheets, such as Excel workbooks (.xslx). For a list of handled filetypes, refer to https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html

convert_df_to_markdown(df) staticmethod

Converts a DataFrame to markdown. Wraps tabula's method pd.DataFrame.to_markdown() between pre and post processing. Preprocess : - Remove in text columns PostProcess : - Replace multiple spaces with 2 spaces.

   Args:
       df (pd.DataFrame): the dataframe to convert.

   Returns:
       str: a markdown formatted table.

convert_sheets_to_markdown(sheets) staticmethod

Handle the conversion of the sheets obtained from pandas.read_excel() method to markdown.

Parameters:

Name Type Description Default
sheets dict[str, DataFrame]

the sheets returned from pd.read_excel(sheet_name=None).

required

Returns:

Name Type Description
str str

the markdown formatted string

parse_file(filepath)

Parses a excel-like file to markdown.

Parameters:

Name Type Description Default
string bytes

the path to the excel-like file.

required

Returns:

Name Type Description
MarkdownDoc MarkdownDoc

the markdown formatted excel file.

parse_string(string)

Parses a bytes string representing an excel file.

Parameters:

Name Type Description Default
string bytes

the excel as a byte string.

required

Returns:

Name Type Description
MarkdownDoc MarkdownDoc

the markdown formatted excel file

read_file(filepath) staticmethod

Read the provided filepath.

Parameters:

Name Type Description Default
filepath str

path to the file.

required

Returns:

Type Description
dict[str, DataFrame]

dict[str, pd.DataFrame]: a mapping containing {sheet_name: corresponding_dataframe.}