Reference for `ExcelParser`

The ExcelParser enables parsing spreadsheets, such as .xslx files. All sheets in the notebook will be parsed.

Bases: AbstractSheetParser[bytes]

Parser for spreadsheets, such as Excel workbooks (.xslx). For a list of handled filetypes, refer to https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html

`init(output_format='auto')`

Initializes an Excel parser

Parameters:

Name	Type	Description	Default
`output_format`	`Literal["markdown_table", "json_lines", "auto]`	the output format of the parsed document. - markdown_table : uses tabula to build a markdown-formatted table. - json_lines : each row of the table will be output as a JSON line. Better for chunking as headers are preserved. - auto : will detect which format is the more suitable. CSV-like sheet will be converset to JSON lines. Defaults to "auto".	`'auto'`

`convert_sheets_to_output_format(sheets)`

Handle the conversion of the sheets obtained from pandas.read_excel() method to the specified output format.

Parameters:

Name	Type	Description	Default
`sheets`	`dict[str, DataFrame]`	the sheets returned from pd.read_excel(sheet_name=None).	required

Returns:

Name	Type	Description
`str`	`str`	the formatted string

`parse_file(filepath)`

Parses a excel-like file to markdown.

Parameters:

Name	Type	Description	Default
`filepath`	`str`	the path to the excel-like file.	required

Returns:

Name	Type	Description
`MarkdownDoc`	`MarkdownDoc`	the markdown formatted excel file.

`parse_string(string)`

Parses a bytes string representing an excel file.

Parameters:

Name	Type	Description	Default
`string`	`bytes`	the excel as a byte string.	required

Returns:

Name	Type	Description
`MarkdownDoc`	`MarkdownDoc`	the markdown formatted excel file

`read_file(filepath)`

Read the provided filepath.

Parameters:

Name	Type	Description	Default
`filepath`	`str`	path to the file.	required

Returns:

Type	Description
`dict[str, DataFrame]`	dict[str, pd.DataFrame]: a mapping containing {sheet_name: corresponding_dataframe.}

Reference for ExcelParser

__init__(output_format='auto')

convert_sheets_to_output_format(sheets)

parse_file(filepath)

parse_string(string)

read_file(filepath)

Reference for `ExcelParser`

`init(output_format='auto')`

`convert_sheets_to_output_format(sheets)`

`parse_file(filepath)`

`parse_string(string)`

`read_file(filepath)`