Word documents often contain valuable data in the form of tables, which can be used for reporting, data analysis, and record-keeping. However, manually extracting and transferring these tables to other formats can be a time-consuming and error-prone task. By automating this process using Python, we can save time, ensure accuracy, and maintain consistency. Spire.Doc for Python provides a seamless solution for the table extraction task, making it effortless to create accessible and manageable files with data from Word document tables. This article will demonstrate how to leverage Spire.Doc for Python to extract tables from Word documents and write them into text files and Excel worksheets.
- Extract Tables from Word Documents to Text Files with Python
- Extract Tables from Word Documents to Excel Workbooks with Python
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.Doc
If you are unsure how to install, please refer to: How to Install Spire.Doc for Python on Windows
Extract Tables from Word Documents to Text Files with Python
Spire.Doc for Python offers the Section.Tables property to retrieve a collection of tables within a section of a Word document. Then, developers can use the properties and methods under the ITable class to access the data in the tables and write it into a text file. This provides a convenient solution for converting Word document tables into text files.
The detailed steps for extracting tables from Word documents to text files are as follows:
- Create an object of Document class and load a Word document using Document.LoadFromFile() method.
- Iterate through the sections in the document and get the table collection of each section through Section.Tables property.
- Iterate through the tables and create a string object for each table.
- Iterate through the rows in each table and the cells in each row, get the text of each cell through TableCell.Paragraphs[].Text property, and add the cell text to the string.
- Save each string to a text file.
- Python
from spire.doc import * from spire.doc.common import * # Create an instance of Document doc = Document() # Load a Word document doc.LoadFromFile("Sample.docx") # Loop through the sections for s in range(doc.Sections.Count): # Get a section section = doc.Sections.get_Item(s) # Get the tables in the section tables = section.Tables # Loop through the tables for i in range(0, tables.Count): # Get a table table = tables.get_Item(i) # Initialize a string to store the table data tableData = '' # Loop through the rows of the table for j in range(0, table.Rows.Count): # Loop through the cells of the row for k in range(0, table.Rows.get_Item(j).Cells.Count): # Get a cell cell = table.Rows.get_Item(j).Cells.get_Item(k) # Get the text in the cell cellText = '' for para in range(cell.Paragraphs.Count): paragraphText = cell.Paragraphs.get_Item(para).Text cellText += (paragraphText + ' ') # Add the text to the string tableData += cellText if k < table.Rows.get_Item(j).Cells.Count - 1: tableData += '\t' # Add a new line tableData += '\n' # Save the table data to a text file with open(f'output/Tables/WordTable_{s+1}_{i+1}.txt', 'w', encoding='utf-8') as f: f.write(tableData) doc.Close()
Extract Tables from Word Documents to Excel Workbooks with Python
Developers can also utilize Spire.Doc for Python to retrieve table data and then use Spire.XLS for Python to write the table data into an Excel worksheet, thereby enabling the conversion of Word document tables into Excel workbooks.
Install Spire.XLS for Python via PyPI:
pip install Spire.XLS
The detailed steps for extracting tables from Word documents to Excel workbooks are as follows:
Create an object of Document class and load a Word document using Document.LoadFromFile() method.
- Create an object of Workbook class and clear the default worksheets using Workbook.Worksheets.Clear() method.
- Iterate through the sections in the document and get the table collection of each section through Section.Tables property.
- Iterate through the tables and create a worksheet for each table using Workbook.Worksheets.Add() method.
- Iterate through the rows in each table and the cells in each row, get the text of each cell through TableCell.Paragraphs[].Text property, and write the text to the worksheet using Worksheet.SetCellValue() method.
- Save the workbook using Workbook.SaveToFile() method.
- Python
from spire.doc import * from spire.doc.common import * from spire.xls import * from spire.xls.common import * # Create an instance of Document doc = Document() # Load a Word document doc.LoadFromFile('Sample.docx') # Create an instance of Workbook wb = Workbook() wb.Worksheets.Clear() # Loop through sections in the document for i in range(doc.Sections.Count): # Get a section section = doc.Sections.get_Item(i) # Loop through tables in the section for j in range(section.Tables.Count): # Get a table table = section.Tables.get_Item(j) # Create a worksheet ws = wb.Worksheets.Add(f'Table_{i+1}_{j+1}') # Write the table to the worksheet for row in range(table.Rows.Count): # Get a row tableRow = table.Rows.get_Item(row) # Loop through cells in the row for cell in range(tableRow.Cells.Count): # Get a cell tableCell = tableRow.Cells.get_Item(cell) # Get the text in the cell cellText = '' for paragraph in range(tableCell.Paragraphs.Count): paragraph = tableCell.Paragraphs.get_Item(paragraph) cellText = cellText + (paragraph.Text + ' ') # Write the cell text to the worksheet ws.SetCellValue(row + 1, cell + 1, cellText) # Save the workbook wb.SaveToFile('output/Tables/WordTableToExcel.xlsx', FileFormat.Version2016) doc.Close() wb.Dispose()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.