Spire.Office for Android via Java 8.4.1 is released
We are excited to announce the release of Spire.Office for Android via Java 8.4.1. In this version, Spire.PDF for Android via Java supports a number of new features, including converting PDF to SVGZ and PPTX and comparing PDF files; Spire.Presentation for Android via Java enhances the conversion from PPTX to PDF. Moreover, some known issues are fixed successfully in this version. More details are listed below.
Here is a list of changes made in this release
Spire.PDF for Android via Java
Category | ID | Description |
New feature | - | Supports converting PDF to PPTX.
String input = "data/JavaPDFSample_1.pdf"; String output = "output/toPPTX.pptx"; //Load a pdf document PdfDocument doc = new PdfDocument(); doc.loadFromFile(input); //Convert to pptx file. doc.saveToFile(output, FileFormat.PPTX); doc.close(); |
New feature | - | Supports image compression.
PdfCompressor compressor = new PdfCompressor(inputFile); compressor.getOptions().getImageCompressionOptions().setCompressImage(true); compressor.getOptions().getImageCompressionOptions().setResizeImages(true); compressor.getOptions().getImageCompressionOptions().setImageQuality(ImageQuality.High); compressor.compressToFile(outputFile); |
New feature | - | Adds a new method pdf.getDocumentInformation() to get the Metadata data, and deprecated XmpMetadata.
PdfDocument doc = new PdfDocument(); doc.loadFromFile(inputFile); StringBuilder builder = new StringBuilder(); builder.append("Author:" + doc.getDocumentInformation().getAuthor() + "\r\n"); builder.append("Title: " + doc.getDocumentInformation().getTitle() + "\r\n"); builder.append("Creation Date: " + doc.getDocumentInformation().getCreationDate() + "\r\n"); builder.append("Subject: " + doc.getDocumentInformation().getSubject() + "\r\n"); builder.append("Producer: " + doc.getDocumentInformation().getProducer() + "\r\n"); builder.append("Creator: " + doc.getDocumentInformation().getCreator() + "\r\n"); builder.append("Keywords: " + doc.getDocumentInformation().getKeywords() + "\r\n"); builder.append("Modify Date: " + doc.getDocumentInformation().getModificationDate() + "\r\n"); builder.append("Customed Property's value: " + doc.getDocumentInformation().getCustomProperty("Field1")); FileWriter fw = new FileWriter(new File(outputFile), true); BufferedWriter bw = new BufferedWriter(fw); bw.write(builder.toString()); bw.flush(); bw.close(); fw.close(); |
New feature | SPIREPDF-6190 | Support setting encryption options when encrypting PDF documents.
PdfDocument pdfdoc = new PdfDocument(); pdfdoc.loadFromFile(inputFile); PdfSecurityPolicy securityPolicy = new PdfPasswordSecurityPolicy("123", "e-iceblue"); securityPolicy.setEncryptionAlgorithm(PdfEncryptionAlgorithm.AES_128); securityPolicy.setEncryptMetadata(false); pdfdoc.encrypt(securityPolicy); pdfdoc.saveToFile(outputFile); |
New feature | - | Supports determining if a PDF document stream is encrypted.
FileInputStream stream_1 = new FileInputStream(new File(inputFile_1)); boolean isPasswordProtected_1 = PdfDocument.isPasswordProtected(stream_1); |
New feature | - | Add a new method to convert PDF to Word.
PdfToWordConverter convert = new PdfToWordConverter(inputFile); convert.saveToDocx(outputFile); |
New feature | - | Supports comparing PDF documents.
PdfDocument pdf1 = new PdfDocument(inputFile_1); PdfDocument pdf2 = new PdfDocument(inputFile_2); PdfComparer compare = new PdfComparer(pdf1, pdf2); compare.getOptions().setPageRanges(0, pdf1.getPages().getCount() - 1, 0, pdf2.getPages().getCount() - 1); compare.compare(outputFile); |
New feature | - | Supports converting PDF documents to SVGZ documents.
PdfDocument pdf = new PdfDocument(inputFile); pdf.saveToFile(outputFile, FileFormat.SVGZ); |
Bug | SPIREPDF-5639 | Fixes the issue that adding new custom properties would cause loss of existing custom properties. |
Bug | - | Fixes the issue that the program did not prompt an error when the open password and the permission password were the same. |
Spire.Presentation for Android via Java
Category | ID | Description |
Bug | SPIREPPT-2404 SPIREPPT-2526 |
Fixes the issue that the application threw an "AbstractMethodError" exception when converting PPTX documents to PDF |
Spire.Presentation for Android via Java 8.1.2 enhances the conversion from PPTX to PDF
We are pleased to announce the release of Spire.Presentation for Android via Java 8.1.2. This version fixes the issue that the application threw an "AbstractMethodError" exception when converting PPTX documents to PDF. More details are listed below.
Here is a list of changes made in this release
Category | ID | Description |
Bug | SPIREPPT-2404 SPIREPPT-2526 |
Fixes the issue that the application threw an "AbstractMethodError" exception when converting PPTX documents to PDF |
Python: Extract Comments from Excel
Excel files often contain a wealth of comments that can provide valuable context and insights. These comments may include important text notes, instructions, or even embedded images that can be incredibly useful for various data analysis and reporting tasks. Extracting this information from the comments can be a valuable step in unlocking the full potential of the data. In this article, we will demonstrate how to effectively extract text and images from comments in Excel files in Python using Spire.XLS for Python.
Install Spire.XLS for Python
This scenario requires Spire.XLS for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.XLS
If you are unsure how to install, please refer to this tutorial: How to Install Spire.XLS for Python on Windows
Extract Text from Comments in Excel in Python
You can get the text of comments using the ExcelCommentObject.Text property. The detailed steps are as follows.
- Create an object of the Workbook class.
- Load an Excel file using Workbook.LoadFromFile() method.
- Create a list to store the extracted comment text.
- Get the comments in the worksheet using Worksheet.Comments property.
- Traverse through the comments.
- Get the text of each comment using ExcelCommentObject.Text property and append it to the list.
- Save the content of the list to a text file.
- Python
from spire.xls import * from spire.xls.common import * # Create a Workbook object workbook = Workbook() # Load an Excel file workbook.LoadFromFile("Comments.xlsx") # Get the first worksheet worksheet = workbook.Worksheets[0] # Create a list to store the comment text comment_text = [] # Get all the comments in the worksheet comments = worksheet.Comments # Extract the text from each comment and add it to the list for i, comment in enumerate(comments, start=1): comment_text.append(f"Comment {i}:") text = comment.Text comment_text.append(text) comment_text.append("") # Write the comment text to a file with open("comments.txt", "w", encoding="utf-8") as file: file.write("\n".join(comment_text))
Extract Images from Comments in Excel in Python
To get the images embedded in Excel comments, you can use the ExcelCommentObject.Fill.Picture property. The detailed steps are as follows.
- Create an object of the Workbook class.
- Load an Excel file using Workbook.LoadFromFile() method.
- Get a specific comment in the worksheet using Worksheet.Comments[index] property.
- Get the embedded image in the comment using ExcelCommentObject.Fill.Picture property.
- Save the image to an image file.
- Python
from spire.xls import * from spire.xls.common import * # Create a Workbook object workbook = Workbook() # Load an Excel file workbook.LoadFromFile("ImageComment.xlsx") # Get the first worksheet worksheet = workbook.Worksheets[0] # Get a specific comment in the worksheet comment = worksheet.Comments[0] # Extract the image from the comment and save it to an image file image = comment.Fill.Picture image.Save("CommentImage/Comment.png")
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Python: Convert Excel XLS to XLSX and Vice Versa
Excel has been a widely used tool for data organization and analysis for many years. Over time, Microsoft has introduced different file formats for storing Excel data, the most common being the older XLS format and the more modern XLSX format.
The XLS format, introduced in the late 1990s, had certain limitations, such as a file size limit of 65,536 rows and 256 columns, and a maximum of 65,000 unique styles. The XLSX format, introduced in 2007, addressed these limitations by allowing for larger file sizes, more rows and columns, and expanded style capabilities. While XLSX is now the standard format, there are still many existing XLS files that need to be accessed and used, which makes the ability to convert between these formats an essential skill. In this article, we will explain how to convert Excel XLS to XLSX and vice versa in Python using Spire.XLS for Python.
Install Spire.XLS for Python
This scenario requires Spire.XLS for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.XLS
If you are unsure how to install, please refer to this tutorial: How to Install Spire.XLS for Python on Windows
Convert XLSX to XLS in Python
To convert an XLSX file to XLS format, you can use the Workbook.SaveToFile(fileName, ExcelVersion.Version97to2003) method. The ExcelVersion.Version97to2003 parameter specifies that the workbook should be saved in the Excel 97-2003 (XLS) format. The detailed steps are as follows.
- Create an object of the Workbook class.
- Load an XLSX file using the Workbook.LoadFromFile() method.
- Save the XLSX file to XLS format using the Workbook.SaveToFile(fileName, ExcelVersion.Version97to2003) method.
- Python
from spire.xls import * from spire.xls.common import * # Specify the input and output file paths inputFile = "Sample1.xlsx" outputFile = "XlsxToXls.xls" # Create a Workbook object workbook = Workbook() # Load the XLSX file workbook.LoadFromFile(inputFile) # Save the XLSX file to XLS format workbook.SaveToFile(outputFile, ExcelVersion.Version97to2003) workbook.Dispose()
Convert XLS to XLSX in Python
To convert an XLS file to XLSX format, you need to specify the target Excel version to a version higher than 97-2003, such as 2007 (ExcelVersion.Version2007), 2010 (ExcelVersion.Version2010), 2013 (ExcelVersion.Version2013), or 2016 (ExcelVersion.Version2016). The detailed steps are as follows.
- Create an object of the Workbook class.
- Load an XLS file using the Workbook.LoadFromFile() method.
- Save the XLS file to an Excel 2016 (XLSX) file using the Workbook.SaveToFile(fileName, ExcelVersion.Version2016) method.
- Python
from spire.xls import * from spire.xls.common import * # Specify the input and output file paths inputFile = "Sample2.xls" outputFile = "XlsToXlsx.xlsx" # Create a Workbook object workbook = Workbook() # Load the XLS file workbook.LoadFromFile(inputFile) # Save the XLS file to XLSX format workbook.SaveToFile(outputFile, ExcelVersion.Version2016) workbook.Dispose()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Python: Modify Content Controls in a Word Document
Word documents leverage Content Control technology to infuse dynamic vitality into document content, offering users enhanced flexibility and convenience when editing and managing documents. These controls, serving as interactive elements, empower users to freely add, remove, or adjust specified content sections while preserving the integrity of the document structure, thereby facilitating agile iterations and personalized customization of document content. This article will guide you how to use Spire.Doc for Python to modify content controls in Word documents within a Python project.
- Modify Content Controls in the Body using Python
- Modify Content Controls within Paragraphs using Python
- Modify Content Controls Wrapping Table Rows using Python
- Modify Content Controls Wrapping Table Cells using Python
- Modify Content Controls within Table Cells using Python
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip command.
pip install Spire.Doc
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows
Modify Content Controls in the Body using Python
In Spire.Doc, the object type for the body content control is StructureDocumentTag. To modify these controls, one needs to traverse the Section.Body.ChildObjects collection to locate objects of type StructureDocumentTag. Below are the detailed steps:
- Create a Document object.
- Use the Document.LoadFromFile() method to load a Word document into memory.
- Retrieve the body of a section in the document using Section.Body.
- Traverse the collection of child objects within Body.ChildObjects, identifying those that are of type StructureDocumentTag.
- Within the StructureDocumentTag.ChildObjects sub-collection, perform modifications based on the type of each child object.
- Finally, utilize the Document.SaveToFile() method to save the changes back to the Word document.
- Python
from spire.doc import * from spire.doc.common import * # Create a new document object doc = Document() # Load the document content from a file doc.LoadFromFile("Sample1.docx") # Get the body of the document body = doc.Sections.get_Item(0).Body # Create lists for paragraphs and tables paragraphs = [] tables = [] for i in range(body.ChildObjects.Count): obj = body.ChildObjects.get_Item(i) # If it is a StructureDocumentTag object if obj.DocumentObjectType == DocumentObjectType.StructureDocumentTag: sdt = (StructureDocumentTag)(obj) # If the tag is "c1" or the alias is "c1" if sdt.SDTProperties.Tag == "c1" or sdt.SDTProperties.Alias == "c1": for j in range(sdt.ChildObjects.Count): child_obj = sdt.ChildObjects.get_Item(j) # If it is a paragraph object if child_obj.DocumentObjectType == DocumentObjectType.Paragraph: paragraphs.append(child_obj) # If it is a table object elif child_obj.DocumentObjectType == DocumentObjectType.Table: tables.append(child_obj) # Modify the text content of the first paragraph if paragraphs: (Paragraph)(paragraphs[0]).Text = "Spire.Doc for Python is a totally independent Python Word class library which doesn't require Microsoft Office installed on system." if tables: # Reset the cells of the first table (Table)(tables[0]).ResetCells(5, 4) # Save the modified document to a file doc.SaveToFile("ModifyBodyContentControls.docx", FileFormat.Docx2016) # Release document resources doc.Close() doc.Dispose()
Modify Content Controls within Paragraphs using Python
In Spire.Doc, the object type for content controls within paragraphs is StructureDocumentTagInline. To modify these, you would traverse the Paragraph.ChildObjects collection to locate objects of type StructureDocumentTagInline. Here are the detailed steps:
- Instantiate a Document object.
- Load a Word document using the Document.LoadFromFile() method.
- Get the body of a section in the document via Section.Body.
- Retrieve the first paragraph of the text body using Body.Paragraphs.get_Item(0).
- Traverse the collection of child objects within Paragraph.ChildObjects, identifying those that are of type StructureDocumentTagInline.
- Within the StructureDocumentTagInline.ChildObjects sub-collection, execute modification operations according to the type of each child object.
- Save the changes back to the Word document using the Document.SaveToFile() method.
- Python
from spire.doc import * from spire.doc.common import * # Create a new Document object doc = Document() # Load document content from a file doc.LoadFromFile("Sample2.docx") # Get the body of the document body = doc.Sections.get_Item(0).Body # Get the first paragraph in the body paragraph = body.Paragraphs.get_Item(0) # Iterate through child objects in the paragraph for i in range(paragraph.ChildObjects.Count): obj = paragraph.ChildObjects.get_Item(i) # Check if the child object is StructureDocumentTagInline if obj.DocumentObjectType == DocumentObjectType.StructureDocumentTagInline: # Convert the child object to StructureDocumentTagInline type structure_document_tag_inline = (StructureDocumentTagInline)(obj) # Check if the Tag or Alias property is "text1" if structure_document_tag_inline.SDTProperties.Tag == "text1": # Iterate through child objects in the StructureDocumentTagInline object for j in range(structure_document_tag_inline.ChildObjects.Count): obj2 = structure_document_tag_inline.ChildObjects.get_Item(j) # Check if the child object is a TextRange object if obj2.DocumentObjectType == DocumentObjectType.TextRange: # Convert the child object to TextRange type range = (TextRange)(obj2) # Set the text content to a specified content range.Text = "97-2003/2007/2010/2013/2016/2019" # Check if the Tag or Alias property is "logo1" if structure_document_tag_inline.SDTProperties.Tag == "logo1": # Iterate through child objects in the StructureDocumentTagInline object for j in range(structure_document_tag_inline.ChildObjects.Count): obj2 = structure_document_tag_inline.ChildObjects.get_Item(j) # Check if the child object is an image if obj2.DocumentObjectType == DocumentObjectType.Picture: # Convert the child object to DocPicture type doc_picture = (DocPicture)(obj2) # Load a specified image doc_picture.LoadImage("DOC-Python.png") # Set the width and height of the image doc_picture.Width = 100 doc_picture.Height = 100 # Save the modified document to a new file doc.SaveToFile("ModifiedContentControlsInParagraph.docx", FileFormat.Docx2016) # Release resources of the Document object doc.Close() doc.Dispose()
Modify Content Controls Wrapping Table Rows using Python
In Spire.Doc, the object type for content controls within table rows is StructureDocumentTagRow. To modify these controls, you need to traverse the Table.ChildObjects collection to find objects of type StructureDocumentTagRow. Here are the detailed steps:
- Create a Document object.
- Load a Word document using the Document.LoadFromFile() method.
- Retrieve the body of a section within the document using Section.Body.
- Obtain the first table in the text body via Body.Tables.get_Item(0).
- Traverse the collection of child objects within Table.ChildObjects, identifying those that are of type StructureDocumentTagRow.
- Access StructureDocumentTagRow.Cells collection to iterate through the cells within this controlled row, and then execute the appropriate modification actions on the cell contents.
- Lastly, use the Document.SaveToFile() method to persist the changes made to the document.
- Python
from spire.doc import * from spire.doc.common import * # Create a new document object doc = Document() # Load the document from a file doc.LoadFromFile("Sample3.docx") # Get the body of the document body = doc.Sections.get_Item(0).Body # Get the first table table = body.Tables.get_Item(0) # Iterate through the child objects in the table for i in range(table.ChildObjects.Count): obj = table.ChildObjects.get_Item(i) # Check if the child object is of type StructureDocumentTagRow if obj.DocumentObjectType == DocumentObjectType.StructureDocumentTagRow: # Convert the child object to a StructureDocumentTagRow object structureDocumentTagRow = (StructureDocumentTagRow)(obj) # Check if the Tag or Alias property of the StructureDocumentTagRow is "row1" if structureDocumentTagRow.SDTProperties.Tag == "row1": # Clear the paragraphs in the cell structureDocumentTagRow.Cells.get_Item(0).Paragraphs.Clear() # Add a paragraph in the cell and set the text textRange = structureDocumentTagRow.Cells.get_Item(0).AddParagraph().AppendText("Arts") textRange.CharacterFormat.TextColor = Color.get_Blue() # Save the modified document to a file doc.SaveToFile("ModifiedTableRowContentControl.docx", FileFormat.Docx2016) # Release document resources doc.Close() doc.Dispose()
Modify Content Controls Wrapping Table Cells using Python
In Spire.Doc, the object type for content controls within table cells is StructureDocumentTagCell. To manipulate these controls, you need to traverse the TableRow.ChildObjects collection to locate objects of type StructureDocumentTagCell. Here are the detailed steps:
- Create a Document object.
- Load a Word document using the Document.LoadFromFile() method.
- Retrieve the body of a section in the document using Section.Body.
- Obtain the first table in the body using Body.Tables.get_Item(0).
- Traverse the collection of rows in the table.
- Within each TableRow, traverse its child objects TableRow.ChildObjects to identify those of type StructureDocumentTagCell.
- Access StructureDocumentTagCell.Paragraphs collection. This allows you to iterate through the paragraphs within the cell and apply the necessary modification operations to the content.
- Finally, use the Document.SaveToFile() method to save the modified document.
- Python
from spire.doc import * from spire.doc.common import * # Create a new document object doc = Document() # Load the document from a file doc.LoadFromFile("Sample4.docx") # Get the body of the document body = doc.Sections.get_Item(0).Body # Get the first table in the document table = body.Tables.get_Item(0) # Iterate through the rows of the table for i in range(table.Rows.Count): row = table.Rows.get_Item(i) # Iterate through the child objects in each row for j in range(row.ChildObjects.Count): obj = row.ChildObjects.get_Item(j) # Check if the child object is a StructureDocumentTagCell if obj.DocumentObjectType == DocumentObjectType.StructureDocumentTagCell: # Convert the child object to StructureDocumentTagCell type structureDocumentTagCell = (StructureDocumentTagCell)(obj) # Check if the Tag or Alias property of structureDocumentTagCell is "cell1" if structureDocumentTagCell.SDTProperties.Tag == "cell1": # Clear the paragraphs in the cell structureDocumentTagCell.Paragraphs.Clear() # Add a new paragraph and add text to it textRange = structureDocumentTagCell.AddParagraph().AppendText("92") textRange.CharacterFormat.TextColor = Color.get_Blue() # Save the modified document to a new file doc.SaveToFile("ModifiedTableCellContentControl.docx", FileFormat.Docx2016) # Dispose of the document object doc.Close() doc.Dispose()
Modify Content Controls within Table Cells using Python
This case demonstrates modifying content controls within paragraphs inside table cells. The process involves navigating to the paragraph collection TableCell.Paragraphs within each cell, then iterating through each paragraph's child objects (Paragraph.ChildObjects) to locate StructureDocumentTagInline objects for modification. Here are the detailed steps:
- Initiate a Document instance.
- Use the Document.LoadFromFile() method to load a Word document.
- Retrieve the body of a section in the document with Section.Body.
- Obtain the first table in the body via Body.Tables.get_Item(0).
- Traverse the table rows collection (Table.Rows), engaging with each TableRow object.
- For each TableRow, navigate its cells collection (TableRow.Cells), entering each TableCell object.
- Within each TableCell, traverse its paragraph collection (TableCell.Paragraphs), examining each Paragraph object.
- In each paragraph, traverse its child objects (Paragraph.ChildObjects), identifying StructureDocumentTagInline instances for modification.
- Within the StructureDocumentTagInline.ChildObjects collection, apply the appropriate edits based on the type of each child object.
- Finally, utilize Document.SaveToFile() to commit the changes to the document.
- Python
from spire.doc import * from spire.doc.common import * # Create a new Document object doc = Document() # Load document content from file doc.LoadFromFile("Sample5.docx") # Get the body of the document body = doc.Sections.get_Item(0).Body # Get the first table table = body.Tables.get_Item(0) # Iterate through the rows of the table for r in range(table.Rows.Count): row = table.Rows.get_Item(r) for c in range(row.Cells.Count): cell = row.Cells.get_Item(c) for p in range(cell.Paragraphs.Count): paragraph = cell.Paragraphs.get_Item(p) for i in range(paragraph.ChildObjects.Count): obj = paragraph.ChildObjects.get_Item(i) # Check if the child object is of type StructureDocumentTagInline if obj.DocumentObjectType == DocumentObjectType.StructureDocumentTagInline: # Convert to StructureDocumentTagInline object structure_document_tag_inline = (StructureDocumentTagInline)(obj) # Check if the Tag or Alias property of StructureDocumentTagInline is "test1" if structure_document_tag_inline.SDTProperties.Tag == "test1": # Iterate through the child objects of StructureDocumentTagInline for j in range(structure_document_tag_inline.ChildObjects.Count): obj2 = structure_document_tag_inline.ChildObjects.get_Item(j) # Check if the child object is of type TextRange if obj2.DocumentObjectType == DocumentObjectType.TextRange: # Convert to TextRange object textRange = (TextRange)(obj2) # Set the text content textRange.Text = "89" # Set text color textRange.CharacterFormat.TextColor = Color.get_Blue() # Save the modified document to a new file doc.SaveToFile("ModifiedContentControlInParagraphOfTableCell.docx", FileFormat.Docx2016) # Dispose of the Document object resources doc.Close() doc.Dispose()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Python: Create a Table Of Contents for a Newly Created Word Document
Creating a table of contents in a Word document significantly enhances its navigability and readability. It serves as a road map for the document, enabling readers to quickly overview the structure and grasp the content framework. This feature facilitates easy navigation for users to jump to any section within the document, which is particularly valuable for lengthy reports, papers, or manuals. It not only saves readers time in locating information but also augments the professionalism of the document and enhances the user experience. Moreover, a table of contents is easy to maintain and update; following any restructuring of the document, it can be swiftly revised to reflect the latest content organization, ensuring coherence and accuracy throughout the document. This article will demonstrate how to use Spire.Doc for Python to create a table of contents in a newly created Word document within a Python project.
- Python Create a Table Of Contents Using Heading Styles
- Python Create a Table Of Contents Using Outline Level Styles
- Python Create a Table Of Contents Using Image Captions
- Python Create a Table Of Contents Using Table Captions
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.Doc
If you are unsure how to install, please refer to: How to Install Spire.Doc for Python on Windows
Python Create a Table Of Contents Using Heading Styles
Creating a table of contents using heading styles is a default method in Word documents to automatically generate a table of contents by utilizing different levels of heading styles to mark titles and sub-titles within the document, followed by leveraging Word's table of contents feature to automatically populate the contents. Here are the detailed steps:
- Create a Document object.
- Add a section using the Document.AddSection() method.
- Add a paragraph using the Section.AddParagraph() method.
- Create a table of contents object using the Paragraph.AppendTOC(int lowerLevel, int upperLevel) method.
- Create a CharacterFormat object and set the font.
- Apply a heading style to the paragraph using the Paragraph.ApplyStyle(BuiltinStyle.Heading1) method.
- Add text content using the Paragraph.AppendText() method.
- Apply character formatting to the text using the TextRange.ApplyCharacterFormat() method.
- Update the table of contents using the Document.UpdateTableOfContents() method.
- Save the document using the Document.SaveToFile() method.
- Python
from spire.doc import * from spire.doc.common import * # Create a new document object doc = Document() # Add a section to the document section = doc.AddSection() # Append a Table of Contents (TOC) paragraph TOC_paragraph = section.AddParagraph() TOC_paragraph.AppendTOC(1, 3) # Create and set character format objects for font character_format1 = CharacterFormat(doc) character_format1.FontName = "Microsoft YaHei" character_format2 = CharacterFormat(doc) character_format2.FontName = "Microsoft YaHei" character_format2.FontSize = 12 # Add a paragraph with Heading 1 style paragraph = section.Body.AddParagraph() paragraph.ApplyStyle(BuiltinStyle.Heading1) # Add text and apply character formatting text_range1 = paragraph.AppendText("Overview") text_range1.ApplyCharacterFormat(character_format1) # Insert normal content paragraph = section.Body.AddParagraph() text_range2 = paragraph.AppendText("Spire.Doc for Python is a professional Python Word development component that enables developers to easily integrate Word document creation, reading, editing, and conversion functionalities into their own Python applications. As a completely standalone component, Spire.Doc for Python does not require the installation of Microsoft Word on the runtime environment.") # Add a paragraph with Heading 1 style paragraph = section.Body.AddParagraph() paragraph.ApplyStyle(BuiltinStyle.Heading1) text_range1 = paragraph.AppendText("Main Functions") text_range1.ApplyCharacterFormat(character_format1) # Add a paragraph with Heading 2 style paragraph = section.Body.AddParagraph() paragraph.ApplyStyle(BuiltinStyle.Heading2) textRange1 = paragraph.AppendText("Only Spire.Doc, No Microsoft Office Automation") textRange1.ApplyCharacterFormat(character_format1) # Add regular content paragraph = section.Body.AddParagraph() textRange2 = paragraph.AppendText("Spire.Doc for Python is a totally independent Python Word class library which doesn't require Microsoft Office installed on system. Microsoft Office Automation is proved to be unstable, slow and not scalable to produce MS Word documents. Spire.Doc for Python is many times faster than Microsoft Word Automation and with much better stability and scalability.") textRange2.ApplyCharacterFormat(character_format2) # Add a paragraph with Heading 3 style paragraph = section.Body.AddParagraph() paragraph.ApplyStyle(BuiltinStyle.Heading3) textRange1 = paragraph.AppendText("Word Versions") textRange1.ApplyCharacterFormat(character_format1) paragraph = section.Body.AddParagraph() textRange2 = paragraph.AppendText("Word97-03 Word2007 Word2010 Word2013 Word2016 Word2019") textRange2.ApplyCharacterFormat(character_format2) # Add a paragraph with Heading 2 style paragraph = section.Body.AddParagraph() paragraph.ApplyStyle(BuiltinStyle.Heading2) textRange1 = paragraph.AppendText("Convert File Documents with High Quality") textRange1.ApplyCharacterFormat(character_format1) # Add regular content paragraph = section.Body.AddParagraph() textRange2 = paragraph.AppendText("By using Spire.Doc for Python, users can save Word Doc/Docx to stream, save as web response and convert Word Doc/Docx to XML, Markdown, RTF, EMF, TXT, XPS, EPUB, HTML, SVG, ODT and vice versa. Spire.Doc for Python also supports to convert Word Doc/Docx to PDF and HTML to image.") textRange2.ApplyCharacterFormat(character_format2) # Add a paragraph with Heading 2 style paragraph = section.Body.AddParagraph() paragraph.ApplyStyle(BuiltinStyle.Heading2) extRange1 = paragraph.AppendText("Other Technical Features") textRange1.ApplyCharacterFormat(character_format1) # Add regular content paragraph = section.Body.AddParagraph() textRange2 = paragraph.AppendText("By using Spire.Doc for Python, developers can build any type of a 64-bit Python application to create and handle Word documents.") textRange2.ApplyCharacterFormat(character_format2) # Update the table of contents doc.UpdateTableOfContents() # Save the document doc.SaveToFile("CreateTOCUsingHeadingStyles.docx", FileFormat.Docx2016) # Release resources doc.Dispose()
Python Create a Table Of Contents Using Outline Level Styles
In a Word document, you can create a table of contents using outline level styles. You can assign an outline level to a paragraph using the ParagraphFormat.OutlineLevel property. Afterwards, you apply these outline levels to the rules for generating the table of contents using the TableOfContent.SetTOCLevelStyle() method. Here's a detailed steps:
- Create a Document object.
- Add a section using the Document.AddSection() method.
- Create a ParagraphStyle object and set the outline level using ParagraphStyle.ParagraphFormat.OutlineLevel = OutlineLevel.Level1.
- Add the created ParagraphStyle object to the document using the Document.Styles.Add() method.
- Add a paragraph using the Section.AddParagraph() method.
- Create a table of contents object using the Paragraph.AppendTOC(int lowerLevel, int upperLevel) method.
- Set the default setting for creating the table of contents with heading styles to False, TableOfContent.UseHeadingStyles = false.
- Apply the outline level style to the table of contents rules using the TableOfContent.SetTOCLevelStyle(int levelNumber, string styleName) method.
- Create a CharacterFormat object and set the font.
- Apply the style to the paragraph using the Paragraph.ApplyStyle(ParagraphStyle.Name) method.
- Add text content using the Paragraph.AppendText() method.
- Apply character formatting to the text using the TextRange.ApplyCharacterFormat() method.
- Update the table of contents using the Document.UpdateTableOfContents() method.
- Save the document using the Document.SaveToFile() method.
- Python
from spire.doc import * from spire.doc.common import * # Create a document object doc = Document() # Add a section to the document section = doc.AddSection() # Define Outline Level 1 titleStyle1 = ParagraphStyle(doc) titleStyle1.Name = "T1S" titleStyle1.ParagraphFormat.OutlineLevel = OutlineLevel.Level1 titleStyle1.CharacterFormat.Bold = True titleStyle1.CharacterFormat.FontName = "Microsoft YaHei" titleStyle1.CharacterFormat.FontSize = 18 titleStyle1.ParagraphFormat.HorizontalAlignment = HorizontalAlignment.Left doc.Styles.Add(titleStyle1) # Define Outline Level 2 titleStyle2 = ParagraphStyle(doc) titleStyle2.Name = "T2S" titleStyle2.ParagraphFormat.OutlineLevel = OutlineLevel.Level2 titleStyle2.CharacterFormat.Bold = True titleStyle2.CharacterFormat.FontName = "Microsoft YaHei" titleStyle2.CharacterFormat.FontSize = 16 titleStyle2.ParagraphFormat.HorizontalAlignment = HorizontalAlignment.Left doc.Styles.Add(titleStyle2) # Define Outline Level 3 titleStyle3 = ParagraphStyle(doc) titleStyle3.Name = "T3S" titleStyle3.ParagraphFormat.OutlineLevel = OutlineLevel.Level3 titleStyle3.CharacterFormat.Bold = True titleStyle3.CharacterFormat.FontName = "Microsoft YaHei" titleStyle3.CharacterFormat.FontSize = 14 titleStyle3.ParagraphFormat.HorizontalAlignment = HorizontalAlignment.Left doc.Styles.Add(titleStyle3) # Add a paragraph TOCparagraph = section.AddParagraph() toc = TOCparagraph.AppendTOC(1, 3) toc.UseHeadingStyles = False toc.UseHyperlinks = True toc.UseTableEntryFields = False toc.RightAlignPageNumbers = True toc.SetTOCLevelStyle(1, titleStyle1.Name) toc.SetTOCLevelStyle(2, titleStyle2.Name) toc.SetTOCLevelStyle(3, titleStyle3.Name) # Define character format characterFormat = CharacterFormat(doc) characterFormat.FontName = "Microsoft YaHei" characterFormat.FontSize = 12 # Add a paragraph and apply outline level style 1 paragraph = section.Body.AddParagraph() paragraph.ApplyStyle(titleStyle1.Name) paragraph.AppendText("Overview") # Add a paragraph and set the text content paragraph = section.Body.AddParagraph() textRange = paragraph.AppendText("Spire.Doc for Python is a professional Word Python API specifically designed for developers to create, read, write, convert, and compare Word documents with fast and high-quality performance.") textRange.ApplyCharacterFormat(characterFormat) # Add a paragraph and apply outline level style 1 paragraph = section.Body.AddParagraph() paragraph.ApplyStyle(titleStyle1.Name) paragraph.AppendText("Main Functions") # Add a paragraph and apply outline level style 2 paragraph = section.Body.AddParagraph() paragraph.ApplyStyle(titleStyle2.Name) paragraph.AppendText("Only Spire.Doc, No Microsoft Office Automation") # Add a paragraph and set the text content paragraph = section.Body.AddParagraph() textRange = paragraph.AppendText("Spire.Doc for Python is a totally independent Python Word class library which doesn't require Microsoft Office installed on system. Microsoft Office Automation is proved to be unstable, slow and not scalable to produce MS Word documents. Spire.Doc for Python is many times faster than Microsoft Word Automation and with much better stability and scalability.") textRange.ApplyCharacterFormat(characterFormat) # Add a paragraph and apply outline level style 3 paragraph = section.Body.AddParagraph() paragraph.ApplyStyle(titleStyle3.Name) paragraph.AppendText("Word Versions") # Add a paragraph and set the text content paragraph = section.Body.AddParagraph() textRange = paragraph.AppendText("Word97-03 Word2007 Word2010 Word2013 Word2016 Word2019") textRange.ApplyCharacterFormat(characterFormat) # Add a paragraph and apply outline level style 2 paragraph = section.Body.AddParagraph() paragraph.ApplyStyle(titleStyle2.Name) paragraph.AppendText("Convert File Documents with High Quality") # Add a paragraph and set the text content paragraph = section.Body.AddParagraph() textRange = paragraph.AppendText("By using Spire.Doc for Python, users can save Word Doc/Docx to stream, save as web response and convert Word Doc/Docx to XML, RTF, EMF, TXT, XPS, EPUB, HTML, SVG, ODT and vice versa. Spire.Doc for Python also supports to convert Word Doc/Docx to PDF and HTML to image.") textRange.ApplyCharacterFormat(characterFormat) # Add a paragraph and apply outline level style 2 paragraph = section.Body.AddParagraph() paragraph.ApplyStyle(titleStyle2.Name) paragraph.AppendText("Other Technical Features") # Add a paragraph and set the text content paragraph = section.Body.AddParagraph() textRange = paragraph.AppendText("By using Spire.Doc for Python, developers can build any type of a 64-bit Python application to create and handle Word documents.") textRange.ApplyCharacterFormat(characterFormat) # Update the table of contents doc.UpdateTableOfContents() # Save the document doc.SaveToFile("CreateTOCUsingOutlineStyles.docx", FileFormat.Docx2016) # Release resources doc.Dispose()
Python Create a Table Of Contents Using Image Captions
Using the Spire.Doc library, you can create a table of contents based on image captions by employing the TableOfContent(Document, "\\h \\z \\c \"Picture\"") method. Below are the detailed steps:
- Create a Document object.
- Add a section using the Document.AddSection() method.
- Create a table of content object with tocForImage = new TableOfContent(Document, " \\h \\z \\c \"Picture\"") and specify the style of the table of contents.
- Add a paragraph using the Section.AddParagraph() method.
- Add the table of content object to the paragraph using the Paragraph.Items.Add(tocForImage) method.
- Add a field separator using the Paragraph.AppendFieldMark(FieldMarkType.FieldSeparator) method.
- Add the text content "TOC" using the Paragraph.AppendText("TOC") method.
- Add a field end mark using the Paragraph.AppendFieldMark(FieldMarkType.FieldEnd) method.
- Add an image using the Paragraph.AppendPicture() method.
- Add a caption paragraph for the image using the DocPicture.AddCaption() method, including product information and formatting.
- Update the table of contents to reflect changes in the document using the Document.UpdateTableOfContents(tocForImage) method.
- Save the document using the Document.SaveToFile() method.
- Python
from spire.doc import * from spire.doc.common import * # Create a new document object doc = Document() # Add a section to the document section = doc.AddSection() # Create a table of content object for images tocForImage = TableOfContent(doc, " \\h \\z \\c \"Picture\"") # Add a paragraph to the section tocParagraph = section.Body.AddParagraph() # Add the TOC object to the paragraph tocParagraph.Items.Add(tocForImage) # Add a field separator tocParagraph.AppendFieldMark(FieldMarkType.FieldSeparator) # Add text content tocParagraph.AppendText("TOC") # Add a field end mark tocParagraph.AppendFieldMark(FieldMarkType.FieldEnd) # Add a blank paragraph to the section section.Body.AddParagraph() # Add a paragraph to the section paragraph = section.Body.AddParagraph() # Add an image docPicture = paragraph.AppendPicture("images/DOC-Python.png") docPicture.Width = 100 docPicture.Height = 100 # Add a caption paragraph for the image obj = docPicture.AddCaption("Picture",CaptionNumberingFormat.Number,CaptionPosition.BelowItem) paragraph = (Paragraph)(obj) paragraph.AppendText(" Spire.Doc for Python product") paragraph.Format.AfterSpacing = 20 # Continue adding paragraphs to the section paragraph = section.Body.AddParagraph() docPicture = paragraph.AppendPicture("images/PDF-Python.png") docPicture.Width = 100 docPicture.Height = 100 obj = docPicture.AddCaption("Picture",CaptionNumberingFormat.Number,CaptionPosition.BelowItem) paragraph = (Paragraph)(obj) paragraph.AppendText(" Spire.PDF for Python product") paragraph.Format.AfterSpacing = 20 paragraph = section.Body.AddParagraph() docPicture = paragraph.AppendPicture("images/XLS-Python.png") docPicture.Width = 100 docPicture.Height = 100 obj = docPicture.AddCaption("Picture",CaptionNumberingFormat.Number,CaptionPosition.BelowItem) paragraph = (Paragraph)(obj) paragraph.AppendText(" Spire.XLS for Python product") paragraph.Format.AfterSpacing = 20 paragraph = section.Body.AddParagraph() docPicture = paragraph.AppendPicture("images/PPT-Python.png") docPicture.Width = 100 docPicture.Height = 100 obj = docPicture.AddCaption("Picture",CaptionNumberingFormat.Number,CaptionPosition.BelowItem) paragraph = (Paragraph)(obj) paragraph.AppendText(" Spire.Presentation for Python product") paragraph.Format.AfterSpacing = 20 # Update the table of contents doc.UpdateTableOfContents(tocForImage) # Save the document to a file doc.SaveToFile("CreateTOCWithImageCaptions.docx", FileFormat.Docx2016) # Dispose of the document object doc.Dispose()
Python Create a Table Of Contents Using Table Captions
Similarly, you can create a table of contents based on table captions by employing the TableOfContent(Document, " \\h \\z \\c \"Table\"") method. Here are the detailed steps:
- Create a Document object.
- Add a section using the Document.AddSection() method.
- Create a table of content object tocForTable = new TableOfContent(Document, " \\h \\z \\c \"Table\"") and specify the style of the table of contents.
- Add a paragraph using the Section.AddParagraph() method.
- Add the table of content object to the paragraph using the Paragraph.Items.Add(tocForTable) method.
- Add a field separator using the Paragraph.AppendFieldMark(FieldMarkType.FieldSeparator) method.
- Add the text content "TOC" using the Paragraph.AppendText("TOC") method.
- Add a field end mark using the Paragraph.AppendFieldMark(FieldMarkType.FieldEnd) method.
- Add a table using the Section.AddTable() method and set the number of rows and columns using the Table.ResetCells(int rowsNum, int columnsNum) method.
- Add a table caption paragraph using the Table.AddCaption() method, including product information and formatting.
- Update the table of contents to reflect changes in the document using the Document.UpdateTableOfContents(tocForTable) method.
- Save the document using the Document.SaveToFile() method.
- Python
from spire.doc import * from spire.doc.common import * # Create a new document doc = Document() # Add a section to the document section = doc.AddSection() # Create a TableOfContent object tocForTable = TableOfContent(doc, " \\h \\z \\c \"Table\"") # Add a paragraph in the section to place the TableOfContent object tocParagraph = section.Body.AddParagraph() tocParagraph.Items.Add(tocForTable) tocParagraph.AppendFieldMark(FieldMarkType.FieldSeparator) tocParagraph.AppendText("TOC") tocParagraph.AppendFieldMark(FieldMarkType.FieldEnd) # Add two empty paragraphs in the section section.Body.AddParagraph() section.Body.AddParagraph() # Add a table in the section table = section.Body.AddTable(True) table.ResetCells(1, 3) # Add a caption paragraph for the table obj = table.AddCaption("Table", CaptionNumberingFormat.Number, CaptionPosition.BelowItem) paragraph = (Paragraph)(obj) paragraph.AppendText(" One row three columns") paragraph.Format.AfterSpacing = 20 # Add a new table in the section table = section.Body.AddTable(True) table.ResetCells(3, 3) # Add a caption paragraph for the second table obj = table.AddCaption("Table", CaptionNumberingFormat.Number, CaptionPosition.BelowItem) paragraph = (Paragraph)(obj) paragraph.AppendText(" Three rows three columns") paragraph.Format.AfterSpacing = 20 # Add another new table in the section table = section.Body.AddTable(True) table.ResetCells(5, 3) # Add a caption paragraph for the third table obj = table.AddCaption("Table", CaptionNumberingFormat.Number, CaptionPosition.BelowItem) paragraph = (Paragraph)(obj) paragraph.AppendText(" Five rows three columns") paragraph.Format.AfterSpacing = 20 # Update the table of contents doc.UpdateTableOfContents(tocForTable) # Save the document to a specified file doc.SaveToFile("CreateTOCUsingTableCaptions.docx", FileFormat.Docx2016) # Dispose resources doc.Dispose()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Python: Rearrange Slides in a PowerPoint Document
Rearranging slides in a PowerPoint presentation is a simple but essential skill. Whether you need to change the order of your points, group related slides together, or move a slide to a different location, the ability to efficiently reorganize your slides can help you create a more coherent and impactful presentation.
In this article, you will learn how to rearrange slides in a PowerPoint document in Python using Spire.Presentation for Python.
Install Spire.Presentation for Python
This scenario requires Spire.Presentation for Python and plum-dispatch v1.7.4. They can be easily installed in your system through the following pip command.
pip install Spire.Presentation
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Presentation for Python on Windows
Rearrange Slides in a PowerPoint Document in Python
To reorder the slides in PowerPoint, two Presentation objects were created - one for loading the original document, and one for creating a new document. By copying the slides from the original document to the new one in the desired sequence, the slide order could be easily rearranged.
The following are the steps to rearrange slides in a PowerPoint document using Python.
- Create a Presentation object.
- Load a PowerPoint document using Presentation.LoadFromFile() method.
- Specify the slide order within a list.
- Create another Presentation object for creating a new presentation.
- Add the slides from the original document to the new presentation in the specified order using Presentation.Slides.AppendBySlide() method.
- Save the new presentation to a PPTX file using Presentation.SaveToFile() method.
- Python
from spire.presentation.common import * from spire.presentation import * # Create a Presentation object presentation = Presentation() # Load a PowerPoint file presentation.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.pptx") # Specify the new slide order within a list newSlideOrder = [4,2,1,3] # Create another Presentation object new_presentation = Presentation() # Remove the default slide new_presentation.Slides.RemoveAt(0) # Iterate through the list for i in range(len(newSlideOrder)): # Add the slides from the original PowerPoint file to the new PowerPoint document in the new order new_presentation.Slides.AppendBySlide(presentation.Slides[newSlideOrder[i] - 1]) # Save the new presentation to file new_presentation.SaveToFile("output/NewOrder.pptx", FileFormat.Pptx2019) # Dispose resources presentation.Dispose() new_presentation.Dispose()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Spire.Office 9.5.0 is released
We're excited to announce the release of Spire.Office 9.5.0. In this version, Spire.Doc supports converting Word to PDF while retaining form fields, Spire.XLS supports finding cells based on regular expressions, Spire.Presentation supports setting spacing between two columns. Meanwhile, tons of known issues have been fixed. More details are listed below.
In this version, the most recent versions of Spire.Doc, Spire.PDF, Spire.XLS, Spire.Presentation, Spire.Email, Spire.DocViewer, Spire.PDFViewer, Spire.Spreadsheet, Spire.OfficeViewer, Spire.DataExport, and Spire.Barcode are included.
DLL Versions:
- Spire.Doc.dll v12.5.5.0
- Spire.Pdf.dll v10.5.5.0
- Spire.XLS.dll v14.5.3.0
- Spire.Presentation.dll v9.5.3.0
- Spire.Barcode.dll v7.2.9.0
- Spire.Email.dll v6.5.10.0
- Spire.DocViewer.Forms.dll v8.7.10.0
- Spire.PdfViewer.Asp.dll v7.12.14.0
- Spire.PdfViewer.Forms.dll v7.12.14.0
- Spire.Spreadsheet.dll v7.4.6.0
- Spire.OfficeViewer.Forms.dll v8.7.12.0
- Spire.DataExport.dll 4.9.0.0
- Spire.DataExport.ResourceMgr.dll v2.1.0
Here is a list of changes made in this release
Spire.Doc
Category | ID | Description |
New feature | - | Supports preserving FormField (CheckBox, DropDown, TextFormField) and SDT (CheckBox, Text, RichText, DropDownList, ComboBox) data when converting Word to PDF.
ToPdfParameterList.PreserveFormFields = true; |
New feature | - | Supports setting three display modes for comments (Hide, ShowInBalloons, ShowInAnnotations) when converting Word to PDF.
Document.LayoutOptions.CommentDisplayMode = CommentDisplayMode.ShowInAnnotations; |
Bug | SPIREDOC-10152 | Fixes the issue that the table widths and fonts are not correct when converting HTML to Word. |
Bug | SPIREDOC-10363 | Fixes the issue that the program threw "ArgumentOutOfRangeException" when loading Word documents. |
Bug | SPIREDOC-10371 | Fixes the issue that headings were missing after converting Word to HTML. |
Bug | SPIREDOC-10376 | Fixes the issue that table borders were missing after converting Word to HTML. |
Bug | SPIREDOC-10402 | Fixes the issue that tables were missing when converting Word to HTML. |
Bug | SPIREDOC-10421 | Fixes the issue that the program threw "InvalidCastException" exception when comparing Word documents. |
Bug | SPIREDOC-10427 | Fixes the issue that the header was lost after adding a watermark to Word and saving it to PDF. |
Spire.PDF
Category | ID | Description |
Bug | SPIREPDF-6045 | Fixes the issue that the content was blurred when printing PDFs. |
Bug | SPIREPDF-6529 | Fixes the issue that the content was lost after replacing text. |
Bug | SPIREPDF-6654 | Fixes the issue that specific Chinese characters could not be drawn successfully. |
Bug | SPIREPDF-6655 | Fixes the issue that the background turned black after converting PDF to pictures. |
Bug | SPIREPDF-6657 | Fixes the issue that the standard validation failed after converting PDF to PDFA3A. |
Bug | SPIREPDF-6658 | Fixes the issue that some pages became blank after converting PDF to XPS and then converting it to PDF. |
Bug | SPIREPDF-6681 | Fixes the issue that the program threw System.NullReferenceException when converting OFD to PDF. |
Bug | SPIREPDF-6687 | Fixes the issue that the added signature was not visible in the signature menu bar on the left side. |
Bug | SPIREPDF-6697 | Fixes the issue that the content formatting was not correct after converting PDF to PowerPoint. |
Bug | SPIREPDF-6701 | Fixes the issue that the background color was incorrect after converting XPS to SVG. |
Bug | SPIREPDF-6707 | Fixes the issue that text replacement was incorrect. |
Bug | SPIREPDF-6714 | Fixes the issue that the content was lost after converting OFD to PDF. |
Bug | SPIREPDF-6715 | Fixes the issue that replacing text that crosses lines was incorrect. |
Bug | SPIREPDF-6727 | Fixes the issue that the program threw FileNotFoundException when converting OFD to PDF. |
Bug | SPIREPDF-6733 | Fixes the issue that the program threw InvalidOperationException when converting PDF to OFD in multiple threads. |
Bug | SPIREPDF-6734 | Fixes the issue that the color of stamps was not correct after converting OFD to PDF. |
Bug | SPIREPDF-6735 | Fixes the issue that replacing Chinese characters did not succeed. |
Spire.XLS
Category | ID | Description |
New feature | SPIREXLS-5128 | Supports adding images to the first page header and footer.
//Load image System.Drawing.Image bufferedImage = System.Drawing.Image.FromFile(inputFile_Img); //Set image on first page header and footer wb.Worksheets[0].PageSetup.FirstLeftHeaderImage = bufferedImage; wb.Worksheets[0].PageSetup.FirstLeftFooterImage = bufferedImage; wb.Worksheets[1].PageSetup.FirstCenterHeaderImage = bufferedImage; wb.Worksheets[1].PageSetup.FirstCenterFooterImage = bufferedImage; wb.Worksheets[2].PageSetup.FirstRightHeaderImage = bufferedImage; wb.Worksheets[2].PageSetup.FirstRightFooterImage = bufferedImage; |
New feature | SPIREXLS-5195 | Supports obtaining active selection range.
Worksheet worksheet = workbook.Worksheets[0]; string Information = null; foreach (CellRange range in worksheet.ActiveSelectionRange) { Information += "RangeAddressLocal:"+ range.RangeAddressLocal+"\r\n"; Information += "ColumnCount:" + range.ColumnCount + "\r\n"; Information += "ColumnWidth:" + range.ColumnWidth + "\r\n"; Information += "Column:" + range.Column + "\r\n"; Information += "RowCount:" + range.RowCount+ "\r\n"; Information += "RowHeight:" + range.RowHeight + "\r\n"; Information += "Row:" + range.Row + "\r\n"; } File.WriteAllText(outputFile_TXT,Information); |
New feature | SPIREXLS-5200 | Supports finding cells based on regular expressions.
CellRange[] ranges = sheet.FindAllString(".*test.", false, false, true); |
Bug | SPIREXLS-5075 | Fixes the issue that the image was lost after converting Excel to image. |
Bug | SPIREXLS-5151 | Fixes the issue that the content in the generated PDF document was lost after converting Excel to PDF on the Kirin system. |
Bug | SPIREXLS-5186 | Fixes the issue that the application threw the "System.NullPointerException" when converting sheet to image. |
Bug | SPIREXLS-5197 | Fixes the issue that the border obtained from merged area was incorrect. |
Bug | SPIREXLS-5198 | Fixes the issue that the text and alternative text obtained from checkboxes were incorrect. |
Bug | SPIREXLS-5214 | Fixes the issue that it failed to set the active cell using the SetActiveCell() method. |
Bug | SPIREXLS-5216 | Fixes the issue that the textboxes added to charts were not displayed. |
Bug | SPIREXLS-5218 | Fixes the issue that the name obtained from checkbox was incorrect. |
Bug | SPIREXLS-5225 | Fixes the issue that the mouse cursor position was incorrect after importing data into Excel using the InsertDataTable() method. |
Bug | SPIREXLS-5228 | Fixes the issue that some graphics and lines were lost after converting Excel document to PDF document. |
Bug | SPIREXLS-5234 | Fixes the issue that it failed to autofit columns using the AutoFitCoumns() method. |
Spire.Presentation
Category | ID | Description |
New feature | SPIREPPT-2497 | Adds the ColumnSpacing property to set the spacing between columns.
//The unit is point shape.TextFrame.ColumnSpacing = 20.50f; |
Bug | SPIREPPT-2493 | Fixes the issue that the application threw the "System.ArgumentException" when appending images to presentations. |
Bug | SPIREPPT-2498 | Fixes the issue that the shape was in opposite direction after converting PPTX to SVG. |
Bug | SPIREPPT-2500 | Fixes the issue that the gradient color of the shape was incorrect after converting PPTX to SVG. |
Python: Extract Tables from Word Documents
Word documents often contain valuable data in the form of tables, which can be used for reporting, data analysis, and record-keeping. However, manually extracting and transferring these tables to other formats can be a time-consuming and error-prone task. By automating this process using Python, we can save time, ensure accuracy, and maintain consistency. Spire.Doc for Python provides a seamless solution for the table extraction task, making it effortless to create accessible and manageable files with data from Word document tables. This article will demonstrate how to leverage Spire.Doc for Python to extract tables from Word documents and write them into text files and Excel worksheets.
- Extract Tables from Word Documents to Text Files with Python
- Extract Tables from Word Documents to Excel Workbooks with Python
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.Doc
If you are unsure how to install, please refer to: How to Install Spire.Doc for Python on Windows
Extract Tables from Word Documents to Text Files with Python
Spire.Doc for Python offers the Section.Tables property to retrieve a collection of tables within a section of a Word document. Then, developers can use the properties and methods under the ITable class to access the data in the tables and write it into a text file. This provides a convenient solution for converting Word document tables into text files.
The detailed steps for extracting tables from Word documents to text files are as follows:
- Create an object of Document class and load a Word document using Document.LoadFromFile() method.
- Iterate through the sections in the document and get the table collection of each section through Section.Tables property.
- Iterate through the tables and create a string object for each table.
- Iterate through the rows in each table and the cells in each row, get the text of each cell through TableCell.Paragraphs[].Text property, and add the cell text to the string.
- Save each string to a text file.
- Python
from spire.doc import * from spire.doc.common import * # Create an instance of Document doc = Document() # Load a Word document doc.LoadFromFile("Sample.docx") # Loop through the sections for s in range(doc.Sections.Count): # Get a section section = doc.Sections.get_Item(s) # Get the tables in the section tables = section.Tables # Loop through the tables for i in range(0, tables.Count): # Get a table table = tables.get_Item(i) # Initialize a string to store the table data tableData = '' # Loop through the rows of the table for j in range(0, table.Rows.Count): # Loop through the cells of the row for k in range(0, table.Rows.get_Item(j).Cells.Count): # Get a cell cell = table.Rows.get_Item(j).Cells.get_Item(k) # Get the text in the cell cellText = '' for para in range(cell.Paragraphs.Count): paragraphText = cell.Paragraphs.get_Item(para).Text cellText += (paragraphText + ' ') # Add the text to the string tableData += cellText if k < table.Rows.get_Item(j).Cells.Count - 1: tableData += '\t' # Add a new line tableData += '\n' # Save the table data to a text file with open(f'output/Tables/WordTable_{s+1}_{i+1}.txt', 'w', encoding='utf-8') as f: f.write(tableData) doc.Close()
Extract Tables from Word Documents to Excel Workbooks with Python
Developers can also utilize Spire.Doc for Python to retrieve table data and then use Spire.XLS for Python to write the table data into an Excel worksheet, thereby enabling the conversion of Word document tables into Excel workbooks.
Install Spire.XLS for Python via PyPI:
pip install Spire.XLS
The detailed steps for extracting tables from Word documents to Excel workbooks are as follows:
Create an object of Document class and load a Word document using Document.LoadFromFile() method.
- Create an object of Workbook class and clear the default worksheets using Workbook.Worksheets.Clear() method.
- Iterate through the sections in the document and get the table collection of each section through Section.Tables property.
- Iterate through the tables and create a worksheet for each table using Workbook.Worksheets.Add() method.
- Iterate through the rows in each table and the cells in each row, get the text of each cell through TableCell.Paragraphs[].Text property, and write the text to the worksheet using Worksheet.SetCellValue() method.
- Save the workbook using Workbook.SaveToFile() method.
- Python
from spire.doc import * from spire.doc.common import * from spire.xls import * from spire.xls.common import * # Create an instance of Document doc = Document() # Load a Word document doc.LoadFromFile('Sample.docx') # Create an instance of Workbook wb = Workbook() wb.Worksheets.Clear() # Loop through sections in the document for i in range(doc.Sections.Count): # Get a section section = doc.Sections.get_Item(i) # Loop through tables in the section for j in range(section.Tables.Count): # Get a table table = section.Tables.get_Item(j) # Create a worksheet ws = wb.Worksheets.Add(f'Table_{i+1}_{j+1}') # Write the table to the worksheet for row in range(table.Rows.Count): # Get a row tableRow = table.Rows.get_Item(row) # Loop through cells in the row for cell in range(tableRow.Cells.Count): # Get a cell tableCell = tableRow.Cells.get_Item(cell) # Get the text in the cell cellText = '' for paragraph in range(tableCell.Paragraphs.Count): paragraph = tableCell.Paragraphs.get_Item(paragraph) cellText = cellText + (paragraph.Text + ' ') # Write the cell text to the worksheet ws.SetCellValue(row + 1, cell + 1, cellText) # Save the workbook wb.SaveToFile('output/Tables/WordTableToExcel.xlsx', FileFormat.Version2016) doc.Close() wb.Dispose()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Spire.Doc for Java 12.5.1 supports ignoring header and footer comparison options when comparing documents
We're pleased to announce the release of Spire.Doc for Java 12.5.1. This version supports ignoring header and footer comparison options when comparing documents, and also fixes some known issues such as incorrect paragraph alignment when converting HTML to Word, redundant content appeared when converting Word to PDF. More details are listed below.
Here is a list of changes made in this release
Category | ID | Description |
New feature | SPIREDOC-10156 | Supports ignoring headers and footers when comparing PDF documents.
CompareOptions options=new CompareOptions(); Options.IgnoreHeadersAndFooters=true;//Default is false |
Bug | SPIREDOC-9330 SPIREDOC-10446 |
Fixes the issue that the text was garbled after converting a DOCX document to a PDF document. |
Bug | SPIREDOC-9309 | Fixes the issue that the content was messed up after converting a DOCX document to a PDF document. |
Bug | SPIREDOC-9349 | Fixes the issue that the content appeared different when it was opened with WPS tool after loading and saving the document. |
Bug | SPIREDOC-10137 | Fix the issue that the text direction of the vertical text box was incorrect after converting a Word document to a PDF document. |
Bug | SPIREDOC-10373 | Fix the issue that the program threw "cannot be cast to java.lang.Float" exception when comparing Word documents. |
Bug | SPIREDOC-10383 | Fixed the issue that the paragraph alignment was incorrect after converting HTML to Word documents. |
Bug | SPIREDOC-10408 | Fixed the issue that the program threw "Specified argument was out of the range of valid values" exception when loading Word documents. |
Bug | SPIREDOC-10455 | Fix the issue that paging was incorrect after converting Word documents to PDF documents using WPS rules. |
Bug | SPIREDOC-10459 | Fixed the issue that images were rotated after converting Word documents to PDF documents. |
Bug | SPIREDOC-10466 | Fix the issue that extra content appeared after converting Word documents to PDF documents. |
Bug | SPIREDOC-10481 | Fix the problem that the program threw a "NullPointerException" when converting Word documents to PDF documents. |
Bug | SPIREDOC-10485 | Fix the issue that extra blank pages appeared after converting Word documents to PDF documents using WPS rules. |
Bug | SPIREDOC-10513 | Fix the issue that the content of the drop-down box was garbled after converting a Word document to a PDF document. |