I am currently working on a project that requires the usage of the Spire.PDF (Spire.Office) library (python based) to convert a PDF document to HTML. I have encountered some difficulties while attempting to convert pdf to html and would greatly appreciate your assistance.
The specifications of my environment are as follows:
Python Version : 3.9.10
Spire.Office (Python) Version : 9.1.0
I attempted to convert the pdf documents to html, but I encountered some conversion issues, I have followed both the approaches to consider SVG or Non-SVG :
I have also attached the document used for conversions and there converted html files.
SVG :
Source Code :
- Code: Select all
PdfDocument pdf = new PdfDocument();
pdf.LoadFromFile("Document3.pdf");
pdf.ConvertOptions.SetPdfToHtmlOptions(True, False, 1, False)
pdf.SaveToFile("Document3.html", FileFormat.HTML)
Issues :
The converted SVG file is not html friendly to update the elements to include some html input elements or style (css styling), as we have a requirement to embed some html elements and style we are not able consider SVG based conversions.
Non-SVG :
Source Code :
- Code: Select all
PdfDocument pdf = new PdfDocument();
pdf.LoadFromFile("Document3.pdf");
pdf.ConvertOptions.SetPdfToHtmlOptions(False, False, 1, False)
pdf.SaveToFile("Document3.html", FileFormat.HTML)
Issues :
The html file generated via Non-SVG is not in proper format as some of the elements are misplaced or not properly converted :
a. The grid structure are created as SVG and its content is misplaced.
b. The grid under 'Section 2' it's where we are required to add some input elements, and currently it is misplaced/SVG is being generated.
c. The flow chart in the pdf is not being rendered properly.
I would greatly appreciate any guidance or instructions you can provide to help me successfully convert the documents to a proper html where we can add some elements and styling.
Thank you very much for your attention and assistance. I look forward to your prompt response.