PDF forms are commonly used to collect user information, and extracting form values programmatically allows for automated processing of submitted data, ensuring accurate data collection and analysis. After extraction, you can generate reports based on form field values or migrate them to other systems or databases. In this article, you will learn how to extract form field values from PDF with Python using Spire.PDF for Python.
Install Spire.PDF for Python
This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.PDF
If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows
Extract Form Field Values from PDF with Python
Spire.PDF for Python supports various types of PDF form fields, including:
- Text box field (represented by the PdfTextBoxFieldWidget class)
- Check box field (represented by the PdfCheckBoxWidgetFieldWidget class)
- Radio button field (represented by the PdfRadioButtonListFieldWidget class)
- List box field (represented by the PdfListBoxWidgetFieldWidget class)
- Combo box field (represented by the PdfComboBoxWidgetFieldWidget class)
Before extracting data from the PDF forms, it is necessary to determine the specific type of each form field first, and then you can use the properties of the corresponding form field class to extract their values accurately. The following are the detailed steps.
- Initialize an instance of the PdfDocument class.
- Load a PDF document using PdfDocument.LoadFromFile() method.
- Get the form in the PDF document using PdfDocument.Form property.
- Create a list to store the extracted form field values.
- Iterate through all fields in the PDF form.
- Determine the types of the form fields, then get the names and values of the form fields using the corresponding properties.
- Write the results to a text file.
- Python
from spire.pdf.common import * from spire.pdf import * inputFile = "Forms.pdf" outputFile = "GetFormFieldValues.txt" # Create a PdfDocument instance pdf = PdfDocument() # Load a PDF document pdf.LoadFromFile(inputFile) # Get PDF forms pdfform = pdf.Form formWidget = PdfFormWidget(pdfform) sb = [] # Iterate through all fields in the form if formWidget.FieldsWidget.Count > 0: for i in range(formWidget.FieldsWidget.Count): field = formWidget.FieldsWidget.get_Item(i) # Get the name and value of the textbox field if isinstance(field, PdfTextBoxFieldWidget): textBoxField = field if isinstance(field, PdfTextBoxFieldWidget) else None name = textBoxField.Name value = textBoxField.Text sb.append("Textbox Name: " + name + "\r") sb.append("Textbox Name " + value + "\r\n") # Get the name of the listbox field if isinstance(field, PdfListBoxWidgetFieldWidget): listBoxField = field if isinstance(field, PdfListBoxWidgetFieldWidget) else None name = listBoxField.Name sb.append("Listbox Name: " + name + "\r") # Get the items of the listbox field sb.append("Listbox Items: \r") items = listBoxField.Values for i in range(items.Count): item = items.get_Item(i) sb.append(item.Value + "\r") # Get the selected item of the listbox field selectedValue = listBoxField.SelectedValue sb.append("Listbox Selected Value: " + selectedValue + "\r\n") # Get the name of the combo box field if isinstance(field, PdfComboBoxWidgetFieldWidget): comBoxField = field if isinstance(field, PdfComboBoxWidgetFieldWidget) else None name = comBoxField.Name sb.append("Combobox Name: " + name + "\r"); # Get the items of the combo box field sb.append("Combobox Items: \r"); items = comBoxField.Values for i in range(items.Count): item = items.get_Item(i) sb.append(item.Value + "\r") # Get the selected item of the combo box field selectedValue = comBoxField.SelectedValue sb.append("Combobox Selected Value: " + selectedValue + "\r\n") # Get the name and selected item of the radio button field if isinstance(field, PdfRadioButtonListFieldWidget): radioBtnField = field if isinstance(field, PdfRadioButtonListFieldWidget) else None name = radioBtnField.Name selectedValue = radioBtnField.SelectedValue sb.append("Radio Button Name: " + name + "\r"); sb.append("Radio Button Selected Value: " + selectedValue + "\r\n") # Get the name and status of the checkbox field if isinstance(field, PdfCheckBoxWidgetFieldWidget): checkBoxField = field if isinstance(field, PdfCheckBoxWidgetFieldWidget) else None name = checkBoxField.Name sb.append("Checkbox Name: " + name + "\r") state = checkBoxField.Checked stateValue = "Yes" if state else "No" sb.append("If the checkBox is checked: " + stateValue + "\r\n") # Write the results to a text file f2=open(outputFile,'w', encoding='UTF-8') for item in sb: f2.write(item) f2.close() pdf.Close()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.