Hyperlink

Hyperlink (3)

Python: Extract Hyperlinks from Word Documents

2024-11-15 01:16:37 Written by support iceblue

MS Word allows users to view hyperlinks but lacks a built-in feature for extracting hyperlinks with a single click. This limitation makes extracting multiple links from a document time-consuming. Thankfully, Python can streamline this process significantly. In this article, we'll show you how to use Spire.Doc for Python to easily extract hyperlinks from Word documents with Python, either individual or batch, saving you time and effort.

Extract Hyperlinks from Word Documents: Specified Links
Extract All Hyperlinks from Word Documents

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

Package Manager

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows.

Extract Hyperlinks from Word Documents: Specified Links

Whether you're looking to retrieve just one important link or filter out certain URLs, this section will guide you through the process step by step. Using the Filed.FiledText and the Filed.Code properties provided by Spire.Doc, you can efficiently target and extract specified hyperlinks, making it easier to access the information you need.
Steps to extract specified hyperlinks from Word documents:

Create an instance of Document class.
Read a Word document from files using Document.LoadFromFile() method.
Iterate through elements to find all hyperlinks in this Word document.
Get a certain hyperlink from the hyperlink collection.
Retrieve the hyperlink text with Field.FieldText property.
Extract URLs from the hyperlink in the Word document using Field.Code property.

Here is the code example of extracting the first hyperlink in a Word document:

Python

from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a Word file
doc.LoadFromFile("/sample.docx")

# Find all hyperlinks in the Word document
hyperlinks = []
for i in range(doc.Sections.Count):
    section = doc.Sections.get_Item(i)
    for j in range(section.Body.ChildObjects.Count):
        sec = section.Body.ChildObjects.get_Item(j)
        if sec.DocumentObjectType == DocumentObjectType.Paragraph:
            for k in range((sec if isinstance(sec, Paragraph) else None).ChildObjects.Count):
                para = (sec if isinstance(sec, Paragraph) else None).ChildObjects.get_Item(k)
                if para.DocumentObjectType == DocumentObjectType.Field:
                    field = para if isinstance(para, Field) else None
                    if field.Type == FieldType.FieldHyperlink:
                        hyperlinks.append(field)

# Get the first hyperlink text and URL
if hyperlinks:
    first_hyperlink = hyperlinks[0]
    hyperlink_text = first_hyperlink.FieldText
    hyperlink_url = first_hyperlink.Code.split('HYPERLINK ')[1].strip('"')  
   
    # Save to a text file
    with open("/FirstHyperlink.txt", "w") as file:
        file.write(f"Text: {hyperlink_text}\nURL: {hyperlink_url}\n")

# Close the document
doc.Close()

extract the first hyperlink from a word document

Extract All Hyperlinks from Word Documents

After checking out how to extract specified hyperlinks, let's move on to extracting all hyperlinks from your Word documents. This is especially helpful when you need a list of all links, whether to check for broken ones or for other purposes. By automating this process with Spire.Doc(short for Spire Doc for Python), you can save time and ensure accuracy. Let's take a closer look at the steps and code example. Steps to extract all hyperlinks from Word documents:

Create a Document object.
Load a Word document from the local storage with Document.LoadFromFile() method.
Loop through elements to find all hyperlinks in the Word document.
Iterate through all hyperlinks in the collection.
Use Field.FieldText property to extract the hyperlink text from each link.
Use Field.Code property to get URLs from hyperlinks.

Below is a code example of extracting all hyperlinks from a Word document:

Python

from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a Word file
doc.LoadFromFile("/sample.docx")

# Find all hyperlinks in the Word document
hyperlinks = []
for i in range(doc.Sections.Count):
    section = doc.Sections.get_Item(i)
    for j in range(section.Body.ChildObjects.Count):
        sec = section.Body.ChildObjects.get_Item(j)
        if sec.DocumentObjectType == DocumentObjectType.Paragraph:
            for k in range((sec if isinstance(sec, Paragraph) else None).ChildObjects.Count):
                para = (sec if isinstance(sec, Paragraph) else None).ChildObjects.get_Item(k)
                if para.DocumentObjectType == DocumentObjectType.Field:
                    field = para if isinstance(para, Field) else None
                    if field.Type == FieldType.FieldHyperlink:
                        hyperlinks.append(field)

# Save all hyperlinks text and URL to a text file
with open("/AllHyperlinks.txt", "w") as file:
    for i, hyperlink in enumerate(hyperlinks):
        hyperlink_text = hyperlink.FieldText
        hyperlink_url = hyperlink.Code.split('HYPERLINK ')[1].strip('"')
        file.write(f"Hyperlink {i+1}:\nText: {hyperlink_text}\nURL: {hyperlink_url}\n\n")

# Close the document
doc.Close()

extract all hyperlinks from word documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Hyperlink

Tagged under

ppt Python Hyperlink

Python: Update or Change Hyperlinks in Word

2023-12-07 00:54:00 Written by support iceblue

Hyperlinks are a useful tool for connecting and navigating between different sections of your document or external resources such as websites or files. However, there may be instances where you need to modify the hyperlinks in your Word document. For example, you may need to update the text or URL of a hyperlink to ensure accuracy, or change the appearance of a hyperlink to improve visibility. In this article, you will learn how to update or change hyperlinks in a Word document in Python using Spire.Doc for Python.

Update a Hyperlink in a Word Document in Python
Change the Appearance of a Hyperlink in a Word Document in Python

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

Package Manager

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Update a Hyperlink in a Word Document in Python

A hyperlink is recognized as a FormField object by Spire.Doc for Python. In order to modify a specific hyperlink, we need to retrieve all hyperlinks in the document and get the desired one by its index. The display text and URL of a hyperlink can be reset by the FormField.Text property and the FormField.Code property. The following are the steps to update a hyperlink in a Word document using Spire.Doc for Python.

Create a Document object.
Load a Word file using Document.LoadFromFile() method.
Loop through the elements in the document to find all hyperlinks.
Get a specific hyperlink from the hyperlink collection.
Update the display text of the hyperlink through FormField.FieldText property.
Update the URL of the hyperlink through FormField.Code property.
Save the document to a different Word file using Document.SaveToFile() method.

Python

from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a Word file
doc.LoadFromFile("C:/Users/Administrator/Desktop/input.docx")

# Find all hyperlinks in the document
hyperlinks = []
for i in range(doc.Sections.Count):
    section = doc.Sections.get_Item(i)
    for j in range(section.Body.ChildObjects.Count):
        sec = section.Body.ChildObjects.get_Item(j)
        if sec.DocumentObjectType == DocumentObjectType.Paragraph:
            for k in range((sec if isinstance(sec, Paragraph) else None).ChildObjects.Count):
                para = (sec if isinstance(sec, Paragraph)
                        else None).ChildObjects.get_Item(k)
                if para.DocumentObjectType == DocumentObjectType.Field:
                    field = para if isinstance(para, Field) else None
                    if field.Type == FieldType.FieldHyperlink:
                        hyperlinks.append(field)

# Get a specific hyperlink
hyperlink = hyperlinks[0]

# Update the display text of the hyperlink
hyperlink.FieldText = "HYPERTEXT MARKUP LANGUAGE"

# Update the URL of the hyperlink
hyperlink.Code ="HYPERLINK \"" + "https://en.wikipedia.org/wiki/HTML" + "\""

# Save the document to a docx file
doc.SaveToFile("output/UpdateHyperlink.docx", FileFormat.Docx)
doc.Close()

Python: Update or Change Hyperlinks in Word

Change the Appearance of a Hyperlink in a Word Document in Python

After a hyperlink is obtained, it's easy to change the appearance of it through the FormField.CharacterFormat object. Specifically, the CharacterFormat object offers the properties such as TextColor, FontName, FontSize, UnderlineStyle to change the style of the characters of a hyperlink. The following are the detailed steps.

Create a Document object.
Load a Word file using Document.LoadFromFile() method.
Loop through the elements in the document to find all hyperlinks.
Get a specific hyperlink from the hyperlink collection.
Change the appearance of the hyperlink through the properties under FormField.CharacterFormat object.
Save the document to a different Word file using Document.SaveToFile() method.

Python

from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a Word file
doc.LoadFromFile("C:/Users/Administrator/Desktop/input.docx")

# Find all hyperlinks in the Word document
hyperlinks = []
for i in range(doc.Sections.Count):
    section = doc.Sections.get_Item(i)
    for j in range(section.Body.ChildObjects.Count):
        sec = section.Body.ChildObjects.get_Item(j)
        if sec.DocumentObjectType == DocumentObjectType.Paragraph:
            for k in range((sec if isinstance(sec, Paragraph) else None).ChildObjects.Count):
                para = (sec if isinstance(sec, Paragraph)
                        else None).ChildObjects.get_Item(k)
                if para.DocumentObjectType == DocumentObjectType.Field:
                    field = para if isinstance(para, Field) else None
                    if field.Type == FieldType.FieldHyperlink:
                        hyperlinks.append(field)

# Get a specific hyperlink
hyperlink = hyperlinks[0]

# Change the appearance of the hyperlink
hyperlink.CharacterFormat.UnderlineStyle = UnderlineStyle.none
hyperlink.CharacterFormat.TextColor = Color.get_Purple()
hyperlink.CharacterFormat.Bold = True

# Save the document to a docx file
doc.SaveToFile("output/ChangeAppearance.docx", FileFormat.Docx)
doc.Close()

Python: Update or Change Hyperlinks in Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Hyperlink

Tagged under

doc Python Hyperlink

Python: Add or Remove Hyperlinks in Word Documents

2023-09-20 01:11:32 Written by support iceblue

Hyperlinks are an essential component of creating dynamic and interactive Word documents. By linking specific text or objects to other documents, web pages, email addresses, or specific locations within the same document, hyperlinks allow users to navigate through information seamlessly. In this article, you will learn how to add or remove hyperlinks in a Word document in Python using Spire.Doc for Python.

Add Hyperlinks to Word in Python
Remove Hyperlinks from Word in Python

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

Package Manager

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Add Hyperlinks to Word in Python

Spire.Doc for Python offers the Paragraph.AppendHyperlink() method to add a web link, an email link, a file link, or a bookmark link to a piece of text or an image inside a paragraph. The following are the detailed steps.

Create a Document object.
Add a section and a paragraph to it.
Insert a hyperlink based on text using Paragraph.AppendHyerplink(link: str, text: str, type: HyperlinkType) method.
Add an image to the paragraph using Paragraph.AppendPicture() method.
Insert a hyperlink based on the image using Paragraph.AppendHyerplink(link: str, picture: DocPicture, type: HyperlinkType) method.
Save the result document using Document.SaveToFile() method.

Python

from spire.doc import *
from spire.doc.common import *

# Create a Word document
doc = Document()

# Add a section
section = doc.AddSection()

# Add a paragraph
paragraph = section.AddParagraph()
paragraph.AppendHyperlink("https://www-iceblue.com/", "Home Page", HyperlinkType.WebLink)

# Append line breaks
paragraph.AppendBreak(BreakType.LineBreak)
paragraph.AppendBreak(BreakType.LineBreak)

# Add an email link
paragraph.AppendHyperlink("mailto:support@e-iceblue.com", "Mail Us", HyperlinkType.EMailLink)

# Append line breaks
paragraph.AppendBreak(BreakType.LineBreak)
paragraph.AppendBreak(BreakType.LineBreak)

# Add a file link
filePath = "C:\\Users\\Administrator\\Desktop\\report.xlsx"
paragraph.AppendHyperlink(filePath, "Click to open the report", HyperlinkType.FileLink)

# Append line breaks
paragraph.AppendBreak(BreakType.LineBreak)
paragraph.AppendBreak(BreakType.LineBreak)

# Add another section and create a bookmark 
section2 = doc.AddSection()
bookmarkParagrapg = section2.AddParagraph()
bookmarkParagrapg.AppendText("Here is a bookmark")
start = bookmarkParagrapg.AppendBookmarkStart("myBookmark")
bookmarkParagrapg.Items.Insert(0, start)
bookmarkParagrapg.AppendBookmarkEnd("myBookmark")

# Link to the bookmark
paragraph.AppendHyperlink("myBookmark", "Jump to a location inside this document", HyperlinkType.Bookmark)

# Append line breaks
paragraph.AppendBreak(BreakType.LineBreak)
paragraph.AppendBreak(BreakType.LineBreak)

# Add an image link
image = "C:\\Users\\Administrator\\Desktop\\logo.png"
picture = paragraph.AppendPicture(image)
paragraph.AppendHyperlink("https://www.e-iceblue.com/", picture, HyperlinkType.WebLink)

# Save to file
doc.SaveToFile("output/CreateHyperlinks.docx", FileFormat.Docx2019);
doc.Dispose()

Python: Add or Remove Hyperlinks in Word Documents

Remove Hyperlinks from Word in Python

To delete all hyperlinks in a Word document at once, you'll need to find all the hyperlinks in the document and then create a custom method FlattenHyperlinks() to flatten them. The following are the detailed steps.

Create a Document object.
Load a sample Word document using Document.LoadFromFile() method.
Find all the hyperlinks in the document using custom method FindAllHyperlinks().
Loop through the hyperlinks and flatten all of them using custom method FlattenHyperlinks().
Save the result document using Document.SaveToFile() method.

Python

from spire.doc import *
from spire.doc.common import *

# Find all the hyperlinks in a document
def FindAllHyperlinks(document):
    hyperlinks = []
    for i in range(document.Sections.Count):
        section = document.Sections.get_Item(i)
        for j in range(section.Body.ChildObjects.Count):
            sec = section.Body.ChildObjects.get_Item(j)
            if sec.DocumentObjectType == DocumentObjectType.Paragraph:
                for k in range((sec if isinstance(sec, Paragraph) else None).ChildObjects.Count):
                    para = (sec if isinstance(sec, Paragraph)
                            else None).ChildObjects.get_Item(k)
                    if para.DocumentObjectType == DocumentObjectType.Field:
                        field = para if isinstance(para, Field) else None
                        if field.Type == FieldType.FieldHyperlink:
                            hyperlinks.append(field)
    return hyperlinks

# Flatten the hyperlink fields
def FlattenHyperlinks(field):
    ownerParaIndex = field.OwnerParagraph.OwnerTextBody.ChildObjects.IndexOf(
        field.OwnerParagraph)
    fieldIndex = field.OwnerParagraph.ChildObjects.IndexOf(field)
    sepOwnerPara = field.Separator.OwnerParagraph
    sepOwnerParaIndex = field.Separator.OwnerParagraph.OwnerTextBody.ChildObjects.IndexOf(
        field.Separator.OwnerParagraph)
    sepIndex = field.Separator.OwnerParagraph.ChildObjects.IndexOf(
        field.Separator)
    endIndex = field.End.OwnerParagraph.ChildObjects.IndexOf(field.End)
    endOwnerParaIndex = field.End.OwnerParagraph.OwnerTextBody.ChildObjects.IndexOf(
        field.End.OwnerParagraph)

    FormatFieldResultText(field.Separator.OwnerParagraph.OwnerTextBody,
                           sepOwnerParaIndex, endOwnerParaIndex, sepIndex, endIndex)

    field.End.OwnerParagraph.ChildObjects.RemoveAt(endIndex)
    
    for i in range(sepOwnerParaIndex, ownerParaIndex - 1, -1):
        if i == sepOwnerParaIndex and i == ownerParaIndex:
            for j in range(sepIndex, fieldIndex - 1, -1):
                field.OwnerParagraph.ChildObjects.RemoveAt(j)

        elif i == ownerParaIndex:
            for j in range(field.OwnerParagraph.ChildObjects.Count - 1, fieldIndex - 1, -1):
                field.OwnerParagraph.ChildObjects.RemoveAt(j)

        elif i == sepOwnerParaIndex:
            for j in range(sepIndex, -1, -1):
                sepOwnerPara.ChildObjects.RemoveAt(j)
        else:
            field.OwnerParagraph.OwnerTextBody.ChildObjects.RemoveAt(i)

# Convert fields to text range and clear the text formatting
def FormatFieldResultText(ownerBody, sepOwnerParaIndex, endOwnerParaIndex, sepIndex, endIndex):
    for i in range(sepOwnerParaIndex, endOwnerParaIndex + 1):
        para = ownerBody.ChildObjects[i] if isinstance(
            ownerBody.ChildObjects[i], Paragraph) else None
        if i == sepOwnerParaIndex and i == endOwnerParaIndex:
            for j in range(sepIndex + 1, endIndex):
               if isinstance(para.ChildObjects[j], TextRange):
                 FormatText(para.ChildObjects[j])

        elif i == sepOwnerParaIndex:
            for j in range(sepIndex + 1, para.ChildObjects.Count):
                if isinstance(para.ChildObjects[j], TextRange):
                  FormatText(para.ChildObjects[j])
        elif i == endOwnerParaIndex:
            for j in range(0, endIndex):
               if isinstance(para.ChildObjects[j], TextRange):
                 FormatText(para.ChildObjects[j])
        else:
            for j, unusedItem in enumerate(para.ChildObjects):
                if isinstance(para.ChildObjects[j], TextRange):
                  FormatText(para.ChildObjects[j])

# Format text
def FormatText(tr):
    tr.CharacterFormat.TextColor = Color.get_Black()
    tr.CharacterFormat.UnderlineStyle = UnderlineStyle.none

# Create a Document object
doc = Document()

# Load a Word file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\test.docx")

# Get all hyperlinks
hyperlinks = FindAllHyperlinks(doc)

# Flatten all hyperlinks
for i in range(len(hyperlinks) - 1, -1, -1):
    FlattenHyperlinks(hyperlinks[i])

# Save to a different file
doc.SaveToFile("output/RemoveHyperlinks.docx", FileFormat.Docx)
doc.Close()

Python: Add or Remove Hyperlinks in Word Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Hyperlink

Tagged under

doc Python Hyperlink

News Category

Hyperlink (3)

Python: Extract Hyperlinks from Word Documents

Install Spire.Doc for Python

Extract Hyperlinks from Word Documents: Specified Links

Extract All Hyperlinks from Word Documents

Apply for a Temporary License

Python: Update or Change Hyperlinks in Word

Install Spire.Doc for Python

Update a Hyperlink in a Word Document in Python

Change the Appearance of a Hyperlink in a Word Document in Python

Apply for a Temporary License

Python: Add or Remove Hyperlinks in Word Documents

Install Spire.Doc for Python

Add Hyperlinks to Word in Python

Remove Hyperlinks from Word in Python

Apply for a Temporary License