C#: Convert PDF to Markdown

2024-09-30 01:13:45 Written by  support iceblue
Rate this item
(0 votes)

The need to convert PDF documents into more flexible and editable formats, such as Markdown, has become a common task for developers and content creators. Converting PDFs to Markdown files facilitates easier editing and version control, and enhances content portability across different platforms and applications, making it particularly suitable for modern web publishing workflows. By utilizing Spire.PDF for .NET, developers can automate the conversion process, ensuring that the rich formatting and structure of the original PDFs are preserved in the resulting Markdown files.

This article will demonstrate how to use Spire.PDF for .NET to convert PDF documents to Markdown format with C# code.

Install Spire.PDF for .NET

To begin with, you need to add the DLL files included in the Spire.PDF for.NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.PDF

Convert PDF Documents to Markdown Files

With the Spire.PDF for .NET library, developers can easily load any PDF file using the PdfDocument.LoadFromFile(string filename) method and then save the document in the desired format by calling the PdfDocument.SaveToFile(string filename, FileFormat fileFormat) method. To convert a PDF to Markdown format, simply specify the FileFormat.Markdown enumeration as a parameter when invoking the method.

The detailed steps for converting PDF documents to Markdown files are as follows:

  • Create an instance of PdfDocument class.
  • Load a PDF document using PdfDocument.LoadFromFile(string filename) method.
  • Convert the document to a Markdown file using PdfDocument.SaveToFile(string filename, FileFormat.Markdown) method.
  • C#
using Spire.Pdf;

namespace PDFToMarkdown
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create an instance of PdfDocument class
            PdfDocument pdf = new PdfDocument();

            // Load a PDF document
            pdf.LoadFromFile("Sample.pdf");

            // Convert the document to Markdown file
            pdf.SaveToFile("output/PDFToMarkdown.md", FileFormat.Markdown);

            // Release resources
            pdf.Close();
        }
    }
}

The PDF Document:

C#: Convert PDF to Markdown

The Result Markdown File:

C#: Convert PDF to Markdown

Convert PDF to Markdown by Streams

In addition to directly reading files for manipulation, Spire.PDF for .NET also supports loading a PDF document from a stream using PdfDocument.LoadFromStream() method and converting it to a Markdown file stream using PdfDocument.SaveToStream() method. Using streams reduces memory usage, supports large files, enables real-time data transfer, and simplifies data exchange with other systems.

The detailed steps for converting PDF documents to Markdown files by streams are as follows:

  • Create a Stream object of PDF documents by downloading from the web or reading from a file.
  • Load the PDF document from the stream using PdfDocument.LoadFromStream(Stream stream) method.
  • Create another Stream object to store the converted Markdown file.
  • Convert the PDF document to a Markdown file stream using PdfDocument.SaveToStream(Stream stream, FileFormat.Markdown) method.
  • C#
using Spire.Pdf;
using System.IO;
using System.Net.Http;

namespace PDFToMarkdownByStream
{
    class Program
    {
        static async Task Main(string[] args)
        {
            // Create an instance of PdfDocument class
            PdfDocument pdf = new PdfDocument();

            // Download a PDF document from a url as bytes
            using (HttpClient client = new HttpClient())
            {
                byte[] pdfBytes = await client.GetByteArrayAsync("http://example.com/Sample.pdf");

                // Create a MemoryStream using the bytes
                using (MemoryStream inputStream = new MemoryStream(pdfBytes))
                {
                    // Load the PDF document from the stream
                    pdf.LoadFromStream(inputStream);

                    // Create another MemoryStream object to store the Markdown file
                    using (MemoryStream outputStream = new MemoryStream())
                    {
                        // Convert the PDF document to a Markdown file stream
                        pdf.SaveToStream(outputStream, FileFormat.Markdown);
                        outputStream.Position = 0; // Reset the position of the stream for subsequent reads

                        // Upload the result stream or write it to a file
                        await client.PostAsync("http://example.com/upload", new StreamContent(outputStream));
                        File.WriteAllBytes("output.md", outputStream.ToArray());
                    }
                }
            }

            // Release resources
            pdf.Close();
        }
    }
}

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.