Extract text from PDF

Pull selectable text from a PDF to copy or download.

Processing happens in your browser; we do not upload your file.

ActivePDFFree

Loading tool...

Text extraction uses pdfjs-dist in the browser. The snippet includes the worker setup for Vite; the notes cover the alternatives (workerPort and CDN).

Extract selectable text from a PDF

Returns text per page and joined. Aimed at Vite + a modern browser.

typescript

import * as pdfjsLib from "pdfjs-dist";
import workerUrl from "pdfjs-dist/build/pdf.worker.min.mjs?url";

pdfjsLib.GlobalWorkerOptions.workerSrc = workerUrl;

export async function extractPdfText(file: File): Promise<{ pages: string[]; text: string }> {
  const data = new Uint8Array(await file.arrayBuffer());
  const pdf = await pdfjsLib.getDocument({ data }).promise;
  const pages: string[] = [];

  for (let n = 1; n <= pdf.numPages; n += 1) {
    const page = await pdf.getPage(n);
    const content = await page.getTextContent();
    const pageText = content.items
      .map((item) => ("str" in item ? item.str : ""))
      .join(" ")
      .replace(/\s+/g, " ")
      .trim();
    pages.push(pageText);
  }

  await pdf.destroy();
  return { pages, text: pages.join("\n\n") };
}

Dependenciespdfjs-dist

Usage notes

The worker config above is for Vite (import with ?url).
Generic bundler alternative (no Vite): GlobalWorkerOptions.workerPort = new Worker(new URL('pdfjs-dist/build/pdf.worker.min.mjs', import.meta.url), { type: 'module' }).
CDN alternative: GlobalWorkerOptions.workerSrc = 'https://cdnjs.cloudflare.com/ajax/libs/pdf.js/<VERSION>/pdf.worker.min.mjs'.
Use result.pages for the per-page text or result.text for the joined text.

Limitations

It doesn't do OCR: it only extracts text that's already selectable. A scanned PDF (images) may return empty.
Order and spacing are approximate (no column reconstruction or layout line breaks).
The worker version must match the installed pdfjs-dist version.

About this tool

Extract text from PDF recovers selectable text from a document so you can copy it or download it as a TXT file.

How to use

Upload the PDF.
Optional: toggle on preserving approximate line breaks to keep the layout closer.
Review the text extracted per page.
Copy the result or download it as TXT.

Use cases

Reuse text from a document without retyping it.
Quote or search content inside a PDF.
Convert a document to plain text.

Limits

Only selectable text is recovered; a scanned PDF (images) has no text to extract and the tool warns you about it.
Max PDF size: 50 MB.

Privacy

Processing happens in your browser — we don't upload your files.

Common errors

The PDF looks scanned and has no selectable text.
Strange symbols show up when the PDF uses non-standard fonts.

Technical notes

Uses pdf.js to read text content.
Detects whether the document looks scanned and warns about problematic symbols.

Technical details

ID: extract-pdf-text
Slug: /en/tools/extract-pdf-text
Backend: Not required
AI: Not required
API: Planned