pdf-lib Architecture: Pure JavaScript PDF Create, Edit, and Merge

源码分析(Updated May 16, 2026)

Why pdf-lib?

Browser PDF libraries are judged on strict criteria:

Library Size Create Edit Merge Fonts Maintenance
pdf-lib ~350KB Active
pdfjs-dist ~2MB Mozilla
jsPDF ~300KB Partial Active
PDFKit ~1MB Node-first

pdf-lib is the only pure JS library that can both create and modify PDFs in the browser.


PDF File Format Basics

Internal structure

%PDF-1.7                          ← version header
1 0 obj                           ← object 1
  << /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj                           ← object 2
  << /Type /Pages /Kids [3 0 R] /Count 1 >>
endobj
3 0 obj                           ← object 3
  << /Type /Page /Parent 2 0 R /MediaBox [0 0 612 792] >>
endobj
xref                               ← cross-reference table
0 4
0000000000 65535 f
0000000009 00000 n
0000000058 00000 n
0000000115 00000 n
trailer
  << /Size 4 /Root 1 0 R >>
startxref
190
%%EOF

Core concepts

Concept Description
Indirect object 1 0 obj ... endobj, referenced by number
Dictionary << /Key /Value >>, like a JSON object
Stream stream ... endstream, binary data (often FlateDecode compressed)
Cross-reference (xref) Byte offset per object for O(1) random access
Page tree Nested /Pages nodes forming a tree

pdf-lib Architecture

Module layers

PDFDocument (top-level API)
  ├── PDFPage (page operations)
  ├── PDFFont (fonts)
  ├── PDFImage (images)
  └── PDFCatalog (document structure)
       └── PDFContext (low-level object model)
            ├── PDFObject
            ├── PDFDict
            ├── PDFStream
            ├── PDFRef
            └── PDFCrossRefSection

Object model

pdf-lib centers on PDFContext, which holds the document object graph:

class PDFContext {
  // Registry of all indirect objects
  objects: Map<PDFRef, PDFObject>;

  // Assign new object numbers
  assign(ref: PDFRef, object: PDFObject): void;

  // Lookup
  lookup(ref: PDFRef): PDFObject;

  // Delete
  delete(ref: PDFRef): void;
}

How PDF merge works

// Core logic of PDFDocument.copyPages
copyPages(srcDoc: PDFDocument, indices: number[]): PDFPage[] {
  const pages: PDFPage[] = [];

  for (const index of indices) {
    // 1. Get page object from source
    const srcPage = srcDoc.getPage(index);

    // 2. Deep-copy page and all referenced objects
    const copiedPage = this.context.copy(srcPage.node);

    // 3. Register all copied objects in target doc
    //    Including: page dict, content streams, resources, fonts, etc.
    //    Critical: remap all PDFRef to new object numbers

    pages.push(PDFPage.of(copiedPage));
  }

  return pages;
}

Hard part: Deep copy must remap every indirect reference. If object A references B via 2 0 R, B may get a new number in the target doc—you need a mapping table.


Stream Compression

PDF content streams often use FlateDecode (zlib/deflate):

class PDFStream {
  dictionary: PDFDict;
  contents: Uint8Array;

  getContentsString(): string;
  getContentsSize(): number;
}

// Compressed write
const compressed = pako.deflate(rawBytes);
stream.dictionary.set(PDFName.of('Filter'), PDFName.of('FlateDecode'));
stream.contents = compressed;

pdf-lib uses pako (pure JS zlib) for compress/decompress.


Font Embedding

Standard vs custom fonts

PDF defines 14 standard fonts (Helvetica, Times-Roman, etc.) that render without embedding. CJK fonts must be embedded.

ToolsKu font strategy

// Pre-bundled CJK fonts
const fonts = {
  sourceHanSans: await pdfDoc.embedFont(
    await fetch('/fonts/CN/SourceHanSansCN-Regular.otf')
  ),
  sourceHanSansBold: await pdfDoc.embedFont(
    await fetch('/fonts/CN/SourceHanSansCN-Bold.otf')
  ),
};

Font subsetting: pdf-lib subsets fonts to only glyphs used in the document—dramatically smaller files.

Case Full font Subset Reduction
10 CJK chars ~7MB ~15KB 99.8%
100 CJK chars ~7MB ~80KB 98.9%

ToolsKu PDF Toolchain

20+ tools mapped to APIs

Tool pdf-lib API Extra libs
Merge copyPages() + addPage() -
Split New doc + copyPages() -
Rotate page.setRotation() -
Watermark page.drawText() with opacity -
Page numbers page.drawText() loop -
Encrypt - @pdfsmaller/pdf-encrypt-lite
Extract text - pdfjs-dist
PDF to image - pdfjs-dist + canvas
Compress Strip metadata + optimize streams -

Encryption: beyond pdf-lib

pdf-lib does not support PDF encryption. ToolsKu uses @pdfsmaller/pdf-encrypt-lite:

import { encrypt } from '@pdfsmaller/pdf-encrypt-lite';

const encryptedPdf = await encrypt(pdfBytes, {
  userPassword: 'user123',
  ownerPassword: 'owner456',
  permissions: {
    printing: true,
    copying: false,
    modifying: false,
  },
});

Performance Practices

Large files

// For 100+ page PDFs, process page-by-page to cap memory
async function processLargePdf(file: File) {
  const pdfDoc = await PDFDocument.load(await file.arrayBuffer());
  const totalPages = pdfDoc.getPageCount();

  for (let i = 0; i < totalPages; i++) {
    const page = pdfDoc.getPage(i);
    // per-page work...
    updateProgress(i / totalPages);
  }
}

Streaming load

pdf-lib does not stream—the entire PDF must load into memory. Files 100MB+ can stress memory.

Mitigation: Show progress for large files and recommend desktop tools for very large PDFs.


Summary

At ~350KB, pdf-lib delivers browser-side PDF create and edit—a remarkable engineering feat. Its PDFContext object graph, indirect reference remapping, and stream compression reflect deep PDF spec knowledge.

ToolsKu builds PDF merge, split, rotate, watermark, and page numbers—20+ tools, all in the browser. With pdfjs-dist (render/extract) and pdf-encrypt-lite (encryption), we have a complete PDF toolchain.

Try these browser-local tools — no sign-up required →

#PDF#pdf-lib#源码分析#浏览器端#架构