WikiPlus

How to Compress Large Scanned PDFs

Scanned PDFs are the most compressible type of PDF file. A 50-page document scanned at 300 DPI can easily be 100 to 200 megabytes — entirely impractical for email, slow to download, and wasteful of storage. Yet the same document compressed at medium quality typically produces a PDF of 10 to 30 megabytes that is visually indistinguishable from the original on any screen. This guide explains why scanned PDFs are so large, why they compress so well, and how to reduce them dramatically using a free browser-based tool.

Why Scanned PDFs Are So Much Larger Than Regular PDFs

A scanned PDF is fundamentally different from a PDF created in a word processor. When you create a PDF from a Word document, the text is stored as vector data — mathematical descriptions of characters that take very little space. A 50-page text document might be 200 to 500 kilobytes as a PDF. A scanned PDF replaces all of that efficient vector representation with raster images — one image per page, captured by a scanner at a fixed DPI. At 300 DPI on an A4 page, each scanned page contains approximately 3.5 million pixels. At 600 DPI, it contains 14 million pixels. Every one of these pixels is stored in the PDF file. Most flatbed scanners and multi-function printers default to 300 DPI as their standard scanning resolution. This produces visually excellent scans but very large files. A 50-page document scanned at 300 DPI might be 150 MB or more, depending on the scanner's compression settings and the complexity of each page's content. Some scanning software applies JPEG compression to the page images before embedding them in the PDF, which reduces size significantly. Other scanners embed uncompressed or minimally compressed page images, producing even larger files. The scanning application you use has more impact on the resulting file size than the scanner hardware itself.

How Compression Works on Scanned PDFs

PDF compression for scanned documents primarily involves two operations: downsampling (reducing the resolution of the page images) and re-encoding (applying more aggressive JPEG compression to the already-embedded images). Downsampling works because most scanned documents are read on screens that display at 72 to 144 pixels per inch. A page scanned at 300 DPI contains two to four times as many pixels as any screen can display. Downsampling the images to 150 DPI — twice the minimum required for screen viewing — reduces the pixel count by 75 percent while producing a document that looks identical on screen. Re-encoding with JPEG compression further reduces file size by applying lossy compression to the pixel data. High-quality JPEG compression (80 to 90 percent quality setting) is nearly indistinguishable from the original. Moderate JPEG compression (60 to 70 percent) is visible at high zoom but not noticeable at normal reading size. Low-quality JPEG (below 50 percent) produces visible artifacts, particularly around text. The combination of downsampling to 150 DPI and moderate JPEG compression — which is roughly what medium compression does — typically reduces a 300 DPI scanned PDF by 70 to 85 percent. A 100 MB scanned PDF commonly compresses to 10 to 30 MB at medium compression. This is one of the most dramatic file size reductions achievable without perceptible quality loss.

Step-by-Step: Compressing a Large Scanned PDF

Open the PDF Compress tool in your browser. Upload your scanned PDF by dragging it onto the drop zone or using the file picker. For large files (50 MB or more), the upload may take a moment depending on your browser and storage speed. Select medium compression. For scanned PDFs, medium compression is the ideal starting point because scanned documents benefit enormously from downsampling, and the quality reduction at medium compression is almost never noticeable for typical scanned pages. Click Compress. MuPDF WebAssembly processes the PDF in your browser. For a large scanned PDF (100 MB), expect processing to take 15 to 60 seconds. The progress indicator will show activity during processing. Review the size reduction shown after processing. For a typical 300 DPI scanned document, medium compression should achieve 60 to 80 percent reduction. If the result is, for example, a 100 MB document compressed to 18 MB, you have achieved an 82 percent reduction — entirely typical for this document type. Download the compressed PDF. Before distributing it, open the file and zoom in on a text-heavy page to verify that the text is still legible. For most scanned documents with normal-sized text (10 points or larger), medium compression produces fully legible results. If the text appears slightly blurry at high zoom but is clearly readable at normal size, medium compression is acceptable. If the text is difficult to read at normal viewing size, try low compression instead.

Getting the Best Results Before You Scan

If you regularly work with scanned documents and file size is a concern, the best time to manage that size is before the PDF is created — during scanning. A few adjustments to your scanning settings will produce smaller files from the outset, reducing or eliminating the need for compression afterward. Set your scanner to 150 to 200 DPI for documents that will only be read on screen. This is sufficient for normal-sized text and produces files that are already four to sixteen times smaller than 300 to 600 DPI scans. The visual difference at normal screen viewing sizes is imperceptible. Use black-and-white (monochrome) or grayscale scanning for text documents, forms, and documents without color information. Color scanning produces files three to four times larger than grayscale at the same DPI. If your document is black text on white paper, there is no reason to scan it in color. Enable PDF compression in your scanner's software. Most scanner apps have a setting for PDF quality or compression level when saving scans as PDF. Setting this to medium or normal reduces file size at creation time without requiring post-processing. For documents that include photographs or important color graphics, 200 to 300 DPI in color is appropriate. But for the majority of office documents — text forms, letters, reports, invoices — 150 DPI grayscale produces small, highly readable PDFs that require no further compression.

Frequently Asked Questions

How much can a scanned PDF be compressed?
Scanned PDFs typically achieve the highest compression ratios of any PDF type. At medium compression, a standard 300 DPI scanned document can be reduced by 60 to 80 percent. At high compression, reductions of 80 to 90 percent are common. A 100 MB scanned PDF often compresses to 10 to 20 MB at medium compression, making it a completely practical size for email and digital sharing.
Will text in a scanned PDF remain readable after compression?
Yes, for normal-sized text (10 points or larger). Medium compression downsamples images to 150 DPI, which is sufficient for reading standard document text on any screen. Small print (footnotes, fine print, disclaimer text under 8 points) may appear slightly softer after medium compression. If you have documents with very small print, use low compression to preserve fine text detail.
What is the difference between compressing a scanned PDF and an OCR PDF?
A scanned PDF without OCR consists entirely of images — one raster image per page with no searchable text. An OCR PDF has had text recognition applied, adding a searchable text layer on top of the images. Both types can be compressed the same way. Compression affects the image layer, not the OCR text layer. After compression, the text remains searchable and the quality of the text recognition is unchanged.