Build-Time Image Preprocessing for Static Sites

Intended Audience: Web Developers, DevOps Engineers

Estimated Reading Time: 6 minutes

Your image gallery looks great in development, but your mobile users are downloading 31 megabytes of app store screenshots over cellular data. Those 4000-pixel PNG exports from your design tools have no business being served to a 375-pixel wide phone screen.

I recently implemented a build-time image processing pipeline that reduced my site's image payload by 76%. The approach is simple, requires no cloud services, and integrates cleanly with git-based deployment workflows.

The Problem: Uncontrolled Image Assets

Images accumulate in web projects from various sources:

Without discipline, these end up served directly to browsers. A product gallery showing five images at 300 pixels wide might be transferring 15 megabytes of data. Users on fast connections won't notice. Everyone else will.

The obvious solution — manually resizing and converting each image — doesn't scale. Someone forgets, someone uses the wrong settings, someone commits the high-res version because "we might need it later." You need automation.

The Architecture: Source and Derived

The core insight is separating source images from derived images:

project/
├── images-master/           # Source: any format, any resolution
│   ├── branding/
│   └── products/
│       └── widget.png       # 4000px PNG from design tool
│
└── client/public/
    └── processed_images/    # Derived: optimized for web
        ├── branding/
        └── products/
            └── widget.jpg   # 1200px JPG at 85% quality

Source images live in their own directory, preserved at full quality. You never touch the processed output directly — it's generated. This separation means:

The Processing Rules

My pipeline applies consistent rules to every image:

Setting Value
Max dimension 1200px (longest edge)
Output format JPG
Quality 85%
Resampling Lanczos (high quality)

The 1200-pixel cap handles most web use cases. Product images, hero banners, blog illustrations — they rarely need more than 1200 pixels on any edge, even on retina displays. If your layout shows an image at 600 CSS pixels, a 1200-pixel source provides 2x density.

Everything converts to JPG at 85% quality. This is aggressive but rarely visibly distinguishable from the source. PNGs with transparency lose that transparency — if you need alpha channels, you'll need to adjust the pipeline. For my use case (product photos, screenshots, promotional images), JPG handles everything.

The Lanczos resampling algorithm matters. Cheaper algorithms like nearest-neighbor produce visible artifacts when downscaling significantly. Lanczos takes more CPU time but produces noticeably sharper results, especially for images with fine detail or text.

Incremental Processing

The naive approach reprocesses every image on every run. With 50+ images, that takes noticeable time and generates unnecessary git churn.

Instead, the pipeline compares modification timestamps:

const sourceStats = fs.statSync(sourcePath);
if (fs.existsSync(outputPath)) {
    const outputStats = fs.statSync(outputPath);
    if (outputStats.mtime >= sourceStats.mtime) {
        // Skip - output is up to date
        return;
    }
}

If the output exists and is newer than the source, skip processing. This makes the common case (running after adding one new image) nearly instant.

But what if you change processing settings? The sources haven't changed, so the timestamp check would skip everything. For this, add a --force flag:

const FORCE = process.argv.includes('--force');

// Later, in the processing logic:
if (!FORCE) {
    // ... timestamp comparison
}

Normal runs are incremental. After changing settings, run with --force to regenerate everything.

The Implementation

I use sharp for image processing. It's a Node.js binding to libvips, which is fast and produces high-quality output. Here's the core processing logic:

const sharp = require('sharp');

async function processImage(sourcePath, outputPath) {
    const metadata = await sharp(sourcePath).metadata();
    const longestEdge = Math.max(metadata.width, metadata.height);

    let pipeline = sharp(sourcePath);

    // Resize if needed (maintain aspect ratio)
    if (longestEdge > MAX_SIZE) {
        pipeline = pipeline.resize(MAX_SIZE, MAX_SIZE, {
            fit: 'inside',
            withoutEnlargement: true
        });
    }

    // Convert to JPG
    await pipeline
        .jpeg({ quality: QUALITY })
        .toFile(outputPath);
}

The fit: 'inside' option constrains to a bounding box while preserving aspect ratio. A 4000x3000 image becomes 1200x900; a 3000x4000 image becomes 900x1200.

One optimization: JPGs that are already under the size limit get copied directly without re-encoding. Re-encoding a JPG always loses some quality, so avoid it when unnecessary:

const ext = path.extname(sourcePath).toLowerCase();
const isJpg = ext === '.jpg' || ext === '.jpeg';

if (isJpg && longestEdge <= MAX_SIZE) {
    fs.copyFileSync(sourcePath, outputPath);
    return;
}

Avoiding Cloud Build Dependencies

A tempting alternative is processing images during CI/CD. But this introduces dependencies:

Instead, I run processing locally and commit the results:

npm run process-images
git add client/public/processed_images/
git commit -m "Update processed images"
git push

The deployment receives pre-processed images and does nothing special with them. AWS Amplify, Netlify, Vercel, a static file server — all work identically. No native dependencies required on the build server.

This trades repository size for build simplicity. For most projects, an extra few megabytes in the repo is insignificant compared to the operational simplicity of pre-processed assets.

Deterministic Output

A concern with any generated content is churn. If running the processor twice produces different output bytes, git will see changes even when nothing meaningful changed.

In practice, sharp produces deterministic output for identical inputs. Running --force and checking git status shows no changes when sources haven't changed. This matters for pull request hygiene — you don't want image processing noise cluttering your diffs.

The Results

Before optimization: 31MB of source images served directly.

After optimization: 7.3MB of processed images — a 76% reduction.

Most savings came from two sources:

  1. Resizing: App store screenshots at 2688x1242 became 1200x554
  2. Format conversion: 17 PNGs became JPGs, often 5-10x smaller for photographic content

Mobile users now download roughly a quarter of the bytes. Pages load faster. Bandwidth costs decrease. And I don't have to think about it — the pipeline enforces the constraints automatically.

Tuning for Your Needs

The 1200-pixel, 85%-quality settings work for my use case. Yours may differ:

The point isn't the specific numbers — it's having a system that enforces consistency and can be adjusted project-wide by changing a few constants.

Beyond Images: The Preprocessing Pattern

This same pattern — source assets transformed into derived assets at build time — applies beyond images:

The common thread: keep sources in a form convenient for humans, transform them into a form optimized for machines, and automate the transformation so it can't be forgotten.

For images specifically, the wins are immediate and measurable. 76% less bandwidth for a few hours of setup is a trade I'll take every time.

Back to Home