A 12-megapixel image measuring 4000 × 3000 contains 12 million pixels. At three bytes per pixel—one byte each for red, green, and blue—the simplest RGB representation is 36 MB before metadata or padding. A JPEG can often make a photograph dramatically smaller because it stores a compact recipe for rebuilding a close approximation, not a literal list of every original RGB value.
Compress a generated test image
The browser draws the source scene, simulates the selected chroma sampling, then passes it through its JPEG encoder. Watch small text, saturated edges, gradients, and block boundaries as you change the parameters.
Encoding the generated scene locally…
Browser encoders choose their own quantization tables and internal settings, so “quality 72” is not a universal JPEG recipe. File sizes are real for this generated 480 × 300 canvas.
The compression assembly line
There are variations, but the familiar lossy JPEG path can be understood as five transformations. Each one changes the question from “what color is this exact pixel?” toward “what visible structure is present in this small region?”
First separate brightness from color
JPEG commonly converts RGB into YCbCr: one channel for luma-like brightness detail and two channels for color differences. Human vision usually notices fine brightness edges more readily than equally fine color variation, so an encoder can keep luma at full detail while sampling chroma more sparsely. In common 4:2:0 sampling, a group of four luma samples shares one sample from each chroma channel.
Then turn pixels into frequencies
The image is divided into 8 × 8 sample blocks. A discrete cosine transform (DCT) expresses each block as a weighted mixture of 64 patterns: one average value, followed by increasingly rapid changes across the block. Smooth skies concentrate their energy near the low-frequency corner. Hair, grass, and noise spread useful information farther across the grid.
An 8 × 8 patch is described as 64 brightness or color samples at fixed locations.
The same patch becomes 64 coefficients, usually strongest near the low-frequency corner.
Quantization is the bargain
Each DCT coefficient is divided by a value from a quantization table and rounded. Small high-frequency coefficients often become zero. Larger divisors make a smaller file but throw away more subtle variation. This is the irreversible step behind a typical JPEG “quality” control—and that quality number is an encoder-specific shortcut, not a universal percentage.
The remaining values are read in a zigzag order that tends to place long runs of zeros together. Run-length and entropy coding can then represent common values with fewer bits. Decompression reverses the coding, multiplies by the quantization values, applies the inverse DCT, and converts the result back toward RGB. The discarded detail cannot return; the decoder reconstructs its best available approximation.
What the mistakes look like
Strong compression can reveal the 8 × 8 working grid.
High-contrast borders can grow faint ripples or halos.
Small colored text and saturated edges can look softer than their brightness detail.
Raw is not one thing
A camera RAW file is usually not a simple RGB bitmap. It may contain sensor mosaic values, metadata, previews, and sometimes lossless or lossy compression of its own. “Raw RGB” here means the useful thought experiment: storing final pixel samples directly. Likewise, JPEG is both a family of coding modes and, in everyday speech, a file carrying a familiar DCT-based JPEG stream.
Lossless takes a different bargain
Lossless compression must reproduce the exact original sample values. PNG filters each scanline to make neighboring values easier to compress, then uses the DEFLATE algorithm—a combination of repeated-string references and entropy coding. Flat graphics, text, repeated patterns, and transparency often compress well. Photographic noise does not repeat politely, so a lossless photograph can remain much larger than a visually similar lossy image.
One picture, several kinds of machinery
DCT-based lossy coding excels at ordinary photographs and broad compatibility. Classic JPEG has no alpha channel.
Filtered scanlines plus DEFLATE preserve exact samples and full alpha. Strong for graphics, UI, and screenshots.
A RIFF container can carry VP8-derived lossy images, a separate lossless mode, alpha, metadata, and animation.
Stores AV1-coded image items in an ISO base-media structure. Supports lossy or lossless coding, HDR, wide color, alpha, and sequences.
A palette format using LZW compression. Its 256-color limit and one-bit transparency are restrictive; simple frame animation made it culturally durable.
XML instructions describe paths, shapes, text, gradients, and filters. It scales cleanly because it stores a scene, not a fixed pixel grid.
RGB is only part of the picture
An alpha channel describes coverage or opacity. Straight alpha stores color independently from alpha; premultiplied alpha stores color already multiplied by coverage, which can make compositing numerically convenient. Formats and graphics APIs must agree on interpretation or translucent edges can grow dark or bright fringes.
Bit depth and color space change what values mean
Eight bits per channel offer 256 code values per channel; ten or twelve bits provide finer steps. But bit depth alone does not define visible color. Color primaries describe the gamut, a transfer function maps encoded values to light, and metadata tells software how to interpret them. HDR combines greater range with appropriate color and transfer characteristics—not merely a larger file.
Choose by what must survive
| Need | Useful starting point | Why |
|---|---|---|
| Broadly compatible photograph | JPEG | Simple delivery and mature decoding everywhere |
| Exact UI, screenshot, or transparency | PNG | Lossless samples and alpha |
| Modern mixed web imagery | WebP | Lossy, lossless, alpha, and animation in one family |
| High compression, HDR, wide gamut | AVIF | Modern AV1 image tools and rich color support |
| Logos and diagrams | SVG | Resolution-independent scene instructions |
| Editing latitude from a camera | Camera RAW | Sensor-oriented data and capture metadata |
References
The standard family and its DCT-based lossy core.
W3CPNG specificationImage structure, filtering, compression, color, and transparency.
GoogleWebP container specificationLossy, lossless, alpha, animation, and metadata chunks.
Alliance for Open MediaAVIF specificationAV1 image items, sequences, HDR, alpha, and color capabilities.