What is an ETag? HTTP Caching Explained

You shipped a new endpoint a few weeks ago. Functionally, it is fine. The JSON is correct, tests pass, and clients can fetch data without issue.

Then the familiar problems show up. Mobile users say screens feel slow even when data rarely changes. Your origin keeps serving the same payload again and again. Logs show repeat requests for identical resources, but your server still sends the full body every time.

That is where many developers first ask, what is an etag, and why does HTTP need one when caches already exist?

An ETag solves a simple but expensive problem. A client often already has the latest copy of a resource, but without a reliable validator it has no clean way to ask, “Has this changed since last time?” So it downloads the whole thing again. Across APIs, static assets, CDNs, and browser caches, that waste adds up fast.

An ETag is an HTTP response header that gives a resource version an identifier. The client stores that identifier and sends it back later in a conditional request. If the server sees the resource has not changed, it can skip the body and return 304 Not Modified instead.

That sounds small. In production, it is not. In high-traffic environments, conditional requests with ETags can let the server return minimal 304 responses for unchanged resources, potentially reducing bandwidth consumption for that resource by 95-99% according to Fastly’s explanation of ETags and revalidation.

Introduction Why Your API Wastes Bandwidth

A common source of backend inefficiency is simple: your API keeps sending data the client already has.

Suppose your mobile app requests /api/profile/123 every time a user opens the settings screen. The profile has not changed since yesterday. Your server still fetches the record, serializes JSON, compresses it, writes headers, and sends the same body again. One request is cheap. Thousands of repeat requests per minute are not.

That waste shows up all over a stack. Product listings, dashboard summaries, avatar images, JavaScript bundles, and CMS responses often change far less often than clients request them. If the server cannot prove that a cached copy is still valid, the safest fallback is to send the whole payload again.

Why basic caching is not enough

Cache-Control helps decide how long a response can be reused, but it does not answer a different question: how can a client check whether its stored copy still matches the current version on the server?

That question matters for APIs because data does not always change on a predictable schedule. You often cannot mark a response as immutable for hours or days. You need a way to stay correct without paying the cost of retransmitting the full body on every check.

ETags provide a middle path:

The client can ask, "Has this exact version changed?"
The server can reply with a small validation response instead of the full payload when nothing changed.
The cached body can be reused by the browser, mobile app, reverse proxy, or CDN.

An ETag works like a claim ticket for one specific representation of a resource. The client stores the ticket with the response. On the next request, it hands the ticket back and asks the server to compare it with the current version. If they match, the server sends 304 Not Modified and skips the body.

That is the beginning of the full ETag lifecycle. You generate a validator, return it to the client, compare it on later requests, and let each layer above your origin use that signal efficiently. The same mechanism that saves bandwidth on repeat GET requests also becomes important later when you configure CDNs and when you protect writes with optimistic concurrency control so one client does not overwrite another client's changes.

A simple symptom is easy to spot in logs. The same client asks for the same resource again and again, and your origin keeps sending identical bytes back.

ETags fix that by turning "send it all again" into "send it only if it changed." That sounds small at first. In production, it affects bandwidth, origin load, response times, cache behavior, and data safety across the full stack.

How ETags Enable Smart Caching

Open your API logs and you will often see the same pattern: GET /api/posts/42 from the same client, over and over, with your origin returning the same JSON each time. The waste is not in the request itself. The waste is in sending identical bytes again when the client already has them.

An ETag solves that by giving one response representation a validator the client can bring back later. The client is effectively saying, “I still have version X. Tell me if X is still current before you send the whole body again.”

The first request

On the first request, there is nothing to validate yet, so the server sends the full representation plus its validator.

A typical exchange looks like this:

Request: GET /api/posts/42
Response: 200 OK
Headers: ETag: "675af34563dc-tr34"
Body: full JSON payload

That ETag is an opaque value in double quotes. It might come from a hash, a revision number, a timestamp-based strategy, or a database version column. The client does not need to decode it. It only needs to store it alongside the response body.

The next request

When the client needs the same resource again, it can make a conditional request instead of a blind re-download. The validator goes into the If-None-Match header.

The flow looks like this:

Client sends a conditional GET with If-None-Match: "675af34563dc-tr34"
Server compares that value with the current validator for the resource
Server returns the smallest correct response

If the resource is unchanged:

Response: 304 Not Modified
Body: no response body
Result: client reuses its cached copy

If the resource changed:

Response: 200 OK
Headers: a new ETag
Body: the latest representation

That loop is the heart of smart caching. The client still checks. The origin just avoids paying to resend the body when nothing changed.

Why this works in practice

Age-based caching answers one question: “Is my cached copy still fresh enough to use without asking?” ETag validation answers a different one: “Is my cached copy still the same version?”

That distinction matters in real systems. Freshness rules such as max-age reduce requests for a period of time, but once that period expires, the client needs a way to revalidate efficiently. ETags give the browser, mobile app, reverse proxy, and CDN a precise comparison point. Instead of falling back to “download everything again,” each layer can ask the origin for confirmation first.

A useful comparison is source control. You do not re-clone a repository just to check whether the current commit changed. You compare identifiers. If the identifier matches, you keep what you already have. ETag validation applies the same idea to HTTP representations.

What developers often mix up

An ETag does not eliminate repeat requests. It makes repeat requests cheaper.

That confusion shows up a lot in backend work. A 304 Not Modified still reaches your server unless an upstream cache can answer it. The bandwidth savings come from skipping the response body, and the CPU savings depend on how expensive your ETag generation and comparison path is.

Another common mistake is treating the ETag like a security token. It is not an authentication mechanism or an integrity guarantee for hostile clients. Its job is much narrower. It tells caches and clients whether the current representation matches the one they already hold.

For backend teams, the bigger picture is what makes ETags worth learning well. The same validator that cuts bandwidth on repeated GET requests also affects how CDNs revalidate objects and how later write operations can avoid overwriting someone else’s changes. Smart caching is only the first stop in the ETag lifecycle, but it is the one that usually pays off first in production.

Strong Versus Weak ETags Explained

Not every resource needs the same level of precision. That is why HTTP supports strong and weak ETags.

A 3D abstract network structure on a black background labeled ETag Integrity to represent digital data connectivity.

The short version

A strong ETag means the representation matches byte for byte.

A weak ETag means the representations are equivalent enough for a given use case, even if the bytes are not identical. Weak ETags use the W/ prefix, such as W/"6868-184f35cde4a".

According to Wikipedia’s overview of HTTP ETag behavior, strong validation and weak validation exist to support different use cases, and weak validators are used in scenarios where semantic equivalence matters more than exact byte identity.

Side by side comparison

Validator type	What it means	Best fit
Strong ETag	Exact byte-for-byte match	File downloads, exact representation checks, concurrency-sensitive updates
Weak ETag	Semantically equivalent, not necessarily byte-identical	Content that may differ in insignificant ways, such as formatting or representation-level changes

When to prefer strong ETags

Use strong ETags when exactness matters.

Examples include:

Binary assets: if a file download changes even slightly, the validator should change.
Precise API update checks: if you want to ensure a client updates exactly the version it fetched, strong comparison is safer.
Range requests and representation integrity: exact bytes matter here.

A strong ETag is your “same file, same bytes” promise.

When weak ETags are the right tool

Use weak ETags when the resource can change in ways that do not matter semantically.

That often happens with:

JSON formatting changes: whitespace or field ordering can differ while the meaning stays the same.
Compression or representation adjustments: the response may be encoded differently without meaningfully changing the content.
Derived API responses: where small representation details should not trigger unnecessary cache misses.

Wikipedia’s example notes that in fintech APIs serving dynamic data, weak ETags such as W/"6868-184f35cde4a" help verify that data has not been modified while still balancing consistency and performance in distributed systems.

The practical rule

If you are validating meaning, weak ETags can be enough.

If you are validating exact bytes or using ETags for strict write safety, use strong ones.

Developers often overuse strong validators on responses where tiny serialization differences create unnecessary cache misses. They also sometimes use weak validators in places where overwrite protection demands stricter guarantees. Both choices create pain. One hurts performance. The other hurts correctness.

Tip: For read-heavy JSON APIs, weak ETags can be a sensible default if your serialization pipeline can introduce harmless output differences. For write preconditions, be much stricter.

Choosing Your ETag Generation Strategy

Choosing an ETag strategy is really choosing what kind of identity your resource should carry through its whole lifecycle.

If a client asks, "Do I already have the current version?", your ETag answers that for caching. If the same client later says, "Only apply my update if nobody changed this first," the very same validator may now protect data integrity. That is why this decision belongs to both performance and correctness.

Start with the question your ETag must answer

A useful way to frame the choice is to ask what your server is trying to prove:

"These exact response bytes are the same." Use a content-based validator tied to the final representation.
"This logical resource version is the same." Use a revision, row version, or update token from your data store.
"This representation is the same for this variant." Use a composite value that includes version plus representation-specific inputs such as language, format, or encoding.

An ETag works like a claim ticket. The client hands it back later, and your server decides whether that ticket still matches the current state. The hard part is deciding what state it should represent.

Common generation approaches

Teams usually choose one of four patterns:

Hash the response content. MD5 or SHA-family hashes are common. This fits cases where the final body is the source of truth.
Use a version number. A database revision, incrementing row version, or event sequence often maps cleanly to API resources.
Derive from modification metadata. Timestamps or last-update signals are simple, but they are only as good as their precision and consistency.
Build a composite ETag. For example, combine a resource revision with a content variant such as gzip versus br, or en versus fr.

Each option answers a slightly different question. That is why ETag design gets tricky in systems with app servers, background jobs, serializers, and CDNs all touching the same response path.

Trade-offs in plain terms

A content hash is easy to reason about. Change the response body, and the ETag changes. That makes it attractive for static assets and APIs where the exact serialized output matters.

It also has a cost. Hashing large payloads on every request can add work, and two servers must serialize the response identically or they will produce different validators for the same logical data.

A version-based ETag often fits APIs better. If your users table already has a version column or updated_at plus a write token, every app instance can generate the same value without re-hashing a large JSON document. This is usually the cleanest path when the same validator will later be reused for If-Match on updates.

A timestamp-based ETag is simple, but it can fail in subtle ways. Two updates inside the same time unit may collapse into one apparent version. Different services may also disagree on clock timing or formatting.

A composite ETag solves real production problems. Suppose your CDN stores both compressed and uncompressed variants, or your API serves localized content. If the ETag only reflects the database row version, clients may treat two different representations as interchangeable when they are not.

ETag versus Last-Modified

You do not need to treat these as rivals. Many systems send both, then let clients and intermediaries use the strongest validator they support.

Attribute	ETag	Last-Modified
Precision	Can represent exact or semantic version identity	Based on modification time
Client check	`If-None-Match`	`If-Modified-Since`
Best fit	APIs, distributed systems, version-aware resources	Simpler resources where timestamp validation is enough
Consistency across representations	Stronger, because it can identify a specific representation version	Weaker, because time alone may not reflect representation details
Concurrency use	Works with `If-Match` for update safety	Not typically the main tool for optimistic concurrency

If your team is refining resource versioning and conditional request behavior together, this guide to API design best practices pairs well with ETag planning.

The failure mode that shows up in distributed systems

Server-local file metadata looks convenient until you put a load balancer in front of multiple instances.

One node generates an ETag from a file inode or local mtime. Another node generates a different value for the same logical resource. The client revalidates, lands on a different instance, and gets a cache miss even though nothing meaningful changed. You lose the bandwidth savings ETags were supposed to provide, and debugging is frustrating because each individual server appears correct on its own.

CDNs make this even more important. If your origin produces unstable validators, the CDN cannot revalidate efficiently with the origin, and clients cannot rely on consistent cache behavior at the edge.

A practical decision framework

Use this checklist:

Should the ETag represent exact bytes or logical resource state?
Can every application instance reproduce the same value?
Will you use the same validator for both cache revalidation and write preconditions?
Does the response vary by language, media type, or compression?
Will a CDN or reverse proxy cache this response and revalidate it later?

A good default for many APIs is a version-based ETag tied to the underlying record or aggregate version, then expanded if representation variants matter. A good default for static or fully rendered content is often a hash of the final response bytes.

Choose the strategy that stays stable across the full stack, from database write to origin response to CDN cache to client update request. That is what turns ETags from a nice header into a reliable system contract.

Practical Implementation In Your Backend

Theory matters. Headers in a real response matter more.

A developer working on backend code related to ETag implementation displayed on a computer screen monitor.

Express example with explicit ETag generation

If you want direct control in Node.js, Express lets you set the header yourself.

The implementation pattern cited by Lightspark is:

res.set('ETag', require('crypto').createHash('md5').update(body).digest('hex'))

Here is a simple example around that pattern:

const express = require('express');
const crypto = require('crypto');

const app = express();

app.get('/api/user/:id', (req, res) => {
  const payload = {
    id: req.params.id,
    name: 'Ava',
    role: 'admin'
  };

  const body = JSON.stringify(payload);
  const etag = crypto.createHash('md5').update(body).digest('hex');

  if (req.headers['if-none-match'] === etag) {
    return res.status(304).end();
  }

  res.set('ETag', etag);
  res.set('Content-Type', 'application/json');
  res.send(body);
});

app.listen(3000);

A few mentoring notes:

Hash the actual response body if your validator should reflect the final representation.
Keep generation consistent across instances.
If you want strict RFC-style formatting, ensure the value is emitted in the expected ETag header form your stack supports.

Lightspark also notes that testing with curl -I -H 'If-None-Match: "..."' can verify 304 behavior, and in some real-world deployments this kind of optimization reduced latency from 200ms to <50ms by minimizing payload transfers, as described in their ETag glossary entry.

If you are building out a broader service from scratch, this practical guide on how to build a REST API is a useful companion.

Django example

Django offers a few ways to handle conditional responses. One clear approach is to compute an ETag from the serialized body and short-circuit when the client already has it.

import hashlib
import json
from django.http import HttpResponse, HttpResponseNotModified

def user_detail(request, user_id):
    payload = {
        "id": user_id,
        "name": "Ava",
        "role": "admin",
    }

    body = json.dumps(payload)
    etag = hashlib.md5(body.encode("utf-8")).hexdigest()

    if request.headers.get("If-None-Match") == etag:
        return HttpResponseNotModified()

    response = HttpResponse(body, content_type="application/json")
    response["ETag"] = etag
    return response

This is intentionally simple. In a production Django app, you would usually tie the ETag to a model revision, update timestamp, or stable serializer output instead of hand-building it in every view.

Nginx and reverse proxy considerations

At the web server layer, ETag handling often depends on how you serve the resource.

For static assets, Nginx can emit validators automatically depending on configuration and upstream behavior. The key architectural question is not “can Nginx add an ETag?” It usually can. The better question is whether the validator is stable and meaningful across your delivery chain.

Use caution when:

Your app server already sets ETags
A CDN sits in front of Nginx
Compression changes the representation
Multiple upstream nodes might generate different values

In those setups, duplicated or inconsistent ETag generation can create hard-to-debug cache behavior.

How to test it with curl

Use curl before you trust browser devtools.

First, inspect the initial response:

curl -i http://localhost:3000/api/user/123

Look for the ETag header.

Then send a conditional request:

curl -i -H 'If-None-Match: d41d8cd98f00b204e9800998ecf8427e' http://localhost:3000/api/user/123

If the resource is unchanged and your logic is correct, the server should return 304 Not Modified.

A short walkthrough can help if you want to see request and response flow in action:

Key takeaway: Framework defaults are convenient, but custom ETag logic is often worth it when you need stable behavior across app instances, serializers, CDNs, and write preconditions.

Advanced ETag Topics And Common Pitfalls

Most introductions stop at browser caching. That leaves out the part senior backend engineers care about most. ETags are also a data integrity tool.

That matters in distributed systems, especially when multiple users or services can update the same resource.

ETags at the CDN and proxy layer

CDNs and reverse proxies use validators to avoid unnecessary origin fetches and to coordinate cache freshness.

An origin can return a representation with an ETag. The CDN stores both. When the cached response becomes stale or needs validation, the CDN can revalidate instead of always pulling the full body from origin.

That gives you a full-stack lifecycle:

Origin generates the validator
CDN stores it with the cached representation
Client or edge revalidates with the validator later
Origin decides whether the body must travel again

This is one reason poor ETag generation hurts more than a single app endpoint. A weak validator strategy can ripple through your browser cache, edge cache, and origin behavior.

Preventing mid-air collisions with If-Match

The concurrency use case is where ETags become more than a performance feature.

The problem is simple. Two users fetch the same resource. Both edit it. The first update succeeds. The second update arrives later with stale data and overwrites the first user’s work.

That is the classic mid-air collision.

ETags solve this with optimistic concurrency control. The client reads a resource and gets its current ETag. When it sends PUT or PATCH, it includes that value in If-Match. The server only applies the update if the current resource still matches that validator. If not, the server can reject the write, commonly with 412 Precondition Failed.

The critical point, which GeeksforGeeks notes is often underexplained, is that ETags do not just “help prevent simultaneous updates.” They provide a concrete precondition mechanism for detecting stale writes before they become silent data corruption.

A practical flow looks like this:

Client A reads /accounts/42 and gets ETag "v7"
Client B reads the same resource and also gets "v7"
Client A updates with If-Match: "v7" and the server stores the change
The resource now has a new validator
Client B updates with stale If-Match: "v7"
Server rejects the request because Client B edited an old version

That is optimistic locking without holding a database lock while the user thinks.

Common pitfalls

A few mistakes show up repeatedly in production systems:

Leaking unstable server details: If your ETag depends on machine-specific metadata, validators can vary across nodes.
Hashing inconsistent output: If one instance serializes fields in a different order, your ETags churn unnecessarily.
Ignoring representation differences: Compression, formatting, and alternate encodings can affect what the validator should represent.
Using ETags only for GET: If you stop there, you miss one of their most valuable roles in write safety.
Failing to design the 412 path: Rejecting stale writes is correct, but your client needs a clear refetch-and-merge flow.

Tip: If you use ETags for updates, treat 412 Precondition Failed as part of the normal contract, not as an exceptional edge case.

Where ETags fit with API versioning

ETags do not replace API versioning.

API versioning changes the contract shape. ETags identify a particular version of a particular resource representation within that contract. You still version your API when the schema or semantics change. You still use ETags when clients need to validate whether a specific representation changed since last fetch.

Those two tools solve different problems, and effective systems usually need both.

Frequently Asked Questions About ETags

Do ETags replace Cache-Control

No.

Cache-Control tells clients and intermediaries how caching should behave. An ETag gives them a validator to use when revalidation is needed. They often work together.

Is an ETag always a hash

No.

Many implementations use hashes such as MD5 or SHA-1, but the server chooses the generation method. A version number, revision token, or other stable identifier can work if it reliably tracks the resource version.

Should I use ETags on JSON APIs

Often, yes.

They are especially useful when clients request the same resources repeatedly and those resources do not change on every request. They are also valuable when updates need stale-write protection.

What is the difference between If-None-Match and If-Match

They serve different intents:

If-None-Match is typically used for cache validation on reads
If-Match is typically used for precondition checks on writes

The first asks, “send the body only if it changed.” The second says, “apply this update only if I still have the latest version.”

Can I use weak ETags for everything

You can, but you should not.

Weak ETags are useful when semantic equivalence matters more than exact bytes. They are not the best fit for every update safety scenario or every exact representation check.

Why am I getting cache misses even when data looks unchanged

Usually one of these is happening:

Your serializer output is not stable
Different servers generate different ETags
Compression or representation changes are affecting the validator
The response includes dynamic fields that churn the hash

Check the full response body, not just the business fields you care about.

Should I expose ETags in API documentation

Yes.

If clients need to use If-None-Match or If-Match, document the headers, the expected response behavior, and what a 304 or 412 means. Good API docs reduce client-side guesswork. This guide on how to write API documentation is a helpful reference for structuring that clearly.

What is the simplest good starting point

For many teams, a solid starting point looks like this:

Static assets: let your platform or server emit stable validators
Read-heavy JSON endpoints: use a deterministic content hash or revision-based ETag
Writable resources: support If-Match on PUT and PATCH
Client guidance: document 304 Not Modified and 412 Precondition Failed behavior

That gives you the full lifecycle. Generation, read validation, edge behavior, and safe updates.

Backend engineers make better architecture decisions when they understand the full request lifecycle, not just framework defaults. If you want more practical backend guides, implementation walkthroughs, and architecture comparisons, explore Backend Application Hub.