Skip to content

BCP 47

The IETF standard for language tags -- the format you put in HTML `lang=`, `hreflang=`, and HTTP `Content-Language` headers. Examples: `en`, `en-US`, `zh-Hans`, `de-CH`.

BCP 47 (Best Current Practice 47, RFC 5646) is the standard for language identifiers used across the web. It's what goes in <html lang>, <link hreflang>, the HTTP Content-Language header, and Accept-Language request headers.

Tag structure (in canonical form):

  • Primary language (required, lowercase): 2 letters per ISO 639-1 (en, fr, de) or 3 letters per ISO 639-3 for less common languages.
  • Script (optional, Title case): 4 letters per ISO 15924 (Hans for Simplified Chinese, Latn, Cyrl, Arab). Use only when needed to disambiguate.
  • Region (optional, UPPERCASE): 2 letters per ISO 3166-1 (US, GB, CA) or 3 digits for UN M.49 (419 for Latin America, 001 for the world).
  • Variant (rare): registered codes like valencia for the Valencian dialect of Catalan.

Subtags are joined by hyphens (NEVER underscores -- a common mistake; en_US is invalid). Tags are case-insensitive at parse time but the canonical form follows the casing rules above.

Common mistakes the BeaverCheck hreflang analyzer flags:

  • en_US (underscore -- must be hyphen)
  • english (full word -- must use the 2/3-letter code)
  • US (region only -- must include language: en-US)
  • en-us (lowercase region -- accepted but not canonical)

For most sites, simple language codes (en, fr) suffice. Add region only when you genuinely target different content per region (e.g., en-US and en-GB because pricing or product availability differs).

Related terms

Further reading

Send Feedback