BCP 47 (Best Current Practice 47, RFC 5646) is the standard for language identifiers used across the web. It's what goes in <html lang>, <link hreflang>, the HTTP Content-Language header, and Accept-Language request headers.
Tag structure (in canonical form):
- Primary language (required, lowercase): 2 letters per ISO 639-1 (
en,fr,de) or 3 letters per ISO 639-3 for less common languages. - Script (optional, Title case): 4 letters per ISO 15924 (
Hansfor Simplified Chinese,Latn,Cyrl,Arab). Use only when needed to disambiguate. - Region (optional, UPPERCASE): 2 letters per ISO 3166-1 (
US,GB,CA) or 3 digits for UN M.49 (419for Latin America,001for the world). - Variant (rare): registered codes like
valenciafor the Valencian dialect of Catalan.
Subtags are joined by hyphens (NEVER underscores -- a common mistake; en_US is invalid). Tags are case-insensitive at parse time but the canonical form follows the casing rules above.
Common mistakes the BeaverCheck hreflang analyzer flags:
en_US(underscore -- must be hyphen)english(full word -- must use the 2/3-letter code)US(region only -- must include language:en-US)en-us(lowercase region -- accepted but not canonical)
For most sites, simple language codes (en, fr) suffice. Add region only when you genuinely target different content per region (e.g., en-US and en-GB because pricing or product availability differs).