3.0 KiB
3.0 KiB
ascii-fold (JavaScript)
Best-effort ASCII folding and slug generation utilities extracted from a reusable snippet. It focuses on practical, predictable results:
- Removes diacritics using Unicode NFKD normalization and strips combining marks
- Handles common ligatures and special Latin letters (Æ/æ → AE/ae, ß → ss, etc.)
- Maps typographic quotes, dashes, ellipsis, spaces, and a few symbols to sensible ASCII
- Optional strict ASCII-only output or keep-non-ASCII-with-placeholder
- Includes a small
toSlughelper built on top oftoASCII
Quick usage
Minimal examples showing what the functions do. Adjust to your environment as needed.
// Assume you have the functions available in scope
const input = "Café™ — 50 °C";
// Basic ASCII folding (default: asciiOnly=true, marksStyle="plain")
const ascii = toASCII(input);
// => "Cafe tm - 50 C"
// Keep non-ASCII by substituting unknowns
const kept = toASCII("Emoji: 😀", { asciiOnly: false, unknown: "?" });
// => "Emoji: ?"
// Slugify
const slug = toSlug("Hello, World! © 2025");
// => "hello-world-c-2025"
Functions
toASCII(input, options)
Converts a string to a best-effort ASCII equivalent:
- NFKD normalize, 2) strip combining marks, 3) map ligatures/letters and symbols, 4) optionally enforce ASCII-only.
Options (AsciiFoldOptions):
marksStyle:"plain" | "paren"(default"plain")"plain": © ® ™ ℠ →c r tm sm"paren": © ® ™ ℠ →(c) (r) (tm) (sm)
asciiOnly:boolean(defaulttrue)- When
true, removes any remaining non-ASCII after mapping - When
false, keeps non-ASCII but replaces still-unknowns withunknown
- When
unknown:string(default"?")- Placeholder for non-ASCII characters that remain when
asciiOnly=false
- Placeholder for non-ASCII characters that remain when
Examples:
toASCII("Äffin – ½ kg", { marksStyle: "paren" });
// => "Affin - 1/2 kg"
toASCII("naïve façade", {});
// => "naive facade"
toSlug(input, options)
Builds on toASCII and normalizes to a URL-friendly slug.
Options:
separator: string (default"-")caseStyle:"lower" | "upper" | "none"(default"lower")strict:boolean(defaultfalse)- When
true, removes everything exceptA–Z a–z 0–9and the chosen separator
- When
toASCIIOptions:AsciiFoldOptions(passed totoASCIIfirst)
Examples:
toSlug("Crème brûlée — ©", { separator: "-" });
// => "creme-brulee-c"
toSlug("Über cool", { caseStyle: "upper", separator: "_" });
// => "UBER_COOL"
Notes on behavior
- Ligatures and special letters handled explicitly: ff/fi/fl/ffi/ffl, Æ/æ, Œ/œ, ß, Þ/þ, Ð/ð, Ł/ł, Ø/ø, Đ/đ
- Typographic punctuation mapped to ASCII: curly quotes → straight quotes, en/em dashes →
-, ellipsis →..., non-breaking and thin/figure spaces → normal space - Some miscellaneous symbols mapped:
° → deg,× → x,÷ → /,• → *, simple fraction glyphs like½ ¼ ¾ - Zero-width marks (ZWNJ/ZWJ/BOM) are removed
License
See the repository-level LICENSE file.