Files
code-snippets/javascript/ascii-fold/README.md
2025-10-31 13:49:24 +01:00

87 lines
3.0 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ascii-fold (JavaScript)
Best-effort ASCII folding and slug generation utilities extracted from a reusable snippet. It focuses on practical, predictable results:
- Removes diacritics using Unicode NFKD normalization and strips combining marks
- Handles common ligatures and special Latin letters (Æ/æ → AE/ae, ß → ss, etc.)
- Maps typographic quotes, dashes, ellipsis, spaces, and a few symbols to sensible ASCII
- Optional strict ASCII-only output or keep-non-ASCII-with-placeholder
- Includes a small `toSlug` helper built on top of `toASCII`
## Quick usage
Minimal examples showing what the functions do. Adjust to your environment as needed.
```js
// Assume you have the functions available in scope
const input = "Café™ — 50°C";
// Basic ASCII folding (default: asciiOnly=true, marksStyle="plain")
const ascii = toASCII(input);
// => "Cafe tm - 50 C"
// Keep non-ASCII by substituting unknowns
const kept = toASCII("Emoji: 😀", { asciiOnly: false, unknown: "?" });
// => "Emoji: ?"
// Slugify
const slug = toSlug("Hello, World! © 2025");
// => "hello-world-c-2025"
```
## Functions
### `toASCII(input, options)`
Converts a string to a best-effort ASCII equivalent:
1) NFKD normalize, 2) strip combining marks, 3) map ligatures/letters and symbols, 4) optionally enforce ASCII-only.
Options (`AsciiFoldOptions`):
- `marksStyle`: `"plain" | "paren"` (default `"plain"`)
- `"plain"`: © ® ™ ℠ → `c r tm sm`
- `"paren"`: © ® ™ ℠ → `(c) (r) (tm) (sm)`
- `asciiOnly`: `boolean` (default `true`)
- When `true`, removes any remaining non-ASCII after mapping
- When `false`, keeps non-ASCII but replaces still-unknowns with `unknown`
- `unknown`: `string` (default `"?"`)
- Placeholder for non-ASCII characters that remain when `asciiOnly=false`
Examples:
```js
toASCII("Äffin ½ kg", { marksStyle: "paren" });
// => "Affin - 1/2 kg"
toASCII("naïve façade", {});
// => "naive facade"
```
### `toSlug(input, options)`
Builds on `toASCII` and normalizes to a URL-friendly slug.
Options:
- `separator`: string (default `"-"`)
- `caseStyle`: `"lower" | "upper" | "none"` (default `"lower"`)
- `strict`: `boolean` (default `false`)
- When `true`, removes everything except `AZ az 09` and the chosen separator
- `toASCIIOptions`: `AsciiFoldOptions` (passed to `toASCII` first)
Examples:
```js
toSlug("Crème brûlée — ©", { separator: "-" });
// => "creme-brulee-c"
toSlug("Über cool", { caseStyle: "upper", separator: "_" });
// => "UBER_COOL"
```
## Notes on behavior
- Ligatures and special letters handled explicitly: ff/fi/fl/ffi/ffl, Æ/æ, Œ/œ, ß, Þ/þ, Ð/ð, Ł/ł, Ø/ø, Đ/đ
- Typographic punctuation mapped to ASCII: curly quotes → straight quotes, en/em dashes → `-`, ellipsis → `...`, non-breaking and thin/figure spaces → normal space
- Some miscellaneous symbols mapped: `° → deg`, `× → x`, `÷ → /`, `• → *`, simple fraction glyphs like `½ ¼ ¾`
- Zero-width marks (ZWNJ/ZWJ/BOM) are removed
## License
See the repository-level `LICENSE` file.