You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

65 lines
3.1 KiB

3 months ago
  1. ### Javascript porting of Markus Kuhn's wcwidth() implementation
  2. The following explanation comes from the original C implementation:
  3. This is an implementation of wcwidth() and wcswidth() (defined in
  4. IEEE Std 1002.1-2001) for Unicode.
  5. http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html
  6. http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html
  7. In fixed-width output devices, Latin characters all occupy a single
  8. "cell" position of equal width, whereas ideographic CJK characters
  9. occupy two such cells. Interoperability between terminal-line
  10. applications and (teletype-style) character terminals using the
  11. UTF-8 encoding requires agreement on which character should advance
  12. the cursor by how many cell positions. No established formal
  13. standards exist at present on which Unicode character shall occupy
  14. how many cell positions on character terminals. These routines are
  15. a first attempt of defining such behavior based on simple rules
  16. applied to data provided by the Unicode Consortium.
  17. For some graphical characters, the Unicode standard explicitly
  18. defines a character-cell width via the definition of the East Asian
  19. FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes.
  20. In all these cases, there is no ambiguity about which width a
  21. terminal shall use. For characters in the East Asian Ambiguous (A)
  22. class, the width choice depends purely on a preference of backward
  23. compatibility with either historic CJK or Western practice.
  24. Choosing single-width for these characters is easy to justify as
  25. the appropriate long-term solution, as the CJK practice of
  26. displaying these characters as double-width comes from historic
  27. implementation simplicity (8-bit encoded characters were displayed
  28. single-width and 16-bit ones double-width, even for Greek,
  29. Cyrillic, etc.) and not any typographic considerations.
  30. Much less clear is the choice of width for the Not East Asian
  31. (Neutral) class. Existing practice does not dictate a width for any
  32. of these characters. It would nevertheless make sense
  33. typographically to allocate two character cells to characters such
  34. as for instance EM SPACE or VOLUME INTEGRAL, which cannot be
  35. represented adequately with a single-width glyph. The following
  36. routines at present merely assign a single-cell width to all
  37. neutral characters, in the interest of simplicity. This is not
  38. entirely satisfactory and should be reconsidered before
  39. establishing a formal standard in this area. At the moment, the
  40. decision which Not East Asian (Neutral) characters should be
  41. represented by double-width glyphs cannot yet be answered by
  42. applying a simple rule from the Unicode database content. Setting
  43. up a proper standard for the behavior of UTF-8 character terminals
  44. will require a careful analysis not only of each Unicode character,
  45. but also of each presentation form, something the author of these
  46. routines has avoided to do so far.
  47. http://www.unicode.org/unicode/reports/tr11/
  48. Markus Kuhn -- 2007-05-26 (Unicode 5.0)
  49. Permission to use, copy, modify, and distribute this software
  50. for any purpose and without fee is hereby granted. The author
  51. disclaims all warranties with regard to this software.
  52. Latest version: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c