You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
|
### Javascript porting of Markus Kuhn's wcwidth() implementation
The following explanation comes from the original C implementation:
This is an implementation of wcwidth() and wcswidth() (defined inIEEE Std 1002.1-2001) for Unicode.
http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.htmlhttp://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html
In fixed-width output devices, Latin characters all occupy a single"cell" position of equal width, whereas ideographic CJK charactersoccupy two such cells. Interoperability between terminal-lineapplications and (teletype-style) character terminals using theUTF-8 encoding requires agreement on which character should advancethe cursor by how many cell positions. No established formalstandards exist at present on which Unicode character shall occupyhow many cell positions on character terminals. These routines area first attempt of defining such behavior based on simple rulesapplied to data provided by the Unicode Consortium.
For some graphical characters, the Unicode standard explicitlydefines a character-cell width via the definition of the East AsianFullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes.In all these cases, there is no ambiguity about which width aterminal shall use. For characters in the East Asian Ambiguous (A)class, the width choice depends purely on a preference of backwardcompatibility with either historic CJK or Western practice.Choosing single-width for these characters is easy to justify asthe appropriate long-term solution, as the CJK practice ofdisplaying these characters as double-width comes from historicimplementation simplicity (8-bit encoded characters were displayedsingle-width and 16-bit ones double-width, even for Greek,Cyrillic, etc.) and not any typographic considerations.
Much less clear is the choice of width for the Not East Asian(Neutral) class. Existing practice does not dictate a width for anyof these characters. It would nevertheless make sensetypographically to allocate two character cells to characters suchas for instance EM SPACE or VOLUME INTEGRAL, which cannot berepresented adequately with a single-width glyph. The followingroutines at present merely assign a single-cell width to allneutral characters, in the interest of simplicity. This is notentirely satisfactory and should be reconsidered beforeestablishing a formal standard in this area. At the moment, thedecision which Not East Asian (Neutral) characters should berepresented by double-width glyphs cannot yet be answered byapplying a simple rule from the Unicode database content. Settingup a proper standard for the behavior of UTF-8 character terminalswill require a careful analysis not only of each Unicode character,but also of each presentation form, something the author of theseroutines has avoided to do so far.
http://www.unicode.org/unicode/reports/tr11/
Markus Kuhn -- 2007-05-26 (Unicode 5.0)
Permission to use, copy, modify, and distribute this softwarefor any purpose and without fee is hereby granted. The authordisclaims all warranties with regard to this software.
Latest version: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
|