| This directory contains a mechanism for GCC to have its own internal |
| implementation of wcwidth functionality (cpp_wcwidth () in libcpp/charset.c), |
| as well as a mechanism to update the information about codepoints permitted in |
| identifiers, which is encoded in libcpp/ucnid.h, and mapping between Unicode |
| names and codepoints, which is encoded in libcpp/uname2c.h. |
| |
| The idea is to produce the necessary lookup tables |
| (../../libcpp/{ucnid.h,uname2c.h,generated_cpp_wcwidth.h}) in a reproducible |
| way, starting from the following files that are distributed by the Unicode |
| Consortium: |
| |
| ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt |
| ftp://ftp.unicode.org/Public/UNIDATA/EastAsianWidth.txt |
| ftp://ftp.unicode.org/Public/UNIDATA/PropList.txt |
| ftp://ftp.unicode.org/Public/UNIDATA/DerivedNormalizationProps.txt |
| ftp://ftp.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt |
| ftp://ftp.unicode.org/Public/UNIDATA/NameAliases.txt |
| |
| These files have been added to source control in this directory; |
| please see unicode-license.txt for the relevant copyright information. |
| |
| In order to keep in sync with glibc's wcwidth as much as possible, it is |
| desirable for the logic that processes the Unicode data to be the same as |
| glibc's. To that end, we also put in this directory, in the from_glibc/ |
| directory, the glibc python code that implements their logic. This code was |
| copied verbatim from glibc, and it can be updated at any time from the glibc |
| source code repository. The files copied from that respository are: |
| |
| localedata/unicode-gen/unicode_utils.py |
| localedata/unicode-gen/utf8_gen.py |
| |
| And the most recent versions added to GCC are from glibc git commit: |
| 71de3aead9fffe89556e80ebc94aa918d8ee7bca |
| |
| The script gen_wcwidth.py found here contains the GCC-specific code to |
| map glibc's output to the lookup tables we require. This script should not need |
| to change, unless there are structural changes to the Unicode data files or to |
| the glibc code. Similarly, makeucnid.cc in ../../libcpp contains the logic to |
| produce ucnid.h. |
| |
| The procedure to update GCC's Unicode support is the following: |
| |
| 1. Update the six Unicode data files from the above URLs. |
| |
| 2. Update the two glibc files in from_glibc/ from glibc's git. Update |
| the commit number above in this README. |
| |
| 3. Run ./gen_wcwidth.py X.Y > ../../libcpp/generated_cpp_wcwidth.h |
| (where X.Y is the version of the Unicode standard corresponding to the |
| Unicode data files being used, most recently, 15.1.0). |
| |
| 4. Update Unicode Copyright years in libcpp/makeucnid.cc and in |
| libcpp/makeuname2c.cc up to the year in which the Unicode |
| standard has been released. |
| |
| 5. Compile makeucnid, e.g. with: |
| g++ -O2 ../../libcpp/makeucnid.cc -o ../../libcpp/makeucnid |
| |
| 6. Generate ucnid.h as follows: |
| ../../libcpp/makeucnid ../../libcpp/ucnid.tab UnicodeData.txt \ |
| DerivedNormalizationProps.txt DerivedCoreProperties.txt \ |
| > ../../libcpp/ucnid.h |
| |
| 7. Read the corresponding Unicode's standard and update correspondingly |
| generated_ranges table in libcpp/makeuname2c.cc (in Unicode 15 all |
| the needed information was in Table 4-8). |
| |
| 8. Compile makeuname2c, e.g. with: |
| g++ -O2 ../../libcpp/makeuname2c.cc -o ../../libcpp/makeuname2c |
| |
| 9: Generate uname2c.h as follows: |
| ../../libcpp/makeuname2c UnicodeData.txt NameAliases.txt \ |
| > ../../libcpp/uname2c.h |