Message369819
Like I said above, it could be argued that the bug is in glibc, and then
https://p.sipsolutions.net/6a4e9fce82dbbfa0.txt
could be used as a simple LD_PRELOAD wrapper to work around this, just to illustrate the problem from that side.
Arguably, that makes glibc in violation of RFC 3629, since it says:
3. UTF-8 definition
[...]
In UTF-8, characters from the U+0000..U+10FFFF range (the UTF-16
accessible range) are encoded using sequences of 1 to 4 octets.
[...]
(hexadecimal) | (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
[...]
Implementations of the decoding algorithm above MUST protect against
decoding invalid sequences.
[...]
Here's a simple test program:
https://p.sipsolutions.net/ac091b4ea4b7f742.txt |
|
Date |
User |
Action |
Args |
2020-05-24 19:28:05 | jberg | set | recipients:
+ jberg, ncoghlan, SilentGhost, eryksun, Neui |
2020-05-24 19:28:05 | jberg | set | messageid: <1590348485.34.0.0647952861096.issue35883@roundup.psfhosted.org> |
2020-05-24 19:28:05 | jberg | link | issue35883 messages |
2020-05-24 19:28:05 | jberg | create | |
|