Message407666
In file Lib/zipfile.py:
1357> flags = centdir[5]
1358> if flags & 0x800:
1359> # UTF-8 file names extension
1360> filename = filename.decode('utf-8')
1361> else:
1362> # Historical ZIP filename encoding
1363> filename = filename.decode('cp437')
ZipFile simply decodes all non-utf8 file names by encoding CP437.
In file Lib/zipfile.py:
352> # This is used to ensure paths in generated ZIP files always use
353> # forward slashes as the directory separator, as required by the
354> # ZIP format specification.
355> if os.sep != "/" and os.sep in filename:
356> filename = filename.replace(os.sep, "/")
And it replaces every '\\' with '/' on windows.
Consider we have a file named '\x97\x5c\x92\x9b', which is '予兆' in Japanese encoded in SHIFT_JIS.
You may have noticed the problem:
'\x5c' is '\\'(backslash) in ASCII
So you will see ZipFile decodes the bytes by CP437, and replaces all '\\' with '/'.
And the Japanese character '予' is replaced partially, it is no longer itself.
Someone says we can replace '/' with '\\' back, and decode it by CP437 to get the raw bytes.
But what if both '/'('\x2f') and '\\'('\x5c') appear in the raw filename?
Simply replacing '\\' in a bytestream without knowning the encoding is by no means a good way.
Maybe we can provide a rawname field in the ZipInfo struct? |
|
Date |
User |
Action |
Args |
2021-12-04 14:01:19 | accelerator0099 | set | recipients:
+ accelerator0099 |
2021-12-04 14:01:19 | accelerator0099 | set | messageid: <1638626479.32.0.427915201739.issue45981@roundup.psfhosted.org> |
2021-12-04 14:01:19 | accelerator0099 | link | issue45981 messages |
2021-12-04 14:01:19 | accelerator0099 | create | |
|