Issue 40407: Zipfile couldn`t recognized character set rightly.

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/84587

classification

Title:	Zipfile couldn`t recognized character set rightly.
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 3.9

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	alanmcintyre, serhiy.storchaka, twouters, 김지훈
Priority:	normal	Keywords:

Created on 2020-04-27 15:17 by 김지훈, last changed 2022-04-11 14:59 by admin.

Messages (1)
msg367429 - (view)	Author: 김지훈 (김지훈)	Date: 2020-04-27 15:17
Hi, I am not a developer. However, when I inquired about an abnormality of an open source program before, it was said that there was a problem with the Zipfile module of Python. So I would like to ask it here. I`m a Korean, and a Windows user. And there are useful Windows compression programs in Korea. However, when using those compression programs, Debian's unzip utility finds character sets well, but fails to find in the case of python. If you look at the attached file, (File size is too large, so attach it elsewhere - https://kutt.it/2F2Xec) there are other compressed files in the compressed file. The names in the compressed file are the names of the compressed programs. And, as I have seen, the result of the basic compression is: 7zip : UTF-8 Alzip : UTF-8 BandiZip : EUC-KR BreadZip : EUC-KR PKZip : UTF-8 StarZip : EUC-KR WinRAR : UTF-8 WinZIP : EUC-KR Zipware : EUC-KR BandiZip and Alzip are the two programs that compete in Korea. I use BandiZip with few ads and this supports multi-core for compression. StarZip is also a Korean program, but its share is not high. BreadZip is also a Korean program, which has been used a lot, but has been discontinued and used only for some people. Anyway, it can be considered that compression softwares in Korea use both EUC-KR and UTF-8 formats. However, the Zipfile module does not recognize this properly.

msg367429 - (view)

Author: 김지훈 (김지훈)

Date: 2020-04-27 15:17

Hi,

I am not a developer.
However, when I inquired about an abnormality of an open source program before,
it was said that there was a problem with the Zipfile module of Python.
So I would like to ask it here.

I`m a Korean, and a Windows user.
And there are useful Windows compression programs in Korea.
However, when using those compression programs, Debian's unzip utility finds character sets well, but fails to find in the case of python.

If you look at the attached file,
(File size is too large, so attach it elsewhere - https://kutt.it/2F2Xec)
there are other compressed files in the compressed file.
The names in the compressed file are the names of the compressed programs.

And, as I have seen, the result of the basic compression is:
7zip : UTF-8
Alzip : UTF-8
BandiZip : EUC-KR
BreadZip : EUC-KR
PKZip : UTF-8
StarZip : EUC-KR
WinRAR : UTF-8
WinZIP : EUC-KR
Zipware : EUC-KR

BandiZip and Alzip are the two programs that compete in Korea.
I use BandiZip with few ads and this supports multi-core for compression.
StarZip is also a Korean program, but its share is not high.
BreadZip is also a Korean program, which has been used a lot, but has been discontinued and used only for some people.

Anyway, it can be considered that compression softwares in Korea use both EUC-KR and UTF-8 formats. However, the Zipfile module does not recognize this properly.

History
Date	User	Action	Args
2022-04-11 14:59:29	admin	set	github: 84587
2021-04-14 18:15:24	iritkatriel	set	nosy: + twouters, alanmcintyre, serhiy.storchaka
2020-09-20 21:53:48	iritkatriel	set	components: + Library (Lib), - 2to3 (2.x to 3.x conversion tool)
2020-04-27 15:17:58	김지훈	create