classification
Title: Python 3's shutil.make_archive is truncating filenames
Type: behavior Stage: resolved
Components: Versions: Python 3.4
process
Status: closed Resolution: duplicate
Dependencies: Superseder: tarfile.py: fix GNU and USTAR formats to properly handle paths with special characters that are encoded with more than one byte each
View: 24838
Assigned To: Nosy List: Dan “locallycompact” Firth, Decorater, lars.gustaebel, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2016-11-29 11:50 by Dan “locallycompact” Firth, last changed 2016-11-29 12:04 by vstinner. This issue is now closed.

Messages (6)
msg281981 - (view) Author: Dan “locallycompact” Firth (Dan “locallycompact” Firth) Date: 2016-11-29 11:50
I have made an example of this bug here: https://github.com/locallycompact/py3_make_archive_bug

Using shutil.make_archive() in python2 works fine, where as in python3 one of the filenames has been truncated by five characters after unpacking the resulting archive. This is the same every time.
msg281982 - (view) Author: Decorater (Decorater) * Date: 2016-11-29 11:58
hmm
This shows a bug in shutil.make_archive in python3.

Run ./test.sh to run shutil.make_archive in both python2 and python3
on the wdir and then extract each of them for comparison. The file called

'/usr/share/ca-certificates/mozilla/TÜBİTAK_UEKAE_Kök_Sertifika_Hizmet_Sağlayıcısı_-_Sürüm_3.crt'

has been truncated by five characters in the python3 archive, becoming:

'/usr/share/ca-certificates/mozilla/TÜBİTAK_UEKAE_Kök_Sertifika_Hizmet_Sağlayıcısı_-_Sürüm_'

Something makes me wonder if it counts the # of chars in the file name specified and strips anything larger.
msg281983 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-29 12:02
This looks as a duplicate of issue24838.
msg281984 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-29 12:02
An entry in a TAR archive has a name. The name field has a size of 100 bytes. The field is padded with zero bytes. I don't know if it must or must not end with a zero byte.

'/usr/share/ca-certificates/mozilla/TÜBİTAK_UEKAE_Kök_Sertifika_Hizmet_Sağlayıcısı_-_Sürüm_3.crt' string encoded to UTF-8 takes 104 bytes.

Python should emit a warning or even fail with an error if a name is longer than 100 *bytes* (not 100 *characters*).
msg281985 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-29 12:02
Oh, Serhiy just closed the issue as a duplicate.
msg281987 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-29 12:04
FYI the first release including the fix 78ede2baa146 is Python 3.5.2.
History
Date User Action Args
2016-11-29 12:04:16vstinnersetmessages: + msg281987
2016-11-29 12:02:44vstinnersetmessages: + msg281985
2016-11-29 12:02:29vstinnersetnosy: + vstinner, lars.gustaebel
messages: + msg281984
2016-11-29 12:02:10serhiy.storchakasetstatus: open -> closed

superseder: tarfile.py: fix GNU and USTAR formats to properly handle paths with special characters that are encoded with more than one byte each

nosy: + serhiy.storchaka
messages: + msg281983
resolution: duplicate
stage: resolved
2016-11-29 11:58:26Decoratersetnosy: + Decorater
messages: + msg281982
2016-11-29 11:50:38Dan “locallycompact” Firthcreate