This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Tarfile.add with bytes path is failing
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.6, Python 3.5
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Patrik Dufresne, ezio.melotti, lars.gustaebel, martin.panter, r.david.murray, vstinner
Priority: normal Keywords:

Created on 2016-01-02 19:53 by Patrik Dufresne, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (7)
msg257355 - (view) Author: Patrik Dufresne (Patrik Dufresne) Date: 2016-01-02 19:53
With python 3.4, Tarfile doesn't properly support adding a files with bytes path. Only unicode is supported. It's failing with exception similar to:

    tar.add(os.path.join(dirpath, filename), filename)
  File "/usr/lib/python3.4/tarfile.py", line 1907, in add
    tarinfo = self.gettarinfo(name, arcname)
  File "/usr/lib/python3.4/tarfile.py", line 1767, in gettarinfo
    arcname = arcname.replace(os.sep, "/")
TypeError: expected bytes, bytearray or buffer compatible object

It uses os.sep, and u"/". Instead, it should use something like posixpath.py:_get_sep(path).
msg257356 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-01-02 20:01
See also issue 21996.
msg257357 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-01-02 20:03
Does using a surrogateescape encoded filename work?  (You won't get the error you report...my question is, does that do the right thing when building the archive?)
msg257381 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-02 22:39
Is the tarfile module designed to support bytes for file names in general? The documentation doesn’t seem to mention bytes anywhere relevant. This seems more like a new feature rather than a bug to me.
msg257386 - (view) Author: Patrik Dufresne (Patrik Dufresne) Date: 2016-01-02 23:39
> Is the tarfile module designed to support bytes for file names in general? The documentation doesn’t seem to mention bytes anywhere relevant. This seems more like a new feature rather than a bug to me.

I'm using bytes in Unix to represent a path. From `os.path` docs : The path parameters can be passed as either strings, or bytes. Applications are encouraged to represent file names as (Unicode) character strings. Unfortunately, some file names may not be representable as strings on Unix, so applications that need to support arbitrary file names on Unix should use bytes objects to represent path names. Vice versa, using bytes objects cannot represent all file names on Windows (in the standard mbcs encoding), hence Windows applications should use string objects to access all files.

As such, I'm expecting to use bytes to represent a path with tarfile.

Also, tar file format doesn't define any specific encoding for filename. I'me xpecting to but any kind of bytes data for a given filename... since this was wokring in tarfile with py2.

> Does using a surrogateescape encoded filename work?  (You won't get the error you report...my question is, does that do the right thing when building the archive?)

I will need to have further look into surrogateescape. I read somewhere it was an experimental feature, so I didn't try it.


Thanks both for your quick feedback in this holidays.
msg257388 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-03 00:16
It looks like surrogate-escaped bytes should be supported thanks to Issue 8390, although this is not so useful if you use the “pax” format (which always uses UTF-8 internally).

To generate a surrogate-escaped string, you can “decode” it with the following error handler:

>>> b"non-as\xA9ii".decode("ascii", "surrogateescape")
'non-as\udca9ii'
msg257422 - (view) Author: Patrik Dufresne (Patrik Dufresne) Date: 2016-01-03 15:33
It's a bit tricky, but with help of surrogateescape I get the expected result.

I'm closing this bug.

Thanks
History
Date User Action Args
2022-04-11 14:58:25adminsetgithub: 70185
2016-01-03 15:33:45Patrik Dufresnesetstatus: open -> closed

messages: + msg257422
2016-01-03 00:16:35martin.pantersetmessages: + msg257388
2016-01-02 23:39:36Patrik Dufresnesetmessages: + msg257386
2016-01-02 22:39:07martin.pantersetnosy: + martin.panter

messages: + msg257381
title: Tarfile.add with bytes path is failling -> Tarfile.add with bytes path is failing
2016-01-02 20:03:06r.david.murraysetmessages: + msg257357
2016-01-02 20:01:26r.david.murraysetnosy: + r.david.murray
messages: + msg257356
2016-01-02 19:54:40SilentGhostsetnosy: + lars.gustaebel

components: + Library (Lib), - Unicode
versions: + Python 3.5, Python 3.6, - Python 3.4
2016-01-02 19:53:25Patrik Dufresnecreate