Index: Doc/library/tarfile.rst =================================================================== --- Doc/library/tarfile.rst (révision 81870) +++ Doc/library/tarfile.rst (copie de travail) @@ -185,8 +185,8 @@ .. data:: ENCODING - The default character encoding i.e. the value from either - :func:`sys.getfilesystemencoding` or :func:`sys.getdefaultencoding`. + The default character encoding i.e. the value from + :func:`sys.getfilesystemencoding`. .. seealso:: @@ -218,7 +218,7 @@ .. versionadded:: 3.2 Added support for the context manager protocol. -.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0) +.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors=None, pax_headers=None, debug=0, errorlevel=0) All following arguments are optional and can be accessed as instance attributes as well. @@ -264,11 +264,12 @@ The *encoding* and *errors* arguments define the character encoding to be used for reading or writing the archive and how conversion errors are going - to be handled. The default settings will work for most users. - See section :ref:`tar-unicode` for in-depth information. + to be handled. The default settings will work for most users. See section + :ref:`tar-unicode` for in-depth information. .. versionchanged:: 3.2 - Use ``'surrogateescape'`` as the default for the *errors* argument. + Use ``'surrogateescape'`` as the default for the *errors* argument if + *encoding* is not ``'mbcs'``. The *pax_headers* argument is an optional dictionary of strings which will be added as a pax global header if *format* is :const:`PAX_FORMAT`. @@ -452,13 +453,15 @@ a :class:`TarInfo` object. -.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape') +.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors=None) - Create a string buffer from a :class:`TarInfo` object. For information on the - arguments see the constructor of the :class:`TarFile` class. + Create a string buffer from a :class:`TarInfo` object. For information on + the arguments see the constructor of the :class:`TarFile` class. See section + :ref:`tar-unicode` for the default *errors* value. .. versionchanged:: 3.2 - Use ``'surrogateescape'`` as the default for the *errors* argument. + Use ``'surrogateescape'`` as the default for the *errors* argument if + *encoding* is not ``'mbcs'``. A ``TarInfo`` object has the following public data attributes: @@ -708,9 +711,16 @@ The *errors* argument defines how characters are treated that cannot be converted. Possible values are listed in section :ref:`codec-base-classes`. -The default scheme is ``'surrogateescape'`` which Python also uses for its -file system calls, see :ref:`os-filenames`. +The default scheme depends on the *encoding*: + * If *encoding* is ``'mbcs'``, the default encoding on Windows: In read mode + the default scheme is ``'replace'``. This avoids unexpected + :exc:`UnicodeError` exceptions and guarantees that an archive can always be + read. In write mode the default value for *errors* is ``'strict'``. This + ensures that name information is not altered unnoticed. + * Otherwise: the default scheme ``'surrogateescape'`` which Python also uses + for its file system calls, see :ref:`os-filenames`. + In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed because all the metadata is stored using *UTF-8*. *encoding* is only used in the rare cases when binary pax headers are decoded or when strings with Index: Lib/tarfile.py =================================================================== --- Lib/tarfile.py (révision 81870) +++ Lib/tarfile.py (copie de travail) @@ -167,6 +167,15 @@ # Some useful functions #--------------------------------------------------------- +def choose_errors(encoding, mode): + if encoding == 'mbcs': + if mode == 'r': + return 'replace' + else: + return 'strict' + else: + return 'surrogateescape' + def stn(s, length, encoding, errors): """Convert a string to a null-terminated bytes object. """ @@ -981,11 +990,12 @@ return info - def tobuf(self, format=DEFAULT_FORMAT, encoding=ENCODING, errors="surrogateescape"): + def tobuf(self, format=DEFAULT_FORMAT, encoding=ENCODING, errors=None): """Return a tar header as a string of 512 byte blocks. """ info = self.get_info() - + if errors is None: + errors = choose_errors(encoding, 'w') if format == USTAR_FORMAT: return self.create_ustar_header(info, encoding, errors) elif format == GNU_FORMAT: @@ -1173,7 +1183,8 @@ if binary: # Try to restore the original byte representation of `value'. # Needless to say, that the encoding must match the string. - value = value.encode(encoding, "surrogateescape") + errors = choose_errors(encoding, 'w') + value = value.encode(encoding, errors) else: value = value.encode("utf8") @@ -1552,7 +1563,7 @@ def __init__(self, name=None, mode="r", fileobj=None, format=None, tarinfo=None, dereference=None, ignore_zeros=None, encoding=None, - errors="surrogateescape", pax_headers=None, debug=None, errorlevel=None): + errors=None, pax_headers=None, debug=None, errorlevel=None): """Open an (uncompressed) tar archive `name'. `mode' is either 'r' to read from an existing archive, 'a' to append data to an existing file or 'w' to create a new file overwriting an existing one. `mode' @@ -1593,7 +1604,10 @@ self.ignore_zeros = ignore_zeros if encoding is not None: self.encoding = encoding - self.errors = errors + if errors is not None: + self.errors = errors + else: + self.errors = choose_errors(self.encoding, 'w') if pax_headers is not None and self.format == PAX_FORMAT: self.pax_headers = pax_headers