This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: configparser doesn't support specifying encoding in read()
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.2
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: brian.curtin, eric.smith, ezio.melotti, georg.brandl, lukasz.langa, michael.foord
Priority: normal Keywords: patch

Created on 2010-07-29 08:08 by lukasz.langa, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
cfgparser.3 lukasz.langa, 2010-07-29 08:08 A data file required for the new unit test
issue9411.diff lukasz.langa, 2010-07-29 08:10 Patch that introduces encoding= argument to read() and a unit test
Messages (5)
msg111899 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2010-07-29 08:08
By default, configparser classes simply `open()` and `read()` files specified in the list passed to `.read()`. This means these calls use the default platform-specific encoding and this is prone to breakage.

An existing solution is to use `readfp()` and pass files one by one to it, handling opening them with a specific encoding manually. This is needlessly complex as it increases the amount of boilerplate needed.

Please find attached a patch where I've added an `encoding=` argument to the `read()` method. By default it chooses `sys.getdefaultencoding()` so the behaviour is backwards compatible. We might consider switching that to 'UTF-8' but there are many INI files from the Windows land encoded in Windows specific codepages.

Anyway, the currently proposed implementation is compatible and enables specifying an `encoding` explicitly. The patch includes a new unit test and some minor fixes for behaviour exposed by this test.
msg111900 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2010-07-29 08:10
Patch included.
msg111903 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2010-07-29 09:28
Seems good to me (the feature - and also the patch after a cursory read through).
msg111907 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-07-29 10:10
The feature request seems reasonable to me, too.

I don't recall if sys.getdefaultencoding() can change while a program is running. If so, you might want to change:

def read(self, filenames, encoding=sys.getdefaultencoding()):

to:

def read(self, filenames, encoding=None):
    if encoding is None:
        encoding=sys.getdefaultencoding()

Also, what if the files have different encodings?

The patch seems to include 2 features: the encoding change and a comment parsing change. You should separate that into two patches.
msg111916 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-07-29 12:18
Applied in r83220.  The **kwds change for some methods should be done separately.
History
Date User Action Args
2022-04-11 14:57:04adminsetgithub: 53657
2010-07-29 12:18:07georg.brandlsetstatus: open -> closed
resolution: accepted
messages: + msg111916
2010-07-29 10:10:19eric.smithsetnosy: + eric.smith
messages: + msg111907
2010-07-29 09:28:29michael.foordsetmessages: + msg111903
2010-07-29 08:10:10lukasz.langasetfiles: + issue9411.diff

type: enhancement
components: + Library (Lib)
versions: + Python 3.2
keywords: + patch
nosy: georg.brandl, ezio.melotti, michael.foord, brian.curtin, lukasz.langa
messages: + msg111900
2010-07-29 08:08:17lukasz.langacreate