This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients georg.brandl, larry, pitrou, sconseil, serhiy.storchaka, vstinner
Date 2013-05-07.12:06:06
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CAMpsgwagr_g7oJck_r9W990meZLZk9t5Gy-mQgG0GMV2tfS7EA@mail.gmail.com>
In-reply-to <1367923838.67.0.961197676851.issue17915@psf.upfronthosting.co.za>
Content
> Accepting of text streams in XMLGenerator should be deprecated in future versions.

I agree that the following pattern is strange:

with codecs.open('/tmp/test.txt', 'w', encoding='iso-8859-1') as f:
   xml = XMLGenerator(f, encoding='iso-8859-1')

Why would I specify a codec twice? What happens if I specify two
different codecs?

with codecs.open('/tmp/test.txt', 'w', encoding='utf-8') as f:
   xml = XMLGenerator(f, encoding='iso-8859-1')

It may be simpler (and safer?) to reject text files. If you cannot
detect that f is a text file, just make it explicit in the
documentation that f must be a binary file.

2013/5/7 Serhiy Storchaka <report@bugs.python.org>:
>
> Serhiy Storchaka added the comment:
>
> It is not working fine on Python 3.3.0.
>
>>>> with codecs.open('/tmp/test.txt', 'w', encoding='iso-8859-1') as f:
> ...     xml = XMLGenerator(f, encoding='iso-8859-1')
> ...     xml.startDocument()
> ...     xml.startElement('root', {'attr': u'\u20ac'})
> ...     xml.endElement('root')
> ...     xml.endDocument()
> ...
> Traceback (most recent call last):
>   File "<stdin>", line 4, in <module>
>   File "/home/serhiy/py/cpython-3.Lib/xml/sax/saxutils.py", line 141, in startElement
>     self._write(' %s=%s' % (name, quoteattr(value)))
>   File "/home/serhiy/py/cpython-3.Lib/xml/sax/saxutils.py", line 96, in _write
>     self._out.write(text)
>   File "/home/serhiy/py/cpython-3.Lib/codecs.py", line 699, in write
>     return self.writer.write(data)
>   File "/home/serhiy/py/cpython-3.Lib/codecs.py", line 355, in write
>     data, consumed = self.encode(object, self.errors)
> UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 7: ordinal not in range(256)
>
> And shouldn't. On Python 2 XMLGenerator works only with binary files and "works" with text files only due implicit str->unicode converting. On Python 3 working with binary files was broken. Issue1470548 restores working with binary file (for which only XMLGenerator can work correctly), but for backward compatibility accepting of text files was left. The problem is that there no trustworthy method to determine whenever a file-like object is binary or text.
>
> Accepting of text streams in XMLGenerator should be deprecated in future versions.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue17915>
> _______________________________________
History
Date User Action Args
2013-05-07 12:06:06vstinnersetrecipients: + vstinner, georg.brandl, pitrou, larry, serhiy.storchaka, sconseil
2013-05-07 12:06:06vstinnerlinkissue17915 messages
2013-05-07 12:06:06vstinnercreate