This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lambacck
Recipients belopolsky, doerwalter, eric.araujo, ezio.melotti, georg.brandl, gpolo, lambacck, rhettinger, ron_adam, vstinner
Date 2010-12-20.22:38:29
SpamBayes Score 1.4070967e-12
Marked as misclassified No
Message-id <1292884712.27.0.318943127633.issue10087@psf.upfronthosting.co.za>
In-reply-to
Content
Sorry in advance for the long winded response.

Ron, have you looked at my patch?

The underlying issue is that the semantics for print() between Python 1 and 3. print() does not accept a bytes type in Python 3. In Python 2 str was a "bytes" type and so print happily sent encoded strings to stdout. 

This presents an issue for both --type=html and the text version if an encoding is asked for. Just using print() will result in repr being called on the byte string and you get either an invalid HTML file or a text file with extra junk in it (same junk in both).

If you ask for an encoding, you are going to get bytes. Changing it back into a string to mask that effect does not actually fix things for you because once you do print() you are back to a default encoding and therefore more broken because you are not doing what the user asked for (which is a particular encoding).

In order for:
    return str(''.join(v).encode(encoding, "xmlcharrefreplace"),
                encoding=encoding)

to solve the issue, you would also need to take away the ability for the user to specify an encoding (at the command line and via the API). It's already a string, why make it a byte and then a string again? If you don't want to deal with encoding, then return a string and leave it up to the consumer of the API to handle the desired encoding (and the "xmlcharrefreplace", maybe with a note in the docs).

If you do want to deal with encoding (which I think we are stuck with), then solve the real issue by not using print() (see my patch).

I think the only reason that my patch was not accepted, and why this is still languishing is that I said I would provide tests and have not had time to do so. 

Please feel free to correct me if I am wrong about any of the above.
History
Date User Action Args
2010-12-20 22:38:32lambaccksetrecipients: + lambacck, doerwalter, georg.brandl, rhettinger, belopolsky, vstinner, ron_adam, gpolo, ezio.melotti, eric.araujo
2010-12-20 22:38:32lambaccksetmessageid: <1292884712.27.0.318943127633.issue10087@psf.upfronthosting.co.za>
2010-12-20 22:38:29lambaccklinkissue10087 messages
2010-12-20 22:38:29lambacckcreate