Message273999
New patch attached (1602_2.patch - hopefully the review will work this time too).
I discovered while researching for the PEP that a decent amount of code expects to be able to write ASCII to sys.stdout.buffer (or sys.stdout.buffer.raw). As my first patch required utf-16-le at this point, it was going to cause havoc.
Rather than break that compatibility, I decided that exposing utf-8 and doing the reencoding at the latest possible stage was better. This is also more consistent with how other encoding issues are likely to be resolved, and shouldn't be any less performant, given that previously we were decoding to utf-16 anyway.
The downsides of this is that read(n) now can only read up to n/4 characters, and write(n) has a much more complicated time dealing with large buffers (as we need to cap the number of utf-16-le bytes but return the number of utf-8 bytes - it's not a direct relationship, so there's more work and a little bit of guessing in some cases).
On the upside, the readline handling is simpler as utf-8 is compatible with the existing interface and now sys.stdin.encoding is accurate. I've rolled that fix into this patch (just the myreadline.c change) as they really ought to go in together. |
|
Date |
User |
Action |
Args |
2016-08-31 04:28:35 | steve.dower | set | recipients:
+ steve.dower, lemburg, mhammond, terry.reedy, paul.moore, tzot, amaury.forgeotdarc, ncoghlan, pitrou, giampaolo.rodola, tim.golden, mark, ned.deily, christoph, ezio.melotti, v+python, hippietrail, ssbarnea, flox, davidsarah, santoso.wijaya, akira, David.Sankel, smerlin, lilydjwg, martin.panter, piotr.dobrogost, Drekin, wiz21, stijn, Jonitis, gurnec, escapewindow, dead1ne |
2016-08-31 04:28:35 | steve.dower | set | messageid: <1472617715.13.0.291575858185.issue1602@psf.upfronthosting.co.za> |
2016-08-31 04:28:35 | steve.dower | link | issue1602 messages |
2016-08-31 04:28:34 | steve.dower | create | |
|