Author terry.reedy
Recipients Arfrever, ezio.melotti, gvanrossum, jkloth, lemburg, mrabarnett, pitrou, r.david.murray, tchrist, terry.reedy, v+python, vstinner
Date 2011-09-08.18:56:10
SpamBayes Score 3.45256e-11
Marked as misclassified No
Message-id <4E690F86.60500@udel.edu>
In-reply-to <1315470765.01.0.974721986343.issue12729@psf.upfronthosting.co.za>
Content
On 9/8/2011 4:32 AM, Ezio Melotti wrote:
> So to summarize a bit, there are different possible level of strictness:
>    1) all the possible encodable values, including the ones>10FFFF;
>    2) values in range 0..10FFFF;
>    3) values in range 0..10FFFF except surrogates (aka scalar values);
>    4) values in range 0..10FFFF except surrogates and noncharacters;
>
> and this is what is currently available in Python:
>    1) not available, probably it will never be;
>    2) available through the 'surrogatepass' error handler;
>    3) default behavior (i.e. with the 'strict' error handler);
>    4) currently not available.
>
> Now, assume that we don't care about option 1 and want to implement the missing option 4 (which I'm still not 100% sure about).  The possible options are:
>    * add a new codec (actually one for each UTF encoding);
>    * add a new error handler that explicitly disallows noncharacters;
>    * change the meaning of 'strict' to match option 4;

If 'strict' meant option 4, then 'scalarpass' could mean option 3. 
'surrogatepass' would then mean 'pass surragates also, in addition to 
non-char scalers'.
History
Date User Action Args
2011-09-08 18:56:11terry.reedysetrecipients: + terry.reedy, lemburg, gvanrossum, pitrou, vstinner, jkloth, ezio.melotti, mrabarnett, Arfrever, v+python, r.david.murray, tchrist
2011-09-08 18:56:10terry.reedylinkissue12729 messages
2011-09-08 18:56:10terry.reedycreate