Message143735
On 9/8/2011 4:32 AM, Ezio Melotti wrote:
> So to summarize a bit, there are different possible level of strictness:
> 1) all the possible encodable values, including the ones>10FFFF;
> 2) values in range 0..10FFFF;
> 3) values in range 0..10FFFF except surrogates (aka scalar values);
> 4) values in range 0..10FFFF except surrogates and noncharacters;
>
> and this is what is currently available in Python:
> 1) not available, probably it will never be;
> 2) available through the 'surrogatepass' error handler;
> 3) default behavior (i.e. with the 'strict' error handler);
> 4) currently not available.
>
> Now, assume that we don't care about option 1 and want to implement the missing option 4 (which I'm still not 100% sure about). The possible options are:
> * add a new codec (actually one for each UTF encoding);
> * add a new error handler that explicitly disallows noncharacters;
> * change the meaning of 'strict' to match option 4;
If 'strict' meant option 4, then 'scalarpass' could mean option 3.
'surrogatepass' would then mean 'pass surragates also, in addition to
non-char scalers'. |
|
Date |
User |
Action |
Args |
2011-09-08 18:56:11 | terry.reedy | set | recipients:
+ terry.reedy, lemburg, gvanrossum, pitrou, vstinner, jkloth, ezio.melotti, mrabarnett, Arfrever, v+python, r.david.murray, tchrist |
2011-09-08 18:56:10 | terry.reedy | link | issue12729 messages |
2011-09-08 18:56:10 | terry.reedy | create | |
|