Author ezio.melotti
Recipients ezio.melotti, gvanrossum, lemburg, loewis, mrabarnett, tchrist, terry.reedy
Date 2011-10-02.06:46:25
SpamBayes Score 5.55112e-16
Marked as misclassified No
Message-id <1317537986.58.0.797304932484.issue12753@psf.upfronthosting.co.za>
In-reply-to
Content
> The problem with official names is that they have things in them that 
> you are not expected in names.  Do you really and truly mean to tell 
> me you think it is somehow **good** that people are forced to write
>    \N{LINE FEED (LF)}
> Rather than the more obvious pair of 
>    \N{LINE FEED}
>    \N{LF}
> ??

Actually Python doesn't seem to support \N{LINE FEED (LF)}, most likely because that's a Unicode 1 name, and nowadays these codepoints are simply marked as '<control>'.

> If so, then I don't understand that.  Nobody in their right 
> mind prefers "\N{LINE FEED (LF)}" over "\N{LINE FEED}" -- do they?

They probably don't, but they just write \n anyway.  I don't think we need to support any of these aliases, especially if they are not defined in the Unicode standard.

I'm also not sure humans use \N{...}: you don't want to write
  'R\N{LATIN SMALL LETTER E WITH ACUTE}sum\N{LATIN SMALL LETTER E WITH ACUTE}'
and you would need to look up the exact name somewhere anyway before using it (unless you know them by heart).
If 'R\xe9sum\xe9' or 'R\u00e9sum\u00e9' are too obscure and/or magic, you can always print() them and get 'Résumé' (or just write 'Résumé' directly in the source).

> All of the standards documents *talk* about things like LRO and ZWNJ.
> I guess the standards aren't "readable" then, right? :)

Right, I had to read down till the table with the meanings before figuring out what they were (and I already forgot it).

> The most persuasive use-case for user-defined names is for private-use
> area code points.  These will never have an official name.  But it is
> just fine to use them.  Don't they deserve a better name, one that 
> makes sense within your own program that uses them?  Of course they do.
>
> For example, Apple has a bunch of private-use glyphs they use all the time.
> In the 8-bit MacRoman encoding, the byte 0xF0 represents the Apple corporate
> logo/glyph thingie of an apple with a bite taken out of it.  (Microsoft
> also has a bunch of these.)  If you upgrade MacRoman to Unicode, you will
> find that that 0xF0 maps to code point U+F8FF using the regular converter.
>
> Now what are you supposed to do in your program when you want a named character
> there?  You certainly do not want to make users put an opaque magic number
> as a Unicode escape.  That is always really lame, because the whole reason 
> we have \N{...} escapes is so we don't have to put mysterious unreadable magic
> numbers in our code!!
>
> So all you do is 
>    use charnames ":alias" => {
>        "APPLE LOGO" => 0xF8FF,
>    };
>
> and now you can use \N{APPLE LOGO} anywhere within that lexical scope.  The
> compiler will dutifully resolve it to U+F8FF, since all name lookups happen
> at compile-time.  And it cannot leak out of the scope.

This is actually a good use case for \N{..}.

One way to solve that problem is doing:
    apples = {
        'APPLE': '\uF8FF',
        'GREEN APPLE': '\U0001F34F',
        'RED APPLE': '\U0001F34E',
    }
and then:
   print('I like {GREEN APPLE} and {RED APPLE}, but not {APPLE}.'.format(**apples))

This requires the format call for each string and it's a workaround, but at least is readable (I hope you don't have too many apples in your strings).

I guess we could add some way to define a global list of names, and that would probably be enough for most applications.  Making it per-module would be more complicated and maybe not too elegant.

> People who write patterns without whitespace for cognitive chunking (plus
> comments for explanation) are wicked wicked wicked.  Frankly I'm surprised 
> Python doesn't require it. :)/2

I actually find those *less* readable.  If there's something fancy in the regex, a comment *before* it is welcomed, but having to read a regex divided on several lines and remove meaningless whitespace and redundant comments just makes the parsing more difficult for me.
History
Date User Action Args
2011-10-02 06:46:26ezio.melottisetrecipients: + ezio.melotti, lemburg, gvanrossum, loewis, terry.reedy, mrabarnett, tchrist
2011-10-02 06:46:26ezio.melottisetmessageid: <1317537986.58.0.797304932484.issue12753@psf.upfronthosting.co.za>
2011-10-02 06:46:26ezio.melottilinkissue12753 messages
2011-10-02 06:46:25ezio.melotticreate