This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients ezio.melotti, gvanrossum, lemburg, loewis, mrabarnett, tchrist, terry.reedy
Date 2011-10-02.06:46:25
SpamBayes Score 5.55112e-16
Marked as misclassified No
Message-id <>
> The problem with official names is that they have things in them that 
> you are not expected in names.  Do you really and truly mean to tell 
> me you think it is somehow **good** that people are forced to write
>    \N{LINE FEED (LF)}
> Rather than the more obvious pair of 
>    \N{LINE FEED}
>    \N{LF}
> ??

Actually Python doesn't seem to support \N{LINE FEED (LF)}, most likely because that's a Unicode 1 name, and nowadays these codepoints are simply marked as '<control>'.

> If so, then I don't understand that.  Nobody in their right 
> mind prefers "\N{LINE FEED (LF)}" over "\N{LINE FEED}" -- do they?

They probably don't, but they just write \n anyway.  I don't think we need to support any of these aliases, especially if they are not defined in the Unicode standard.

I'm also not sure humans use \N{...}: you don't want to write
and you would need to look up the exact name somewhere anyway before using it (unless you know them by heart).
If 'R\xe9sum\xe9' or 'R\u00e9sum\u00e9' are too obscure and/or magic, you can always print() them and get 'Résumé' (or just write 'Résumé' directly in the source).

> All of the standards documents *talk* about things like LRO and ZWNJ.
> I guess the standards aren't "readable" then, right? :)

Right, I had to read down till the table with the meanings before figuring out what they were (and I already forgot it).

> The most persuasive use-case for user-defined names is for private-use
> area code points.  These will never have an official name.  But it is
> just fine to use them.  Don't they deserve a better name, one that 
> makes sense within your own program that uses them?  Of course they do.
> For example, Apple has a bunch of private-use glyphs they use all the time.
> In the 8-bit MacRoman encoding, the byte 0xF0 represents the Apple corporate
> logo/glyph thingie of an apple with a bite taken out of it.  (Microsoft
> also has a bunch of these.)  If you upgrade MacRoman to Unicode, you will
> find that that 0xF0 maps to code point U+F8FF using the regular converter.
> Now what are you supposed to do in your program when you want a named character
> there?  You certainly do not want to make users put an opaque magic number
> as a Unicode escape.  That is always really lame, because the whole reason 
> we have \N{...} escapes is so we don't have to put mysterious unreadable magic
> numbers in our code!!
> So all you do is 
>    use charnames ":alias" => {
>        "APPLE LOGO" => 0xF8FF,
>    };
> and now you can use \N{APPLE LOGO} anywhere within that lexical scope.  The
> compiler will dutifully resolve it to U+F8FF, since all name lookups happen
> at compile-time.  And it cannot leak out of the scope.

This is actually a good use case for \N{..}.

One way to solve that problem is doing:
    apples = {
        'APPLE': '\uF8FF',
        'GREEN APPLE': '\U0001F34F',
        'RED APPLE': '\U0001F34E',
and then:
   print('I like {GREEN APPLE} and {RED APPLE}, but not {APPLE}.'.format(**apples))

This requires the format call for each string and it's a workaround, but at least is readable (I hope you don't have too many apples in your strings).

I guess we could add some way to define a global list of names, and that would probably be enough for most applications.  Making it per-module would be more complicated and maybe not too elegant.

> People who write patterns without whitespace for cognitive chunking (plus
> comments for explanation) are wicked wicked wicked.  Frankly I'm surprised 
> Python doesn't require it. :)/2

I actually find those *less* readable.  If there's something fancy in the regex, a comment *before* it is welcomed, but having to read a regex divided on several lines and remove meaningless whitespace and redundant comments just makes the parsing more difficult for me.
Date User Action Args
2011-10-02 06:46:26ezio.melottisetrecipients: + ezio.melotti, lemburg, gvanrossum, loewis, terry.reedy, mrabarnett, tchrist
2011-10-02 06:46:26ezio.melottisetmessageid: <>
2011-10-02 06:46:26ezio.melottilinkissue12753 messages
2011-10-02 06:46:25ezio.melotticreate