This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author wdoekes
Recipients edlm10, eric.smith, georg.brandl, loewis, pitrou, waveform, wdoekes
Date 2008-11-14.09:46:33
SpamBayes Score 2.7494673e-13
Marked as misclassified No
Message-id <1226655997.98.0.831190416791.issue1222@psf.upfronthosting.co.za>
In-reply-to
Content
@Antoine Pitrou

Regarding "# XXX is the trailing space a bug?": I'm inclined to believe
that it is. Man 7 locale does not mention that p_sep_by_space should be
used for non-int currency symbols, nor that it shouldn't. However, it
does say:

""" char *int_curr_symbol; /* First three chars are a currency symbol
from ISO 4217.  Fourth char is the separator. Fifth char is ’ ’. */ """

I haven't seen that fifth character, and in the libc sources I can't
find one either:

glibc-2.7/localedata/locales/nl_NL:66:int_curr_symbol
"<U0045><U0055><U0052><U0020>"

I do however see a separator.

In libc documentation I found here (
http://www.chemie.fu-berlin.de/chemnet/use/info/libc/libc_19.html#SEC325
), it says the following:

""" In other words, treat all nonzero values alike in these members.
These members apply only to currency_symbol. When you use
int_curr_symbol, you never print an additional space, because
int_curr_symbol itself contains the appropriate separator. The POSIX
standard says that these two members apply to the int_curr_symbol as
well as the currency_symbol. But an example in the ANSI C standard
clearly implies that they should apply only to the
currency_symbol---that the int_curr_symbol contains any appropriate
separator, so you should never print an additional space. Based on what
we know now, we recommend you ignore these members when printing
international currency symbols, and print no extra space. """

This is probably not right either, because this forces you to use an
n_sign_posn and p_sign_posn that have the symbol on the same side of the
value. (Although that might not be such an awful assumption.) And, more
importantly, a grep through the sources reveal that no language has a
preceding space. (But they do, I assume, have *_sign_posn's that want a
trailing symbol.)

""" glibc-2.7/localedata/locales$ grep ^int_curr_symbol * | grep -vE
'(<U0020>| )"' | wc -l
0 """

That leaves us with two more options. Option three: the fourth character
is optional and defines what the separator is but not _where_ it should
be. I.e. you might have to move it according to what *_sign_posn says.

And finally, option four: international formatting should ignore all of
*_cs_precedes, *_sep_by_space and *_sign_posn. Locale(7) explicitly
mentions currency_symbol, not int_cur_symbol. Perhaps one should assume
that international notation is universal. (I would guess that most
common is:
<int_curr_symbol><space><optional_minus><num><mon_thousands_sep><num><mon_decimal_point><num>)


Personally I would go with option three. It has the least impact on
formatting because it only cleans up spaces. I'm guessing however that
option four is the Right One.
History
Date User Action Args
2008-11-14 09:46:38wdoekessetrecipients: + wdoekes, loewis, georg.brandl, pitrou, eric.smith, edlm10, waveform
2008-11-14 09:46:37wdoekessetmessageid: <1226655997.98.0.831190416791.issue1222@psf.upfronthosting.co.za>
2008-11-14 09:46:36wdoekeslinkissue1222 messages
2008-11-14 09:46:33wdoekescreate