msg56200 - (view) |
Author: Éric de la Musse (edlm10) |
Date: 2007-09-30 17:17 |
locale.format function delete spaces in result which is a problem when
thousand separator is space (in French for example).
The problem seems in the code below (extract from locale.py):
145 while seps:
146 # If the number was formatted for a specific width, then it
147 # might have been filled with spaces to the left or right. If
148 # so, kill as much spaces as there where separators.
149 # Leading zeroes as fillers are not yet dealt with, as it is
150 # not clear how they should interact with grouping.
151 sp = result.find(" ")
152 if sp==-1:break
153 result = result[:sp]+result[sp+1:]
154 seps -= 1
Example :
>>> import locale
>>> locale.setlocale(locale.LC_NUMERIC)
'C'
>>> locale.setlocale(locale.LC_NUMERIC, "fr_FR.UTF-8")
'fr_FR.UTF-8'
>>> locale.format("%.2f", 12345.67, True)
'12345,67'
The correct result is '12 345,67' not '12345,67'
and if I call
>>> locale.format("%9.2f", 12345.67, True)
'12 345,67'
the result is correct
Is this behavior correct or a bug?
|
msg59829 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-01-12 17:18 |
Just tried on 2.5.1 and on trunk, I can't reproduce it. On both I get:
>>> import locale
>>> locale.setlocale(locale.LC_NUMERIC, "fr_FR.UTF-8")
'fr_FR.UTF-8'
>>> locale.format("%9.2f", 12345.67, True)
' 12345,67'
>>> locale.format("%.2f", 12345.67, True)
'12345,67'
>>> locale.localeconv()['thousands_sep']
''
Which means that there is no thousands separator for the French locale
on my system (a Mandriva Linux box).
|
msg60084 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-01-18 10:49 |
Ok, I manage to reproduce it on an Ubuntu box with both 2.5.2a0 and SVN
trunk.
IMO it's a bug since the expected behaviour would be to enforce the
locale settings for number formatting. That's the whole purpose of
locale.format() after all. (no to mention the discrepancy between %.2f
and %9.2f)
|
msg60093 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-01-18 14:18 |
Eric, your diagnostic looks right, format() gets confused when it tries
to remove padding characters to account for the added thousands
separators. It does not check that there were padding characters in the
first place, and it assumes that the thousands separator is not a space
character itself.
Since Georg Brandl and Martin von Loewis are listed as co-authors, I add
them to the cc list ;)
|
msg60136 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2008-01-19 09:52 |
The space removal was there before my change; assigning to Martin.
|
msg61929 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-01-31 21:16 |
Here is a patch. It is fairly big and I'm sorry about it, here is why:
1. it reworks the test suite using unittest, and makes it more comprehensive
2. it adds a feature in locale.py which allows overriding the localeconv
dictionary (this is necessary for proper testing)
3. it fixes the present bug, and also fixes integer formatting with
grouping and padding (which was broken, as well as untested)
Point 2 above may be controversial, it is just a dictionary (by default
empty) named _override_localeconv, if you prefer we can make it a real
API, e.g. _set_localeconv().
|
msg61956 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2008-02-01 10:55 |
Looks good to me.
|
msg65558 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-04-16 18:20 |
Here is an updated patch against trunk.
|
msg65582 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-04-17 10:12 |
I see that http://wiki.python.org/moin/PythonBugDay says "need to test
on non-MacOS platform". Actually, I've tested the patch under Mandriva
and Debian Linux.
|
msg65912 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-04-28 12:00 |
I've had the opportunity to test on a Windows box and there are various
failures in the TestStringMethods test case. If someone with more
knowledge of the Windows world could take a lookm it would be nice.
|
msg65914 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-04-28 12:30 |
Ok, here is an updated patch for Windows compatibility of the test suite.
|
msg70173 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-07-23 14:57 |
Would someone object to committing this before beta3?
For clarity I would first commit the rewrite of test_locale to use
unittest, and then the fix for the thousands separator bug.
|
msg70272 - (view) |
Author: Eric V. Smith (eric.smith) *  |
Date: 2008-07-25 20:01 |
I'd recommend breaking the patch into the 3 parts mentioned in msg61929,
then committing the first 2 parts. That should make the 3rd part much
easier to evaluate.
|
msg70274 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-07-25 20:55 |
The first 2 parts have been committed in r65237. I'll soon provide a
patch for the third part.
|
msg70275 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-07-25 21:02 |
Here is the patch for the third part.
|
msg75854 - (view) |
Author: Walter Doekes (wdoekes) |
Date: 2008-11-14 09:46 |
@Antoine Pitrou
Regarding "# XXX is the trailing space a bug?": I'm inclined to believe
that it is. Man 7 locale does not mention that p_sep_by_space should be
used for non-int currency symbols, nor that it shouldn't. However, it
does say:
""" char *int_curr_symbol; /* First three chars are a currency symbol
from ISO 4217. Fourth char is the separator. Fifth char is ’ ’. */ """
I haven't seen that fifth character, and in the libc sources I can't
find one either:
glibc-2.7/localedata/locales/nl_NL:66:int_curr_symbol
"<U0045><U0055><U0052><U0020>"
I do however see a separator.
In libc documentation I found here (
http://www.chemie.fu-berlin.de/chemnet/use/info/libc/libc_19.html#SEC325
), it says the following:
""" In other words, treat all nonzero values alike in these members.
These members apply only to currency_symbol. When you use
int_curr_symbol, you never print an additional space, because
int_curr_symbol itself contains the appropriate separator. The POSIX
standard says that these two members apply to the int_curr_symbol as
well as the currency_symbol. But an example in the ANSI C standard
clearly implies that they should apply only to the
currency_symbol---that the int_curr_symbol contains any appropriate
separator, so you should never print an additional space. Based on what
we know now, we recommend you ignore these members when printing
international currency symbols, and print no extra space. """
This is probably not right either, because this forces you to use an
n_sign_posn and p_sign_posn that have the symbol on the same side of the
value. (Although that might not be such an awful assumption.) And, more
importantly, a grep through the sources reveal that no language has a
preceding space. (But they do, I assume, have *_sign_posn's that want a
trailing symbol.)
""" glibc-2.7/localedata/locales$ grep ^int_curr_symbol * | grep -vE
'(<U0020>| )"' | wc -l
0 """
That leaves us with two more options. Option three: the fourth character
is optional and defines what the separator is but not _where_ it should
be. I.e. you might have to move it according to what *_sign_posn says.
And finally, option four: international formatting should ignore all of
*_cs_precedes, *_sep_by_space and *_sign_posn. Locale(7) explicitly
mentions currency_symbol, not int_cur_symbol. Perhaps one should assume
that international notation is universal. (I would guess that most
common is:
<int_curr_symbol><space><optional_minus><num><mon_thousands_sep><num><mon_decimal_point><num>)
Personally I would go with option three. It has the least impact on
formatting because it only cleans up spaces. I'm guessing however that
option four is the Right One.
|
msg83549 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2009-03-14 00:16 |
Committed in r70356, r70357, r70358, r70359. I didn't try to change
anything about the trailing space since it's minor anyway.
|
msg83753 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2009-03-18 15:50 |
r70357 seems to have caused the sparc solaris py3k buildbot to start failing on
test_locale:
======================================================================
FAIL: test_integer_grouping_and_padding (test.test_locale.TestNumberFormatting)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test_locale.py", line 181,
in test_integer_grouping_and_padding
out=('4%s200' % self.sep).rjust(10))
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test_locale.py", line 143,
in _test_format
func=locale.format, **format_opts)
File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test_locale.py", line 139,
in _test_formatfunc
func(format, value, **format_opts), out)
AssertionError: ' 4200' != ' 4200'
Any idea what's going on here?
|
msg83757 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2009-03-18 17:02 |
Hi Mark,
> AssertionError: ' 4200' != ' 4200'
>
> Any idea what's going on here?
The problem seems to be that the thousands separator on the Solaris
variant of en_US is an empty string (rather than a comma) (*), and
apparently it hits a bug in the padding mechanism (which perhaps assumes
that the thousands separator is always a 1-character string).
(*) (another reason not to use any C locale-based mechanism for
localization, by the way...)
|
msg83758 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2009-03-18 17:05 |
By the way, Martin, do you have any idea why the Solaris buildbot
doesn't seem to be enabled on trunk?
|
msg83761 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2009-03-18 17:11 |
Trying a fix in r70458.
|
msg83769 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2009-03-18 18:57 |
It looks like that did the trick. Thank you!
|
msg83779 - (view) |
Author: Eric V. Smith (eric.smith) *  |
Date: 2009-03-18 21:29 |
Antoine Pitrou wrote:
> The problem seems to be that the thousands separator on the Solaris
> variant of en_US is an empty string (rather than a comma) (*), and
> apparently it hits a bug in the padding mechanism (which perhaps assumes
> that the thousands separator is always a 1-character string).
>
> (*) (another reason not to use any C locale-based mechanism for
> localization, by the way...)
I've come to believe this, too. I'm working on cleaning up the C
implementations so I can do all of the locale-based formating without
using the locale functions. I'll use the localeconv values, but that's it.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:27 | admin | set | github: 45563 |
2009-03-18 21:30:00 | eric.smith | set | messages:
+ msg83779 title: locale.format bug if thousand separator is space (french separator as example) -> locale.format bug if thousand separator is space (french separator as example) |
2009-03-18 18:57:53 | mark.dickinson | set | messages:
+ msg83769 |
2009-03-18 17:11:42 | pitrou | set | messages:
+ msg83761 |
2009-03-18 17:05:27 | pitrou | set | messages:
+ msg83758 |
2009-03-18 17:02:31 | pitrou | set | messages:
+ msg83757 |
2009-03-18 15:50:26 | mark.dickinson | set | nosy:
+ mark.dickinson messages:
+ msg83753
|
2009-03-14 00:16:26 | pitrou | set | status: open -> closed
messages:
+ msg83549 resolution: fixed |
2008-11-14 09:46:36 | wdoekes | set | messages:
+ msg75854 |
2008-11-13 13:49:11 | wdoekes | set | nosy:
+ wdoekes |
2008-07-25 21:02:50 | pitrou | set | files:
- 1222.patch |
2008-07-25 21:02:46 | pitrou | set | files:
- bug1222.patch |
2008-07-25 21:02:41 | pitrou | set | files:
+ thousands_sep.patch messages:
+ msg70275 |
2008-07-25 20:55:45 | pitrou | set | messages:
+ msg70274 |
2008-07-25 20:01:50 | eric.smith | set | messages:
+ msg70272 |
2008-07-23 14:57:28 | pitrou | set | priority: normal messages:
+ msg70173 |
2008-04-28 12:30:33 | pitrou | set | files:
+ bug1222.patch messages:
+ msg65914 |
2008-04-28 12:01:02 | pitrou | set | messages:
+ msg65912 |
2008-04-23 16:33:51 | eric.smith | set | nosy:
+ eric.smith |
2008-04-17 10:12:33 | pitrou | set | messages:
+ msg65582 |
2008-04-16 18:20:52 | pitrou | set | files:
+ 1222.patch keywords:
+ patch messages:
+ msg65558 |
2008-04-16 18:20:06 | pitrou | set | files:
- 1222.patch |
2008-02-01 10:55:32 | georg.brandl | set | messages:
+ msg61956 |
2008-01-31 21:16:44 | pitrou | set | files:
+ 1222.patch messages:
+ msg61929 |
2008-01-19 09:52:59 | georg.brandl | set | assignee: loewis messages:
+ msg60136 |
2008-01-18 14:18:34 | pitrou | set | nosy:
+ loewis, georg.brandl messages:
+ msg60093 |
2008-01-18 10:49:30 | pitrou | set | versions:
+ Python 2.6, - Python 2.4 |
2008-01-18 10:49:10 | pitrou | set | messages:
+ msg60084 |
2008-01-12 17:18:20 | pitrou | set | nosy:
+ pitrou messages:
+ msg59829 |
2007-11-24 21:34:12 | waveform | set | nosy:
+ waveform |
2007-09-30 17:17:48 | edlm10 | set | nosy:
+ edlm10 messages:
+ msg56200 |
2007-09-30 14:15:46 | edlm10 | create | |