classification
Title: complex constructor doesn't accept string with nan and inf
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.1, Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: mark.dickinson Nosy List: cdavid, christian.heimes, mark.dickinson, rhettinger
Priority: low Keywords: patch

Created on 2008-02-15 08:15 by christian.heimes, last changed 2009-01-20 21:40 by mark.dickinson. This issue is now closed.

Files
File name Uploaded Description Edit
nan_parse.patch cdavid, 2008-12-31 11:34
Messages (21)
msg62422 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-02-15 08:15
This is a reminder.

The issue makes the complex(repr(...)) round-trip impossible when either
the real or imag part is nan or infinite.
msg63169 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-03-01 21:38
Signed zeros cause difficulties with round-tripping, too:

>>> z = complex(-0.0, -0.0); w = complex(repr(z))
>>> z
-0j
>>> w
-0j
>>> z.real
-0.0
>>> w.real
0.0
msg78592 - (view) Author: Cournapeau David (cdavid) Date: 2008-12-31 07:41
I started a patch against the trunk to handle nan/inf/infinite (I have
not yet tackled the problem of negative zero).

The patch is a bit big, because I found the function quite difficult to
follow, so I refactored it a bit first (replacing the state machine with
the big switch with a sequential parsing). One potential problem is that
I do some computation with inf to make signed infinite, I don't know if
it is safe. I don't know is setting a variable to Nan is safe either.

Although I tested the function in a simple main under valgrind for
various legit and bogus input, there is no tests in this patch. I would
like to add some, but I am not familiar with python testing, so I would
be glad to receive some indications on how to do it properly.
msg78593 - (view) Author: Cournapeau David (cdavid) Date: 2008-12-31 07:53
Of course, I notice two bugs just after sending the patch... New patch
to fix those two issues (no check for closing bracket if opening ones
are there and a bug when only imaginary part is given).
msg78594 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-12-31 09:05
-1 on this feature request.  IMO, it adds no value to any real world
applications and isn't worth the added code complexity, the on-going
maintenance burden, and the risk of introducing new bugs.
msg78595 - (view) Author: Cournapeau David (cdavid) Date: 2008-12-31 09:13
I disagree the feature is not needed, for several reasons:
 - complex(repr(..)) roundtrip should work, whatever the value of complex is
 - it is supported for float, so why not for complex ?
 - I believe on the contrary it solves a very real problem: incidently,
I got interested in this patch while working on numpy, and it is
certainly useful to be able to parse nan and inf (for example when we
create arrays from strings). Nan may be seen as non useful for so called
real usage of python, but for scientific people, it is a crucial to have
proper support of nan (which may mean missing data depending on the
context) and inf.
 - it does not add complexity: I would argue that independantly of
nan/inf support, my patch makes the function simpler to follow (no state
machine with done/sw_error/etc... state).
msg78596 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-12-31 10:05
> complex(repr(..)) roundtrip should work,

Nice-to-have but not a requirement that the entire input domain be
supported. 

> it is supported for float, so why not for complex ?

It made sense for floats because of prevalence of use cases and because
we wanted to match IEEE-754R as much as possible.  Unfortunately, the
support for Infs and NaNs has already negatively impacted the float
world by making the math module harder to maintain and test.  I would
not like to see that extended to cmath or complex() unless compelling
real-world use cases arise.

> ... for scientific people, it is a crucial to have proper support
> of nan (which may mean missing data depending on the context) and inf.

Mark, does Inf have a standard interpretation for complex numbers?  Do
all infinities meet or do they radiate, each with their own phase angle?

Also, do you want to stick with the 754 interpretation of NaNs as the
result of invalid operations or are you comfortable with the
MatLab/Octave notion of using NaNs to indicate missing values (something
they do as an optimization because it is the only way to flag a
non-numeric value in a floating point register or C double)?

> it does not add complexity: 

That depends on whether handling of NaNs and Infs creeps into cmath.  

Mainly, I'm just questioning whether there exist compelling use cases
for parsing NaNs and Infs in the context of complex numbers.
msg78597 - (view) Author: Cournapeau David (cdavid) Date: 2008-12-31 10:38
> Nice-to-have but not a requirement that the entire input domain be
> supported.

Ok.

> It made sense for floats because of prevalence of use cases and 
> because we wanted to match IEEE-754R as much as possible.

But why shouldn't this apply to complex numbers ? I am biased because I
mainly use python for scientific work, but complex numbers are not more
special than float in my mind.

> Unfortunately, the support for Infs and NaNs has already negatively 
> impacted the float world by making the math module harder to maintain
> and test.

Yes, it is difficult to handle nan and inf, there are a lot of corner
cases. But I fail to see how this applies here: my patch is essentially
a rewrote of the parsing, and the code to handle nan/inf is only 7
lines. This is independent of the handling of how nan/inf is handled by
math operations on it.

> Mark, does Inf have a standard interpretation for complex numbers?
> Do all infinities meet or do they radiate, each with their own phase >
angle?

Hm, not sure what you mean here by standard interpretation, but
infinities "do not meet", in the sense that (x,inf) and (inf,x) for
example can never been 'near' from each other (in the distance sense),
except if x is inf. It does not make more sense to talk about phase of
complex numbers with inf than for 0. But again, I don't see how this is
relevant to the proposed feature. 

> Mainly, I'm just questioning whether there exist compelling use cases
> for parsing NaNs and Infs in the context of complex numbers.

For any task where parsing complex makes sense. Since many numerical
operations may end up with nan or inf, this is relatively common I would
say.

It this can make the review easier, I can split the patch in two: first
the refactoring (+ tests once someone tells me how to), and then 
inf/nan handling (with additional tests).
msg78598 - (view) Author: Cournapeau David (cdavid) Date: 2008-12-31 11:34
Ok, I found out how to make tests, and I found some problems while using
this on numpy. A third version of the patch, with unit tests: all tests
in test_complex.py pass.
msg78824 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-01-02 15:57
[Raymond]
> I would
> not like to see that extended to cmath or complex() unless compelling
> real-world use cases arise.

Hmm.  Have you looked at the cmath module recently?  You may be in for a 
nasty surprise...

> Mark, does Inf have a standard interpretation for complex numbers?  Do
> all infinities meet or do they radiate, each with their own phase
> angle?

Shrug.  Mathematically, by far the most common and useful model is the 
complex plane plus a single extra point at infinity.  But when complexes 
are implemented as pairs of floats things look a little strange. Kahan, 
in his 'branch cuts' paper identifies 9 'different' complex infinities 
in this model, and C99 Annex G specifies exactly how functions should 
behave for various different infinities.

It's never really been clear to me how useful it is to be able to 
distinguish these infinities.  But the cmath module *does* make an 
effort to return the 'correct' (according to C99 Annex G) values for 
inputs with a special real or imaginary component.  On balance, I'd 
support making complex('nan + nan*j') do the right thing.

(Though I can't help feeling that the repr of complex(nan, nan)
should have been 'nan + nan*1j' rather than the current 'nan + nan*j'.)

> Also, do you want to stick with the 754 interpretation of NaNs as the
> result of invalid operations

I've never much liked the use of NaN to represent missing values, and
I don't think Python should embrace that usage.
msg78827 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-01-02 16:08
cdavid, in your application, how hard is it to work around the problem by 
simply printing and parsing pairs of floats rather than complex numbers?

E.g.,

<get real part and imaginary part from string> 
z = complex(float(real_part), float(imag_part))
msg78833 - (view) Author: Cournapeau David (cdavid) Date: 2009-01-02 16:26
It is not really for an application, but for numpy. Of course, one can
always get around the problem - but since this is really a limitation
which can be easily fixed, why not fixing the problem :) ?
msg79250 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-01-06 11:56
[Mark]
> inputs with a special real or imaginary component.  On balance, I'd 
> support making complex('nan + nan*j') do the right thing.

Having thought about this a bit more, I take this back.  Some reasons are 
given below.

[David]
> - complex(repr(..)) roundtrip should work, whatever the value of complex 
is

I disagree. *Why* do you think it should work?  It fails for many other 
types:

>>> def roundtrip(x): return type(x)(repr(x))
... 
>>> roundtrip("abc")
"'abc'"
>>> roundtrip([1,2,3])
['[', '1', ',', ' ', '2', ',', ' ', '3', ']']
>>> roundtrip((1,))
('(', '1', ',', ')')
>>> roundtrip(Fraction(1, 2))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in roundtrip
  File "/Users/dickinsm/python_source/trunk/Lib/fractions.py", line 73, in 
__new__
    raise ValueError('Invalid literal for Fraction: %r' % input)
ValueError: Invalid literal for Fraction: 'Fraction(1, 2)'
>>> roundtrip(Decimal(1))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in roundtrip
  File "/Users/dickinsm/python_source/trunk/Lib/decimal.py", line 545, in 
__new__
    "Invalid literal for Decimal: %r" % value)
  File "/Users/dickinsm/python_source/trunk/Lib/decimal.py", line 3721, in 
_raise_error
    raise error(explanation)
decimal.InvalidOperation: Invalid literal for Decimal: "Decimal('1')"


In general, type(repr(x)) == x only works for simple types (ints, floats), 
not compound types like lists, strings, tuples, ...  I think it's 
reasonable to regard complex as such a compound type.

There *is* a general Python guideline that eval(repr(x)) == x should work 
where possible.  Note that this fails even for floats:

>>> eval(repr(float('nan')))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
NameError: name 'nan' is not defined


The *real* problem here is that repr(complex) is a mess when it comes to 
niceties like signed zeros, infinities and nans:

>>> complex(-0., 0.)  # throws away sign of 0 on real part
0j
>>> eval(repr(complex(2., -0.)))  # throws away sign on imag. part
(2+0j)
>>> nan = float('nan')
>>> complex(0, nan)
nan*j
>>> nan*j  # can't even eval this...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'j' is not defined
>>> nan*1j # not much better: real part should be 0, not nan
(nan+nan*j)
>>> inf = float('inf')
>>> inf*1j  # ouch: nan in real part!
(nan+inf*j)


I think the *right* solution is to define repr of a complex number
z to be:

"complex(%r, %r)" % (z.real, z.imag)

but that's not possible before the mythical, compatibility-breaking, 
Python 4.0.  And even then your desired complex(repr(z)) roundtripping 
wouldn't work, though eval(repr(z)) would.

While we're daydreaming, another possibility is to follow C99's lead, and 
introduce a type of 'imaginary' numbers, and make 1j be an imaginary 
literal rather than a complex literal.  That solves some of the above 
eval(repr(.)) problems.  Again, I don't see how this could happen before 
4.0, and I'm not even sure that it's desirable.

In the circumstances, repr does the best it can, which is to output a 
whole expression which more-or-less describes the given complex number, 
and will usually eval back to the right thing.  I think it's not the 
business of the complex() constructor to parse such expressions.

To summarize:  it's a mess.  I'd recommend that you and/or numpy don't use 
repr for roundtrip-capable conversions to string.  Instead, output the 
real and imaginary parts separately, and use float-from-string and the 
complex-from-pair-of-floats constructors to reconstruct.
msg79251 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-01-06 12:05
Christian, any comments?  Is it okay to close this as a 'won't fix'?
msg79257 - (view) Author: Cournapeau David (cdavid) Date: 2009-01-06 13:16
> I disagree. *Why* do you think it should work?  It fails for many other 
> types:

I don't understand the rationale: why not making something work better
if possible ? Also, I don't understand the comparison with Decimal or
Fraction; there is a big difference between making the roundtrip not
possible at all for a type, and making it possible only for some values.

> I think the *right* solution is to define repr of a complex number
> z to be:

Why ? I don't understand those examples: maybe I am missing your
argument, but I understand it as we should not fix one bug because there
are another bugs related to inf and nan in complex. For me, something like:

a = complex(0)
a *= float('inf') * 1j
-> a = nan+nanj

Is really not desirable. That's actually a more serious bug than this
one. Are you saying that python does not care about inf/nan for complex
numbers ? If so, be it, but it would a bit unfortunate.
msg79261 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2009-01-06 13:30
Definitely! I don't see a reason the make the code more complex.

David, we are against the patch because we have to keep the code
maintainable. The code is already long and spreads over lots of lines.
Your use case isn't common enough to warrant even more complicated
parsing code.
msg79266 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-01-06 15:44
As Christian says, it's about not increasing code complexity without a 
good reason.  For me, it's also about mental complexity:  the set of valid 
inputs to the complex constructor should be small, and should be easy to 
understand and to describe.

The addition of the '*1j' part of the expression bothers me particularly,
as does allowing complex('j') to be valid.


I have a compromise proposal:

1. For nans and infs, get rid of the * on output:  change repr so that 
e.g., repr(complex(0, nan)) is simply 'nanj', and repr(complex(inf,-inf)) 
is '(inf-infj)'.  (Yes, I know it looks ugly, but bear with me.)

2. On input, allow strings of the form "complex-string" or "(complex-
string)" where (in pseudo BNF):

sign ::= '+' | '-'
real-part ::= float-string
imag-part ::= float-string 'j'
complex-string ::= real-part [sign imag-part] | imag-part

and float-string is any string currently accepted by float
(excluding leading or trailing whitespace).  Nothing else would be 
permitted.  (Well, okay, we still have to allow whitespace around the 
complex string, both inside and outside the parentheses.)

I think this would simplify the parsing, and remove need for special 
casing of nans and infs.  It might even allow the code to become simpler, 
by sharing some of it with stuff in floatobject.c.

The above would allow double signs:  e.g. '2+-1j', with the first sign 
being the one explicitly described above and the second being part of the 
float-string.  I don't think this is a terrible thing, if it simplifies 
parsing.  It might even be helpful.  For example: "%r+%rj" % (z.real, 
z.imag) would always be valid input.

The current 'nan*j' is certainly more attractive than 'nanj', but it's 
just too suggestive of the actual expression float('nan')*1j, which it 
turns out doesn't produce the same thing.

If you could come up with a patch that does something like this I'd take a 
look.


By the way, "0 + inf*1j" giving nan + inf*j isn't really a bug;  it's 
pretty much unavoidable:  1j is really complex(0, 1), and when multiplying 
by inf we get complex(inf*0, inf*1).  With the usual IEEE 754 semantics 
inf*0 is nan.

I think it's exactly to fix this sort of behaviour that C99 introduced its 
imaginary type (see Annex G of the standard).  It doesn't seem to have 
caught on, though:  I don't think gcc implements this, and I'd be very 
surprised if Visual Studio does.
msg79268 - (view) Author: Cournapeau David (cdavid) Date: 2009-01-06 16:05
@ Mark

Concerning float('inf') * 1j: you're right, my rambling did not make any
sense, sorry.

I agree that adding complexity may be a good reason to warrant an
arbitrary feature; actually, I did not manage to handle nan/inf at first
because of the complexity :) But as I mentioned in a previous comment, I
think my patch simplifies the parsing as well. I suggested to split the
patch in to: one whithout any new feature (no handling of nan/inf), and
then anoter handling nan/inf on the top of it. Would that be acceptable ?

Also, how should I proceed to share implementation between floatobject.c
and complexobject.c ? Would creating a new file for the common code be
acceptable ?
msg79272 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-01-06 16:28
David, at this point, I recommend dropping this one.  It has become a
time waster.  Possibly, there is a refactoring of the code that makes
parsing a bit simpler or shares some common code but it is not worth
destabilizing the battle-tested code and risking the introduction of
subtle new parsing errors.
msg79273 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-01-06 16:35
Mark, I'm somewhat uncomfortable with your proposal also.  It changes
what the parser will accept and the potential benefits are almost nil. 
Also, putting NaNs and Infs in complex numbers is probably not something
that should be pursued.  I see no utility in pairs like (NaN, 3.0) or
(-2.0, Inf).  The whole request is use case challenged.  AFAICT, the
current implementation suffice for real cases and there is no compelling
need to rearrange and redesign this code.
msg80284 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-01-20 21:40
Closing this one as won't fix.
History
Date User Action Args
2009-01-20 21:40:12mark.dickinsonsetstatus: open -> closed
resolution: wont fix
messages: + msg80284
2009-01-06 16:35:42rhettingersetmessages: + msg79273
2009-01-06 16:28:37rhettingersetpriority: normal -> low
messages: + msg79272
versions: + Python 3.1, Python 2.7, - Python 2.6, Python 3.0
2009-01-06 16:05:00cdavidsetmessages: + msg79268
2009-01-06 15:44:30mark.dickinsonsetmessages: + msg79266
2009-01-06 13:30:07christian.heimessetmessages: + msg79261
2009-01-06 13:16:41cdavidsetmessages: + msg79257
2009-01-06 12:05:01mark.dickinsonsetmessages: + msg79251
2009-01-06 11:56:28mark.dickinsonsetmessages: + msg79250
2009-01-02 16:26:11cdavidsetmessages: + msg78833
2009-01-02 16:08:18mark.dickinsonsetmessages: + msg78827
2009-01-02 15:57:20mark.dickinsonsetmessages: + msg78824
2008-12-31 11:35:12cdavidsetfiles: - nan_parse.patch
2008-12-31 11:35:08cdavidsetfiles: - nan_parse.patch
2008-12-31 11:34:56cdavidsetfiles: + nan_parse.patch
messages: + msg78598
2008-12-31 10:38:24cdavidsetmessages: + msg78597
2008-12-31 10:05:16rhettingersetassignee: mark.dickinson
messages: + msg78596
2008-12-31 09:14:01cdavidsetmessages: + msg78595
2008-12-31 09:05:25rhettingersettype: enhancement
messages: + msg78594
nosy: + rhettinger
2008-12-31 07:53:18cdavidsetfiles: + nan_parse.patch
messages: + msg78593
2008-12-31 07:41:56cdavidsetfiles: + nan_parse.patch
keywords: + patch
messages: + msg78592
nosy: + cdavid
2008-03-01 21:38:49mark.dickinsonsetnosy: + mark.dickinson
messages: + msg63169
2008-02-15 08:15:16christian.heimescreate