Issue 2121: complex constructor doesn't accept string with nan and inf

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/46374

classification

Title:	complex constructor doesn't accept string with nan and inf
Type:	enhancement	Stage:
Components:	Interpreter Core	Versions:	Python 3.1, Python 2.7

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:	mark.dickinson	Nosy List:	cdavid, christian.heimes, mark.dickinson, rhettinger
Priority:	low	Keywords:	patch

Created on 2008-02-15 08:15 by christian.heimes, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
nan_parse.patch	cdavid, 2008-12-31 11:34

Messages (21)
msg62422 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2008-02-15 08:15
This is a reminder. The issue makes the complex(repr(...)) round-trip impossible when either the real or imag part is nan or infinite.
msg63169 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-03-01 21:38
Signed zeros cause difficulties with round-tripping, too: >>> z = complex(-0.0, -0.0); w = complex(repr(z)) >>> z -0j >>> w -0j >>> z.real -0.0 >>> w.real 0.0
msg78592 - (view)	Author: Cournapeau David (cdavid)	Date: 2008-12-31 07:41
I started a patch against the trunk to handle nan/inf/infinite (I have not yet tackled the problem of negative zero). The patch is a bit big, because I found the function quite difficult to follow, so I refactored it a bit first (replacing the state machine with the big switch with a sequential parsing). One potential problem is that I do some computation with inf to make signed infinite, I don't know if it is safe. I don't know is setting a variable to Nan is safe either. Although I tested the function in a simple main under valgrind for various legit and bogus input, there is no tests in this patch. I would like to add some, but I am not familiar with python testing, so I would be glad to receive some indications on how to do it properly.
msg78593 - (view)	Author: Cournapeau David (cdavid)	Date: 2008-12-31 07:53
Of course, I notice two bugs just after sending the patch... New patch to fix those two issues (no check for closing bracket if opening ones are there and a bug when only imaginary part is given).
msg78594 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-12-31 09:05
-1 on this feature request. IMO, it adds no value to any real world applications and isn't worth the added code complexity, the on-going maintenance burden, and the risk of introducing new bugs.
msg78595 - (view)	Author: Cournapeau David (cdavid)	Date: 2008-12-31 09:13
I disagree the feature is not needed, for several reasons: - complex(repr(..)) roundtrip should work, whatever the value of complex is - it is supported for float, so why not for complex ? - I believe on the contrary it solves a very real problem: incidently, I got interested in this patch while working on numpy, and it is certainly useful to be able to parse nan and inf (for example when we create arrays from strings). Nan may be seen as non useful for so called real usage of python, but for scientific people, it is a crucial to have proper support of nan (which may mean missing data depending on the context) and inf. - it does not add complexity: I would argue that independantly of nan/inf support, my patch makes the function simpler to follow (no state machine with done/sw_error/etc... state).
msg78596 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-12-31 10:05
> complex(repr(..)) roundtrip should work, Nice-to-have but not a requirement that the entire input domain be supported. > it is supported for float, so why not for complex ? It made sense for floats because of prevalence of use cases and because we wanted to match IEEE-754R as much as possible. Unfortunately, the support for Infs and NaNs has already negatively impacted the float world by making the math module harder to maintain and test. I would not like to see that extended to cmath or complex() unless compelling real-world use cases arise. > ... for scientific people, it is a crucial to have proper support > of nan (which may mean missing data depending on the context) and inf. Mark, does Inf have a standard interpretation for complex numbers? Do all infinities meet or do they radiate, each with their own phase angle? Also, do you want to stick with the 754 interpretation of NaNs as the result of invalid operations or are you comfortable with the MatLab/Octave notion of using NaNs to indicate missing values (something they do as an optimization because it is the only way to flag a non-numeric value in a floating point register or C double)? > it does not add complexity: That depends on whether handling of NaNs and Infs creeps into cmath. Mainly, I'm just questioning whether there exist compelling use cases for parsing NaNs and Infs in the context of complex numbers.
msg78597 - (view)	Author: Cournapeau David (cdavid)	Date: 2008-12-31 10:38
> Nice-to-have but not a requirement that the entire input domain be > supported. Ok. > It made sense for floats because of prevalence of use cases and > because we wanted to match IEEE-754R as much as possible. But why shouldn't this apply to complex numbers ? I am biased because I mainly use python for scientific work, but complex numbers are not more special than float in my mind. > Unfortunately, the support for Infs and NaNs has already negatively > impacted the float world by making the math module harder to maintain > and test. Yes, it is difficult to handle nan and inf, there are a lot of corner cases. But I fail to see how this applies here: my patch is essentially a rewrote of the parsing, and the code to handle nan/inf is only 7 lines. This is independent of the handling of how nan/inf is handled by math operations on it. > Mark, does Inf have a standard interpretation for complex numbers? > Do all infinities meet or do they radiate, each with their own phase > angle? Hm, not sure what you mean here by standard interpretation, but infinities "do not meet", in the sense that (x,inf) and (inf,x) for example can never been 'near' from each other (in the distance sense), except if x is inf. It does not make more sense to talk about phase of complex numbers with inf than for 0. But again, I don't see how this is relevant to the proposed feature. > Mainly, I'm just questioning whether there exist compelling use cases > for parsing NaNs and Infs in the context of complex numbers. For any task where parsing complex makes sense. Since many numerical operations may end up with nan or inf, this is relatively common I would say. It this can make the review easier, I can split the patch in two: first the refactoring (+ tests once someone tells me how to), and then inf/nan handling (with additional tests).
msg78598 - (view)	Author: Cournapeau David (cdavid)	Date: 2008-12-31 11:34
Ok, I found out how to make tests, and I found some problems while using this on numpy. A third version of the patch, with unit tests: all tests in test_complex.py pass.
msg78824 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2009-01-02 15:57
[Raymond] > I would > not like to see that extended to cmath or complex() unless compelling > real-world use cases arise. Hmm. Have you looked at the cmath module recently? You may be in for a nasty surprise... > Mark, does Inf have a standard interpretation for complex numbers? Do > all infinities meet or do they radiate, each with their own phase > angle? Shrug. Mathematically, by far the most common and useful model is the complex plane plus a single extra point at infinity. But when complexes are implemented as pairs of floats things look a little strange. Kahan, in his 'branch cuts' paper identifies 9 'different' complex infinities in this model, and C99 Annex G specifies exactly how functions should behave for various different infinities. It's never really been clear to me how useful it is to be able to distinguish these infinities. But the cmath module does make an effort to return the 'correct' (according to C99 Annex G) values for inputs with a special real or imaginary component. On balance, I'd support making complex('nan + nanj') do the right thing. (Though I can't help feeling that the repr of complex(nan, nan) should have been 'nan + nan1j' rather than the current 'nan + nan*j'.) > Also, do you want to stick with the 754 interpretation of NaNs as the > result of invalid operations I've never much liked the use of NaN to represent missing values, and I don't think Python should embrace that usage.
msg78827 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2009-01-02 16:08
cdavid, in your application, how hard is it to work around the problem by simply printing and parsing pairs of floats rather than complex numbers? E.g., <get real part and imaginary part from string> z = complex(float(real_part), float(imag_part))
msg78833 - (view)	Author: Cournapeau David (cdavid)	Date: 2009-01-02 16:26
It is not really for an application, but for numpy. Of course, one can always get around the problem - but since this is really a limitation which can be easily fixed, why not fixing the problem :) ?
msg79250 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2009-01-06 11:56
[Mark] > inputs with a special real or imaginary component. On balance, I'd > support making complex('nan + nanj') do the right thing. Having thought about this a bit more, I take this back. Some reasons are given below. [David] > - complex(repr(..)) roundtrip should work, whatever the value of complex is I disagree. Why* do you think it should work? It fails for many other types: >>> def roundtrip(x): return type(x)(repr(x)) ... >>> roundtrip("abc") "'abc'" >>> roundtrip([1,2,3]) ['[', '1', ',', ' ', '2', ',', ' ', '3', ']'] >>> roundtrip((1,)) ('(', '1', ',', ')') >>> roundtrip(Fraction(1, 2)) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in roundtrip File "/Users/dickinsm/python_source/trunk/Lib/fractions.py", line 73, in __new__ raise ValueError('Invalid literal for Fraction: %r' % input) ValueError: Invalid literal for Fraction: 'Fraction(1, 2)' >>> roundtrip(Decimal(1)) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in roundtrip File "/Users/dickinsm/python_source/trunk/Lib/decimal.py", line 545, in __new__ "Invalid literal for Decimal: %r" % value) File "/Users/dickinsm/python_source/trunk/Lib/decimal.py", line 3721, in _raise_error raise error(explanation) decimal.InvalidOperation: Invalid literal for Decimal: "Decimal('1')" In general, type(repr(x)) == x only works for simple types (ints, floats), not compound types like lists, strings, tuples, ... I think it's reasonable to regard complex as such a compound type. There is a general Python guideline that eval(repr(x)) == x should work where possible. Note that this fails even for floats: >>> eval(repr(float('nan'))) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 1, in <module> NameError: name 'nan' is not defined The real problem here is that repr(complex) is a mess when it comes to niceties like signed zeros, infinities and nans: >>> complex(-0., 0.) # throws away sign of 0 on real part 0j >>> eval(repr(complex(2., -0.))) # throws away sign on imag. part (2+0j) >>> nan = float('nan') >>> complex(0, nan) nanj >>> nanj # can't even eval this... Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'j' is not defined >>> nan1j # not much better: real part should be 0, not nan (nan+nanj) >>> inf = float('inf') >>> inf1j # ouch: nan in real part! (nan+infj) I think the right solution is to define repr of a complex number z to be: "complex(%r, %r)" % (z.real, z.imag) but that's not possible before the mythical, compatibility-breaking, Python 4.0. And even then your desired complex(repr(z)) roundtripping wouldn't work, though eval(repr(z)) would. While we're daydreaming, another possibility is to follow C99's lead, and introduce a type of 'imaginary' numbers, and make 1j be an imaginary literal rather than a complex literal. That solves some of the above eval(repr(.)) problems. Again, I don't see how this could happen before 4.0, and I'm not even sure that it's desirable. In the circumstances, repr does the best it can, which is to output a whole expression which more-or-less describes the given complex number, and will usually eval back to the right thing. I think it's not the business of the complex() constructor to parse such expressions. To summarize: it's a mess. I'd recommend that you and/or numpy don't use repr for roundtrip-capable conversions to string. Instead, output the real and imaginary parts separately, and use float-from-string and the complex-from-pair-of-floats constructors to reconstruct.
msg79251 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2009-01-06 12:05
Christian, any comments? Is it okay to close this as a 'won't fix'?
msg79257 - (view)	Author: Cournapeau David (cdavid)	Date: 2009-01-06 13:16
> I disagree. Why do you think it should work? It fails for many other > types: I don't understand the rationale: why not making something work better if possible ? Also, I don't understand the comparison with Decimal or Fraction; there is a big difference between making the roundtrip not possible at all for a type, and making it possible only for some values. > I think the right solution is to define repr of a complex number > z to be: Why ? I don't understand those examples: maybe I am missing your argument, but I understand it as we should not fix one bug because there are another bugs related to inf and nan in complex. For me, something like: a = complex(0) a = float('inf') 1j -> a = nan+nanj Is really not desirable. That's actually a more serious bug than this one. Are you saying that python does not care about inf/nan for complex numbers ? If so, be it, but it would a bit unfortunate.
msg79261 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2009-01-06 13:30
Definitely! I don't see a reason the make the code more complex. David, we are against the patch because we have to keep the code maintainable. The code is already long and spreads over lots of lines. Your use case isn't common enough to warrant even more complicated parsing code.
msg79266 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2009-01-06 15:44
As Christian says, it's about not increasing code complexity without a good reason. For me, it's also about mental complexity: the set of valid inputs to the complex constructor should be small, and should be easy to understand and to describe. The addition of the '1j' part of the expression bothers me particularly, as does allowing complex('j') to be valid. I have a compromise proposal: 1. For nans and infs, get rid of the on output: change repr so that e.g., repr(complex(0, nan)) is simply 'nanj', and repr(complex(inf,-inf)) is '(inf-infj)'. (Yes, I know it looks ugly, but bear with me.) 2. On input, allow strings of the form "complex-string" or "(complex- string)" where (in pseudo BNF): sign ::= '+' \| '-' real-part ::= float-string imag-part ::= float-string 'j' complex-string ::= real-part [sign imag-part] \| imag-part and float-string is any string currently accepted by float (excluding leading or trailing whitespace). Nothing else would be permitted. (Well, okay, we still have to allow whitespace around the complex string, both inside and outside the parentheses.) I think this would simplify the parsing, and remove need for special casing of nans and infs. It might even allow the code to become simpler, by sharing some of it with stuff in floatobject.c. The above would allow double signs: e.g. '2+-1j', with the first sign being the one explicitly described above and the second being part of the float-string. I don't think this is a terrible thing, if it simplifies parsing. It might even be helpful. For example: "%r+%rj" % (z.real, z.imag) would always be valid input. The current 'nanj' is certainly more attractive than 'nanj', but it's just too suggestive of the actual expression float('nan')1j, which it turns out doesn't produce the same thing. If you could come up with a patch that does something like this I'd take a look. By the way, "0 + inf1j" giving nan + infj isn't really a bug; it's pretty much unavoidable: 1j is really complex(0, 1), and when multiplying by inf we get complex(inf0, inf1). With the usual IEEE 754 semantics inf*0 is nan. I think it's exactly to fix this sort of behaviour that C99 introduced its imaginary type (see Annex G of the standard). It doesn't seem to have caught on, though: I don't think gcc implements this, and I'd be very surprised if Visual Studio does.
msg79268 - (view)	Author: Cournapeau David (cdavid)	Date: 2009-01-06 16:05
@ Mark Concerning float('inf') * 1j: you're right, my rambling did not make any sense, sorry. I agree that adding complexity may be a good reason to warrant an arbitrary feature; actually, I did not manage to handle nan/inf at first because of the complexity :) But as I mentioned in a previous comment, I think my patch simplifies the parsing as well. I suggested to split the patch in to: one whithout any new feature (no handling of nan/inf), and then anoter handling nan/inf on the top of it. Would that be acceptable ? Also, how should I proceed to share implementation between floatobject.c and complexobject.c ? Would creating a new file for the common code be acceptable ?
msg79272 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2009-01-06 16:28
David, at this point, I recommend dropping this one. It has become a time waster. Possibly, there is a refactoring of the code that makes parsing a bit simpler or shares some common code but it is not worth destabilizing the battle-tested code and risking the introduction of subtle new parsing errors.
msg79273 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2009-01-06 16:35
Mark, I'm somewhat uncomfortable with your proposal also. It changes what the parser will accept and the potential benefits are almost nil. Also, putting NaNs and Infs in complex numbers is probably not something that should be pursued. I see no utility in pairs like (NaN, 3.0) or (-2.0, Inf). The whole request is use case challenged. AFAICT, the current implementation suffice for real cases and there is no compelling need to rearrange and redesign this code.
msg80284 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2009-01-20 21:40
Closing this one as won't fix.

History
Date	User	Action	Args
2022-04-11 14:56:30	admin	set	github: 46374
2009-01-20 21:40:12	mark.dickinson	set	status: open -> closed resolution: wont fix messages: + msg80284
2009-01-06 16:35:42	rhettinger	set	messages: + msg79273
2009-01-06 16:28:37	rhettinger	set	priority: normal -> low messages: + msg79272 versions: + Python 3.1, Python 2.7, - Python 2.6, Python 3.0
2009-01-06 16:05:00	cdavid	set	messages: + msg79268
2009-01-06 15:44:30	mark.dickinson	set	messages: + msg79266
2009-01-06 13:30:07	christian.heimes	set	messages: + msg79261
2009-01-06 13:16:41	cdavid	set	messages: + msg79257
2009-01-06 12:05:01	mark.dickinson	set	messages: + msg79251
2009-01-06 11:56:28	mark.dickinson	set	messages: + msg79250
2009-01-02 16:26:11	cdavid	set	messages: + msg78833
2009-01-02 16:08:18	mark.dickinson	set	messages: + msg78827
2009-01-02 15:57:20	mark.dickinson	set	messages: + msg78824
2008-12-31 11:35:12	cdavid	set	files: - nan_parse.patch
2008-12-31 11:35:08	cdavid	set	files: - nan_parse.patch
2008-12-31 11:34:56	cdavid	set	files: + nan_parse.patch messages: + msg78598
2008-12-31 10:38:24	cdavid	set	messages: + msg78597
2008-12-31 10:05:16	rhettinger	set	assignee: mark.dickinson messages: + msg78596
2008-12-31 09:14:01	cdavid	set	messages: + msg78595
2008-12-31 09:05:25	rhettinger	set	type: enhancement messages: + msg78594 nosy: + rhettinger
2008-12-31 07:53:18	cdavid	set	files: + nan_parse.patch messages: + msg78593
2008-12-31 07:41:56	cdavid	set	files: + nan_parse.patch keywords: + patch messages: + msg78592 nosy: + cdavid
2008-03-01 21:38:49	mark.dickinson	set	nosy: + mark.dickinson messages: + msg63169
2008-02-15 08:15:16	christian.heimes	create