This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author G..Scott.Johnston
Recipients G..Scott.Johnston
Date 2013-08-28.06:28:30
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1377671311.74.0.613410259035.issue18863@psf.upfronthosting.co.za>
In-reply-to
Content
I've come up with the following series of minimal examples to demonstrate my bug. 


>>> unicode("")
u''
>>> unicode("", errors="ignore")
u''


>>> unicode("abcü")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
>>> unicode("abcü", errors="ignore")
u'abc'


>>> unicode(3)
u'3'
>>> unicode(3, errors="ignore")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: coercing to Unicode: need string or buffer, int found


>>> unicode(unicode(""))
u''
>>> unicode(unicode(""), errors="ignore")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: decoding Unicode is not supported


The first two pairs of mini-programs are reasonable behaviour.  If the errors parameter is set to "ignore", no additional errors are thrown, but characters that produce encoding errors are skipped in the output, as expected.  

The third pair of mini-programs can be solved by instead writing unicode(str(3), errors="ignore").  This should likely be done automatically, given the fact that unicode(3) behaves as expected, and properly converts between types.  The fact that the conversion is done automatically without the errors parameter leads me to believe that there is a logic problem with the code, where the setting errors="ignore" changes the path of execution by more than just skipping characters that cause encoding errors.

The fourth pair of mini-programs is simply baffling.  The first mini-program clearly demonstrates that decoding a Unicode object is in fact supported.  The fact that the second mini-program claims it's not supported further demonstrates that the logic depends on the errors="ignore" parameter more than it should.
History
Date User Action Args
2013-08-28 06:28:31G..Scott.Johnstonsetrecipients: + G..Scott.Johnston
2013-08-28 06:28:31G..Scott.Johnstonsetmessageid: <1377671311.74.0.613410259035.issue18863@psf.upfronthosting.co.za>
2013-08-28 06:28:31G..Scott.Johnstonlinkissue18863 messages
2013-08-28 06:28:30G..Scott.Johnstoncreate