classification
Title: Python is missing alternative for common quoting character
Type: behavior Stage: resolved
Components: Unicode Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, keul, mrabarnett, r.david.murray
Priority: normal Keywords:

Created on 2012-07-16 16:04 by keul, last changed 2012-07-16 17:49 by pitrou. This issue is now closed.

Messages (3)
msg165630 - (view) Author: Luca Fabbri (keul) Date: 2012-07-16 16:04
Using the unicodedata.decomposition function on characters like \u201c and \u201d I didn't get back the classic quote character (").

This is a very common error when text is taken from Microsoft Word (where in italian language a couple of quoting character in a sentence like "foo" is automatically changed to “foo”).
msg165636 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-07-16 16:43
I don't understand why you would expect to get a ".  The unicode characters aren't "s.  As far as I can see (from, for example, http://codepoints.net/U+201C), Python is behaving as expected here.
msg165644 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2012-07-16 17:48
A codepoint such as "é" ("\N{LATIN SMALL LETTER E WITH ACUTE}") can be decomposed to "\u0065\u0301" ("\N{LATIN SMALL LETTER E}\N{COMBINING ACUTE ACCENT"), but "\u201c" ("\N{LEFT DOUBLE QUOTATION MARK}") and "\u201d" ("\N{RIGHT DOUBLE QUOTATION MARK}") cannot be decomposed.
History
Date User Action Args
2012-07-16 17:49:29pitrousetstatus: open -> closed
2012-07-16 17:48:26mrabarnettsetstatus: pending -> open
nosy: + mrabarnett
messages: + msg165644

2012-07-16 16:43:42r.david.murraysetstatus: open -> pending

nosy: + r.david.murray
messages: + msg165636

resolution: not a bug
stage: resolved
2012-07-16 16:04:22keulcreate