Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

email package and Unicode strings handling #43960

Closed
manlioperillo mannequin opened this issue Sep 10, 2006 · 4 comments
Closed

email package and Unicode strings handling #43960

manlioperillo mannequin opened this issue Sep 10, 2006 · 4 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@manlioperillo
Copy link
Mannequin

manlioperillo mannequin commented Sep 10, 2006

BPO 1555842
Nosy @warsaw, @devdanzin, @bitdancer

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/bitdancer'
closed_at = <Date 2010-06-02.02:18:53.682>
created_at = <Date 2006-09-10.16:04:26.000>
labels = ['type-bug', 'library']
title = 'email package and Unicode strings handling'
updated_at = <Date 2010-06-02.02:18:53.681>
user = 'https://bugs.python.org/manlioperillo'

bugs.python.org fields:

activity = <Date 2010-06-02.02:18:53.681>
actor = 'r.david.murray'
assignee = 'r.david.murray'
closed = True
closed_date = <Date 2010-06-02.02:18:53.682>
closer = 'r.david.murray'
components = ['Library (Lib)']
creation = <Date 2006-09-10.16:04:26.000>
creator = 'manlioperillo'
dependencies = []
files = []
hgrepos = []
issue_num = 1555842
keywords = []
message_count = 4.0
messages = ['29793', '29794', '84471', '106873']
nosy_count = 5.0
nosy_names = ['barry', 'manlioperillo', 'ajaksu2', 'r.david.murray', 'bgamari']
pr_nums = []
priority = 'normal'
resolution = 'duplicate'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue1555842'
versions = ['Python 2.6']

@manlioperillo
Copy link
Mannequin Author

manlioperillo mannequin commented Sep 10, 2006

The support for Unicode strings in the email package
(notably MIMEText and Header class) is not uniform.

The behaviour with Unicode strings in Header is
documented but the interface is not good.

This code works, but it should not:

>> h = Header.Header(u"àèìòù", charset="us-ascii")
>> m = Message.Message()
>> m["Subject"] = h
>> print m.as_string()

Allowing this to work can cause confusion, I'm saying
that the charset is us-ascii, not utf-8.

With MIMEText I obtain:

m = MIMEText.MIMEText(u"àèìòù", _charset="us-ascii")
>>> print m.as_string()

[ exception ]

I think that the correct behaviour (for all functions
accepting strings) is:

  • Do not accept plain str strings (8-bit).
    Accept only if they are plain ascii (7-bit).
  • The charset specified should not be considered an
    hint, but the charset I want to be used.

Regards Manlio Perillo

@manlioperillo manlioperillo mannequin assigned warsaw Sep 10, 2006
@manlioperillo manlioperillo mannequin added the stdlib Python modules in the Lib dir label Sep 10, 2006
@manlioperillo manlioperillo mannequin assigned warsaw Sep 10, 2006
@manlioperillo manlioperillo mannequin added the stdlib Python modules in the Lib dir label Sep 10, 2006
@manlioperillo
Copy link
Mannequin Author

manlioperillo mannequin commented Sep 10, 2006

Logged In: YES
user_id=1054957

The last example is not right.
Here is the correct one:

 >>> m = MIMEText.MIMEText(u"àèìòù", _charset="utf-8")
 
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "C:\Python2.4\lib\email\MIMEText.py", line 28, in
__init__
    self.set_payload(_text, _charset)
  File "C:\Python2.4\lib\email\Message.py", line 218, in
set_payload
    self.set_charset(charset)
  File "C:\Python2.4\lib\email\Message.py", line 260, in
set_charset
    self._payload = charset.body_encode(self._payload)
  File "C:\Python2.4\lib\email\Charset.py", line 366, in
body_encode
    return email.base64MIME.body_encode(s)
  File "C:\Python2.4\lib\email\base64MIME.py", line 136, in
encode
    enc = b2a_base64(s[i:i + max_unencoded])
UnicodeEncodeError: 'ascii' codec can't encode characters in
position 0-2: ordinal not in range(128)

So it seems that email.Message does not handle Unicode strings.

The code works if I set the charset to latin-1.

@devdanzin
Copy link
Mannequin

devdanzin mannequin commented Mar 30, 2009

Confirmed on trunk.

@devdanzin devdanzin mannequin added type-bug An unexpected behavior, bug, or error labels Mar 30, 2009
@warsaw warsaw assigned bitdancer and unassigned warsaw May 5, 2010
@bitdancer
Copy link
Member

It took me a while to figure out why latin-1 works. I turns out to be an accident: latin-1 uses quoted-printable encoding, and the email quoprimime module accidentally manages to quote unicode characters in the latin-1 range.

The Header example, as noted by the OP, is working as documented. This confusing interface isn't going to get fixed in the current email package. The equivalent email6 API will be cleaner.

The MIMEText portion is a duplicate of bpo-1368247.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants