This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Inconsistent behaviour between BytesParser.parse and Parser.parse
Type: behavior Stage: resolved
Components: email Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: barry, python-dev, r.david.murray, serhiy.storchaka, vajrasky, Łukasz.Kucharski
Priority: normal Keywords: patch

Created on 2014-05-12 02:58 by Łukasz.Kucharski, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
mail_test.py Łukasz.Kucharski, 2014-05-12 02:58 Simple program that depicts the issue presented.
bytes_parser_dont_close_file.patch vajrasky, 2014-05-30 16:15 review
bytes_parser_dont_close_file_v2.patch vajrasky, 2014-05-31 04:04 review
bytes_parser_dont_close_file_v3.patch vajrasky, 2014-05-31 15:37 review
bytes_parser_dont_close_file_v4.patch vajrasky, 2014-06-01 06:12 review
bytes_parser_dont_close_file_v5.patch vajrasky, 2014-06-08 16:17 review
Messages (16)
msg218309 - (view) Author: Łukasz Kucharski (Łukasz.Kucharski) Date: 2014-05-12 02:58
The behaviour of the method for both classes seem to be a little different. Executing `Parser.parse(fp)` does not close the file pointer passed while Executing `BytesParser.parse` does.

I think this caused by fact that `BytesParser.parse` implementation is using `with` statement. Writing this

    fp = TextIOWrapper(file_pointer, encoding='ascii', errors='surrogateescape')
    with fp:
      return self.parser.parse(fp, headersonly)
    file_pointer.seek(0)

The original `file_pointer` gets closed and the call to `seek` fails. 

I am not sure whether such behaviour is intended, and whether, the `with` behaves badly, or the `TextIOWrapper`, or the `BytesParser`, thus I am unable to suggest or provide a patch. But I think the behaviour should be consistent and/or documented.

I attached a file that depicts the issue. The problem originated from SO:
http://stackoverflow.com/questions/23599457/how-to-parse-an-email-in-python-without-closing-the-file
I think it's a minor issue, but it did cause a code to fail with no apparent reason.
msg219395 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-05-30 15:23
I think the code should be using 'detach' after passing the fp to parser.parse, instead of using 'with'.
msg219398 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2014-05-30 16:15
Here is the simple patch based on David Murray's thought.
msg219406 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-05-30 17:33
Use try/finally to detach even if an exception is raised during parsing.
msg219439 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2014-05-31 04:04
Thank you, Serhiy, for the review! Here is the updated patch.
msg219449 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-05-31 12:16
Could you please add a test with parse() raising an exception?

Yet one nitpick. Instead of

        fp = openfile('msg_02.txt', 'rb')
        self.addCleanup(fp.close)
        ...

you can write

        with openfile('msg_02.txt', 'rb') as fp:
            ...

as in other tests.
msg219452 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2014-05-31 15:37
Serhiy, here is the latest patch incorporating your request.
msg219453 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-05-31 15:45
Sorry, I meant to test parser with invalid message, so that parse() raises an exception, and file shouldn't be closed after this.
msg219461 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2014-05-31 16:50
The Parse class does not throw exception if given invalid message:

Assume /tmp/a.txt contains garbage, such as: "&&&&&"

With this code:

    with open("/tmp/a.txt", "r") as fp:
        msg = email.parser.Parser().parse(fp) # does not throw exception
        print(msg) # => &&&&&
        msg['from'] # => None

It is just you can not get useful information, such as msg['to'].
msg219469 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-05-31 19:08
Right, part of the parser contract is to not throw exceptions.  Traditionally, a bug could result in an exception, but not an invalid message.

However, using the new email policies, it is possible to *request* that it raise exceptions instead of registering defects.  See policy.raise_on_defect.
msg219471 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-05-31 19:15
If the parser itself doesn't raise exceptions, we should test with input stream raising an exception.
msg219489 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2014-06-01 06:12
Ok, here is the updated patch based on R. David Murray's help. Thanks!
msg219841 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-06-05 20:09
I believe there are msg_NN files that have defects.  I'd rather use one of those in the exception test.
msg220038 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2014-06-08 16:17
Here is the patch based on R. David Murray's nitpick.
msg221623 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-06-26 17:33
New changeset 0a16756dfcc0 by R David Murray in branch '3.4':
#21476: Unwrap fp in BytesParser so the file isn't unexpectedly closed.
http://hg.python.org/cpython/rev/0a16756dfcc0

New changeset a3ee325fd489 by R David Murray in branch 'default':
Merge #21476: Unwrap fp in BytesParser so the file isn't unexpectedly closed.
http://hg.python.org/cpython/rev/a3ee325fd489
msg221624 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-06-26 17:34
Thanks, Vajrasky.  And to the reviewers as well.
History
Date User Action Args
2022-04-11 14:58:03adminsetgithub: 65675
2014-06-26 17:34:41r.david.murraysetstatus: open -> closed
resolution: fixed
messages: + msg221624

stage: commit review -> resolved
2014-06-26 17:33:32python-devsetnosy: + python-dev
messages: + msg221623
2014-06-25 10:32:01Claudiu.Popasettitle: Inconsitent behaviour between BytesParser.parse and Parser.parse -> Inconsistent behaviour between BytesParser.parse and Parser.parse
stage: commit review
2014-06-08 16:17:31vajraskysetfiles: + bytes_parser_dont_close_file_v5.patch

messages: + msg220038
2014-06-05 20:09:51r.david.murraysetmessages: + msg219841
2014-06-01 06:12:30vajraskysetfiles: + bytes_parser_dont_close_file_v4.patch

messages: + msg219489
2014-05-31 19:15:46serhiy.storchakasetmessages: + msg219471
2014-05-31 19:08:17r.david.murraysetmessages: + msg219469
2014-05-31 16:50:26vajraskysetmessages: + msg219461
2014-05-31 15:45:36serhiy.storchakasetmessages: + msg219453
2014-05-31 15:37:30vajraskysetfiles: + bytes_parser_dont_close_file_v3.patch

messages: + msg219452
2014-05-31 12:16:24serhiy.storchakasetmessages: + msg219449
2014-05-31 04:04:13vajraskysetfiles: + bytes_parser_dont_close_file_v2.patch

messages: + msg219439
2014-05-30 17:33:05serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg219406
2014-05-30 16:15:30vajraskysetfiles: + bytes_parser_dont_close_file.patch

nosy: + vajrasky
messages: + msg219398

keywords: + patch
2014-05-30 15:23:24r.david.murraysetmessages: + msg219395
versions: + Python 3.5, - Python 3.3
2014-05-12 02:58:05Łukasz.Kucharskicreate