diff --git a/Doc/library/email.contentmanager.rst b/Doc/library/email.contentmanager.rst new file mode 100644 --- /dev/null +++ b/Doc/library/email.contentmanager.rst @@ -0,0 +1,222 @@ +:mod:`email.contentmanager: Managing MIME Content +------------------------------------------------- + +.. module:: email.contentmanager + :synopsis: Storing and Retrieving Content from MIME Parts + +.. moduleauthor:: R. David Murray +.. sectionauthor:: R. David Murray + + +.. note:: + + The contentmanager module has been included in the standard library on a + :term:`provisional basis `. Backwards incompatible + changes (up to and including removal of the module) may occur if deemed + necessary by the core developers. + +.. versionadded:: 3.4 + as a :term:`provisional module `. + +The :mod:`~email.message` module provides a class that can represent an +arbitrary email message, regardless of whether it is a non-MIME format message, +or is in MIME format (has a MIME-Version header). That basic message model has +a useful and flexible API, but it only knows about the general structure of a +message. Actual MIME messages and message subparts can have additional +structure and semantics. This module provides classes and tools for handling +various specific types of content in a flexible and extensible fashion, +including the ability to retrieve the content of the message as a specialized +object type rather than as a simple bytes object. The module takes care of the +RFC-specified MIME details for the various common content types, and support +for additional types can be added by an application using the extension +mechanisms. + +To provide an API that makes content management simpler, we define a +subclass of :class:`~email.message.Message` named :class:`.MIMEMessage`. +Note: a ``MIMEMessage`` can still be used to represent a non-MIME, +text-body-only message during parsing, but that is a case that happens only +rarely in today's world. + +This module defines the eponymous "Content Manager" classes. The base +:class:`.ContentManager` class defines an API for registering content +management functions, creating mappings between MIME content types and other +representations (files or Python objects). Three subclasses of +:class:`.ContentManager` provide concrete implementations of the content +management protocol: :class:`RawDataManager` maps between content types and +``str`` or ``bytes`` data (and requires that you manage all of the MIME +parameters by hand), :class:`FileManager` uses the :mod:`mimetypes` module to +map between MIME content types and file objects, and :class:`ObjectManager` +maps between MIME content types and specific Python object types that represent +that content. This module also defines a few helper classes that can be used +when creating MIME parts from content when using the :class:`ObjectManager`. + +.. note:: + + Although :class:`.MIMEMEssage` is currently documented in this module + because of the provisional nature of the code, the implementation lives + in the :mod:`email.message` module. + + +.. class:: MIMEMessage(policy=default) + + The *policy* argument determines the :mod:`~email.policy` that will be used + to update the message model. The default value, :class:`default`, follows + the rules of the email RFCs except for line endings: instead of the RFC + mandated ``\r\n``, it uses the Python standard ``\n`` line endings. For + more information see the :mod:`~email.policy` documentation. + + This class is a subclass of :class:`~email.message.Message`. It adds + the following methods: + + + .. method:: get_body(preferencelist=('releated', 'html', 'plain')) + + Return the MIME part that contains the notional ``body`` of the message. + + *preferencelist* is a sequence of strings from the set ``related``, + ``html``, and ``plain``, and indicates the order of preference for the + content type of the part returned. If ``html`` is included in the list + and ``related`` is not, returns the ``html`` part of a ``related`` part + if one is found. If there is no part that matches at least one of the + types in the preference list, returns ``None``. Note that for most + applications the only combinations that really make sense are + ``('plain',)``, ``('html', 'plain')``, and the default, ``('related', + 'html', 'plain')``. + + If called on a non-``multipart`` message, returns the part on which it + was called if that part is of a type that matches one of the preferences, + otherwise it returns ``None``. Recall that if a part does not have + an explicit type it defaults to ``text/plain``. + + If a part has a :mailheader:``Content-Disposition`` header, it is + only considered a body candidate if the value is ``inline``. + + + .. method:: iter_attachments() + + Returns an iterator over all of the parts of the message that are not + candidate "body" parts. That is, the first occurrence of each of + ``text/plain``, ``text/html``, ``multipart/related, or + ``multipart/alternative`` are skipped, and all remaining parts are + returned, When applied directly to a ``multipart/related``, returns an + iterator over the all the related parts except an initial ``text`` part, + if there is one. When applied directly to a ``multipart/alternative`` or + a non-``multipart``, returns an empty list. + + + .. method:: iter_parts() + + Returns an iterator over all of the parts of the message, which will be + empty for a non-``multipart``. + + + .. method:: get_content(*args, content_manager=None, **kw) + + Calls the ``get_content`` method of the *content_manager*, passing itself + as the message object, and passing along any other arguments or keywords + as additional arguments. If *content_manager* is not specified, it + defaults to the ``content_manager`` specified by the current + :mod:`~email.policy`. + + + .. method:: set_content(*args, headers=None, content_manager=None, **kw) + + Calls the ``set_content`` method of the *content_manager*, passing itself + as the message object, and passing along any other arguments or keywords + as additional arguments. *headers* is a list of header objects to be + added to the message, which is done before the *content_manager* is + called. If *content_manager* is not specified, it defaults to the + ``content_manager`` specified by the current :mod:`~email.policy`. + + + .. method:: make_related() + + Convert a non-``multipart`` message into a ``multipart/related`` message, + moving the existing content into the (new) first part of the ``mulitpart``. + + + .. method:: make_alternative() + + Convert a non-``multipart`` or a ``multipart-related`` into a + ``multipart/alternative``, moving the existing content into the (new) + first part of the ``multipart``. + + + .. method:: make_mixed() + + Convert a non-``multipart``, a ``multipart-related``, or a + ``multipart-alternative`` into a ``multipart/alternative``, moving the + existing content into the (new) first part of the ``multipart``. + + + .. method:: add_related(*args, **kw) + + If the message is a ``multipart/related``, create a new message + object, pass all of the arguments to its :meth:`set_content` method, + and :meth:`~email.message.Message.attach` it to the ``multipart``. If + the message is a non-``multipart``, call :meth:`make_related` and then + proceeds as above. If the message is any other type of ``multipart``, + raise a :exc:`TypeError`. + + + .. method:: add_alternative(*args, **kw) + + If the message is a ``multipart/alternative``, create a new message + object, pass all of the arguments to its :meth:`set_content` method, and + :meth:`~email.message.Message.attach` it to the ``multipart``. If the + message is a non-``multipart`` or ``multipart-related``, call + :meth:`make_alternative` and then proceeds as above. If the message is + any other type of ``multipart``, raise a :exc:`TypeError`. + + + .. method:: add_attachment(*args, **kw) + + If the message is a ``multipart/mixed``, create a new message object, + pass all of the arguments to its :meth:`set_content` method, and + :meth:`~email.message.Message.attach` it to the ``multipart``. If the + message is a non-``multipart``, ``multipart-related``, or + ``multipart/alternative``, call :meth:`make_mixed` and then proceeds as + above. + + +.. class ContentManager() + + Base class for content managers. Provides the standard registry mechanisms that + map MIME content types to other representations, as well as the ``get_content`` + and ``set_content`` dispatch methods. + + .. method get_content(msg, *args, **kw) + + Extract the payload from *msg* and return an object that encodes information + about the extracted data. + + + .. method set_content(msg, obj, *args, **kw) + + Transform and store *obj* into *msg*, possibly making other changes to *msg* + as well, such as adding various MIME headers to encode information needed + to interpret the stored data. + + .. method add_get_handler(mimetype, handler) + + Record *handler* as the function to call when :meth:`get_content` is + called on a message whose ``mimetype`` is *mimetype*. *mimetype* is a + string of the form ``maintype[/subtype]``. That is, ``subtype`` is + optional; if only a ``maintype`` is given, then if there is no more + specific match for the ``mimetype`` of a message, the handler + corresponding to its ``maintype`` is called. + + .. method add_set_handler(typekey, handler) + + Record *handler* as the function to call when an object of a type + matching *typekey* is passed to :meth:`set_content`. *typekey* may be + one of three things: an actual ``type`` object (eg: ``str``), the + ``__qualname__`` of a type (eg: ``email.message.Message``), or the + ``__name__`` of a type (eg: ``Message``). The preceding is the order in + which ``set_content`` does lookup attempts when passed an object. That + is, given an object ``obj``, ``set_content`` will first try to look the + object up by ``obj.__class__``, then by ``obj.__class__.__qualname__``, + then by ``obj.__class__.__name__``. If no match is found, the lookup + sequence is repeated for each class in ``obj.__mro__`` until a match is + found. To register a default handler, therefore, register it under the + *typekey* ``object`` or ``"object"``. diff --git a/Lib/email/message.py b/Lib/email/message.py --- a/Lib/email/message.py +++ b/Lib/email/message.py @@ -8,8 +8,6 @@ import re import uu -import base64 -import binascii from io import BytesIO, StringIO # Intrapackage imports @@ -903,3 +901,51 @@ # I.e. def walk(self): ... from email.iterators import walk + + +class MIMEMessage(Message): + + def __init__(self, policy=None): + if policy is None: + from email.policy import default + policy = default + Message.__init__(self, policy) + + def get_body(self, preferencelist=('related', 'html', 'plain')): + found = [None] * len(preferencelist) + for part in self.walk(): + maintype, subtype = part.get_content_type().split('/') + if (subtype not in preferencelist or + not (maintype == 'text' and + subtype in ('html', 'plain')) and + not (maintype == 'multipart' and subtype == 'related') or + part.get('content-disposition') not in (None, 'inline')): + continue + priority = preferencelist.index(subtype) + if priority == 0: + # Short circuit, don't need to check the rest. + return part + found[priority] = part + # If #18652 is added, use this instead: + #return first_true(found, pred=lambda x: x is not None) + return next(filter(lambda x: x is not None, found), None) + + def iter_attachments(self): + seen = [] + maintype, subtype = self.get_content_type().split('/') + if maintype != 'multipart' or subtype == 'alternative': + return + for part in self.get_payload(): + maintype, subtype = part.get_content_type().split('/') + if ((maintype == 'text' and subtype in ('html', 'plain') or + maintype == 'multipart' and + subtype in ('related', 'alternative')) and + part.get('content-disposition') in (None, 'inline') and + subtype not in seen): + seen.append(subtype) + continue + yield part + + def iter_parts(self): + if self.is_multipart(): + yield from self.get_payload() diff --git a/Lib/test/test_email/__init__.py b/Lib/test/test_email/__init__.py --- a/Lib/test/test_email/__init__.py +++ b/Lib/test/test_email/__init__.py @@ -42,6 +42,8 @@ # here we make minimal changes in the test_email tests compared to their # pre-3.3 state. policy = compat32 + # Likewise, the default message object is Message. + message = Message def __init__(self, *args, **kw): super().__init__(*args, **kw) @@ -54,9 +56,11 @@ with openfile(filename) as fp: return email.message_from_file(fp, policy=self.policy) - def _str_msg(self, string, message=Message, policy=None): + def _str_msg(self, string, message=None, policy=None): if policy is None: policy = self.policy + if message is None: + message = self.message return email.message_from_string(string, message, policy=policy) def _bytes_repr(self, b): diff --git a/Lib/test/test_email/test_message.py b/Lib/test/test_email/test_message.py --- a/Lib/test/test_email/test_message.py +++ b/Lib/test/test_email/test_message.py @@ -1,6 +1,13 @@ import unittest +import textwrap from email import policy -from test.test_email import TestEmailBase +from email.message import MIMEMessage +from test.test_email import TestEmailBase, parameterize + + +# Helper. +def first(iterable): + return next(filter(lambda x: x is not None, iterable), None) class Test(TestEmailBase): @@ -14,5 +21,288 @@ m['To'] = 'xyz@abc' +@parameterize +class TestMIMEMessage(TestEmailBase): + + policy = policy.default + message = MIMEMessage + + # The first argument is a triple (related, html, plain) of indices into the + # list returned by 'walk' called on a Message constructed from the third. + # The indices indicate which part should match the corresponding part-type + # when passed to get_body (ie: the "first" part of that type in the + # message). The second argument is a list of indices into the 'walk' list + # of the attachments that should be returned by a call to + # 'iter_attachments'. The third argument is a list of indices into 'walk' + # that should be returned by a call to 'iter_parts'. Note that the first + # item returned by 'walk' is the Message itself. + + message_params = { + + 'empty_message': ( + (None, None, 0), + (), + (), + ""), + + 'non_mime_plain': ( + (None, None, 0), + (), + (), + textwrap.dedent("""\ + To: foo@example.com + + simple text body + """)), + + 'mime_non_text': ( + (None, None, None), + (), + (), + textwrap.dedent("""\ + To: foo@example.com + MIME-Version: 1.0 + Content-Type: image/jpg + + bogus body. + """)), + + 'plain_html_alternative': ( + (None, 2, 1), + (), + (1, 2), + textwrap.dedent("""\ + To: foo@example.com + MIME-Version: 1.0 + Content-Type: multipart/alternative; boundary="===" + + preamble + + --=== + Content-type: text/plain + + simple body + + --=== + Content-Type: text/html + +

simple body

+ --===-- + """)), + + 'plain_html_mixed': ( + (None, 2, 1), + (), + (1, 2), + textwrap.dedent("""\ + To: foo@example.com + MIME-Version: 1.0 + Content-Type: multipart/mixed; boundary="===" + + preamble + + --=== + Content-type: text/plain + + simple body + + --=== + Content-Type: text/html + +

simple body

+ + --===-- + """)), + + 'plain_html_attachment_mixed': ( + (None, None, 1), + (2,), + (1, 2), + textwrap.dedent("""\ + To: foo@example.com + MIME-Version: 1.0 + Content-Type: multipart/mixed; boundary="===" + + --=== + Content-type: text/plain + + simple body + + --=== + Content-Type: text/html + Content-Disposition: attachment + +

simple body

+ + --===-- + """)), + + 'html_text_attachment_mixed': ( + (None, 2, None), + (1,), + (1, 2), + textwrap.dedent("""\ + To: foo@example.com + MIME-Version: 1.0 + Content-Type: multipart/mixed; boundary="===" + + --=== + Content-type: text/plain + Content-Disposition: attachment + + simple body + + --=== + Content-Type: text/html + +

simple body

+ + --===-- + """)), + + 'html_text_attachment_inline_mixed': ( + (None, 2, 1), + (), + (1, 2), + textwrap.dedent("""\ + To: foo@example.com + MIME-Version: 1.0 + Content-Type: multipart/mixed; boundary="===" + + --=== + Content-type: text/plain + Content-Disposition: inline + + simple body + + --=== + Content-Type: text/html + Content-Disposition: inline + +

simple body

+ + --===-- + """)), + + + 'related': ( + (0, 1, None), + (2,), + (1, 2), + textwrap.dedent("""\ + To: foo@example.com + MIME-Version: 1.0 + Content-Type: multipart/related; boundary="===" + + --=== + Content-type: text/html + +

simple body

+ + --=== + Content-Type: image/jpg + Content-ID: + + bogus body + + --===-- + """)), + + + 'mixed_alternative_plain_related': ( + (3, 4, 2), + (6, 7), + (1, 6, 7), + textwrap.dedent("""\ + To: foo@example.com + MIME-Version: 1.0 + Content-Type: multipart/mixed; boundary="===" + + --=== + Content-Type: multipart/alternative; boundary="+++" + + --+++ + Content-Type: text/plain + + simple body + + --+++ + Content-Type: multipart/related; boundary="___" + + --___ + Content-Type: text/html + +

simple body

+ + --___ + Content-Type: image/jpg + Content-ID: + + bogus jpg body + + --___-- + + --+++-- + + --=== + Content-Type: image/jpg + Content-Disposition: attachment + + bogus jpg body + + --=== + Content-Type: image/jpg + Content-Disposition: attachment + + another bogus jpg body + + --===-- + """)), + + } + + def message_as_body_source(self, body_parts, attachments, parts, msg): + m = self._str_msg(msg) + allparts = list(m.walk()) + expected = [None if n is None else allparts[n] for n in body_parts] + related = 0; html = 1; plain = 2 + self.assertEqual(m.get_body(), first(expected)) + self.assertEqual(m.get_body(preferencelist=('related', 'html', 'plain')), + first(expected)) + self.assertEqual(m.get_body(preferencelist=('related', 'html')), + first(expected[related:html+1])) + self.assertEqual(m.get_body(preferencelist=('related', 'plain')), + first([expected[related], expected[plain]])) + self.assertEqual(m.get_body(preferencelist=('html', 'plain')), + first(expected[html:plain+1])) + self.assertEqual(m.get_body(preferencelist=['related']), expected[related]) + self.assertEqual(m.get_body(preferencelist=['html']), expected[html]) + self.assertEqual(m.get_body(preferencelist=['plain']), expected[plain]) + self.assertEqual(m.get_body(preferencelist=('plain', 'html')), + first(expected[plain:html-1:-1])) + self.assertEqual(m.get_body(preferencelist=('plain', 'related')), + first([expected[plain], expected[related]])) + self.assertEqual(m.get_body(preferencelist=('html', 'related')), + first(expected[html::-1])) + self.assertEqual(m.get_body(preferencelist=('plain', 'html', 'related')), + first(expected[::-1])) + self.assertEqual(m.get_body(preferencelist=('html', 'plain', 'related')), + first([expected[html], + expected[plain], + expected[related]])) + + def message_as_attachment_source(self, body_parts, attachments, parts, msg): + m = self._str_msg(msg) + allparts = list(m.walk()) + attachments = [allparts[n] for n in attachments] + self.assertEqual(list(m.iter_attachments()), attachments) + + def message_as_parts_source(self, body_parts, attachments, parts, msg): + m = self._str_msg(msg) + allparts = list(m.walk()) + parts = [allparts[n] for n in parts] + self.assertEqual(list(m.iter_parts()), parts) + + if __name__ == '__main__': unittest.main()