Message332290
While RFC2047 clearly states that an encoder MUST not split multi-byte encodings in the middle of a character (section 5, "Each 'encoded-word' MUST represent an integral number of characters. A multi-octet character may not be split across adjacent 'encoded-word's.), it also states that to fit length restrictions, CRLF SPACE is used as a delimiter between encoded words (section 2, "If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used."). In section 6.2 it states
When displaying a particular header field that contains multiple
'encoded-word's, any 'linear-white-space' that separates a pair of
adjacent 'encoded-word's is ignored. (This is to allow the use of
multiple 'encoded-word's to represent long strings of unencoded text,
without having to separate 'encoded-word's where spaces occur in the
unencoded text.)
(linear-white-space is the RFC822 term for foldable whitespace).
The parser is leaving spaces between two encoded-word tokens in place, where it must remove them instead. And it is doing so correctly for unstructured headers, just not in get_bare_quoted_string, get_atom and get_dot_atom.
Then there is Postel's law (*be liberal in what you accept from others*), and the email package already applies that principle to RFC2047 elsewhere; RFC2047 also states that "An 'encoded-word' MUST NOT appear within a 'quoted-string'." yet email._header_value_parser's handling of quoted-string will process EW sections. |
|
Date |
User |
Action |
Args |
2018-12-21 11:34:21 | mjpieters | set | recipients:
+ mjpieters, barry, r.david.murray, era |
2018-12-21 11:34:21 | mjpieters | set | messageid: <1545392061.32.0.788709270274.issue35547@psf.upfronthosting.co.za> |
2018-12-21 11:34:21 | mjpieters | link | issue35547 messages |
2018-12-21 11:34:21 | mjpieters | create | |
|