Message 295126 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gdr@garethrees.org
Recipients	ShiftedBit, gdr@garethrees.org
Date	2017-06-04.15:28:17
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1496590097.26.0.84732812742.issue30564@psf.upfronthosting.co.za>
In-reply-to

Content
RFC 4648 section 3.5 says: The padding step in base 64 and base 32 encoding can, if improperly implemented, lead to non-significant alterations of the encoded data. For example, if the input is only one octet for a base 64 encoding, then all six bits of the first symbol are used, but only the first two bits of the next symbol are used. These pad bits MUST be set to zero by conforming encoders, which is described in the descriptions on padding below. If this property do not hold, there is no canonical representation of base-encoded data, and multiple base- encoded strings can be decoded to the same binary data. If this property (and others discussed in this document) holds, a canonical encoding is guaranteed. In some environments, the alteration is critical and therefore decoders MAY chose to reject an encoding if the pad bits have not been set to zero. If decoders may choose to reject non-canonical encodings, then they may also choose to accept them. (That's the meaning of "MAY" in RFC 2119.) So I think Python's behaviour is conforming to the standard.

RFC 4648 section 3.5 says:

   The padding step in base 64 and base 32 encoding can, if improperly
   implemented, lead to non-significant alterations of the encoded data.
   For example, if the input is only one octet for a base 64 encoding,
   then all six bits of the first symbol are used, but only the first
   two bits of the next symbol are used.  These pad bits MUST be set to
   zero by conforming encoders, which is described in the descriptions
   on padding below.  If this property do not hold, there is no
   canonical representation of base-encoded data, and multiple base-
   encoded strings can be decoded to the same binary data.  If this
   property (and others discussed in this document) holds, a canonical
   encoding is guaranteed.

   In some environments, the alteration is critical and therefore
   decoders MAY chose to reject an encoding if the pad bits have not
   been set to zero.

If decoders may choose to reject non-canonical encodings, then they may also
choose to accept them. (That's the meaning of "MAY" in RFC 2119.) So I think
Python's behaviour is conforming to the standard.

History
Date	User	Action	Args
2017-06-04 15:28:17	gdr@garethrees.org	set	recipients: + gdr@garethrees.org, ShiftedBit
2017-06-04 15:28:17	gdr@garethrees.org	set	messageid: <1496590097.26.0.84732812742.issue30564@psf.upfronthosting.co.za>
2017-06-04 15:28:17	gdr@garethrees.org	link	issue30564 messages
2017-06-04 15:28:17	gdr@garethrees.org	create