Message 264065 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	Ronan.Lamy, gvanrossum, larry, pitrou, pjenvey, serhiy.storchaka, vstinner
Date	2016-04-23.18:15:32
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1461435333.18.0.124129358247.issue26800@psf.upfronthosting.co.za>
In-reply-to

Content
Side comment about "bytes-like" and "byte string". As for language from PEP 484, I think it is too permissive. bytes and bytearray have are "bytes strings" because they are not only sequences of bytes, but have a lot of str-like methods: lower(), split(), startswith(), strip(), etc. Many Python functions that work with "bytes strings" expect the support some of these methods. memoryview() has no these methods, it is even not always the sequence of bytes. Other objects that support the buffer protocol can even be not sequences. This is a problem when we want to support all objects with the buffer protocol in functions written in Python. We need to wrap them in memoryview and cast to the 'B' format. But in many cases the term "byte-like" means that bytes and bytearray are accepted. There are different levels of "byte-likelity", and unfortunately there is no good terminology. As for moving away from accepting non-bytes paths, I think the arguments are similar to arguments about why Path is not str subclass. Or why we don't convert any path argument to string by calling str(). Because it can hide errors and cause unexpected behavior instead of exception. For example on Windows array('u', '扡摣晥') represents not Unicode name '扡摣晥', but bytes name b'abcdef'.

Side comment about "bytes-like" and "byte string". As for language from PEP 484, I think it is too permissive. bytes and bytearray have are "bytes strings" because they are not only sequences of bytes, but have a lot of str-like methods: lower(), split(), startswith(), strip(), etc. Many Python functions that work with "bytes strings" expect the support some of these methods. memoryview() has no these methods, it is even not always the sequence of bytes. Other objects that support the buffer protocol can even be not sequences. This is a problem when we want to support all objects with the buffer protocol in functions written in Python. We need to wrap them in memoryview and cast to the 'B' format. But in many cases the term "byte-like" means that bytes and bytearray are accepted. There are different levels of "byte-likelity", and unfortunately there is no good terminology.

As for moving away from accepting non-bytes paths, I think the arguments are similar to arguments about why Path is not str subclass. Or why we don't convert any path argument to string by calling str(). Because it can hide errors and cause unexpected behavior instead of exception. For example on Windows array('u', '扡摣晥') represents not Unicode name '扡摣晥', but bytes name b'abcdef'.

History
Date	User	Action	Args
2016-04-23 18:15:33	serhiy.storchaka	set	recipients: + serhiy.storchaka, gvanrossum, pitrou, vstinner, larry, pjenvey, Ronan.Lamy
2016-04-23 18:15:33	serhiy.storchaka	set	messageid: <1461435333.18.0.124129358247.issue26800@psf.upfronthosting.co.za>
2016-04-23 18:15:33	serhiy.storchaka	link	issue26800 messages
2016-04-23 18:15:32	serhiy.storchaka	create