Issue 35131: Cannot access to customized paths within .pth file

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/79312

classification

Title:	Cannot access to customized paths within .pth file
Type:	behavior	Stage:
Components:	Windows	Versions:	Python 3.8, Python 3.7, Python 3.6

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Valentin Zhao, Windson Yang, brett.cannon, jaraco, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority:	normal	Keywords:	easy

Created on 2018-11-01 09:56 by Valentin Zhao, last changed 2022-04-11 14:59 by admin.

Files
File name	Uploaded	Description	Edit
IMG_20181101_173328_[B@ae031df.jpg	Valentin Zhao, 2018-11-01 09:56

Messages (11)
msg329050 - (view)	Author: Valentin Zhao (Valentin Zhao)	Date: 2018-11-01 09:56
I want to manage all the packages that I installed so every time adding package I set "--target" so the package will be downloaded there. Then I wrote the directory in a .pth file which is located in "/Python36/Lib/site-packages" so I could still get accessed to all the packages even though they are not located within "Python36" folder. However, my current user name of Windows is a Chinese name, which means the customized path I mentioned before has Chinese characters within it, thus the .pth file will be also encoded with 'gbk'. Every time I would like to import these packages will get "UnicodeDecodeError: 'gbk' can't decode byte xxx...". Fortunately I have found the reason and cracked the problem: python read .pth files without setting any encoding. The code is located in "Python36/Lib/site.py" def addpackage(sitedir, name, known_paths): if known_paths is None: known_paths = _init_pathinfo() reset = True else: reset = False fullname = os.path.join(sitedir, name) try: # here should set the second param as encoding='utf-8' f = open(fullname, "r") except OSError: return # other codes And after I doing this, everything goes well.
msg329172 - (view)	Author: Steve Dower (steve.dower) *	Date: 2018-11-02 23:40
Can you save your file in gbk encoding? That will be an immediate fix. I don't know that we can/should change the encoding we read without checking with everyone who writes out .pth files. (+Jason as a start here, but I suspect there are more tools that write them.) We could add a handler for UnicodeDecodeError that falls back on utf-8? I think that's reasonable.
msg329173 - (view)	Author: Steve Dower (steve.dower) *	Date: 2018-11-02 23:40
I'll mark this easy as well, since adding that handler is straightforward. Unless someone knows a reason we shouldn't do that either.
msg329178 - (view)	Author: Windson Yang (Windson Yang) *	Date: 2018-11-03 03:47
Hello, Valentin Zhao, do you have time to fix it? Or I can create a PR
msg329198 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2018-11-03 14:05
I'm only aware of one tool that writes .pth files, and that's setuptools, and it always writes ASCII (assuming package names are ASCII), so any encoding handling should be fine there. > We could add a handler for UnicodeDecodeError that falls back on utf-8? Yes, reasonable, but maybe we should consider instead _preferring_ UTF-8 and fall back to default encodings. That would be my preference.
msg329199 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2018-11-03 14:12
Also, I would argue that this is an enhancement request and not a bug - that the prior expectation was that the .pth file is encoded in whatever encoding the system expects by default, and that adding support for a standardized encoding for .pth files is a new feature. As another aside: Valentin, the technique you're using to manage packages is likely to run into issues with certain packages - in particular any packages that rely on their own `.pth` files to invoke behavior, such as future_fstrings (https://pypi.org/project/future-fstrings/). I learned about this issue in (https://github.com/jaraco/rwt/issues/29), which is why the rwt project adds a `sitecustomize.py` to the target directory that ensures .pth files are run. Just FYI.
msg329497 - (view)	Author: Valentin Zhao (Valentin Zhao)	Date: 2018-11-09 06:42
I am better just waiting you guys fixing that because it is not urgent. On Sat, Nov 3, 2018 at 10:12 PM Jason R. Coombs <report@bugs.python.org> wrote: > > Jason R. Coombs <jaraco@jaraco.com> added the comment: > > Also, I would argue that this is an enhancement request and not a bug - > that the prior expectation was that the .pth file is encoded in whatever > encoding the system expects by default, and that adding support for a > standardized encoding for .pth files is a new feature. > > As another aside: Valentin, the technique you're using to manage packages > is likely to run into issues with certain packages - in particular any > packages that rely on their own `.pth` files to invoke behavior, such as > future_fstrings (https://pypi.org/project/future-fstrings/). I learned > about this issue in (https://github.com/jaraco/rwt/issues/29), which is > why the rwt project adds a `sitecustomize.py` to the target directory that > ensures .pth files are run. Just FYI. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <https://bugs.python.org/issue35131> > _______________________________________ >
msg329498 - (view)	Author: Windson Yang (Windson Yang) *	Date: 2018-11-09 06:58
I tried to create a PR for it, However, I don't know how to handle the code at https://github.com/python/cpython/blob/d4c76d960b/Lib/site.py#L159 So how to check UnicodeDecodeError when we just open the file, I use readlines() but it may use too many memory than before (I'm not sure it's important in this case). try: f = open(fullname, "r") data = f.readlines() except UnicodeDecodeError: f = open(fullname, "r", encoding="utf-8") data = f.readlines()
msg330058 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2018-11-18 18:42
The problem you've encountered is that previously the file was assumed to be one encoding and would fail if it was not that encoding... so it was possible to lazy-load the file and process each line. In the new model, where you need to evaluate the viability of the file in one of two candidate encodings, you'll necessarily need to read the entire file once before processing its contents. Therefore, I recommend one of these options: 1. Always read the file in binary mode, ascertain the "best" encoding, then rewind the file and wrap it in a TextIOWrapper for that encoding. Presumably this logic is common--perhaps there's already a routine that does just that. 2. In a try/except block, read the entire content, decoded, into another iterable ... and then have the logic below rely on that content. i.e. `f = list(f)`. 3. Always assume UTF-8 instead of the system encoding. This change would be backward incompatible, so probably isn't acceptable without at least an interim release with a deprecation warning. I recommend a combination of (1) and then (3) in the future. That is: def determine_best_encoding(f, encodings=('utf-8', sys.getdefaultencoding())): """ Attempt to read and decode all of stream f using the encodings and return the first one that succeeds. Rewinds the file. """ f = open(..., 'rb) encoding = determine_best_encoding(f) if encoding != 'utf-8': warnings.warn("Detected pth file with unsupported encoding", DeprecationWarning) f = io.TextIOWrapper(f, encoding) Then, in a future version, dropping support for local encodings, all of that code can be replaced with `f = open(..., encoding='utf-8')`.
msg330113 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2018-11-19 19:52
There is not "find best encoding" code, hence why so much code out there uses chardet. :) This might also tie into issue #33944 and the idea of rethinking .pth files.
msg330201 - (view)	Author: Windson Yang (Windson Yang) *	Date: 2018-11-21 13:23
I will fix this issue after we have consensus with the future of .pth file in #33944

History
Date	User	Action	Args
2022-04-11 14:59:07	admin	set	github: 79312
2018-11-29 14:57:52	vstinner	set	nosy: + vstinner
2018-11-21 13:23:17	Windson Yang	set	messages: + msg330201
2018-11-19 19:52:53	brett.cannon	set	nosy: + brett.cannon messages: + msg330113
2018-11-18 18:42:29	jaraco	set	messages: + msg330058
2018-11-09 06:58:59	Windson Yang	set	messages: + msg329498
2018-11-09 06:42:13	Valentin Zhao	set	messages: + msg329497
2018-11-03 14:12:33	jaraco	set	messages: + msg329199
2018-11-03 14:05:00	jaraco	set	messages: + msg329198
2018-11-03 03:47:09	Windson Yang	set	nosy: + Windson Yang messages: + msg329178
2018-11-02 23:40:58	steve.dower	set	keywords: + easy messages: + msg329173 versions: + Python 3.7, Python 3.8
2018-11-02 23:40:16	steve.dower	set	nosy: + jaraco messages: + msg329172
2018-11-02 20:09:11	ned.deily	set	nosy: + paul.moore, tim.golden, zach.ware, steve.dower components: + Windows, - Library (Lib)
2018-11-01 09:56:37	Valentin Zhao	create