This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author jaraco
Recipients docs@python, jaraco
Date 2021-07-30.14:21:04
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1627654865.8.0.668589058029.issue44779@roundup.psfhosted.org>
In-reply-to
Content
In [this comment](https://github.com/python/cpython/pull/27436#issuecomment-889815333), I learned that it's possible to get repo clones into a bad state by:

- commit a text file to main (merge a PR)
- customize the newline handling for that file in .gitattributes (in a separate PR)

Users (including buildbots) that pulled the code between these two steps will be stuck with the files at the state checked out in the first step.

Example (must be run on Windows):

PS C:\> git clone https://github.com/python/cpython --depth 100
Cloning into 'cpython'...
remote: Enumerating objects: 5946, done.
remote: Counting objects: 100% (5946/5946), done.
remote: Compressing objects: 100% (5079/5079), done.
Receiving objects: 100% (5946/5946), 24.99 MiB | 9.18 MiB/s, done.
remote: Total 5946 (delta 1382), reused 2314 (delta 758), pack-reused 0
Resolving deltas: 100% (1382/1382), done.
Updating files: 100% (4699/4699), done.
PS C:\> cd cpython
PS C:\cpython> git checkout aaa83cd^1
HEAD is now at 851cca8 Add missing gdbm dependencies to the UNIX CI (GH-27467)
PS C:\cpython> # simulate as if this rev was the the initial checkout
PS C:\cpython> git rm -r :/ > $null ; git checkout HEAD -- :/
PS C:\cpython> python -c "import pathlib; print(repr(pathlib.Path('Lib/test/test_importlib/namespacedata01/utf-8.file').read_bytes()))"
b'Hello, UTF-8 world!\r\n'
PS C:\cpython> git checkout -q aaa83cd
HEAD is now at aaa83cd bpo-44771: Apply changes from importlib_resources 5.2.1 (GH-27436)
PS C:\cpython> python -c "import pathlib; print(repr(pathlib.Path('Lib/test/test_importlib/namespacedata01/utf-8.file').read_bytes()))"
b'Hello, UTF-8 world!\r\n'
PS C:\cpython> git rm -r :/ > $null ; git checkout HEAD -- :/
PS C:\cpython> python -c "import pathlib; print(repr(pathlib.Path('Lib/test/test_importlib/namespacedata01/utf-8.file').read_bytes()))"
b'Hello, UTF-8 world!\n'

This issue doesn't exist on other repos (the file has Unix newlines in all checkouts):

PS C:\> git clone https://github.com/python/importlib_resources
Cloning into 'importlib_resources'...
remote: Enumerating objects: 2811, done.
remote: Counting objects: 100% (732/732), done.
remote: Compressing objects: 100% (400/400), done. eceiving objects:   1% (29/2811)
remote: Total 2811 (delta 456), reused 556 (delta 309), pack-reused 2079
Receiving objects: 100% (2811/2811), 446.21 KiB | 7.44 MiB/s, done.
Resolving deltas: 100% (1796/1796), done.
PS C:\> cd importlib_resources
PS C:\importlib_resources> python -c "import pathlib; print(repr(pathlib.Path('importlib_resources/tests/namespacedata01/utf-8.file').read_bytes()))" 
b'Hello, UTF-8 world!\n'


I'm not sure there's much this project can do, except maybe consider minimizing the number of files that need customization.

As a former Windows enthusiast, I found the CRLF changes to be annoying an not particularly useful, so I sought to use LF for newlines wherever possible, for simplicity and consistency. Some editors (notably Notepad) would not handle these newlines well, but almost all other editors would handle them just fine.

This project could consider standardizing on Unix newlines with a small number of exceptions rather than allowing files by default to be converted to platform-specific newlines.

I'm yet unsure what setting it is about the CPython repo that causes newlines to be customized per platform but importlib_resources does not.
History
Date User Action Args
2021-07-30 14:21:05jaracosetrecipients: + jaraco, docs@python
2021-07-30 14:21:05jaracosetmessageid: <1627654865.8.0.668589058029.issue44779@roundup.psfhosted.org>
2021-07-30 14:21:05jaracolinkissue44779 messages
2021-07-30 14:21:04jaracocreate