This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mbrijun@gmail.com
Recipients eryksun, mbrijun@gmail.com, paul.moore, steve.dower, tim.golden, zach.ware
Date 2020-03-28.20:13:28
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CANie7rgO9=8BB3VSjwiqKgvpg2ANns4p2EssWn9tMZwX8rZ9hw@mail.gmail.com>
In-reply-to <1585407427.66.0.678568192152.issue40095@roundup.psfhosted.org>
Content
Hi Steve, Eryk,

thank you very much for looking into this. I was looking into "st_ino"
as a potential substitute of a full path of a file when it comes to
uniquely identifying that file in a database.

> ReFS uses a 128-bit file ID, which I gather consists of a 64-bit directory ID and a 64-bit relative ID. (Take this with a grain of salt. AFAIK, Microsoft hasn't published a spec for ReFS.) The latter is 0 for the directory itself and increments by 1 for each file created in the directory, with no reuse of previous values if a file is deleted or moved. If that's correct, and if "test.jpg" was created in "\test", then the directory ID of "\test" is 0x29d5, and the relative file ID is 0x4ae.

This assumption seems to be correct. All files within the same
directory have identical first half of their ID, as reported by
"fsutil".

U:\test>fsutil file queryfileid test.jpg
File ID is 0x00000000000029d500000000000004ae

U:\test>fsutil file queryfileid test.nef
File ID is 0x00000000000029d50000000000000483

U:\test>fsutil file queryfileid test.ARW
File ID is 0x00000000000029d50000000000000484

U:\test>fsutil file queryfileid test.db
File ID is 0x00000000000029d50000000000000495

>
> > >>> from pathlib import Path
> > >>> hex(Path('U:/test/test.jpg').stat().st_ino)
> > '0x4000000004ae29d5'
>
> os.stat calls WINAPI GetFileInformationByHandle, which returns a 64-bit file ID. It appears that ReFS generates this ID by concatenating the relative ID and directory ID in a way that is "not guaranteed to be unique" according to the BY_HANDLE_FILE_INFORMATION [1] docs.

The feedack from "st_ino" appears to be in total sync with "fsutil".
The only real difference (apart for the for the missing leading zeros
in each half) is the inclusion of a hex "4" at the very beginning of
the hex sequence. But even that is consistent as the "4" is present in
all cases.

>>> hex(Path('U:/test/test.jpg').stat().st_ino)
'0x4000000004ae29d5'
>>> hex(Path('U:/test/test.nef').stat().st_ino)
'0x40000000048329d5'
>>> hex(Path('U:/test/test.arw').stat().st_ino)
'0x40000000048429d5'
>>> hex(Path('U:/test/test.db').stat().st_ino)
'0x40000000049529d5'
History
Date User Action Args
2020-03-28 20:13:28mbrijun@gmail.comsetrecipients: + mbrijun@gmail.com, paul.moore, tim.golden, zach.ware, eryksun, steve.dower
2020-03-28 20:13:28mbrijun@gmail.comlinkissue40095 messages
2020-03-28 20:13:28mbrijun@gmail.comcreate