digitalmars.D.learn - File size
- harakim (31/31) Aug 21 2023 I have been doing some backups and I wrote a utility that
- FeepingCreature (5/24) Aug 21 2023 Can you print some of the wrong sizes? D's DirEntry iteration
- harakim (2/6) Aug 21 2023 Yes! I will get that information tomorrow.
- harakim (20/24) Aug 22 2023 Thanks for the suggestion. I was working on getting the list for
- FeepingCreature (2/27) Aug 23 2023 That's hilarious! I'm happy you found it.
- harakim (4/5) Aug 24 2023 Me too! Thanks for the support.
I have been doing some backups and I wrote a utility that determines if files are an exact match. As a shortcut, I check the file size. So far so good on this with millions of files until I found something odd: getSize() and DirEntry's .size are producing different values. This is the relevant code: ``` if (sourceFile.size != getSize(destinationFilename)) { if (getSize(sourceFile.name) != getSize(destinationFilename)) writeln("Also did not match"); else writeln("Did match so this is odd"); return ArchivalStatus.SizeDidNotMatch; } ``` Whereas before it just returned SizeDidNotMatch, now it also prints "Did match so this is odd". It seems really odd that getSize(sourceFile.name) is returning a different number than sourceFile.size. This is an external HDD on windows formatted in ntfs that it is reading. I believe I originally wrote the files to the file system in Windows, but then today I cut and paste them (in the same drive) in Linux. However, this is the first time this has happened after millions of comparisons and it only happened for about 6 files. It does happen consistently though. I have verified that the file size is that reported by getSize and not sourceFile.size and that the files open correctly. This is my compiler version: DMD32 D Compiler v2.104.2-dirty If this is actually a problem and I'm not missing something, I would not mind trying to fix this whenever I have some time.
Aug 21 2023
On Monday, 21 August 2023 at 07:52:28 UTC, harakim wrote:I have been doing some backups and I wrote a utility that determines if files are an exact match. As a shortcut, I check the file size. So far so good on this with millions of files until I found something odd: getSize() and DirEntry's .size are producing different values. ... It seems really odd that getSize(sourceFile.name) is returning a different number than sourceFile.size. This is an external HDD on windows formatted in ntfs that it is reading. I believe I originally wrote the files to the file system in Windows, but then today I cut and paste them (in the same drive) in Linux. However, this is the first time this has happened after millions of comparisons and it only happened for about 6 files. It does happen consistently though. I have verified that the file size is that reported by getSize and not sourceFile.size and that the files open correctly. ...Can you print some of the wrong sizes? D's DirEntry iteration code just calls `FindFirstFileW`/`FindNextFileW`, so this *shouldn't* be a D-specific issue, and it should be possible to reproduce this in C.
Aug 21 2023
On Monday, 21 August 2023 at 11:05:36 UTC, FeepingCreature wrote:Can you print some of the wrong sizes? D's DirEntry iteration code just calls `FindFirstFileW`/`FindNextFileW`, so this *shouldn't* be a D-specific issue, and it should be possible to reproduce this in C.Yes! I will get that information tomorrow.
Aug 21 2023
On Monday, 21 August 2023 at 11:05:36 UTC, FeepingCreature wrote:Can you print some of the wrong sizes? D's DirEntry iteration code just calls `FindFirstFileW`/`FindNextFileW`, so this *shouldn't* be a D-specific issue, and it should be possible to reproduce this in C.Thanks for the suggestion. I was working on getting the list for you when I decided to first try and reproduce this on Linux. I was not able to do so. Then I opened the Linux File Explorer and went to one of the files. There were two files by that name, with names differing only by case. In windows, I only saw one, because Windows Explorer only supports one file with an identical case-insensitive name per directory. Unsurprisingly, that is also the one that was selected by getSize(filename). The underlying windows functions must ignore case as well and select the same way as Explorer (which makes sense). That explains why Windows Explorer reported the same size as getsize(name) in every case, while DirEntry.size would match for the file with the same case as windows recognized and not for the file with a different case. I was able to get into this state because I copied the files (merged directories) in Linux. It was interesting to look into. It seems everything is working as designed. It shouldn't be an issue for me going forward either as I move more and more towards Linux.
Aug 22 2023
On Tuesday, 22 August 2023 at 16:22:52 UTC, harakim wrote:On Monday, 21 August 2023 at 11:05:36 UTC, FeepingCreature wrote:That's hilarious! I'm happy you found it.Can you print some of the wrong sizes? D's DirEntry iteration code just calls `FindFirstFileW`/`FindNextFileW`, so this *shouldn't* be a D-specific issue, and it should be possible to reproduce this in C.Thanks for the suggestion. I was working on getting the list for you when I decided to first try and reproduce this on Linux. I was not able to do so. Then I opened the Linux File Explorer and went to one of the files. There were two files by that name, with names differing only by case. In windows, I only saw one, because Windows Explorer only supports one file with an identical case-insensitive name per directory. Unsurprisingly, that is also the one that was selected by getSize(filename). The underlying windows functions must ignore case as well and select the same way as Explorer (which makes sense). That explains why Windows Explorer reported the same size as getsize(name) in every case, while DirEntry.size would match for the file with the same case as windows recognized and not for the file with a different case. I was able to get into this state because I copied the files (merged directories) in Linux. It was interesting to look into. It seems everything is working as designed. It shouldn't be an issue for me going forward either as I move more and more towards Linux.
Aug 23 2023
On Wednesday, 23 August 2023 at 08:48:26 UTC, FeepingCreature wrote:That's hilarious! I'm happy you found it.Me too! Thanks for the support. (PS I've already reformatted that drive to ext4.)
Aug 24 2023