digitalmars.D - Why?
- Paolo Invernizzi (3/3) Apr 04 https://github.com/dlang/dmd/pull/16348
- Richard (Rikki) Andrew Cattermole (4/9) Apr 04 I can certainly see it being used with shared libraries.
- Paolo Invernizzi (6/13) Apr 04 If you mean shared code for libraries, I don't see how. Isn't dub
- Richard (Rikki) Andrew Cattermole (6/20) Apr 04 No I meant shared libraries.
- Paolo Invernizzi (8/29) Apr 04 Aren't 'di' sources the target solution for library API? What's
- Richard (Rikki) Andrew Cattermole (10/45) Apr 04 You can and probably will still zip them up.
- Paolo Invernizzi (6/28) Apr 04 Package manager should take care of moving files around following
- Hipreme (9/42) Apr 04 The .di feature is a big flop of D. They are nearly useless,
- Paolo Invernizzi (7/39) Apr 04 Let's put aside the implementation of the automatic way to
- Richard (Rikki) Andrew Cattermole (6/9) Apr 04 I intend to see this happen yes.
- Hipreme (9/49) Apr 04 On the case of it being opaque, I could not care less. Yes, .di
- Adam Wilson (4/8) Apr 05 Now ... to convince Walter that loading the whole archive into
- Walter Bright (8/12) Apr 09 There's a switch in the source code to read it all at once or use memory...
- Richard (Rikki) Andrew Cattermole (5/9) Apr 09 That doesn't sound right.
- Richard (Rikki) Andrew Cattermole (12/23) Apr 09 It appears to have been true as of Windows 2000.
- Richard (Rikki) Andrew Cattermole (1/1) Apr 09 https://issues.dlang.org/show_bug.cgi?id=24494
- Walter Bright (3/5) Apr 09 Code generation has changed to be PIC (Position Independent Code) so thi...
- Richard (Rikki) Andrew Cattermole (9/16) Apr 09 From Windows Internals 5 and WinAPI docs, it seems as though it does
- Walter Bright (3/17) Apr 09 Instead of using pointers, use offsets from the beginning of the file.
- ryuukk_ (6/20) Apr 09 Who managed to convince you to spend time working on this?
- Walter Bright (9/10) Apr 09 Nobody. I've wanted to do it for decades, just never got around to it. W...
- Adam Wilson (12/20) Apr 10 Great.
- Paulo Pinto (5/18) Apr 10 Not only we don't care about file access times to JAR/WAR/EAR and
- Hipreme (18/23) Apr 04 Rationale:
- Paolo Invernizzi (3/10) Apr 04 What you can't do with `dmd -I` that you can do with sar?
- Walter Bright (2/3) Apr 09 Unfortunately, some things cannot be benchmarked until they are built.
- Paolo Invernizzi (3/7) Apr 09 Exactly, mine It's a bet ... but hey, I'll be happy to lost it,
- Steven Schveighoffer (21/29) Apr 09 I will also bet that any difference in compile time will be
- Walter Bright (28/45) Apr 09 On my timing on compiling hello world, a 1.412s build becomes 1.375s, 35...
- Steven Schveighoffer (31/70) Apr 10 Yes, the nice thing is knowing you will not have to ask the
- Paulo Pinto (9/17) Apr 10 C++ compilers are already on the next level, past PCH, with C++
- Walter Bright (9/16) Apr 10 That's more or less what my C++ compiler did back in the 1990s. The symb...
- Walter Bright (15/15) Apr 10 We certainly could do more with .sar files, we just have to start somewh...
- Steven Schveighoffer (25/44) Apr 11 No. tar programs would work fine with it. We could indicate they
- Nick Treleaven (11/30) Apr 11 Sounds like a good solution. Users would be able to use e.g. any
- Walter Bright (16/20) Apr 13 We use standard object formats because we don't have a linker. I've spen...
- Steven Schveighoffer (33/57) Apr 14 Exactly, we don't need to be responsible for all the things.
https://github.com/dlang/dmd/pull/16348 *sigh* /P
Apr 04
On 04/04/2024 10:55 PM, Paolo Invernizzi wrote:https://github.com/dlang/dmd/pull/16348 *sigh* /PI can certainly see it being used with shared libraries. However there will need to be some changes or at least acknowledgement for cli args per file and support passing it via the import path switch.
Apr 04
On Thursday, 4 April 2024 at 10:18:10 UTC, Richard (Rikki) Andrew Cattermole wrote:On 04/04/2024 10:55 PM, Paolo Invernizzi wrote:If you mean shared code for libraries, I don't see how. Isn't dub the *official* tool to use for library? What's the problem in revamping the way dub download / organise / handle source distributions?https://github.com/dlang/dmd/pull/16348 *sigh* /PI can certainly see it being used with shared libraries.
Apr 04
On 05/04/2024 12:07 AM, Paolo Invernizzi wrote:On Thursday, 4 April 2024 at 10:18:10 UTC, Richard (Rikki) Andrew Cattermole wrote:No I meant shared libraries. Specifically the distribution of the source files that act as the interface to it. That way you have the .sar file and the .dll and that's everything you need to use it.On 04/04/2024 10:55 PM, Paolo Invernizzi wrote:If you mean shared code for libraries, I don't see how. Isn't dub the *official* tool to use for library? What's the problem in revamping the way dub download / organise / handle source distributions?https://github.com/dlang/dmd/pull/16348 *sigh* /PI can certainly see it being used with shared libraries.
Apr 04
On Thursday, 4 April 2024 at 11:10:04 UTC, Richard (Rikki) Andrew Cattermole wrote:On 05/04/2024 12:07 AM, Paolo Invernizzi wrote:Aren't 'di' sources the target solution for library API? What's the problem in distributing a zip or tar? In C++ you usually have a specific "include" directory with all you need, what's the burden in doing a zip with the shared library binary and the 'di' directories? /POn Thursday, 4 April 2024 at 10:18:10 UTC, Richard (Rikki) Andrew Cattermole wrote:No I meant shared libraries. Specifically the distribution of the source files that act as the interface to it. That way you have the .sar file and the .dll and that's everything you need to use it.On 04/04/2024 10:55 PM, Paolo Invernizzi wrote:If you mean shared code for libraries, I don't see how. Isn't dub the *official* tool to use for library? What's the problem in revamping the way dub download / organise / handle source distributions?https://github.com/dlang/dmd/pull/16348 *sigh* /PI can certainly see it being used with shared libraries.
Apr 04
On 05/04/2024 1:27 AM, Paolo Invernizzi wrote:On Thursday, 4 April 2024 at 11:10:04 UTC, Richard (Rikki) Andrew Cattermole wrote:You can and probably will still zip them up. Getting the number of files down makes it a bit easier to work with when moving them around or using it. I may like the idea of it, but not enough to be arguing for it, so my main concern is making sure Walter isn't simplifying it down to a point where we will have issues with it. I.e. recommending the spec should be a little more complicated than what the code he has written so far (my improved spec): https://gist.github.com/rikkimax/d75bdd1cb9eb9aa7bacb532b69ed398cOn 05/04/2024 12:07 AM, Paolo Invernizzi wrote:Aren't 'di' sources the target solution for library API? What's the problem in distributing a zip or tar? In C++ you usually have a specific "include" directory with all you need, what's the burden in doing a zip with the shared library binary and the 'di' directories? /POn Thursday, 4 April 2024 at 10:18:10 UTC, Richard (Rikki) Andrew Cattermole wrote:No I meant shared libraries. Specifically the distribution of the source files that act as the interface to it. That way you have the .sar file and the .dll and that's everything you need to use it.On 04/04/2024 10:55 PM, Paolo Invernizzi wrote:If you mean shared code for libraries, I don't see how. Isn't dub the *official* tool to use for library? What's the problem in revamping the way dub download / organise / handle source distributions?https://github.com/dlang/dmd/pull/16348 *sigh* /PI can certainly see it being used with shared libraries.
Apr 04
On Thursday, 4 April 2024 at 12:33:58 UTC, Richard (Rikki) Andrew Cattermole wrote:On 05/04/2024 1:27 AM, Paolo Invernizzi wrote:Package manager should take care of moving files around following a given configuration, not users. Let the package manager be cool enough to handle the specific problem that SAR files are supposed to solve.On Thursday, 4 April 2024 at 11:10:04 UTC, Richard (Rikki) Andrew Cattermole wrote:You can and probably will still zip them up. Getting the number of files down makes it a bit easier to work with when moving them around or using it. I may like the idea of it, but not enough to be arguing for it, so my main concern is making sure Walter isn't simplifying it down to a point where we will have issues with it. I.e. recommending the spec should be a little more complicated than what the code he has written so far (my improved spec): https://gist.github.com/rikkimax/d75bdd1cb9eb9aa7bacb532b69ed398c[...]Aren't 'di' sources the target solution for library API? What's the problem in distributing a zip or tar? In C++ you usually have a specific "include" directory with all you need, what's the burden in doing a zip with the shared library binary and the 'di' directories? /P
Apr 04
On Thursday, 4 April 2024 at 12:27:00 UTC, Paolo Invernizzi wrote:On Thursday, 4 April 2024 at 11:10:04 UTC, Richard (Rikki) Andrew Cattermole wrote:The .di feature is a big flop of D. They are nearly useless, their generation is dumb since no code analysis is done for removing useless imports. It should analyze the code and: check if the type import is used or not. Also, public imports are always kept. There's no reason nowadays to use auto DI generation. One must either completely ignore this feature or just create their own DI files.On 05/04/2024 12:07 AM, Paolo Invernizzi wrote:Aren't 'di' sources the target solution for library API? What's the problem in distributing a zip or tar? In C++ you usually have a specific "include" directory with all you need, what's the burden in doing a zip with the shared library binary and the 'di' directories? /POn Thursday, 4 April 2024 at 10:18:10 UTC, Richard (Rikki) Andrew Cattermole wrote:No I meant shared libraries. Specifically the distribution of the source files that act as the interface to it. That way you have the .sar file and the .dll and that's everything you need to use it.On 04/04/2024 10:55 PM, Paolo Invernizzi wrote:If you mean shared code for libraries, I don't see how. Isn't dub the *official* tool to use for library? What's the problem in revamping the way dub download / organise / handle source distributions?https://github.com/dlang/dmd/pull/16348 *sigh* /PI can certainly see it being used with shared libraries.
Apr 04
On Thursday, 4 April 2024 at 13:15:50 UTC, Hipreme wrote:On Thursday, 4 April 2024 at 12:27:00 UTC, Paolo Invernizzi wrote:Let's put aside the implementation of the automatic way to generate '.di', and rephrase: Are .di intended to be the correct way to expose _public_ API of an opaque library binary? What's the problem SAR targets to solve that a package manager can't solve? Why D needs it?On Thursday, 4 April 2024 at 11:10:04 UTC, Richard (Rikki) Andrew Cattermole wrote:The .di feature is a big flop of D. They are nearly useless, their generation is dumb since no code analysis is done for removing useless imports. It should analyze the code and: check if the type import is used or not. Also, public imports are always kept. There's no reason nowadays to use auto DI generation. One must either completely ignore this feature or just create their own DI files.On 05/04/2024 12:07 AM, Paolo Invernizzi wrote:Aren't 'di' sources the target solution for library API? What's the problem in distributing a zip or tar? In C++ you usually have a specific "include" directory with all you need, what's the burden in doing a zip with the shared library binary and the 'di' directories? /P[...]No I meant shared libraries. Specifically the distribution of the source files that act as the interface to it. That way you have the .sar file and the .dll and that's everything you need to use it.
Apr 04
On 05/04/2024 3:11 AM, Paolo Invernizzi wrote:Let's put aside the implementation of the automatic way to generate '.di', and rephrase: Are .di intended to be the correct way to expose /public/ API of an opaque library binary?I intend to see this happen yes. If we don't do this, language features like inference simply make it impossible to link against code that has had it. It will also mean shared libraries can hide their internal details, which is something I care about greatly.
Apr 04
On Thursday, 4 April 2024 at 14:11:35 UTC, Paolo Invernizzi wrote:On Thursday, 4 April 2024 at 13:15:50 UTC, Hipreme wrote:On the case of it being opaque, I could not care less. Yes, .di files are the correct way to expose public API. When they are used correctly, you could reduce by a lot the compilation time required by your files. SAR solves exactly 0 things. It wasn't even put to test and Dennis even tested in the thread discussion SAR vs dmd -i. DMD -i was faster. Walter said that a cold run could be way faster though since the files won't need to be mem mapped.On Thursday, 4 April 2024 at 12:27:00 UTC, Paolo Invernizzi wrote:Let's put aside the implementation of the automatic way to generate '.di', and rephrase: Are .di intended to be the correct way to expose _public_ API of an opaque library binary? What's the problem SAR targets to solve that a package manager can't solve? Why D needs it?On Thursday, 4 April 2024 at 11:10:04 UTC, Richard (Rikki) Andrew Cattermole wrote:The .di feature is a big flop of D. They are nearly useless, their generation is dumb since no code analysis is done for removing useless imports. It should analyze the code and: check if the type import is used or not. Also, public imports are always kept. There's no reason nowadays to use auto DI generation. One must either completely ignore this feature or just create their own DI files.On 05/04/2024 12:07 AM, Paolo Invernizzi wrote:Aren't 'di' sources the target solution for library API? What's the problem in distributing a zip or tar? In C++ you usually have a specific "include" directory with all you need, what's the burden in doing a zip with the shared library binary and the 'di' directories? /P[...]No I meant shared libraries. Specifically the distribution of the source files that act as the interface to it. That way you have the .sar file and the .dll and that's everything you need to use it.
Apr 04
On Thursday, 4 April 2024 at 15:36:31 UTC, Hipreme wrote:SAR solves exactly 0 things. It wasn't even put to test and Dennis even tested in the thread discussion SAR vs dmd -i. DMD -i was faster. Walter said that a cold run could be way faster though since the files won't need to be mem mapped.Now ... to convince Walter that loading the whole archive into RAM once will be better than mem-mapping... RAM is cheap and source code is not a big memory hit.
Apr 05
On 4/5/2024 5:20 PM, Adam Wilson wrote:Now ... to convince Walter that loading the whole archive into RAM once will be better than mem-mapping... RAM is cheap and source code is not a big memory hit.There's a switch in the source code to read it all at once or use memory mapping, so I could benchmark which is faster. There's no significant difference, probably because the files weren't large enough. BTW, I recall that executable files are not read into memory and then jumped to. They are memory-mapped files, this is so the executable can start up much faster. Pieces of the executable are loaded in on demand, although the OS will speculatively load in pieces, too.
Apr 09
On 10/04/2024 5:18 AM, Walter Bright wrote:BTW, I recall that executable files are not read into memory and then jumped to. They are memory-mapped files, this is so the executable can start up much faster. Pieces of the executable are loaded in on demand, although the OS will speculatively load in pieces, too.That doesn't sound right. Address randomization, Windows remapping of symbols at runtime (with state that is kept around so you can do it later), all suggest it isn't like that now.
Apr 09
On 10/04/2024 5:22 AM, Richard (Rikki) Andrew Cattermole wrote:On 10/04/2024 5:18 AM, Walter Bright wrote:It appears to have been true as of Windows 2000. https://learn.microsoft.com/en-us/archive/msdn-magazine/2002/march/windows-2000-loader-what-goes-on-inside-windows-2000-solving-the-mysteries-of-the-loader See: LdrpMapDll However I don't think those two features may have existed at the time. As of Windows Internals 5, the cache manager uses 256kb blocks as part of memory mapping (very useful information that!). Would be worth double checking that this is the default for std.mmap. So it seems I'm half right, there is no way Windows could be memory mapping binaries when address randomization is turned on for a given block that has rewrites for symbol locations, but it may be memory mapping large blocks of data if it doesn't.BTW, I recall that executable files are not read into memory and then jumped to. They are memory-mapped files, this is so the executable can start up much faster. Pieces of the executable are loaded in on demand, although the OS will speculatively load in pieces, too.That doesn't sound right. Address randomization, Windows remapping of symbols at runtime (with state that is kept around so you can do it later), all suggest it isn't like that now.
Apr 09
On 4/9/2024 10:22 AM, Richard (Rikki) Andrew Cattermole wrote:Address randomization, Windows remapping of symbols at runtime (with state that is kept around so you can do it later), all suggest it isn't like that now.Code generation has changed to be PIC (Position Independent Code) so this is workable.
Apr 09
On 10/04/2024 7:04 AM, Walter Bright wrote:On 4/9/2024 10:22 AM, Richard (Rikki) Andrew Cattermole wrote:From Windows Internals 5 and WinAPI docs, it seems as though it does memory map initially. But the patching will activate CoW, so in effect it isn't memory mapped if you need to patch. Which is quite useful information for me while working with Unicode tables. If you need to patch? That won't be shared. If you don't need to patch? Who cares how much ROM is used! Don't be afraid to use 256kb in a single table. Just don't use pointers... no matter what and it'll be shared.Address randomization, Windows remapping of symbols at runtime (with state that is kept around so you can do it later), all suggest it isn't like that now.Code generation has changed to be PIC (Position Independent Code) so this is workable.
Apr 09
On 4/9/2024 12:11 PM, Richard (Rikki) Andrew Cattermole wrote:On 10/04/2024 7:04 AM, Walter Bright wrote:Right, so the executable is designed to not need patching.On 4/9/2024 10:22 AM, Richard (Rikki) Andrew Cattermole wrote:From Windows Internals 5 and WinAPI docs, it seems as though it does memory map initially. But the patching will activate CoW, so in effect it isn't memory mapped if you need to patch.Address randomization, Windows remapping of symbols at runtime (with state that is kept around so you can do it later), all suggest it isn't like that now.Code generation has changed to be PIC (Position Independent Code) so this is workable.If you don't need to patch? Who cares how much ROM is used! Don't be afraid to use 256kb in a single table. Just don't use pointers... no matter what and it'll be shared.Instead of using pointers, use offsets from the beginning of the file.
Apr 09
On Tuesday, 9 April 2024 at 17:18:02 UTC, Walter Bright wrote:On 4/5/2024 5:20 PM, Adam Wilson wrote:Who managed to convince you to spend time working on this? The solution to poor phobos build time is to stop abusing templates for code that doesn't exist for problems that nobody have https://github.com/dlang/phobos/tree/master/phobos/sysNow ... to convince Walter that loading the whole archive into RAM once will be better than mem-mapping... RAM is cheap and source code is not a big memory hit.There's a switch in the source code to read it all at once or use memory mapping, so I could benchmark which is faster. There's no significant difference, probably because the files weren't large enough. BTW, I recall that executable files are not read into memory and then jumped to. They are memory-mapped files, this is so the executable can start up much faster. Pieces of the executable are loaded in on demand, although the OS will speculatively load in pieces, too.
Apr 09
On 4/9/2024 10:58 AM, ryuukk_ wrote:Who managed to convince you to spend time working on this?Nobody. I've wanted to do it for decades, just never got around to it. What triggered it was my proposal to Adam to split Phobos modules into a much more granular structure, which would increase the number of files in it by a factor of 5 or more. (The current structure is of each module being a grab bag of marginally related functions.) A more granular nature would hopefully reduce the "every module imports every other module" problem Phobos has. But lots more modules increases aggregate file lookup times. One cannot really tell how well it works without trying it.
Apr 09
On Tuesday, 9 April 2024 at 19:11:28 UTC, Walter Bright wrote:Nobody. I've wanted to do it for decades, just never got around to it. What triggered it was my proposal to Adam to split Phobos modules into a much more granular structure, which would increase the number of files in it by a factor of 5 or more. (The current structure is of each module being a grab bag of marginally related functions.) A more granular nature would hopefully reduce the "every module imports every other module" problem Phobos has.Great. Now everybody is going to think that I started this. For the record I did **not** start this. Walter sent me this idea out of the blue after I pointed out that working with hundreds (or thousands) of files in Phobos was going This wasn't the problem I was thinking of because frankly, nobody certain advantages from a distribution standpoint. Although honestly, we're going to end up unpacking the files for other tools to use anyways.
Apr 10
On Wednesday, 10 April 2024 at 10:17:53 UTC, Adam Wilson wrote:On Tuesday, 9 April 2024 at 19:11:28 UTC, Walter Bright wrote:Not only we don't care about file access times to JAR/WAR/EAR and DLLs, we happily ship binary libraries, instead of parsing source code all the time. This looks to me at yet another distraction.[...]Great. Now everybody is going to think that I started this. For the record I did **not** start this. Walter sent me this idea out of the blue after I pointed out that working with hundreds (or thousands) of files in Phobos This wasn't the problem I was thinking of because frankly, have certain advantages from a distribution standpoint. Although honestly, we're going to end up unpacking the files for other tools to use anyways.
Apr 10
On Thursday, 4 April 2024 at 09:55:32 UTC, Paolo Invernizzi wrote:https://github.com/dlang/dmd/pull/16348 *sigh* /PRationale: https://github.com/dlang/dmd/blob/8e94bc644fc72dc3f72a00791eb52b40230ceb26/changelog/dmd.source-archive.dd#L79 The part I'm very interested in in the compilation time. This may reduce by a lot the compilation time required by dub libraries. But there would be a requirement of doing a synchronized change between all the compilers and our existing tools :) For development time, this feature might be useless. I also will need to start thinking now how to support those.2. To compile all the source files at once with DMD, the command line can getextremely long, and certainly unwieldy. With .sar files, you may not even need a makefile or builder, just: This is actually a like. It is possible to do that with `dmd -i -i=std` or something like that. The main feature one gain from makefiles or builders aren't declaring the files you're using, it is for defining version configuration. Also, I'll be renaming the thread name since it doesn't open up for any discussion.
Apr 04
On Thursday, 4 April 2024 at 10:48:50 UTC, Hipreme wrote:On Thursday, 4 April 2024 at 09:55:32 UTC, Paolo Invernizzi wrote:My 2 cents: there will be NO advantages in compilation time.[...]Rationale: https://github.com/dlang/dmd/blob/8e94bc644fc72dc3f72a00791eb52b40230ceb26/changelog/dmd.source-archive.dd#L79 [...][...]What you can't do with `dmd -I` that you can do with sar?
Apr 04
On 4/4/2024 4:12 AM, Paolo Invernizzi wrote:My 2 cents: there will be NO advantages in compilation time.Unfortunately, some things cannot be benchmarked until they are built.
Apr 09
On Tuesday, 9 April 2024 at 17:19:41 UTC, Walter Bright wrote:On 4/4/2024 4:12 AM, Paolo Invernizzi wrote:Exactly, mine It's a bet ... but hey, I'll be happy to lost it, of course!My 2 cents: there will be NO advantages in compilation time.Unfortunately, some things cannot be benchmarked until they are built.
Apr 09
On Tuesday, 9 April 2024 at 18:49:21 UTC, Paolo Invernizzi wrote:On Tuesday, 9 April 2024 at 17:19:41 UTC, Walter Bright wrote:I will also bet that any difference in compile time will be extremely insignificant. I don't bet against decades of filesystem read optimizations. Saving e.g. microseconds on a 1.5 second build isn't going to move the needle. I did reduce stats semi-recently for DMD and saved a significant percentage of stats, I don't really think it saved insane amounts of time. It was more of a "oh, I thought of a better way to do this". I think at the time, there was some resistance to adding more stats to the compiler due to the same misguided optimization beliefs, and so I started looking at it. If reducing stats by 90% wasn't significant, reducing them again likely isn't going to be noticed. See https://github.com/dlang/dmd/pull/14582 The only benefit I might see in this is to *manage* the source as one item. But I don't really know that we need a new custom format. `tar` is pretty simple. ARSD has a tar implementation that I lifted for my raylib-d installer which allows reading tar files with about [100 lines of code](https://github.com/schveiguy/raylib-d/blob/9906279494f1f83b2c4c9550779d46962af7c342/install/source/app.d#L22-L132). -SteveOn 4/4/2024 4:12 AM, Paolo Invernizzi wrote:Exactly, mine It's a bet ... but hey, I'll be happy to lost it, of course!My 2 cents: there will be NO advantages in compilation time.Unfortunately, some things cannot be benchmarked until they are built.
Apr 09
On 4/9/2024 4:42 PM, Steven Schveighoffer wrote:I will also bet that any difference in compile time will be extremely insignificant. I don't bet against decades of filesystem read optimizations. Saving e.g. microseconds on a 1.5 second build isn't going to move the needle.On my timing on compiling hello world, a 1.412s build becomes 1.375s, 35 milliseconds faster. Most of the savings appear to be due to when the archive is first accessed, its table of contents is loaded into the path cache and file cache that you developed. Then, no stats are done on the filesystem.I did reduce stats semi-recently for DMD and saved a significant percentage of stats, I don't really think it saved insane amounts of time. It was more of a "oh, I thought of a better way to do this". I think at the time, there was some resistance to adding more stats to the compiler due to the same misguided optimization beliefs, and so I started looking at it. If reducing stats by 90% wasn't significant, reducing them again likely isn't going to be noticed. See https://github.com/dlang/dmd/pull/14582Nice. I extended it so files in an archive are tracked.The only benefit I might see in this is to *manage* the source as one item.The convenience of being able to distribute a "header only" library as one file may be significant. I've always liked things that didn't need an installation program. An install should be "copy the file onto your system" and uninstall should be "delete the file" ! Back in the days of CD software, my compiler was set up so no install was necessary, just put the CD in the drive and run it. You didn't even have to set the environment variables, as the compiler would look for its files relative to where the executable file was (argv[0]). You can see vestiges of that still in today's dmd. Of course, to get it to run faster you'd XCOPY it onto the hard drive. Though some users were flummoxed by the absence of INSTALL.EXE and I'd have to explain how to use XCOPY.But I don't really know that we need a new custom format. `tar` is pretty simple. ARSD has a tar implementation that I lifted for my raylib-d installer which allows reading tar files with about [100 lines of code](https://github.com/schveiguy/raylib-d/blob/9906279494f1f83b2c4c9550779d46962af7c342/install/source/app.d#L22-L132).Thanks for the code. A tar file is serial, meaning one has to read the entire file to see what it is in it (because it was designed for tape systems where data is simply appended). The tar file doesn't have a table of contents, the filename is limited to 100 characters, and the path is limited to 155 characters. Sar files have a table of contents at the beginning, and unlimited filespec sizes. P.S. the code that actually reads the .sar file is about 20 lines! (Excluding checking for corrupt files, and the header structure definition.) The archive reader and writer can be encapsulated in a separate module, so anyone can replace it with a different format.
Apr 09
On Wednesday, 10 April 2024 at 03:47:30 UTC, Walter Bright wrote:On 4/9/2024 4:42 PM, Steven Schveighoffer wrote:Yes, the nice thing is knowing you will not have to ask the filesystem for something you know doesn't exist. Pre-loading the directory structure could do the same thing, but I think that's definitely not as efficient.I will also bet that any difference in compile time will be extremely insignificant. I don't bet against decades of filesystem read optimizations. Saving e.g. microseconds on a 1.5 second build isn't going to move the needle.On my timing on compiling hello world, a 1.412s build becomes 1.375s, 35 milliseconds faster. Most of the savings appear to be due to when the archive is first accessed, its table of contents is loaded into the path cache and file cache that you developed. Then, no stats are done on the filesystem.Consider that java archives (`.jar` files) are distributed as a package instead of individual `.class` files. And Microsoft (and other C compilers) can produce "pre-compiled headers", that take away some of the initial steps of compilation. I think there would be enthusiastic support for D archive files that reduce some of the compilation steps, or provide extra features (e.g. predetermined inference or matching compile-time switches). Especially if you aren't going to directly edit these archive files, you will be mechanically generating them, why not do more inside there?The only benefit I might see in this is to *manage* the source as one item.The convenience of being able to distribute a "header only" library as one file may be significant. I've always liked things that didn't need an installation program. An install should be "copy the file onto your system" and uninstall should be "delete the file" ! Back in the days of CD software, my compiler was set up so no install was necessary, just put the CD in the drive and run it. You didn't even have to set the environment variables, as the compiler would look for its files relative to where the executable file was (argv[0]). You can see vestiges of that still in today's dmd. Of course, to get it to run faster you'd XCOPY it onto the hard drive. Though some users were flummoxed by the absence of INSTALL.EXE and I'd have to explain how to use XCOPY.A tar file is serial, meaning one has to read the entire file to see what it is in it (because it was designed for tape systems where data is simply appended).You can index a tar file easily. Each file is preceded by a header with the information about the file (including size). So you can determine the catalog by seeking to each header. Note also that we can work with tar files to add indexes that are backwards compatible with existing tools. Remember, we are generating this *from a tool that we control*. Prepending an index "file" is trivial.The tar file doesn't have a table of contents, the filename is limited to 100 characters, and the path is limited to 155 characters.I'm not too worried about such things. I've never run into filename length problems with tar. But also, most modern tar formats do not have these limitations: https://www.gnu.org/software/tar/manual/html_section/Formats.htmlSar files have a table of contents at the beginning, and unlimited filespec sizes. P.S. the code that actually reads the .sar file is about 20 lines! (Excluding checking for corrupt files, and the header structure definition.) The archive reader and writer can be encapsulated in a separate module, so anyone can replace it with a different format.I would suggest we replace it with a modern tar format for maximum compatibility with existing tools. We already have seen the drawbacks of using the abandoned `sdl` format for dub packages. We should not repeat that mistake. -Steve
Apr 10
On Wednesday, 10 April 2024 at 16:42:53 UTC, Steven Schveighoffer wrote:On Wednesday, 10 April 2024 at 03:47:30 UTC, Walter Bright wrote:C++ compilers are already on the next level, past PCH, with C++ modules. VC++ uses a database format for BMI (Binary Module Interface), has open sourced it, and there are some people trying to champion it as means to have C++ tooling similar to what Java and .NET IDEs can do with JVM/CLR metadata. https://devblogs.microsoft.com/cppblog/open-sourcing-ifc-sdk-for-cpp-modules/[...]Yes, the nice thing is knowing you will not have to ask the filesystem for something you know doesn't exist. Pre-loading the directory structure could do the same thing, but I think that's definitely not as efficient. [...]
Apr 10
On 4/10/2024 9:54 AM, Paulo Pinto wrote:C++ compilers are already on the next level, past PCH, with C++ modules. VC++ uses a database format for BMI (Binary Module Interface), has open sourced it, and there are some people trying to champion it as means to have C++ tooling similar to what Java and .NET IDEs can do with JVM/CLR metadata. https://devblogs.microsoft.com/cppblog/open-sourcing-ifc-sdk-for-cpp-modules/That's more or less what my C++ compiler did back in the 1990s. The symbol table and AST was created in a memory-mapped file, which could be read back in to jump-start the next compilation. Yes, it was faster. But the problem C++ has is compiling it is inherently slow due to the design of the language. My experience with that led to D being fast to compile, because I knew what to get rid of. With a language that compiles fast, it isn't worthwhile to have a binary precompiled module.
Apr 10
We certainly could do more with .sar files, we just have to start somewhere. If we're going to add features to a .tar file, like an index, aren't we then creating our own format and won't be able to use existing .tar programs? Yes, one can skip through a .tar archive indexing as one goes. The problem is one winds up reading the .tar archive. With the .sar format, the index is at the beginning and none of the rest of the file is read in, unless actually needed. .tar is the only archive format I'm aware of that does not have an index section, and that's because it's designed for append-only magtapes. (Talk about ancient obsolete technology!) Many archive formats also include optional compression, and various compression methods at that. All that support would have to be added to the compiler, as otherwise I'll get the bug reports "dmd failed with my .zip file!" Still, the concept of presenting things as a single file is completely distinct from the file format used. The archive format being pluggable is certainly an option.
Apr 10
On Thursday, 11 April 2024 at 01:36:57 UTC, Walter Bright wrote:We certainly could do more with .sar files, we just have to start somewhere. If we're going to add features to a .tar file, like an index, aren't we then creating our own format and won't be able to use existing .tar programs?No. tar programs would work fine with it. We could indicate they are normal files, and normal tar programs would just extract an "index" file when expanding, or we could indicate they are vendor-specific extensions, which should be ignored or processed as normal files by other tar programs. We are not the first ones to think of these things, it is in the spec.Yes, one can skip through a .tar archive indexing as one goes. The problem is one winds up reading the .tar archive. With the .sar format, the index is at the beginning and none of the rest of the file is read in, unless actually needed. .tar is the only archive format I'm aware of that does not have an index section, and that's because it's designed for append-only magtapes. (Talk about ancient obsolete technology!)This would be a fallback, when an index isn't provided as the first file. So normal tar source files could be supported.Many archive formats also include optional compression, and various compression methods at that. All that support would have to be added to the compiler, as otherwise I'll get the bug reports "dmd failed with my .zip file!"tar format doesn't have compression, though the tar executable supports it. I wouldn't recommend zip files as a supported archive format, and using compressed tarballs would definitely result in reading the whole file (you can't skip N bytes when you don't know the compressed size).Still, the concept of presenting things as a single file is completely distinct from the file format used. The archive format being pluggable is certainly an option.I stress again, we should not introduce esoteric formats that are mostly equivalent to existing formats without a good reason. The first option should be to use existing formats, seeing if we can fit our use case into them. If that is impossible or prevents certain features, then we can consider using a new format. It should be a high bar to add new file formats to the toolchain, as this affects all tools that people depend on and use. Think of why we use standard object formats instead of our own format (which would allow much more tight integration with the language). -Steve
Apr 11
On Thursday, 11 April 2024 at 15:28:34 UTC, Steven Schveighoffer wrote:On Thursday, 11 April 2024 at 01:36:57 UTC, Walter Bright wrote:Sounds like a good solution. Users would be able to use e.g. any GUI program that supports tar to extract a file from the archive. The advantage is for reading. D-specific tools should be used to write the file. If there is any concern about this, it could even have a different extension so long as the file format is standard tar - users that know this can still benefit from tar readers. There seems to be precedent for this - apparently .jar files are .zip files.If we're going to add features to a .tar file, like an index, aren't we then creating our own format and won't be able to use existing .tar programs?No. tar programs would work fine with it. We could indicate they are normal files, and normal tar programs would just extract an "index" file when expanding, or we could indicate they are vendor-specific extensions, which should be ignored or processed as normal files by other tar programs. We are not the first ones to think of these things, it is in the spec.Or just error if a tar file doesn't have the expected index file.Yes, one can skip through a .tar archive indexing as one goes. The problem is one winds up reading the .tar archive. With the .sar format, the index is at the beginning and none of the rest of the file is read in, unless actually needed. .tar is the only archive format I'm aware of that does not have an index section, and that's because it's designed for append-only magtapes. (Talk about ancient obsolete technology!)This would be a fallback, when an index isn't provided as the first file. So normal tar source files could be supported.
Apr 11
On 4/11/2024 8:28 AM, Steven Schveighoffer wrote:Think of why we use standard object formats instead of our own format (which would allow much more tight integration with the language).We use standard object formats because we don't have a linker. I've spent a lot of time trying to understand their byzantine structure. It's not fun work. I mentioned that the archive support can be pluggable. It's only two functions with a generic interface to them. If we aren't going to move forward with source archives, it would be a giant waste of time to learn .tar and all its variations. I chose to invent the .sar format because it's 20 lines of code to read them, and about the same to write them. Even doing a survey of the top 10 archive formats would have taken more time than the entire PR, let alone the time spent debating them. The source archive PR is a proof of concept. The actual archive format is irrelevant.or we could indicate they are vendor-specific extensionsWouldn't that defeat the purpose of being a .tar format?It should be a high bar to add new file formats to the toolchain, as thisaffects all tools that people depend on and use. Using a .tar format would affect all the dlang source code tools just as much as using the .sar format would.
Apr 13
On Sunday, 14 April 2024 at 06:04:02 UTC, Walter Bright wrote:On 4/11/2024 8:28 AM, Steven Schveighoffer wrote:Exactly, we don't need to be responsible for all the things. Using standard object format means we don't have to write our own linker.Think of why we use standard object formats instead of our own format (which would allow much more tight integration with the language).We use standard object formats because we don't have a linker. I've spent a lot of time trying to understand their byzantine structure. It's not fun work.I mentioned that the archive support can be pluggable. It's only two functions with a generic interface to them. If we aren't going to move forward with source archives, it would be a giant waste of time to learn .tar and all its variations.Fair point. If this doesn't fly, then learning all the variations of tar might not be applicable (though I can say I personally "learned" tar in about 15 minutes, it's really simple).I chose to invent the .sar format because it's 20 lines of code to read them, and about the same to write them. Even doing a survey of the top 10 archive formats would have taken more time than the entire PR, let alone the time spent debating them.This misses the point. It's not that it's easy to add to the compiler. Both are easy, both are straightforward, one might be easier than the other, but it's probably a wash (maybe 2 hours vs 4 hours?) The problem is *all the other tools* that people might want to use. And specifically, I'm talking about IDEs. You have a 20 line solution in D, how does that help an IDE written in Java? However, Java has `tar` support that is tried and tested, and probably already in the IDE codebase itself. Writing 20 lines of code isn't "mission accomplished". We now have to ask all IDE providers to support this for symbol lookup. That's what I'm talking about.The source archive PR is a proof of concept. The actual archive format is irrelevant.This is good, and I understand what you are trying to say. As long as it remains PoC, with the expectation that if it turns out to be useful, we address these ecosystem issues, then I have no objections.No, vendor-specific sections are in the spec. Existing tar programs would still read these just fine. But even if we wanted to avoid that, adding an index can be done by including a specific filename that the D compiler recognizes as the index.or we could indicate they are vendor-specific extensionsWouldn't that defeat the purpose of being a .tar format?Yes, of course. It's just, will there be a ready-made library available for whatever IDEs are using for language/libraries? With .sar, the answer is no (it hasn't been invented yet). With .tar, it's likely yes. -SteveIt should be a high bar to add new file formats to thetoolchain, as this affects all tools that people depend on and use. Using a .tar format would affect all the dlang source code tools just as much as using the .sar format would.
Apr 14