www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Why GNU coreutils/dd is creating a dummy file more efficiently than

reply BoQsc <vaidas.boqsc gmail.com> writes:
This code of D creates a dummy 47,6 MB text file filled with Nul 
characters in about 9 seconds

import std.stdio, std.process;

void main() {

	writeln("Creating a dummy file");
	File file = File("test.txt", "w");

    for (int i = 0; i < 50000000; i++)
	{
		file.write("\x00");
	}
    file.close();

}


While GNU coreutils dd can create 500mb dummy Nul file in a 
second.
https://github.com/coreutils/coreutils/blob/master/src/dd.c

What are the explanations for this?
May 23 2019
next sibling parent reply Cym13 <cpicard purrfect.fr> writes:
On Thursday, 23 May 2019 at 09:09:05 UTC, BoQsc wrote:
 This code of D creates a dummy 47,6 MB text file filled with 
 Nul characters in about 9 seconds

 import std.stdio, std.process;

 void main() {

 	writeln("Creating a dummy file");
 	File file = File("test.txt", "w");

    for (int i = 0; i < 50000000; i++)
 	{
 		file.write("\x00");
 	}
    file.close();

 }


 While GNU coreutils dd can create 500mb dummy Nul file in a 
 second.
 https://github.com/coreutils/coreutils/blob/master/src/dd.c

 What are the explanations for this?
If you're talking about benchmarking it's important to provide both source code and how you use/compile them. However in that case I think I can point you in the right direction already: I'll suppose that you used something like that: dd if=/dev/zero of=testfile bs=1M count=500 Note in particular the blocksize argument. I set it to 1M but by default it's 512 bytes. If you use strace with the command above you'll see a series of write() calls, each writting 1M of null bytes to testfile. That's the main difference between your code and what dd does: it doesn't write 1 byte at a time. This results in way less system calls and system calls are very expensive. To go fast, read/write bigger chunks. I may be wrong though, maybe you tested with a bs of 1 byte, so test for yourself and if necessary provide all informations and not just pieces so that we are able to reproduce your test :)
May 23 2019
parent reply kdevel <kdevel vogtner.de> writes:
On Thursday, 23 May 2019 at 09:44:15 UTC, Cym13 wrote:

[...]

 Note in particular the blocksize argument. I set it to 1M but 
 by default it's 512 bytes. If you use strace with the command 
 above you'll see a series of write() calls, each writting 1M of 
 null bytes to testfile. That's the main difference between your 
 code and what dd does: it doesn't write 1 byte at a time.
His code doesn't write 1 byte at a time either. strace on my machine reports a blocksize of 4096. If I use this blocksize with dd it still takes only a fraction of a second to complete.
 This results in way less system calls and system calls are very 
 expensive.
His program and dd with bs=4K both have the same number of syscalls.
 To go fast, read/write bigger chunks.
Or use rawWrite instead of write (reduces the runtime to about 1.6 s). When using write time is IMHO spent in unicode processing and/or locking. Or write more characters at a time. The code below takes 60 ms to complete. y.d ``` import std.stdio, std.process; void main() { writeln("Creating a dummy file"); File file = File("test.txt", "w"); ubyte [4096] nuls; for (int i = 0; i < 50_000_000 / nuls.sizeof; ++i) file.write(cast (char[nuls.sizeof]) nuls); file.close(); } ```
May 23 2019
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, May 23, 2019 at 06:20:23PM +0000, kdevel via Digitalmars-d-learn wrote:
 On Thursday, 23 May 2019 at 09:44:15 UTC, Cym13 wrote:
[...]
 To go fast, read/write bigger chunks.
Or use rawWrite instead of write (reduces the runtime to about 1.6 s). When using write time is IMHO spent in unicode processing and/or locking. Or write more characters at a time. The code below takes 60 ms to complete.
If you're on Linux, writing a bunch of zeroes just to create a large file is a waste of time. Just use the kernel's sparse file feature: https://www.systutorials.com/136652/handling-sparse-files-on-linux/ The blocks won't actually get allocated until you write something to them, so this beats any write-based method of creating a file filled with zeroes -- probably by several orders of magnitude. :-P T -- It is not the employer who pays the wages. Employers only handle the money. It is the customer who pays the wages. -- Henry Ford
May 23 2019
parent Daniel =?UTF-8?B?S296w6Fr?= <kozzi11 gmail.com> writes:
On Thursday, 23 May 2019 at 18:37:17 UTC, H. S. Teoh wrote:
 On Thu, May 23, 2019 at 06:20:23PM +0000, kdevel via 
 Digitalmars-d-learn wrote:
 On Thursday, 23 May 2019 at 09:44:15 UTC, Cym13 wrote:
[...]
 To go fast, read/write bigger chunks.
Or use rawWrite instead of write (reduces the runtime to about 1.6 s). When using write time is IMHO spent in unicode processing and/or locking. Or write more characters at a time. The code below takes 60 ms to complete.
If you're on Linux, writing a bunch of zeroes just to create a large file is a waste of time. Just use the kernel's sparse file feature: https://www.systutorials.com/136652/handling-sparse-files-on-linux/ The blocks won't actually get allocated until you write something to them, so this beats any write-based method of creating a file filled with zeroes -- probably by several orders of magnitude. :-P T
Yes using sparse files is good, but only for this case. If you need write something else than null it is not so usable. But AFAIK not all FS support this anyway
May 23 2019
prev sibling next sibling parent Daniel Kozak <kozzi11 gmail.com> writes:
On Thu, May 23, 2019 at 11:10 AM BoQsc via Digitalmars-d-learn <
digitalmars-d-learn puremagic.com> wrote:

 This code of D creates a dummy 47,6 MB text file filled with Nul
 characters in about 9 seconds

 import std.stdio, std.process;

 void main() {

         writeln("Creating a dummy file");
         File file = File("test.txt", "w");

     for (int i = 0; i < 50000000; i++)
         {
                 file.write("\x00");
         }
     file.close();

 }


 While GNU coreutils dd can create 500mb dummy Nul file in a
 second.
 https://github.com/coreutils/coreutils/blob/master/src/dd.c

 What are the explanations for this?
https://matthias-endler.de/2017/yes/
May 23 2019
prev sibling next sibling parent Daniel Kozak <kozzi11 gmail.com> writes:
On Thu, May 23, 2019 at 11:19 PM Daniel Kozak <kozzi11 gmail.com> wrote:

Fixed version without decode to dchar

void main()
{
    import std.range : array, cycle, take;
    import std.stdio;
    import std.utf;
    immutable buf_size = 8192;
    immutable buf = "\x00".byCodeUnit.cycle.take(buf_size).array;
    auto cnt = 50_000_000 / buf_size;
    immutable tail = "\x00".byCodeUnit.cycle.take(50_000_000 %
buf_size).array;
    File file = File("test.txt", "w");
    while(cnt--)
        file.rawWrite(buf);
    file.rawWrite(tail);
}
May 23 2019
prev sibling parent Daniel Kozak <kozzi11 gmail.com> writes:
On Thu, May 23, 2019 at 11:06 PM Daniel Kozak <kozzi11 gmail.com> wrote:

 On Thu, May 23, 2019 at 11:10 AM BoQsc via Digitalmars-d-learn <
 digitalmars-d-learn puremagic.com> wrote:
 https://matthias-endler.de/2017/yes/
So this should do it void main() { import std.range : array, cycle, take; import std.stdio; immutable buf_size = 8192; immutable buf = "\x00".cycle.take(buf_size).array; auto cnt = 50_000_000 / buf_size; immutable tail = "\x00".cycle.take(50_000_000 % buf_size).array; File file = File("test.txt", "w"); while(cnt--) file.rawWrite(buf); file.rawWrite(tail); }
May 23 2019