digitalmars.D.learn - mmap file performance

Andy Valencia (94/94) Apr 10 2024 I wrote a "count newlines" based on mapped files. It used about

Steven Schveighoffer (6/13) Apr 11 2024 For a repeatable comparison, you should provide the code which

Andy Valencia (43/45) Apr 11 2024 With pleasure:

kdevel (13/16) Apr 17 2024 ^^

Patrick Schluter (9/17) Apr 15 2024 The setup of a memory mapped file is relatively costly. For

Andy Valencia (10/13) Apr 15 2024 Interestingly, this performance deficit is present even when run

Patrick Schluter (12/26) Apr 24 2024 Indeed, my statement concerning file size is misleading. It's the

Andy Valencia <dont spam.me> writes:

I wrote a "count newlines" based on mapped files.  It used about 
twice the CPU of the version which just read 1 meg at a time.  I 
thought something was amiss (needless slice indirection or 
something), so I wrote the code in C.  It had the same CPU usage 
as the D version.  So...mapped files, not so much.  Not D's 
fault.  And writing it in C made me realize how much easier it is 
to code in D!

The D version:

import std.stdio : writeln;
import std.mmfile : MmFile;

const uint CHUNKSZ = 65536;

size_t
countnl(ref shared char[] data)
{
     size_t res = 0;

     foreach (c; data) {
         if (c == '\n') {
             res += 1;
         }
     }
     return res;
}

void
usage(in string progname)
{
     import core.stdc.stdlib : exit;
     import std.stdio : stderr;

     stderr.writeln("Usage is: ", progname, " %s <file> ...");
     exit(1);
}

public:
void
main(string[] argv)
{
     if (argv.length < 2) {
         usage(argv[0]);
     }
     foreach(mn; argv[1 .. $]) {
         auto mf = new MmFile(mn);
         auto data = cast(shared char[])mf.opSlice();
         size_t res;
         res = countnl(data);
         writeln(mn, ": ", res);
     }
}

And the C one (no performance gain over D):

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>

static unsigned long
countnl(int fd, char *nm)
{
     char *buf, *p;
     struct stat st;
     unsigned int cnt;
     unsigned long res;

     if (fstat(fd, &st) < 0) {
         perror(nm);
         return(0);
     }

     cnt = st.st_size;
     buf = mmap(0, cnt, PROT_READ, MAP_SHARED, fd, 0);
     if (buf == MAP_FAILED) {
         perror(nm);
         return(0);
     }
     res = 0L;
     for (p = buf; cnt; cnt -= 1) {
         if (*p++ == '\n') {
             res += 1L;
         }
     }
     munmap(buf, st.st_size);
     return(res);
}

int
main(int argc, char **argv)
{
     int x;

     for (x = 1; x < argc; ++x) {
         unsigned long res;
         char *nm = argv[x];

         int fd = open(nm, O_RDONLY);
         if (fd < 0) {
             perror(nm);
             continue;
         }
         res = countnl(fd, nm);
         close(fd);
         printf("%s: %uld\n", nm, res);
     }
}

Apr 10 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

On Thursday, 11 April 2024 at 00:24:44 UTC, Andy Valencia wrote:
 I wrote a "count newlines" based on mapped files.  It used 
 about twice the CPU of the version which just read 1 meg at a 
 time.  I thought something was amiss (needless slice 
 indirection or something), so I wrote the code in C.  It had 
 the same CPU usage as the D version.  So...mapped files, not so 
 much.  Not D's fault.  And writing it in C made me realize how 
 much easier it is to code in D!

For a repeatable comparison, you should provide the code which 
does 1MB reads.

I have found that mmapped files are faster than reading buffered 
files, but maybe only for large files?

-Steve

Apr 11 2024

Andy Valencia <dont spam.me> writes:

On Thursday, 11 April 2024 at 14:54:36 UTC, Steven Schveighoffer 
wrote:
 For a repeatable comparison, you should provide the code which 
 does 1MB reads.

With pleasure:

import std.stdio : writeln, File, stderr;

const uint BUFSIZE = 1024*1024;

private uint
countnl(File f)
{
     uint res = 0;
     char[BUFSIZE] buf;

     while (!f.eof) {
         auto sl = f.rawRead(buf);
         foreach (c; sl) {
             if (c == '\n') {
                 res += 1;
             }
         }
     }
     return res;
}

private uint
procfile(in string fn) {
     import std.exception : ErrnoException;
     File f;

     try {
         f = File(fn, "r");
     } catch(ErrnoException e) {
         stderr.writeln("Can't open: ", fn);
         return 0;
     }
     uint res = countnl(f);
     f.close();
     return res;
}

void
main(in string[] argv)
{
     foreach (fn; argv[1 .. $]) {
         uint res;
         res = procfile(fn);
         writeln(fn, ": ", res);
     }
}

Apr 11 2024

kdevel <kdevel vogtner.de> writes:

On Thursday, 11 April 2024 at 16:23:44 UTC, Andy Valencia wrote:
 [...]
 void
 main(in string[] argv)

        ^^

What if you want to use

    bool memorymapped;
    getopt (argv, "m", &memorymapped);

inside main? [1]

Have you tried using "rm" [2] instead of "r" as stdioOpenmode 
under Linux
for a "no code" method of employing mmap for reading?

[1] const main args? [2011]
https://forum.dlang.org/post/mailman.2277.1313189911.14074.digitalmars-d-learn puremagic.com

[2] section "NOTES" in fopen(3)
https://man7.org/linux/man-pages/man3/fopen.3.html

Apr 17 2024

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Thursday, 11 April 2024 at 00:24:44 UTC, Andy Valencia wrote:
 I wrote a "count newlines" based on mapped files.  It used 
 about twice the CPU of the version which just read 1 meg at a 
 time.  I thought something was amiss (needless slice 
 indirection or something), so I wrote the code in C.  It had 
 the same CPU usage as the D version.  So...mapped files, not so 
 much.  Not D's fault.  And writing it in C made me realize how 
 much easier it is to code in D!

 [...]

The setup of a memory mapped file is relatively costly. For 
smaller files it is a net loss and read/write beats it hands 
down. Furthermore, sequential access is not the best way to 
exploit the advantages of mmap. Full random access is the strong 
suite of mmap as it replaces kernel syscalls (lseek,read, write 
or pread, pwrite) by user land processing.
You could try MAP_POPULATE option in the mmap as it enables 
read-ahead on the file which may help on sequential code.

Apr 15 2024

Andy Valencia <dont spam.me> writes:

On Monday, 15 April 2024 at 08:05:25 UTC, Patrick Schluter wrote:
 The setup of a memory mapped file is relatively costly. For 
 smaller files it is a net loss and read/write beats it hands 
 down.

Interestingly, this performance deficit is present even when run 
against the largest conveniently available file on my 
system--libQt6WebEngineCore.so.6.4.2 at 148 megs.  But since this 
reproduces in its C counterpart, it is not at all a reflection of 
D.

As you say, truly random access might play to mmap's strengths.

My real point is that, whichever API I use, coding in D was far 
less tedious; I like the resulting code, and it showed no 
meaningful performance cost.

Apr 15 2024

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Monday, 15 April 2024 at 16:13:41 UTC, Andy Valencia wrote:
 On Monday, 15 April 2024 at 08:05:25 UTC, Patrick Schluter 
 wrote:
 The setup of a memory mapped file is relatively costly. For 
 smaller files it is a net loss and read/write beats it hands 
 down.

 Interestingly, this performance deficit is present even when 
 run against the largest conveniently available file on my 
 system--libQt6WebEngineCore.so.6.4.2 at 148 megs.  But since 
 this reproduces in its C counterpart, it is not at all a 
 reflection of D.

 As you say, truly random access might play to mmap's strengths.

Indeed, my statement concerning file size is misleading. It's the 
amount of operations done on the file that is important. For 
bigger files it is normal to have more operations.
I have measurement for our system (Linux servers) where we have 
big index files which represent a ternary tree that are generally 
memory mapped. These files are several hundreds of megabytes big 
and the access is almost random. These files still grow but the 
growing parts are not memory mapped but accessed with pread() and 
pwrite() calls. Accesses via pread() take exactly twice the time 
from memory copy for reads of exactly 64 bytes (size of the 
record).

 My real point is that, whichever API I use, coding in D was far 
 less tedious; I like the resulting code, and it showed no 
 meaningful performance cost.

Apr 24 2024

D Programming

C/C++ Programming

Other

digitalmars.D.learn - mmap file performance