www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 17294] New: Incorrect -profile=gc data

https://issues.dlang.org/show_bug.cgi?id=17294

          Issue ID: 17294
           Summary: Incorrect -profile=gc data
           Product: D
           Version: D2
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P1
         Component: druntime
          Assignee: nobody puremagic.com
          Reporter: mihails.strasuns.contractor sociomantic.com

Existing implementation of -profile=gc is somewhat naive in a sense that it
assumes that any relevant function call only results in direct immediate
allocation for exact data being requested. It can differ from real GC stats a
lot, simple example:

====
void main ( )
{
    void[] buffer;
    buffer.length = 20;
    buffer.length = 60;
    buffer.length = 10;
    buffer ~= "abcd".dup;
}
====

Currently reported trace will look like this:

             60                  1    void[] D main ./sample.d:7
             20                  1    void[] D main ./sample.d:6
             10                  1    void[] D main ./sample.d:8
              4                  1    void[] D main ./sample.d:9

Which is wrong for variety of reasons:

1) runtime will allocate more data than was requested (32 and 64 bytes for
first two length assignments)
2) third length assignment shrinks the array and thus will not result in any
allocations despite being reported in log
3) last append will result in re-allocating the array and will thus allocate
more than just 4 bytes for "abcd"

There are other similar issues which all come from the fact that `-profile=gc`
does not in fact track real GC allocations. One idea how that can be fixed
without major changes in runtime API is to rely on `synchronized` + `GC.stats`:

```
extern (C) void[] _d_arraysetlengthTTrace(string file, int line, string
funcname, const TypeInfo ti, size_t newlength, void[]* p)
{
    import core.memory;

    synchronized (global_rt_lock)
    {
        auto oldstats = GC.stats();
        auto result = _d_arraysetlengthT(ti, newlength, p);
        auto newstats = GC.stats();
        if (newstats.usedSize > oldstats.usedSize)
        {
            accumulate(file, line, funcname, ti.toString(),
                newstats.usedSize - oldstats.usedSize);
        }
        return result;
    }
}
```

This gives perfect precision of reported allocations but this simple solution
comes at cost of considerably changing scheduling of multi-threaded programs
with `-profile=gc`. I would be interested to hear if there are any other ideas
to fix the problem.

--
Apr 03 2017