www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - general questions on reference types versus value types...

reply "WhatMeWorry" <kheaser gmail.com> writes:
Is it correct to say that D reference types (classes, dynamic 
arrays, etc.) are always allocated on the heap; whereas D value 
types (structs, static arrays, etc.) are always allocated on the 
stack?  Or is this a gross oversimplification?

Because can't structures contain classes and classes contain 
structures?
If so, how should one thinks about these hybrid types?  Does the 
outermost type take precedent over what ever it contains?

Can't resist this. Maybe I should just create a play code, but 
could a Structure contain a class that contained a structure that 
contained a class that...  Not sure why one would ever need to, 
so just a theoretical question.

thanks.
Nov 30 2014
next sibling parent Jonathan M Davis via Digitalmars-d-learn writes:
On Monday, December 01, 2014 04:42:36 WhatMeWorry via Digitalmars-d-learn wrote:
 Is it correct to say that D reference types (classes, dynamic
 arrays, etc.) are always allocated on the heap; whereas D value
 types (structs, static arrays, etc.) are always allocated on the
 stack?  Or is this a gross oversimplification?

 Because can't structures contain classes and classes contain
 structures?
 If so, how should one thinks about these hybrid types?  Does the
 outermost type take precedent over what ever it contains?

 Can't resist this. Maybe I should just create a play code, but
 could a Structure contain a class that contained a structure that
 contained a class that...  Not sure why one would ever need to,
 so just a theoretical question.
Structs can go on the heap. e.g. MyStruct* s = new MyStruct("foo"); And structs can be reference types. e.g. struct MyStruct { Object o; } So, ultimately, while it's true on the surface that structs are value types, classes reference types, etc., the reality is a bit more complicated. Even arrays aren't really one or the other, because while slices refer to the same data, mutating the array object itself of one slice does not alter the array object of another slice. So, strictly speaking, something is a value type if assigning it to another variable or initializing another variable with it does a full, deep-copy. e.g. Type t; Type u = t; But even that gets a bit fuzzy once stuff like immutable gets involved, because if you have struct MyStruct { string s; } auto s = MyStruct("foo"); auto t = s; then t is a copy of s, because the reference portion is immutable and thus can't be mutated. Really, talking value types and reference types at the basic level is pretty straightforward and fairly clear, but once you get down into it, things get a _lot_ more complicated. - Jonathan M Davis
Nov 30 2014
prev sibling parent reply "H. S. Teoh via Digitalmars-d-learn" <digitalmars-d-learn puremagic.com> writes:
On Mon, Dec 01, 2014 at 04:42:36AM +0000, WhatMeWorry via Digitalmars-d-learn
wrote:
 
 Is it correct to say that D reference types (classes, dynamic arrays,
 etc.) are always allocated on the heap; whereas D value types
 (structs, static arrays, etc.) are always allocated on the stack?  Or
 is this a gross oversimplification?
It's somewhat an oversimplification, as you *can* allocate by-value types on the heap, e.g., `MyStruct* ptr = new MyStruct(...)`. But it's rare to want to do that; usually if you need to do that, you should just use a class instead. There's also emplace, which can place class objects on the stack (or structs on the heap), and static array class members are obviously allocated on the heap since there is no stack to place them on.
 Because can't structures contain classes and classes contain
 structures?  If so, how should one thinks about these hybrid types?
 Does the outermost type take precedent over what ever it contains?
That's not a very useful way to think about it. A better way to think about it is, value type == "allocated right here", and reference type == "allocated elsewhere". For example, if you have a struct: struct S { int x; } Then when you declare a variable of type S in main(), the contents of S is allocated "right here", that is, on the stack as the function executes: void main() { S s; // allocated "right here", i.e., on the stack } If you declare a member variable of type S, the contents of S are embedded in the surrounding scope: class C { S s; // s is part of the contents of C } struct T { S s; // s is part of the contents of T } You could illustrate it with a diagram: +---class C---+ | +---S s---+ | | | int x; | | | +---------+ | +-------------+ The member s is part of the block of memory that an instance of C resides in. A class, OTOH, is allocated "elsewhere", so when you declare a variable of type C, the variable is not the object itself, but a reference pointing to somewhere else: void main() { C c = new C(); } On the stack: On the heap: +------C c------+ +---class C---+ | <reference> --------> | +---S s---+ | +---------------+ | | int x; | | | +---------+ | +-------------+ As you can see, the variable c is actually on the stack, but it doesn't contain the actual object. Instead, it points elsewhere -- to the heap where the object is allocated. Now what happens if we put a class inside a struct? struct U { int y; C c; } void main() { U u; } On the stack: On the heap: +----U u------------+ +---class C---+ | int y; | | +---S s---+ | | +---C c---------+ | | | int x; | | | | <reference> ---------->| +---------+ | | +---------------+ | +-------------+ +-------------------+ So you see, u is an interesting kind of object; it is allocated *both* on the stack and on the heap! The 'int y' part of it is on the stack, as well as the reference part of 'c', but the contents of 'c' is not on the stack, but on the heap. The stack part of U has value semantics, while the c part has reference semantics -- for example, when you do this: U v = u; v will actually contain a *copy* of 'int y', but its 'C c' member will actually point to the *same* instance of C as u: On the stack: On the heap: +----U u------------+ +---class C---+ | int y; | | +---S s---+ | | +---C c---------+ | | | int x; | | | | <reference> ---------->| +---------+ | | +---------------+ | +-------------+ +-------------------+ ^ | +----U v------------+ | | int y; | | | +---C c---------+ | | | | <reference> ------------------+ | +---------------+ | +-------------------+ So now, modifying u.y and v.y will not interfere with each other, but modifying u.c will affect v.c, and vice versa, because they are actually referencing the same object on the heap. You might be wondering why you'd want to do something like this, but this is exactly what makes D slices so handy: a dynamic array is actually internally implemented as a struct that has a by-value member, and a reference member: struct _d_array(T) { size_t length; T* ptr; } When you take a slice of an array, it just copies the struct and modifies the .length and .ptr fields appropriately, but the two copies of the struct references the same underlying array on the heap. The important thing is that the value semantics of _d_array ensures that the *original* slice is unchanged, yet the reference semantics of _d_array lets you modify the original array through the slice. For example: void main() { int[] a = [1,2,3]; // original array int[] b = a[0 .. 1]; // slice of original array assert(a == [1,2,3]); // original slice is not modified b[0] = 4; // but you can modify the array contents via the new slice assert(a[0] == 4); // and it shows up in the original array }
 Can't resist this. Maybe I should just create a play code, but could a
 Structure contain a class that contained a structure that contained a
 class that...  Not sure why one would ever need to, so just a
 theoretical question.
[...] I assure you it's not just a theoretical question. You use it everyday, in the form of array slices, you just didn't realize it. :-) Consider, for example, the structure of an array of arrays: int[][] arr; The variable 'arr' itself is a _d_array struct containing a by-value field .length, and a .ptr field that references an array of _d_array structs, each of which contain by-value .length fields (which are on the heap, btw!) and .ptr fields that references yet another place on the heap where the subarray contents are kept. That is to say, the .length fields are "allocated right here" (which could be on the stack for the arr variable itself, or on the heap when it's one of the subarray slices), and the .ptr fields point to data which is "allocated elsewhere" (i.e., on the heap by default, but it doesn't have to be -- you could emplace() it somewhere else, the point is that it's not embedded into the surrounding context). T -- Век живи - век учись. А дураком помрёшь.
Nov 30 2014
next sibling parent reply "WhatMeWorry" <kheaser eapl.org> writes:
  Wow. This is great stuff! And diagrams are always appreciated. 
You should write a book.  I'm off to play with emplace.
Dec 01 2014
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 12/01/2014 12:06 AM, WhatMeWorry wrote:
   Wow. This is great stuff! And diagrams are always appreciated. You
 should write a book.
Agreed! H. S. Teoh, should I translate this post for the Turkish audience or should I wait for an article version of it first? ;) We can even reddit it then... Ali
Dec 01 2014
parent reply "Suliman" <evermind live.ru> writes:
Could anybody explain why there is opinion that stack is fast and 
the heap is slow. All of them are located in the same memory. So 
the access time should be equal.
Dec 01 2014
next sibling parent ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On Mon, 01 Dec 2014 20:22:59 +0000
Suliman via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 Could anybody explain why there is opinion that stack is fast and=20
 the heap is slow. All of them are located in the same memory. So=20
 the access time should be equal.
could anybody explain why there is opinion that monorails are fast and bycicles are slow? all of them using the same thing -- wheel, so the speed should be equal.
Dec 01 2014
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Suliman:

 Could anybody explain why there is opinion that stack is fast 
 and the heap is slow. All of them are located in the same 
 memory. So the access time should be equal.
Often the access speed is not exactly the same, because the stack memory is usually hotter, this means it's more often kept in one of the CPU caches. But that's not the speed difference they usually talk about. They refer to the time needed to allocate and deallocate heap memory, that is often higher (and unpredictable) compared to allocating some stack memory. If you have a good GC like the OracleVM ones, allocating memory is very fast (sometimes little more than a comparison and an increase, thank to the green generation that is handled as a stack), but still not as fast as allocating and freeing stack memory. The situation is quite more complex than what I have explained. There are even CPUs with programmer-controlled scratch memory, instead of just an automatic managed hierarchy of caches. Bye, bearophile
Dec 01 2014
prev sibling next sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 12/1/14 3:22 PM, Suliman wrote:
 Could anybody explain why there is opinion that stack is fast and the
 heap is slow. All of them are located in the same memory. So the access
 time should be equal.
Measure it :) But short answer is twofold: 1. stack is usually hot in the local processor cache, so it's almost always the fastest to access. 2. The *access* to the heap memory isn't necessarily slower, as the heap memory can just as easily be in the cache, it's *allocating* heap memory that is slower (and vastly so) than allocating stack memory. -Steve
Dec 01 2014
prev sibling next sibling parent "H. S. Teoh via Digitalmars-d-learn" <digitalmars-d-learn puremagic.com> writes:
On Mon, Dec 01, 2014 at 08:22:59PM +0000, Suliman via Digitalmars-d-learn wrote:
 Could anybody explain why there is opinion that stack is fast and the
 heap is slow. All of them are located in the same memory. So the
 access time should be equal.
That may be true 15 years ago, it's not true today with multilevel CPU caches. The stack is basically always "hot" in the CPU cache because it's very frequently accessed (function calls, function returns, temporaries, local variables, etc.), so accessing stuff on the stack almost always hits the cache. Accessing stuff on the heap is likely to incur a cache miss, so the CPU has to go and fetch it from RAM (sloooow). Not to mention, allocating stuff on the heap incurs a lot of overhead to keep track of which parts of memory is in use or free, whereas allocating stuff on the stack is just bumping a pointer (and compilers usually combine several stack allocations into a single instruction that allocates all of them at once -- optimizers will even elide pointer bumps if the total required size for local variables is already known in advance and the end of the allocated region doesn't need to be used -- e.g. the function never calls another function). Also, allocating on the heap generates garbage for the GC to collect. Also, heap-allocated objects tend to require pointer dereferences to access, which means higher chance of CPU cache miss (the pointer and the target data may be in two different RAM pages, so the CPU may have to do the RAM roundtrip *twice* -- this is esp. true for virtual function calls). T -- Живёшь только однажды.
Dec 01 2014
prev sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Monday, 1 December 2014 at 20:23:00 UTC, Suliman wrote:
 Could anybody explain why there is opinion that stack is fast 
 and the heap is slow. All of them are located in the same 
 memory. So the access time should be equal.
Yes, the problem is that if you load from a memory area (of 64 bytes) that has not been visisted in a while then the cpu will have to wait for approximately the same time it takes to execute 100 instructions. So if you touch a 64 byte block every 20 instructions that needs to be loaded from RAM you end up with the CPU being 80% underutilized… The throughput is actually quite nice, but the latency is the problem. It is possible to tell the cpu to prefetch new areas of memory ahead of time, but then you have to do it 100 instructions before you actually access it… Which is hard to get right and might be slower if the memory was already loaded into the caches anyway. On some CPUs the prefetch instructions can be counterproductive…
Dec 01 2014
prev sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
H. S. Teoh:

 you *can* allocate by-value
 types on the heap, e.g., `MyStruct* ptr = new MyStruct(...)`. 
 But it's rare to want to do that; usually if you need to do
 that, you should just use a class instead.
If I create data structures that contain many pointers, like some kinds of trees, I usually use heap-allocated structs, to reduce memory overhead and simplify the whole structure. Bye, bearophile
Dec 01 2014