www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Taking arguments by value or by reference

reply Anonymouse <zorael gmail.com> writes:
I'm passing structs around (collections of strings) whose .sizeof 
returns 432.

The readme for 2.094.0 includes the following:

 This release reworks the meaning of in to properly support all 
 those use cases. in parameters will now be passed by reference 
 when optimal, [...]

 * Otherwise, if the type's size requires it, it will be passed 
 by reference.
 Currently, types which are over twice the machine word size 
 will be passed by
 reference, however this is controlled by the backend and can be 
 changed based
 on the platform's ABI.
However, I asked in #d a while ago and was told to always pass by value until it breaks, and only then resort to ref.
 [18:32:16] <zorael> at what point should I start passing my 
 structs by ref rather than by value? some are nested in others, 
 so sizeofs range between 120 and 620UL
 [18:33:43] <Herringway> when you start getting stack overflows
 [18:39:09] <zorael> so if I don't need ref for the references, 
 there's no inherent merit to it unless I get in trouble without 
 it?
 [18:39:20] <Herringway> pretty much
 [18:40:16] <Herringway> in many cases the copying is merely 
 theoretical and doesn't actually happen when optimized
I've so far just been using const parameters. What should I be using?
Oct 03 2020
next sibling parent reply Max Haughton <maxhaton gmail.com> writes:
On Saturday, 3 October 2020 at 23:00:46 UTC, Anonymouse wrote:
 I'm passing structs around (collections of strings) whose 
 .sizeof returns 432.

 The readme for 2.094.0 includes the following:

 This release reworks the meaning of in to properly support all 
 those use cases. in parameters will now be passed by reference 
 when optimal, [...]

 * Otherwise, if the type's size requires it, it will be passed 
 by reference.
 Currently, types which are over twice the machine word size 
 will be passed by
 reference, however this is controlled by the backend and can 
 be changed based
 on the platform's ABI.
However, I asked in #d a while ago and was told to always pass by value until it breaks, and only then resort to ref.
 [18:32:16] <zorael> at what point should I start passing my 
 structs by ref rather than by value? some are nested in 
 others, so sizeofs range between 120 and 620UL
 [18:33:43] <Herringway> when you start getting stack overflows
 [18:39:09] <zorael> so if I don't need ref for the references, 
 there's no inherent merit to it unless I get in trouble 
 without it?
 [18:39:20] <Herringway> pretty much
 [18:40:16] <Herringway> in many cases the copying is merely 
 theoretical and doesn't actually happen when optimized
I've so far just been using const parameters. What should I be using?
Firstly, the new in semantics are very new and possibly subtly broken (take a look at the current thread in general). Secondly, as to the more specific question of how to pass a big struct around it may be helpful to look at this quick godbolt example (https://d.godbolt.org/z/nPvTWz). Pay attention to the instructions writing to stack memory (or not). A struct that big will be passed around on the stack, whether it gets copied or not depends on the semantics of the struct etc. The guiding principle to your function parameters should be correctness - if I am passing a big struct around, if I want to take ownership of it I probably want to take it by value but if I want to modify it I should take it by reference (or by pointer but don't overcomplicate, notice in the previous example they lower to the same thing). If I just want to look at it, it should be taken by const ref if possible (D const isn't the same as C++ const, this may catch you out). Const-correctness is a rule to live by especially with an big unwieldy struct. I would avoid the new in for now, but I would go with const ref from what you've described so far.
Oct 03 2020
parent reply Anonymouse <zorael gmail.com> writes:
On Saturday, 3 October 2020 at 23:47:32 UTC, Max Haughton wrote:
 The guiding principle to your function parameters should be 
 correctness - if I am passing a big struct around, if I want to 
 take ownership of it I probably want to take it by value but if 
 I want to modify it I should take it by reference (or by 
 pointer but don't overcomplicate, notice in the previous 
 example they lower to the same thing). If I just want to look 
 at it, it should be taken by const ref if possible (D const 
 isn't the same as C++ const, this may catch you out).

 Const-correctness is a rule to live by especially with an big 
 unwieldy struct.

 I would avoid the new in for now, but I would go with const ref 
 from what you've described so far.
I mostly really only want a read-only view of the struct, and whether a copy was done or not is academic. However, profiling showed (what I interpret as) a lot of copying being done in release builds specifically. https://i.imgur.com/JJzh4Zc.jpg Naturally a situation where I need ref I'd use ref, and in the rare cases where it actually helps to have a mutable copy directly I take it mutable. But if I understand what you're saying, and ignoring --preview=in, you'd recommend I use const ref where I would otherwise use const? Is there some criteria I can go by when making this decision, or does it always reduce to looking at the disassembly?
Oct 04 2020
next sibling parent Mathias LANG <geod24 gmail.com> writes:
On Sunday, 4 October 2020 at 14:26:43 UTC, Anonymouse wrote:
 [...]

 I mostly really only want a read-only view of the struct, and 
 whether a copy was done or not is academic. However, profiling 
 showed (what I interpret as) a lot of copying being done in 
 release builds specifically.

 https://i.imgur.com/JJzh4Zc.jpg

 Naturally a situation where I need ref I'd use ref, and in the 
 rare cases where it actually helps to have a mutable copy 
 directly I take it mutable. But if I understand what you're 
 saying, and ignoring --preview=in, you'd recommend I use const 
 ref where I would otherwise use const?

 Is there some criteria I can go by when making this decision, 
 or does it always reduce to looking at the disassembly?
If the struct adds overhead to copy, use `const ref`. But if you do, you might end up with another set of problems. Aliasing is one of them, and the dangers of it are discussed at length in the thread about `-preview=in` in general. The other issue is that `const ref` means you cannot pass rvalues. This is when people usually turn towards `auto ref`. Unfortunately, it requires you to use templates, which is not always possible. So, in short: `auto ref const` if it's a template and aliasing is not a concern, `const ref` if the copy adds overhead, and add a `const` non-`ref` overload to deal with rvalues if needed. If you want to be a bit more strict, throwing `scope` in the mix is good practice, too. ---------- Now, about `-preview=in`: The aim of this switch is to address *exactly* this use case. While it is still experimental and I don't recommend using it in critical projects just yet, giving it a try should be straightforward and any feedback is appreciated. What I mean by "should be straightforward", is that the only thing `-preview=in` will complain about is `in ref` (it triggers an error). The main issue at the moment is that, if you use `dub`, you need to have control over the dependencies to add a configuration, or use `DFLAGS="-preview=in" dub` in order for it to work. Working on a fix to that right now. For reference, this is what adapting code to use `-preview=in` feels like in my project: https://github.com/Geod24/agora/commit/a52419851a7e6e4ef241c4617ebe0c8cc0ebe5cc You can see that I added it pretty much everywhere the type `Hash` was used, because `Hash` is a 64 bytes struct but I needed to support rvalues.
Oct 04 2020
prev sibling parent Max Haughton <maxhaton gmail.com> writes:
On Sunday, 4 October 2020 at 14:26:43 UTC, Anonymouse wrote:
 On Saturday, 3 October 2020 at 23:47:32 UTC, Max Haughton wrote:
 The guiding principle to your function parameters should be 
 correctness - if I am passing a big struct around, if I want 
 to take ownership of it I probably want to take it by value 
 but if I want to modify it I should take it by reference (or 
 by pointer but don't overcomplicate, notice in the previous 
 example they lower to the same thing). If I just want to look 
 at it, it should be taken by const ref if possible (D const 
 isn't the same as C++ const, this may catch you out).

 Const-correctness is a rule to live by especially with an big 
 unwieldy struct.

 I would avoid the new in for now, but I would go with const 
 ref from what you've described so far.
I mostly really only want a read-only view of the struct, and whether a copy was done or not is academic. However, profiling showed (what I interpret as) a lot of copying being done in release builds specifically. https://i.imgur.com/JJzh4Zc.jpg Naturally a situation where I need ref I'd use ref, and in the rare cases where it actually helps to have a mutable copy directly I take it mutable. But if I understand what you're saying, and ignoring --preview=in, you'd recommend I use const ref where I would otherwise use const? Is there some criteria I can go by when making this decision, or does it always reduce to looking at the disassembly?
This is skill you only really hone with experience, but it's not too bad once you're used to it. For a big struct, I would just stick to expressing what you want it to *do* rather than how you want it to perform. If you want to take ownership you basically have to take by value, but if you (as you said) want a read only view definitely const ref. If I was reading your code, ref immediately tells me not to think about ownership and const ref immediately tells me you just want to look at the goods. One thing I haven't mentioned so far is that not all types have non-trivial semantics when it comes to passing them around by value, so if you are writing generic code it is often best to avoid these.
Oct 04 2020
prev sibling parent reply IGotD- <nise nise.com> writes:
On Saturday, 3 October 2020 at 23:00:46 UTC, Anonymouse wrote:
 I'm passing structs around (collections of strings) whose 
 .sizeof returns 432.

 The readme for 2.094.0 includes the following:

 This release reworks the meaning of in to properly support all 
 those use cases. in parameters will now be passed by reference 
 when optimal, [...]

 * Otherwise, if the type's size requires it, it will be passed 
 by reference.
 Currently, types which are over twice the machine word size 
 will be passed by
 reference, however this is controlled by the backend and can 
 be changed based
 on the platform's ABI.
However, I asked in #d a while ago and was told to always pass by value until it breaks, and only then resort to ref.
 [18:32:16] <zorael> at what point should I start passing my 
 structs by ref rather than by value? some are nested in 
 others, so sizeofs range between 120 and 620UL
 [18:33:43] <Herringway> when you start getting stack overflows
 [18:39:09] <zorael> so if I don't need ref for the references, 
 there's no inherent merit to it unless I get in trouble 
 without it?
 [18:39:20] <Herringway> pretty much
 [18:40:16] <Herringway> in many cases the copying is merely 
 theoretical and doesn't actually happen when optimized
I've so far just been using const parameters. What should I be using?
I don't agree with this, especially if the struct is 432 bytes. It takes time and memory to copy such structure. I always use "const ref" when I pass structures because that's only a pointer. Classes are references by themselves so its not applicable there. Only "ref" when I want to modify the contents. However there are some exceptions to this rule in D as D support slice parameters. In this case you want a copy as slice of the array, often because the slice is often casted from something else. Basically the array slice parameter become an lvalue. This copy of parameters to the stack is an abomination in computer science and only useful in some cases but mostly not. The best would be if the compiler itself could determine what is the most efficient. Nim does this and it was not long ago suggested that the "in" keyword should have a new life as such optimization, is that the change that has entered in 2.094.0? Why wasn't this a DIP? I even see this in some C++ program code where strings are passed as value which means that the string is copied including a possible memory allocation which certainly slow things down. Do not listen to people who says "pass everything by value" because that is in general not ideal in imperative languages.
Oct 04 2020
parent Adam D. Ruppe <destructionator gmail.com> writes:
On Sunday, 4 October 2020 at 15:30:48 UTC, IGotD- wrote:
 I don't agree with this, especially if the struct is 432 bytes. 
 It takes time and memory to copy such structure.
If the compiler chooses to inline the function (which happens quite frequently with optimizations turned on), no copy takes place regardless of how you write it if the compiler can see it is unnecessary. Returning a struct by value rarely means a copy either since the compiler actually passed a pointer to where it wants it up front, so it is constructed in-place. So like "pass by value" in the language is not necessarily big copies in the generated binary. That's why the irc folks were advising to not worry about it unless you see a problem coming up that the profiles points here.
Oct 04 2020