www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Binding rvalues to const ref in D

reply Atila Neves <atila.neves gmail.com> writes:
New post since the the last one was already off-topic. Continued 
from:

http://forum.dlang.org/post/nu7mv8$mqu$1 digitalmars.com

I get the feeling that people are talking past each other. I'm 
going to give my view of the situation and everybody can correct 
me if I'm wrong / throw tomatoes at me.code

On the one hand some people want rvalues to bind to const ref. I 
can only assume that they want this because they want to pass 
rvalues to a function efficiently - i.e. put a pointer in a 
register. It might also be due to familiarity with C++ but I 
speculate. If indeed I'm right, then I wonder if it's by instinct 
or if it's been measured versus passing a struct by value. I just 
wrote this:

struct Vector { float x, y, z; }
float silly(Vector v) { return v.x * 5; }

float test() {
     Vector v;
     return silly(Vector(1, 2, 3)) * 7;
}

Yes, it's a stupid example. But ldc2 -O3 gives me this for 
`silly`:

movq   rax,xmm0
movd   xmm0,eax
mulss  xmm0,DWORD PTR [rip+0x0]
ret

It's a bit longer than if I passed in a float directly:

mulss  xmm0,DWORD PTR [rip+0x0]
ret

But... there's no copying or moving of the entire struct. C++ 
(also passing by value, I just hand-tranlated the code) is 
similar but for some reason was better at optimising:

mulss  xmm0,DWORD PTR [rip+0x0]
ret
nop

Again, no copying or moving. Which is what I expected. Granted, 
real-life code might be complicated enough to make matters a lot 
worse. I'm just wondering out loud how likely that is to happen 
and how big of an impact on total performance that'll have. My 
question is: do you _really_ need rvalues to bind to const ref 
for performance? If not, what _do_ you need it for? Is it an 
instinctive reaction against passing structs by value from C++98 
days?

It's been mentioned that one might not get a say on how a 
function is declared if calling, say, C++ from D. That's a fair 
argument, and one I've not heard a solution for yet. Maybe allow 
rvalues to bind to const ref in `extern(C++)`? I don't know, I'm 
thinking out "loud".


On the other hand we have the "rvalues binding to const ref => 
rvalue references or other complicated mechanisms for figuring 
out whether or not the const ref is an rvalue". This seems to 
have not been explained correctly. I'm not blaming anyone, I just 
tried yesterday and failed as well.

The situation is this: if one wants move semantics, one must know 
when one can move. Because rvalues bind to const& in C++, you 
never know whether the const& is an lvalue or rvalue. The 
solution to this was rvalue references, which are refs that can 
_only_ bind to rvalues. That way you know that the origin was an 
rvalue an wahey, move semantics. They complicated the language 
significantly. Did you know there's more than one kind of rvalue 
in C++? Oh yes:

http://en.cppreference.com/w/cpp/language/value_category

Do we want that? I don't.

Summary:

* rvalues don't bind to const ref because if they did there'd be 
ambiguity, and to solving that problem would make the language 
more complicated.
* Knowing when passed-in parameters were actually rvalues turns 
out to be something compilers want to do because performance.
* It'd be nice if D could call C++ functions that take const& 
with rvalues

Tomato time?

Atila
Oct 19 2016
next sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Wed, 19 Oct 2016 15:18:36 +0000, Atila Neves wrote:

 The situation is this: if one wants move semantics, one must know when
 one can move. Because rvalues bind to const& in C++, you never know
 whether the const& is an lvalue or rvalue.
To clarify: You copy lvalues instead of moving them. You move rvalues instead of copying them. This has an impact when you assign or pass a ref struct variable to a non-ref variable. As a practical matter, whenever I come up against a case where I need to pass something by reference but it's an rvalue, I assign a temporary variable (after scratching my head at an inscrutable message because, hey, everyone's using templates right now and the error message just tells me that I can't pass a DateTime to something expecting a ref T). So it seems like the compiler could take care of this by only providing lvalue references but automatically creating those temporary variables for me. It's going to create an extra copy that might not be needed if there were a special rvalue reference type modifier, but why should I care? It's exactly as efficient as the code the compiler forces me to write. This is what Ethan Watson has suggested, too.
Oct 19 2016
parent reply Atila Neves <atila.neves gmail.com> writes:
On Wednesday, 19 October 2016 at 15:58:23 UTC, Chris Wright wrote:
 On Wed, 19 Oct 2016 15:18:36 +0000, Atila Neves wrote:

 The situation is this: if one wants move semantics, one must 
 know when one can move. Because rvalues bind to const& in C++, 
 you never know whether the const& is an lvalue or rvalue.
To clarify: You copy lvalues instead of moving them. You move rvalues instead of copying them. This has an impact when you assign or pass a ref struct variable to a non-ref variable.
Then there's this: void foo(ref Foo); //doesn't copy lvalues void foo(Foo); What's a ref struct variable?
 As a practical matter, whenever I come up against a case where 
 I need to pass something by reference but it's an rvalue, I 
 assign a temporary variable (after scratching my head at an 
 inscrutable message because, hey, everyone's using templates 
 right now and the error message just tells me that I can't pass 
 a DateTime to something expecting a ref T).
I'm assuming this happens because you don't control the signature of the function you're calling and it takes by ref?
 So it seems like the compiler could take care of this by only 
 providing lvalue references but automatically creating those 
 temporary variables for me. It's going to create an extra copy 
 that might not be needed if there were a special rvalue 
 reference type modifier, but why should I care? It's exactly as 
 efficient as the code the compiler forces me to write.

 This is what Ethan Watson has suggested, too.
Interesting. Also, I must have missed that suggestion. Atila
Oct 19 2016
next sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Wed, 19 Oct 2016 21:19:03 +0000, Atila Neves wrote:

 On Wednesday, 19 October 2016 at 15:58:23 UTC, Chris Wright wrote:
 On Wed, 19 Oct 2016 15:18:36 +0000, Atila Neves wrote:

 The situation is this: if one wants move semantics, one must know when
 one can move. Because rvalues bind to const& in C++,
 you never know whether the const& is an lvalue or rvalue.
To clarify: You copy lvalues instead of moving them. You move rvalues instead of copying them. This has an impact when you assign or pass a ref struct variable to a non-ref variable.
Then there's this: void foo(ref Foo); //doesn't copy lvalues void foo(Foo); What's a ref struct variable?
A variable whose type is a struct and which has the `ref` modifier.
 As a practical matter, whenever I come up against a case where I need
 to pass something by reference but it's an rvalue, I assign a temporary
 variable (after scratching my head at an inscrutable message because,
 hey, everyone's using templates right now and the error message just
 tells me that I can't pass a DateTime to something expecting a ref T).
I'm assuming this happens because you don't control the signature of the function you're calling and it takes by ref?
Right. For instance, binding query parameters with mysql-native. The thing you're binding is passed by reference and I'm not sure why.
Oct 19 2016
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 10/19/2016 07:04 PM, Chris Wright wrote:
 Right. For instance, binding query parameters with mysql-native. The
 thing you're binding is passed by reference and I'm not sure why.
It's been like that since mysql-native's original release, by the original author, some years ago. I suspect the idea was a rudimentary ORM-like approach: to have the prepared statement params semi-permanently tied to actual variables (ie, "bound" to them). Ie, so you could re-exectute the same prepared statement with different values just by changing the values and calling `execPrepared` again, without calling any of the bind functions again. I'd have to check whether or not that usage pattern currently works though.
Oct 19 2016
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/20/16 2:38 AM, Nick Sabalausky wrote:
 On 10/19/2016 07:04 PM, Chris Wright wrote:
 Right. For instance, binding query parameters with mysql-native. The
 thing you're binding is passed by reference and I'm not sure why.
It's been like that since mysql-native's original release, by the original author, some years ago. I suspect the idea was a rudimentary ORM-like approach: to have the prepared statement params semi-permanently tied to actual variables (ie, "bound" to them). Ie, so you could re-exectute the same prepared statement with different values just by changing the values and calling `execPrepared` again, without calling any of the bind functions again. I'd have to check whether or not that usage pattern currently works though.
Yes, it does work. However, one thing that I *sorely* miss is the ability to simply bind an individual value. At the moment, in order to bind a value, you have to pass an array of Variant for all the values. I currently have a whole wrapper around this library to make it more palatable, and to fix the lifetime issues. -Steve
Oct 20 2016
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 10/20/2016 09:33 AM, Steven Schveighoffer wrote:
 Yes, it does work. However, one thing that I *sorely* miss is the
 ability to simply bind an individual value.

 At the moment, in order to bind a value, you have to pass an array of
 Variant for all the values. I currently have a whole wrapper around this
 library to make it more palatable, and to fix the lifetime issues.
You can't bind individual values? Is there something wrong with "bindParameter(value, paramIndex)"? (I mean, besides the fact that it takes a ref, and, like the rest of the lib, isn't really documented anywhere outside of the code itself.) I do agree though, mysql-native *definitely* needs an API refresh. (In fact, I just happened to post several issues regarding that yesterday, and another person posted one as well. I want to take care of this ASAP, especially b/c it makes sense to do so before fixing the near-total lack of docs, which is already in desperate need of addressing.) Since you've found the need to wrap the API, would you mind taking a look through the current list of issues I've tagged "api" (although I see several of them are yours), and post any thoughts or add any additional issues you might have? I'd like to address these things ASAP, and input from people who use the lib and have issues with the API would be highly valuable: https://github.com/mysql-d/mysql-native/issues
Oct 20 2016
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/20/16 12:50 PM, Nick Sabalausky wrote:
 On 10/20/2016 09:33 AM, Steven Schveighoffer wrote:
 Yes, it does work. However, one thing that I *sorely* miss is the
 ability to simply bind an individual value.

 At the moment, in order to bind a value, you have to pass an array of
 Variant for all the values. I currently have a whole wrapper around this
 library to make it more palatable, and to fix the lifetime issues.
You can't bind individual values? Is there something wrong with "bindParameter(value, paramIndex)"? (I mean, besides the fact that it takes a ref, and, like the rest of the lib, isn't really documented anywhere outside of the code itself.)
Yes, because bindParameter(myint + 5, idx) doesn't work. And this is even worse: if(x == 5) { int y = 6; cmd.bindParameter(y, 2); } // oops, y is now gone! Or maybe this: foreach(i, j; someRange) { cmd.bindParameter(j, i); }// now all are bound to reference the same non-existent memory In order for Command struct to legitimately keep references to arbitrary value types, you need to put the storage somewhere. This isn't very conducive to how D programs are written. Now, there is bindParameters(Variant[]), which binds the *value* stored in each parameter to the fields. This was the only way I could do it without having to allocate space for individual values. But you must bind everything at once!
 I do agree though, mysql-native *definitely* needs an API refresh. (In
 fact, I just happened to post several issues regarding that yesterday,
 and another person posted one as well. I want to take care of this ASAP,
 especially b/c it makes sense to do so before fixing the near-total lack
 of docs, which is already in desperate need of addressing.)

 Since you've found the need to wrap the API, would you mind taking a
 look through the current list of issues I've tagged "api" (although I
 see several of them are yours), and post any thoughts or add any
 additional issues you might have? I'd like to address these things ASAP,
 and input from people who use the lib and have issues with the API would
 be highly valuable:

 https://github.com/mysql-d/mysql-native/issues
Honestly, the most egregious issue is the lifetime management. In some cases, if you pass or copy resource wrappers, the destructor will close the connection, or the above thing about having to allocate a place for values so you can bind parameters without worrying about their lifetimes going away. Wrapping mysql-native (which should be concerned mostly with low-level stuff) so I can make more suitable ranges out of the data was really hard, I ended up having to use RefCounted to make sure all the resource handles didn't go away! I'll take a look when I can. One other thing API-wise that is horrendous is the handling of null parameters (especially when you have to insert multiple rows with the same prepared statement, and sometimes you have some fields that should be null). Nullable!T works awesome for vibe, I think mysql-native should use that model. -Steve
Oct 20 2016
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 10/20/2016 04:32 PM, Steven Schveighoffer wrote:
 On 10/20/16 12:50 PM, Nick Sabalausky wrote:
 You can't bind individual values? Is there something wrong with
 "bindParameter(value, paramIndex)"? (I mean, besides the fact that it
 takes a ref, and, like the rest of the lib, isn't really documented
 anywhere outside of the code itself.)
[...examples involving out-of-scope data...]
 Now, there is bindParameters(Variant[]), which binds the *value* stored
 in each parameter to the fields. This was the only way I could do it
 without having to allocate space for individual values. But you must
 bind everything at once!
Ok, I see. Right. Actually I hit the same problem myself yesterday adding a test for a PR that added support for setting null via Variant(null) instead of setNullParam. The bindParameters(Variant[]) was the only one I could use because you can't pass a null literal by ref.
 Honestly, the most egregious issue is the lifetime management. In some
 cases, if you pass or copy resource wrappers, the destructor will close
 the connection, or the above thing about having to allocate a place for
 values so you can bind parameters without worrying about their lifetimes
 going away. Wrapping mysql-native (which should be concerned mostly with
 low-level stuff) so I can make more suitable ranges out of the data was
 really hard, I ended up having to use RefCounted to make sure all the
 resource handles didn't go away!
Right, gotcha. I hadn't really hit that much myself in the past because for a while I hadn't really been using the prepared statements much, nor using it without vibe's connection pool. But you're right, this stuff definitely needs fixed.
 I'll take a look when I can. One other thing API-wise that is horrendous
 is the handling of null parameters (especially when you have to insert
 multiple rows with the same prepared statement, and sometimes you have
 some fields that should be null). Nullable!T works awesome for vibe, I
 think mysql-native should use that model.
Yea, nulls were kind of always an awkward thing in the lib. I think the lib's original design might predate Nullable, which, I too am a fan of. I think I'm going to try to put out first-try pass at a new API in a separate branch, try to get that out as soon as I can, and post it for experimentation/feedback.
Oct 20 2016
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/20/16 5:07 PM, Nick Sabalausky wrote:
 I think I'm going to try to put out first-try pass at a new API in a
 separate branch, try to get that out as soon as I can, and post it for
 experimentation/feedback.
Awesome! Looking forward to it. -Steve
Oct 20 2016
prev sibling parent reply Ethan Watson <gooberman gmail.com> writes:
On Wednesday, 19 October 2016 at 21:19:03 UTC, Atila Neves wrote:
 On Wednesday, 19 October 2016 at 15:58:23 UTC, Chris Wright 
 wrote:
 So it seems like the compiler could take care of this by only 
 providing lvalue references but automatically creating those 
 temporary variables for me. It's going to create an extra copy 
 that might not be needed if there were a special rvalue 
 reference type modifier, but why should I care? It's exactly 
 as efficient as the code the compiler forces me to write.

 This is what Ethan Watson has suggested, too.
Interesting. Also, I must have missed that suggestion.
It actually went a bit further than my suggestion, if I'm reading the summary correctly. For example, right now we go: Vector3 vSomeTempName = v1 + v2; someVectorFunc( vSomeTempName ); This will keep the vSomeTempName entirely in scope and living on the stack for as long as that code block is active. A simplification step would be to store rvalues on the stack without needing to name them, and only destroying them once the block's scope goes out of scope. It still provides an easy escape from a C++ function though. For example: D code: return someVectorFunc( v1 + v2 ); C++ code: const Vector3& someVectorFunc( const Vector3& someVector ) { return someVector; } You'd still want to insert some sanity checking code in the code gen to throw an exception if the C++ function is returning a reference to the current stack and your D function is also returning by reference.
Oct 20 2016
parent reply Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 20 October 2016 at 20:16, Ethan Watson via Digitalmars-d <
digitalmars-d puremagic.com> wrote:

 On Wednesday, 19 October 2016 at 21:19:03 UTC, Atila Neves wrote:

 On Wednesday, 19 October 2016 at 15:58:23 UTC, Chris Wright wrote:

 So it seems like the compiler could take care of this by only providing
 lvalue references but automatically creating those temporary variables for
 me. It's going to create an extra copy that might not be needed if there
 were a special rvalue reference type modifier, but why should I care? It's
 exactly as efficient as the code the compiler forces me to write.

 This is what Ethan Watson has suggested, too.
Interesting. Also, I must have missed that suggestion.
It actually went a bit further than my suggestion, if I'm reading the summary correctly. For example, right now we go: Vector3 vSomeTempName = v1 + v2; someVectorFunc( vSomeTempName ); This will keep the vSomeTempName entirely in scope and living on the stack for as long as that code block is active. A simplification step would be to store rvalues on the stack without needing to name them, and only destroying them once the block's scope goes out of scope. It still provides an easy escape from a C++ function though. For example: D code: return someVectorFunc( v1 + v2 ); C++ code: const Vector3& someVectorFunc( const Vector3& someVector ) { return someVector; } You'd still want to insert some sanity checking code in the code gen to throw an exception if the C++ function is returning a reference to the current stack and your D function is also returning by reference.
DIP25 introduced return ref to address this issue. Just annotate it correctly?
Oct 20 2016
parent reply Ethan Watson <gooberman gmail.com> writes:
On Thursday, 20 October 2016 at 10:36:16 UTC, Manu wrote:
 DIP25 introduced return ref to address this issue. Just 
 annotate it correctly?
I mean, it'll work, but it's not the most secure method to rely on the programmer remembering to do it.
Oct 20 2016
parent Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 20 October 2016 at 21:07, Ethan Watson via Digitalmars-d <
digitalmars-d puremagic.com> wrote:

 On Thursday, 20 October 2016 at 10:36:16 UTC, Manu wrote:

 DIP25 introduced return ref to address this issue. Just annotate it
 correctly?
I mean, it'll work, but it's not the most secure method to rely on the programmer remembering to do it.
True, but isn't that just the case for any extern function? I mean, extern functions are just like that; gotta type the signature right :) Not sure it's worth runtime logic to attempt to check that someone typed the signature incorrectly...? It's certainly not the only way users could bugger up the extern declaration and cause any number of similar problems.
Oct 20 2016
prev sibling next sibling parent Johan Engelen <j j.nl> writes:
On Wednesday, 19 October 2016 at 15:18:36 UTC, Atila Neves wrote:
 Yes, it's a stupid example. But ldc2 -O3 gives me this for 
 `silly`:
Great example, thanks, please more of that :) https://github.com/ldc-developers/ldc/issues/1842 cheers, Johan
Oct 19 2016
prev sibling next sibling parent Guillaume Piolat <first.last gmail.com> writes:
On Wednesday, 19 October 2016 at 15:18:36 UTC, Atila Neves wrote:
 My question is: do you _really_ need rvalues to bind to const 
 ref for performance? If not, what _do_ you need it for? Is it 
 an instinctive reaction against passing structs by value from 
 C++98 days?
imho it's the compiler job to pass by value or ref. For a function that is inlined it should be able to make its on choice.
Oct 19 2016
prev sibling next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 10/19/2016 08:18 AM, Atila Neves wrote:

 Did you know there's more
 than one kind of rvalue in C++? Oh yes:

 http://en.cppreference.com/w/cpp/language/value_category

 Do we want that?
NO! My off-topic contribution to this thread: I won't be surprised when C++ will eventually be classified as a case of mass hysteria. Ali
Oct 19 2016
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 10/19/2016 04:50 PM, Ali Çehreli wrote:
 My off-topic contribution to this thread: I won't be surprised when C++
 will eventually be classified as a case of mass hysteria.
That'll happen at the same time modern web technology stacks are classified similarly. Much as I'd love to see that day, I'm not holding my breath... But seriously, every time I look at anything going on in C++ the last several years, it looks more and more like it just simply wants to *be* D, takes a couple drunken steps in that direction, and falls flat on its face.
Oct 19 2016
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 10/19/2016 10:53 PM, Nick Sabalausky wrote:

 [C++] just simply wants to *be*
 D, takes a couple drunken steps in that direction, and falls flat on its
 face.
That's too funny! :D Ali
Oct 19 2016
prev sibling parent bitwise <bitwise.pvt gmail.com> writes:
On Wednesday, 19 October 2016 at 15:18:36 UTC, Atila Neves wrote:
 [...]
 On the one hand some people want rvalues to bind to const ref. 
 I can only assume that they want this because they want to pass 
 rvalues to a function efficiently
 [...]

 struct Vector { float x, y, z; }
In games/real-time simulations, we have to deal with 4x4 float/double matrices. They're not big enough to warrant a heap alloc(like opencv's cv::Mat) but not small enough that you want to arbitrarily copy them around either. When you have a scene with thousands of nodes, all the extra copying will be a huge waste. I cringe every time I see someone getting all religious about profilers. There isn't always one big thing that's responsible for your slowdown.
 The situation is this: if one wants move semantics, one must 
 know when one can move. Because rvalues bind to const& in C++, 
 you never know whether the const& is an lvalue or rvalue. The 
 solution to this was rvalue references, which are refs that can 
 _only_ bind to rvalues. That way you know that the origin was 
 an rvalue an wahey, move semantics. They complicated the 
 language significantly. Did you know there's more than one kind 
 of rvalue in C++? Oh yes:
I don't understand the situation completely here. void foo(ref Bar bar){} void foo(Bar bar){} Why do these have to be ambiguous? Can't the compiler just prefer the second overload for rvalues? In C++, you _must_ differentiate between move constructors and by-value constructors because of the eager copying that happens when you pass things like std::vector by value. I suggested a similar convention for D containers though, and Andrei was strongly opposed to the idea of eager-copying value-type containers. If things go this way, then aren't the above two overloads enough? Bit
Oct 20 2016