www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Best interface for memcpy() (and the string.h family of functions)

reply Stefanos Baziotis <sdi1600105 di.uoa.gr> writes:
I'm a GSoC student (I'll post this week an update) in
the project "Independency of D from the C Standard Library".
Part of this project is a D implementation of the family of 
functions
memcpy(), memset() etc.

What do you think is the best interface for say memcpy()?

My initial pick was void memcpyD(T)(T* dst, const T* src), but it 
was proposed
that `ref` instead of pointers might be better.

Thanks,
Stefanos
May 29
next sibling parent reply Jonathan Marler <johnnymarler gmail.com> writes:
On Wednesday, 29 May 2019 at 11:46:28 UTC, Stefanos Baziotis 
wrote:
 I'm a GSoC student (I'll post this week an update) in
 the project "Independency of D from the C Standard Library".
 Part of this project is a D implementation of the family of 
 functions
 memcpy(), memset() etc.

 What do you think is the best interface for say memcpy()?

 My initial pick was void memcpyD(T)(T* dst, const T* src), but 
 it was proposed
 that `ref` instead of pointers might be better.

 Thanks,
 Stefanos
The default memcpy signature is still pretty useful in many cases. The original signature should still be implemented and available as a non-template function: void memcpy(void* dst, void* src, size_t length); For D, you should also create a template so developer's don't have to cast to `void*` all the time, but it just forwards all calls to the real memcpy function like this: void memcpy(T,U)(T* dst, U* src, size_t length) { pragma(inline, true); memcpy(cast(void*)dst, cast(void*)src, length); } And there's no need to have a different name like `memcpyD`. The function behaves the same as libc's memcpy, and when you have libc available, you should use that implementation instead so you can leverages other people's work when you can. However, we also want to get type-safety and bounds-checking when when can. So we should also provide a set of templates that accept D arrays, verifies type-safety and bounds checking, then forwards the call to memcpy. /** acopy - Array Copy */ void acopy(T,U)(T dst, U src) trusted if (isArrayLike!T && isArrayLike!U && dst[0].sizeof == src[0].sizeof) in { assert(dst.length >= src.length, "copyFrom source length larger than destination"); } do { pragma(inline, true); static assert (!__traits(isStaticArray, T), "acopy doest not accept static arrays since they are passed by value"); import whereever_memcpy_is: memcpy; memcpy(dst.ptr, src.ptr, src.length * ElementSizeForCopy!dst); } /// ditto void acopy(T,U)(T dst, U src) system if (isArrayLike!T && isPointerLike!U && dst[0].sizeof == src[0].sizeof) { pragma(inline, true); static assert (!__traits(isStaticArray, T), "acopy doest not accept static arrays since they are passed by value"); import whereever_memcpy_is: memcpy; memcpy(dst.ptr, src, dst.length * ElementSizeForCopy!dst); } /// ditto void acopy(T,U)(T dst, U src) system if (isPointerLike!T && isArrayLike!U && dst[0].sizeof == src[0].sizeof) { pragma(inline, true); import whereever_memcpy_is: memcpy; memcpy(dst, src.ptr, src.length * ElementSizeForCopy!dst); } /// ditto void acopy(T,U)(T dst, U src, size_t size) system if (isPointerLike!T && isPointerLike!U && dst[0].sizeof == src[0].sizeof) { pragma(inline, true); import whereever_memcpy_is: memcpy; memcpy(dst, src, size * ElementSizeForCopy!dst); } Note that the isArrayLike and isPointerLike and ElementSizeForCopy would probably look something like: template isArrayLike(T) { enum isArrayLike = is(typeof(T.init.length)) && is(typeof(T.init.ptr)) && is(typeof(T.init[0])); } template isPointerLike(T) { enum isPointerLike = T.sizeof == (void*).sizeof && is(typeof(T.init[0])); } // The size of each array element. If the actual size is 0, then it // is assumed to be 1. template ElementSizeForCopy(alias Array) { static if (Array[0].sizeof == 0) enum ElementSizeForCopy = 1; else enum ElementSizeForCopy = Array[0].sizeof; } Note that everything here is an inline-template, so everything gets reduced to a single memcpy call and some bounds checks.
May 29
parent reply Stefanos Baziotis <sdi1600105 di.uoa.gr> writes:
On Wednesday, 29 May 2019 at 15:41:42 UTC, Jonathan Marler wrote:
 The default memcpy signature is still pretty useful in many 
 cases.  The original signature should still be implemented and 
 available as a non-template function:

 void memcpy(void* dst, void* src, size_t length);

 For D, you should also create a template so developer's don't 
 have to cast to `void*` all the time, but it just forwards all 
 calls to the real memcpy function like this:

 void memcpy(T,U)(T* dst, U* src, size_t length)
 {
     pragma(inline, true);
     memcpy(cast(void*)dst, cast(void*)src, length);
 }

 And there's no need to have a different name like `memcpyD`. 
 The function behaves the same as libc's memcpy, and when you 
 have libc available, you should use that implementation instead 
 so you can leverages other people's work when you can.
I'm not sure about that. Does it really make sense to have such an interface in the case where you don't have libc memcpy available? Although, there is a discussion about such fallback functions. But I don't know, I feel like it will encourage bad practices. In the same way, I don't know about whether it should accept two different types.
 However, we also want to get type-safety and bounds-checking 
 when when can.  So we should also provide a set of templates 
 that accept D arrays, verifies type-safety and bounds checking, 
 then forwards the call to memcpy.
Those are good ideas. But I think all this could be done explicitly with (ref T[] dst, ref T[] source). This makes a specific-to-arrays version, which again I'm unsure if it is good to make specific cases. Generally, all those things are up for discussion, I don't pretend to have some definitive answer. The thing with all this code depending on libc memcpy is that to my understanding, the prospect is that libc will be removed. And this project is a step towards that by making some better D versions (meaning, leveraging D features). If the better version calls libc, then when libc is finally removed, all this code will break. And because we encouraged this bad practice, _a lot_ of code will break. Which will then force people to write their D-version of memcpy(void *dst, const void *src, size_t len); Which of course is bad because suddenly, we lost all the D benefits + we lost all the work that has been put on libc. Best regards, Stefanos
May 29
parent reply Jonathan Marler <johnnymarler gmail.com> writes:
On Wednesday, 29 May 2019 at 17:35:03 UTC, Stefanos Baziotis 
wrote:
 On Wednesday, 29 May 2019 at 15:41:42 UTC, Jonathan Marler 
 wrote:
 The default memcpy signature is still pretty useful in many 
 cases.  The original signature should still be implemented and 
 available as a non-template function:

 void memcpy(void* dst, void* src, size_t length);

 For D, you should also create a template so developer's don't 
 have to cast to `void*` all the time, but it just forwards all 
 calls to the real memcpy function like this:

 void memcpy(T,U)(T* dst, U* src, size_t length)
 {
     pragma(inline, true);
     memcpy(cast(void*)dst, cast(void*)src, length);
 }

 And there's no need to have a different name like `memcpyD`. 
 The function behaves the same as libc's memcpy, and when you 
 have libc available, you should use that implementation 
 instead so you can leverages other people's work when you can.
I'm not sure about that. Does it really make sense to have such an interface in the case where you don't have libc memcpy available?
Sure. Any time you have a buffer whose type isn't known at compile-time and you need to copy between them. For example, I have an audio program that copies buffers of audio, but the format of that buffer could be an array of floats or integers depending on the format that your audio hardware and OS support.
 Although, there is a discussion about such fallback functions. 
 But I don't
 know, I feel like it will encourage bad practices.

 In the same way, I don't know about whether it should accept 
 two different types.
Well that's why you have memcpy (for those who know what they're doing) and you have other functions for safe behavior. But you don't want to instantiate a new version of memcpy for every type variation, that's why they all just forward the call to the real memcpy.
 However, we also want to get type-safety and bounds-checking 
 when when can.  So we should also provide a set of templates 
 that accept D arrays, verifies type-safety and bounds 
 checking, then forwards the call to memcpy.
Those are good ideas. But I think all this could be done explicitly with (ref T[] dst, ref T[] source). This makes a specific-to-arrays version, which again I'm unsure if it is good to make specific cases.
Yes it could be done, but then you end up with N copies of your memcpy implementation, one for every combination of types. You're code size is going to explode. You can certainly support the signature you provided, I just wouldn't have the implementation inside of that template, instead you should cast and forward to memcpy.
 The thing with all this code depending on libc memcpy is that 
 to my understanding,
 the prospect is that libc will be removed. And this project is 
 a step towards that
 by making some better D versions (meaning, leveraging D 
 features).
Right, which is why you use the libc version by default, and only use your own when libc is disabled. This is what I do in my standard library https://github.com/marler8997/mar which works with or without libc. I went through several designs for how to go about this memcpy solution and what I've provided you is the result of that.
 If the better version calls libc, then when libc
 is finally removed, all this code will break. And because we 
 encouraged
 this bad practice, _a lot_ of code will break.
How would it break? If you remove libc, your module should now enable your implementation of memcpy. And all the code that calls memcpy doesn't care whether it came from libc or from a D module.
May 29
parent reply Stefanos Baziotis <sdi1600105 di.uoa.gr> writes:
On Wednesday, 29 May 2019 at 17:45:59 UTC, Jonathan Marler wrote:
 I'm not sure about that. Does it really make sense to have 
 such an
 interface in the case where you don't have libc memcpy 
 available?
Sure. Any time you have a buffer whose type isn't known at compile-time and you need to copy between them. For example, I have an audio program that copies buffers of audio, but the format of that buffer could be an array of floats or integers depending on the format that your audio hardware and OS support.
So, you copy ubyte*.
 Well that's why you have memcpy (for those who know what 
 they're doing) and you have other functions for safe behavior.  
 But you don't want to instantiate a new version of memcpy for 
 every type variation, that's why they all just forward the call 
 to the real memcpy.
You want, because instantiation and inlining of specific types is what makes D memcpy fast. And also, what I hope will make better error messages and instrumentation. But that's yet to be seen, most important is the performance.
 Yes it could be done, but then you end up with N copies of your 
 memcpy implementation, one for every combination of types.  
 You're code size is going to explode.  You can certainly 
 support the signature you provided, I just wouldn't have the 
 implementation inside of that template, instead you should cast 
 and forward to memcpy.
Actually, code size for arrays is a very good reminder, thanks.
 The thing with all this code depending on libc memcpy is that 
 to my understanding,
 the prospect is that libc will be removed. And this project is 
 a step towards that
 by making some better D versions (meaning, leveraging D 
 features).
Right, which is why you use the libc version by default, and only use your own when libc is disabled. This is what I do in my standard library https://github.com/marler8997/mar which works with or without libc. I went through several designs for how to go about this memcpy solution and what I've provided you is the result of that.
 If the better version calls libc, then when libc
 is finally removed, all this code will break. And because we 
 encouraged
 this bad practice, _a lot_ of code will break.
How would it break? If you remove libc, your module should now enable your implementation of memcpy. And all the code that calls memcpy doesn't care whether it came from libc or from a D module.
My point is that you will write code differently depending on what memcpy you have, that's why this "new memcpy" will have different signature. To have the best of both worlds, we would have to write our own memcpy(void*, void*, size_t);. And so, if you encourage the use of this interface (because hey, even if you don't have libc eventually, your code will not crash), when libc is not present, the code will be slow.
May 29
parent reply Jonathan Marler <johnnymarler gmail.com> writes:
On Wednesday, 29 May 2019 at 17:55:49 UTC, Stefanos Baziotis 
wrote:
 On Wednesday, 29 May 2019 at 17:45:59 UTC, Jonathan Marler 
 wrote:
 I'm not sure about that. Does it really make sense to have 
 such an
 interface in the case where you don't have libc memcpy 
 available?
Sure. Any time you have a buffer whose type isn't known at compile-time and you need to copy between them. For example, I have an audio program that copies buffers of audio, but the format of that buffer could be an array of floats or integers depending on the format that your audio hardware and OS support.
So, you copy ubyte*.
It doesn't make a difference whether the final memcpy is `void*` or `byte*`. The point is that it's one function, not a template, and you might as well use the same type that the real memcpy uses so you don't change the signature when you're not using libc.
 Well that's why you have memcpy (for those who know what 
 they're doing) and you have other functions for safe behavior.
  But you don't want to instantiate a new version of memcpy for 
 every type variation, that's why they all just forward the 
 call to the real memcpy.
You want, because instantiation and inlining of specific types is what makes D memcpy fast. And also, what I hope will make better error messages and instrumentation. But that's yet to be seen, most important is the performance.
You don't want to inline the memcpy implementation. What makes you think that would be faster?
May 29
parent reply Stefanos Baziotis <sdi1600105 di.uoa.gr> writes:
On Wednesday, 29 May 2019 at 18:00:57 UTC, Jonathan Marler wrote:
 It doesn't make a difference whether the final memcpy is 
 `void*` or `byte*`.
Yes.
 The point is that it's one function, not a template, and you 
 might as well use the same type that the real memcpy uses so 
 you don't change the signature when you're not using libc.
This is what will prevent doing anything really useful in D. This is what I meant that to have that, you have to implement the D version of libc memcpy.
 You don't want to inline the memcpy implementation.  What makes 
 you think that would be faster?
CTFE / introspection I hope and currently, benchmarks.
May 29
parent reply Jonathan Marler <johnnymarler gmail.com> writes:
On Wednesday, 29 May 2019 at 18:04:07 UTC, Stefanos Baziotis 
wrote:
 You don't want to inline the memcpy implementation.  What 
 makes you think that would be faster?
CTFE / introspection I hope and currently, benchmarks.
You didn't answer the question. How would inlining the implementation of memcpy be faster? The implementation of memcpy doesn't need to know which types it is copying, so every call to it can have the exact same implementation. You only need one instance of the implementation. This means you can fine-tune it, many libc implementations will implement it in assembly because it's used so often and again, it doesn't need to know what types it is copying. All it needs is 2 pointers a size. That's why in D, you should only create wrappers that ensure type-safety and bounds checking and then forward to the real implementation, and those wrappers should be inlined but not the memcpy implementation itself. If you want to provide you own implementation of memcpy you can, but inlining your implementation into every call, when the implementation is truly type agnostic just results in code bloat with no benefit.
May 29
next sibling parent reply Stefanos Baziotis <sdi1600105 di.uoa.gr> writes:
On Wednesday, 29 May 2019 at 18:14:11 UTC, Jonathan Marler wrote:
 You didn't answer the question.
I don't know how "benchmarks" does not answer a question. For me, it's the most important answer.
 How would inlining the implementation of memcpy be faster? The 
 implementation of memcpy doesn't need to know which types it is 
 copying, so every call to it can have the exact same 
 implementation.  You only need one instance of the 
 implementation.  This means you can fine-tune it, many libc 
 implementations will implement it in assembly because it's used 
 so often and again, it doesn't need to know what types it is 
 copying.  All it needs is 2 pointers a size.  That's why in D, 
 you should only create wrappers that ensure type-safety and 
 bounds checking and then forward to the real implementation, 
 and those wrappers should be inlined but not the memcpy 
 implementation itself.

 If you want to provide you own implementation of memcpy you 
 can, but inlining your implementation into every call, when the 
 implementation is truly type agnostic just results in code 
 bloat with no benefit.
It is typed currently, with benefits. It's not the same for every type and our idea is not to just forward the size. By inlining, you can get quite better performance exactly because you inline and you don't just forward the size and because you know info about the type. Check this: https://github.com/JinShil/memcpyD/blob/master/memcpyd.d And preferably, run it and see the asm generated. Also, what should be considered is that types give you the info about alignment and different implementations depending on this alignment.
May 29
parent reply Jonathan Marler <johnnymarler gmail.com> writes:
On Wednesday, 29 May 2019 at 19:06:43 UTC, Stefanos Baziotis 
wrote:
 On Wednesday, 29 May 2019 at 18:14:11 UTC, Jonathan Marler 
 wrote:
 You didn't answer the question.
I don't know how "benchmarks" does not answer a question. For me, it's the most important answer.
Yes that would be an answer, I guess I got confused when you mentioned CTFE and introspection, I wasn't sure if "benchmarks" was referring to those features or to runtime benchmarks. And looks like Mike posted the benchmarks on that github link you sent.
 How would inlining the implementation of memcpy be faster? The 
 implementation of memcpy doesn't need to know which types it 
 is copying, so every call to it can have the exact same 
 implementation.  You only need one instance of the 
 implementation.  This means you can fine-tune it, many libc 
 implementations will implement it in assembly because it's 
 used so often and again, it doesn't need to know what types it 
 is copying.  All it needs is 2 pointers a size.  That's why in 
 D, you should only create wrappers that ensure type-safety and 
 bounds checking and then forward to the real implementation, 
 and those wrappers should be inlined but not the memcpy 
 implementation itself.

 If you want to provide you own implementation of memcpy you 
 can, but inlining your implementation into every call, when 
 the implementation is truly type agnostic just results in code 
 bloat with no benefit.
It is typed currently, with benefits. It's not the same for every type and our idea is not to just forward the size. By inlining, you can get quite better performance exactly because you inline and you don't just forward the size and because you know info about the type. Check this: https://github.com/JinShil/memcpyD/blob/master/memcpyd.d And preferably, run it and see the asm generated. Also, what should be considered is that types give you the info about alignment and different implementations depending on this alignment.
It's true that if you can assume pointers are aligned on a particular boundary that you can be faster than memcpy which works with any alignment. This must be what Mike is doing, though, I would then create only a few instances of memcpy that assume alignment on boundaries like 4, 8, 16. And if you have a pointer or an array to a particular type, you can probably assume that pointer/array is aligned on that types's "alignof" property. I think I will use this in my library.
May 29
parent reply Stefanos Baziotis <sdi1600105 di.uoa.gr> writes:
On Wednesday, 29 May 2019 at 19:35:36 UTC, Jonathan Marler wrote:
 Yes that would be an answer, I guess I got confused when you 
 mentioned CTFE and introspection, I wasn't sure if "benchmarks" 
 was referring to those features or to runtime benchmarks.  And 
 looks like  Mike posted the benchmarks on that github link you 
 sent.
Great, you can see that in the benchmarks, memcpyD is faster than libc memcpy except for sizes larger than 32768. We hope that we can surpass those as well, as yesterday I did some simple inline SIMD things and got better performance in 32768. But previous work is of course responsibility of Mike and those benchmarks are in part because of inlining.
 It's true that if you can assume pointers are aligned on a 
 particular boundary that you can be faster than memcpy which 
 works with any alignment.  This must be what Mike is doing, 
 though, I would then create only a few instances of memcpy that 
 assume alignment on boundaries like 4, 8, 16.  And if you have 
 a pointer or an array to a particular type, you can probably 
 assume that pointer/array is aligned on that types's "alignof" 
 property.
This is, as I said, the alignment guarrantee. I hope that I can get other benefits from types also. Also, hopefully we will do LDC / GDC specific things. Leverage the intrinsics for example. I will put an update shortly, as the other students, explaining some of that, but I thought since we started it.. :p
 I think I will use this in my library.
Great! We hope that it will be useful and any feedback is appreciated!
May 29
parent reply Jonathan Marler <johnnymarler gmail.com> writes:
On Wednesday, 29 May 2019 at 20:28:18 UTC, Stefanos Baziotis 
wrote:
 On Wednesday, 29 May 2019 at 19:35:36 UTC, Jonathan Marler 
 wrote:
[...]
Great, you can see that in the benchmarks, memcpyD is faster than libc memcpy except for sizes larger than 32768. We hope that we can surpass those as well, as yesterday I did some simple inline SIMD things and got better performance in 32768. But previous work is of course responsibility of Mike and those benchmarks are in part because of inlining.
[...]
This is, as I said, the alignment guarrantee. I hope that I can get other benefits from types also. Also, hopefully we will do LDC / GDC specific things. Leverage the intrinsics for example. I will put an update shortly, as the other students, explaining some of that, but I thought since we started it.. :p
 [...]
Great! We hope that it will be useful and any feedback is appreciated!
I haven't benchmarked it yet but here's the changes I've made to my standard library to also take advantage of alignment guarantees from typed pointers and arrays. https://github.com/dragon-lang/mar/commit/bb096d2d4f489d47177f6a678b1d9bab756e3dc7
May 29
parent reply Stefanos Baziotis <sdi1600105 di.uoa.gr> writes:
On Wednesday, 29 May 2019 at 23:27:35 UTC, Jonathan Marler wrote:
 I haven't benchmarked it yet but here's the changes I've made 
 to my standard library to also take advantage of alignment 
 guarantees from typed pointers and arrays.

 https://github.com/dragon-lang/mar/commit/bb096d2d4f489d47177f6a678b1d9bab756e3dc7
Good, this week I'm also working on alignment. (more specifically, mis-alignment). Since you took the time anyway to play with alignment, you might find SIMD instructions useful. Take a look at Mike's memcpyD. My yesterday toy SIMD that surpassed libc memcpy was as simple as: static foreach(i; 0 .. T.sizeof/32) { // Assuming RDI is 'dst' and RSI 'src' asm pure nothrow nogc { vmovdqa YMM0, [RDI+i*32]; vmovdqa [RSI+i*32], YMM0; } } /* instead of static foreach(i; 0 .. T.sizeof/32) { memcpyD((cast(S!32*)dst) + i, (cast(const S!32*)src) + i); } */ Again, really simple and dumb, but effective. A couple of notes, so that you don't have the headaches I had: 1) You can use `vmovdqu` (notice the 'u' at the end) for unaligned memory and skip note 2. 2) `vmovdqa` assumes 32-byte aligned memory. Now, `align()` is kind of buggy, so if you have a normal buffer on the stack that you want to align, that: align(32) ubyte[32768] buf; won't work. One solution is to allocate memory on heap and do slight pointer arithmetic to have it aligned. Last minute discovery: Haha, the compiler flags I used were: -mcpu=avx -inline With these flags, memcpyD is faster. _Removing_ -inline resulted in faster code for libc memcpy. I'll have to look close tomorrow. (Oh, and the libc memcpy, it seems from disasm, achieves these results with sse3, so 128-bit instructions. I mean.. at least impressive).
May 29
parent kinke <noone nowhere.com> writes:
On Thursday, 30 May 2019 at 00:55:54 UTC, Stefanos Baziotis wrote:
 Now, `align()` is kind of buggy
It works fine with LDC, and I guess with GDC too.
May 29
prev sibling parent welkam <wwwelkam gmail.com> writes:
On Wednesday, 29 May 2019 at 18:14:11 UTC, Jonathan Marler wrote:
 and then forward to the real implementation
With D you can forward to best suiting implementation. What libc does it performs various runtime checks in order to figure out what is the best way of copying provided input. With D it should be possible to make certain checks at compile time. Secondly C's memcopy is a big function not because its best for performance but because of convenience. With D we can have many smaller functions and they would be selected by template magic.
May 29
prev sibling next sibling parent reply kinke <noone nowhere.com> writes:
On Wednesday, 29 May 2019 at 11:46:28 UTC, Stefanos Baziotis 
wrote:
 My initial pick was void memcpyD(T)(T* dst, const T* src), but 
 it was proposed
 that `ref` instead of pointers might be better.
ref would only work when copying one instance at a time. Many times, you'll want to copy a contiguous array of a length only known at runtime (and definitely NOT invoke memcpy in a loop, so that the implementation can e.g. use SIMD streaming when copying gazillions of 32-bit pixels). I'd suggest a structure similar to this, minimizing bloat: // int a, b; memcpyD(&a, &b); // int[4] a, b; memcpyD(&a, &b); // int[16] a; int[4] b; memcpyD!4(&a[8], b.ptr); void memcpyD(size_t length = 1, T)(T* dst, const T* src) { pickBestImpl!(T.alignof, length * T.sizeof)(dst, src); } void memcpyD(T)(T* dst, const T* src, size_t length) { pickBestImpl!(T.alignof)(dst, src, length * T.sizeof); } private: /* These 2 will probably share most logic, the first one just exploiting a * static size. A common mixin might come in handy (e.g., switching from * runtime-if to static-if). */ void pickBestImpl(size_t alignment, size_t size)(void* dst, const void* src); void pickBestImpl(size_t alignment)(void* dst, const void* src, size_t size);
May 29
parent reply Stefanos Baziotis <sdi1600105 di.uoa.gr> writes:
On Wednesday, 29 May 2019 at 20:50:45 UTC, kinke wrote:
 ref would only work when copying one instance at a time. Many 
 times, you'll want to copy a contiguous array of a length only 
 known at runtime (and definitely NOT invoke memcpy in a loop, 
 so that the implementation can e.g. use SIMD streaming when 
 copying gazillions of 32-bit pixels).
The current state is that we think that slices should be enough for this need. Meaning, you don't need the third size parameter. In this case, ref is better. On the other, in other cases I think that pointers are more intuitive. Again, of course the fact that _I_ think it is of little importance. That post was primarily made so that you, the community, can give feedback on this. Apart from that, I'm still sceptical about whether we should provide a version with size..
May 29
parent reply kinke <noone nowhere.com> writes:
On Thursday, 30 May 2019 at 00:18:06 UTC, Stefanos Baziotis wrote:
 The current state is that we think that slices should be enough 
 for this need.
 Meaning, you don't need the third size parameter. In this case, 
 ref is better. On the other, in other cases I think that 
 pointers
 are more intuitive.
In D, there's no ugly and unsafe need to pass slices to memcpy, as a simple `dst[] = src[]` can do the job much better, boiling down to a memcpy (with 3rd param) if T is a POD (and the two slices don't overlap, have the same length etc. if bounds checks are enabled). Taking a slice by ref, if I understand you correctly, would firstly only work with slice lvalues (i.e., no `ptr[0..$-1]` rvalues), and secondly IMO be very confusing and bad for generic code, as I would expect the slice itself to be memcopied then, not its contents.
May 29
parent reply Mike Franklin <slavo5150 yahoo.com> writes:
On Thursday, 30 May 2019 at 01:19:54 UTC, kinke wrote:

 In D, there's no ugly and unsafe need to pass slices to memcpy, 
 as a simple `dst[] = src[]` can do the job much better, boiling 
 down to a memcpy (with 3rd param) if T is a POD (and the two 
 slices don't overlap, have the same length etc. if bounds 
 checks are enabled).
This is an important observation. My vision for the GSoC project was targeted primarily at druntime. D memcpy would rarely, if ever, be invoked directly by most users. Expressions like `dst[] = src[]` and other assignment expressions that require memcpy as part of their behaviro, would be lowered by the compiler to the runtime memcpy template. Mike
May 29
parent reply Stefanos Baziotis <sdi1600105 di.uoa.gr> writes:
On Thursday, 30 May 2019 at 01:35:05 UTC, Mike Franklin wrote:
 On Thursday, 30 May 2019 at 01:19:54 UTC, kinke wrote:

 In D, there's no ugly and unsafe need to pass slices to 
 memcpy, as a simple `dst[] = src[]` can do the job much 
 better, boiling down to a memcpy (with 3rd param) if T is a 
 POD (and the two slices don't overlap, have the same length 
 etc. if bounds checks are enabled).
This is an important observation. My vision for the GSoC project was targeted primarily at druntime. D memcpy would rarely, if ever, be invoked directly by most users.
If we don't really target users, then that makes this:
 Apart from that, I'm still sceptical about whether we should 
 provide
 a version with size..
Not important. Because my thought was that a lot of users would have some pointers a, b and somehow want to do: memcpy(a, b, for_some_size); What I'm thinking is that yes, we decouple D from libc _on D Runtime_. But in general, users may will still want that.
May 30
parent reply Mike Franklin <slavo5150 yahoo.com> writes:
On Thursday, 30 May 2019 at 08:28:50 UTC, Stefanos Baziotis wrote:

 If we don't really target users, then that makes this:

 Apart from that, I'm still sceptical about whether we should 
 provide
 a version with size..
Not important. Because my thought was that a lot of users would have some pointers a, b and somehow want to do: memcpy(a, b, for_some_size); What I'm thinking is that yes, we decouple D from libc _on D Runtime_. But in general, users may will still want that.
If users need to copy blocks of memory they should first prefer those D features that were added to improve upon C, so users don't have to resort to raw pointers, pointer arithmetic, managing sizes outside of arrays, etc. See Walter's article "C's biggest mistake" for some perspective on that http://www.drdobbs.com/architecture-and-design/cs-biggest-mistake/228701625 It's important when designing a D replacement for a C feature to not repeat C's mistakes. I wouldn't rule out a public interface in the future, but, at the moment, I don't see a compelling use case given that D has first-class arrays. Regardless, a public interface should be required to achieve the goals of the GSoC project and could introduce controversy and other design complications. Mike
May 30
next sibling parent Mike Franklin <slavo5150 yahoo.com> writes:
On Thursday, 30 May 2019 at 09:10:11 UTC, Mike Franklin wrote:
Regardless, a public interface should be
 required to achieve the goals of the GSoC project and could 
 introduce controversy and other design complications.
should --> shouldn't
May 30
prev sibling parent Stefanos Baziotis <sdi1600105 di.uoa.gr> writes:
On Thursday, 30 May 2019 at 09:10:11 UTC, Mike Franklin wrote:
 If users need to copy blocks of memory they should first prefer 
 those D features that were added to improve upon C, so users 
 don't have to resort to raw pointers, pointer arithmetic, 
 managing sizes outside of arrays, etc.  See Walter's article 
 "C's biggest mistake" for some perspective on that 
 http://www.drdobbs.com/architecture-and-design/cs-biggest-mistake/228701625
It's important when designing a D replacement for a C feature to not repeat C's
mistakes.
I agree with Walter on that. I don't think though that ref, dynamic arrays as now and GC are the solution to that or low-level memory management in general. I think that people are in 2 categories: 1) People that use these D features will probably never want to use mempcy() (directly) anyway. 2) People that use D more as a betterC will probably want to use a memcpy() with pointers and possibly one more optional parameter, in which they will give size. But, some important notes: a) D moves in a certain direction, away from C, pointers etc. And it moves towards ref, dynamic arrays. Agreeing with that is not important, but help is. b) If memcpy() targets (possibly only) the D Runtime, then it doesn't really care for the users in category 1) or 2) as they are on the user side. So, I think the best option in this regard, especially note a) is to use refs, unless there are serious implementation obstacles (which I doubt). - Stefanos
May 30
prev sibling parent Kagamin <spam here.lot> writes:
IME partial copy primitives are lacking, so I use this:

/// Copy only as much as possible, return the copied data
T[] CopyHead(T)(T[] dst, in T[] src) pure
{
	if(dst.length>=src.length)return CopyAll(dst, src);
	CopyOverlap(dst, src[0..dst.length]);
	return dst;
}

/// Copy all input data, return the copied data
T[] CopyAll(T)(T[] dst, in T[] src) pure
{
	assert(dst.length>=src.length);
	dst=dst[0..src.length];
	CopyOverlap(dst, src);
	return dst;
}

/// Copy overlapping slices
void CopyOverlap(T)(T[] dst, in T[] src) pure
{
	import core.stdc.string:memmove;
	assert(dst.length==src.length,"same lengths required");
	byte[] dstBytes=cast(byte[])dst;
	memmove(dstBytes.ptr, src.ptr, dstBytes.length);
}
May 30