digitalmars.D.announce - utiliD: A library with absolutely no dependencies for bare-metal

Mike Franklin (28/28) May 04 2019 In an attempt to put some impetus behind an idea that I've

Eugene Wissner (10/40) May 04 2019 Hi Mike,

Mike Franklin (9/17) May 05 2019 Excellent! Yes, I remember seeing tanya. As you can tell I have

Eugene Wissner (37/37) May 09 2019 - Memcmp, memcpy, memmove and memset are named equal, copy,

Mike Franklin (31/69) May 10 2019 Yeah, that is indeed unfortunate, and something I'll have to

Johan Engelen (9/20) May 10 2019 Why would you use inline assembly ? (generalizing but: extremely

Mike Franklin (29/35) May 10 2019 The only reason to use inline assembly is to achieve something

Johan Engelen (18/20) May 11 2019 Inline assembly prevents a lot of optimizations that give large

H. S. Teoh (29/35) May 10 2019 [...]

Mike Franklin (22/51) May 10 2019 I understand that point of view. Indeed we have to demonstrate

Mike Franklin (7/33) May 10 2019 Also, take a look at this data:

H. S. Teoh (14/19) May 10 2019 [...]

Mike Franklin (77/86) May 10 2019 I think this thread is beginning losing sight of the larger

H. S. Teoh (80/133) May 10 2019 Yes, that's definitely a direction we want to head in. I think it will

Mike Franklin (51/93) May 11 2019 Personally I'd be fine with just killing of DMD's backend and

welkam (4/8) May 20 2019 Yeah about that...

Jacob Carlborg (5/11) May 05 2019 Might be interesting to write a tool that enforces the rules. It would

Mike Franklin <slavo5150 yahoo.com> writes:

In an attempt to put some impetus behind an idea that I've 
proposed multiple times on the forum, I've resurrected my utiliD 
repository:  https://github.com/JinShil/utiliD  I decided to 
resurrect the repository after this brief discussion:  
https://forum.dlang.org/post/bjycwgrifumsfrhprjho forum.dlang.org

The idea behind the library is that it would not depend on 
druntime, phobos, C standard library, or anything else but would 
still offer many of the features that those libraries provide.  
To utilize the library, one would only need a D compiler. It 
could be used in bare-metal programming, -betterC builds, or as a 
fundamental utility library for implementing DMD, druntime, and 
phobos themselves.

It's what I envision as a potential seed for Andrei's opt-in 
continuum 
(https://forum.dlang.org/post/q7j4sl$17pe$1 digitalmars.com).

I don't know what will ultimately happen with this library, if 
anything, but even if all it does is facilitate brainstorming of 
ideas, or as the genesis of some better idea, it will be a 
success.

I don't have much time to work on it right now, as I'm currently 
preoccupied fixing the compiler and the druntime to make 
something like utiliD possible, but if others grok the idea and 
want to help make it a reality, your help is most welcome.

Also, if any of you have already started something with the same 
goals, I'll be happy to drop this repository and join you.

You can find me on Slack and Discord using the handle JinShil if 
you wish to have a dialog about this.

Mike

May 04 2019

Eugene Wissner <belka caraus.de> writes:

On Sunday, 5 May 2019 at 03:45:41 UTC, Mike Franklin wrote:
 In an attempt to put some impetus behind an idea that I've 
 proposed multiple times on the forum, I've resurrected my 
 utiliD repository:  https://github.com/JinShil/utiliD  I 
 decided to resurrect the repository after this brief 
 discussion:  
 https://forum.dlang.org/post/bjycwgrifumsfrhprjho forum.dlang.org

 The idea behind the library is that it would not depend on 
 druntime, phobos, C standard library, or anything else but 
 would still offer many of the features that those libraries 
 provide.  To utilize the library, one would only need a D 
 compiler. It could be used in bare-metal programming, -betterC 
 builds, or as a fundamental utility library for implementing 
 DMD, druntime, and phobos themselves.

 It's what I envision as a potential seed for Andrei's opt-in 
 continuum 
 (https://forum.dlang.org/post/q7j4sl$17pe$1 digitalmars.com).

 I don't know what will ultimately happen with this library, if 
 anything, but even if all it does is facilitate brainstorming 
 of ideas, or as the genesis of some better idea, it will be a 
 success.

 I don't have much time to work on it right now, as I'm 
 currently preoccupied fixing the compiler and the druntime to 
 make something like utiliD possible, but if others grok the 
 idea and want to help make it a reality, your help is most 
 welcome.

 Also, if any of you have already started something with the 
 same goals, I'll be happy to drop this repository and join you.

 You can find me on Slack and Discord using the handle JinShil 
 if you wish to have a dialog about this.

 Mike

Hi Mike,

you may remember that I'm working on a library named "tanya" 
(https://github.com/caraus-ecms/tanya). It is now almost 
phobos-free and I reimplemented some routines from libc for 
x86-64 linux. Ideally I'd like to get rid of libc for some 
platforms. While the library isn't interesting for you since it's 
too high-level, it could be based on something like utilD. So, as 
for me, I'd be very much interested in collective effort in this 
direction and can contribute.

May 04 2019

Mike Franklin <slavo5150 yahoo.com> writes:

On Sunday, 5 May 2019 at 05:23:26 UTC, Eugene Wissner wrote:

 you may remember that I'm working on a library named "tanya" 
 (https://github.com/caraus-ecms/tanya). It is now almost 
 phobos-free and I reimplemented some routines from libc for 
 x86-64 linux. Ideally I'd like to get rid of libc for some 
 platforms. While the library isn't interesting for you since 
 it's too high-level, it could be based on something like utilD. 
 So, as for me, I'd be very much interested in collective effort 
 in this direction and can contribute.

Excellent!  Yes, I remember seeing tanya.  As you can tell I have 
very few details worked out with regard to utiliD.  You obviously 
have some more experience creating such a library.

I see that 
https://github.com/caraus-ecms/tanya/tree/master/arch/x64/linux/memory has what
appears to be something equivalent to memcpy, memcmp, and memset.  I am very
interested in having D implementations of those (inline assembly is D) but I
also want to explore the idea keeping things strongly-typed (as least as long
as possible) and utilize design-by-introspection to branch the implementation.

I'd be interested in hearing more about what you have in mind.

Thanks,
Mike

May 05 2019

Eugene Wissner <belka caraus.de> writes:

- Memcmp, memcpy, memmove and memset are named equal, copy, 
copyBackward and fill respectively. I just wanted to create 
native implementations that are bit safer than their C 
counterparts. So they do the same job, but accept void[] instead 
of pointers. There are also templated functions with the same 
names, that work with ranges.

I‘m not very comfortable with GCC‘s inline asm and it doesn‘t 
support naked asm as DMD does, so I put asm in the .S files and 
compile them separately. But I‘m fine with inline asm too. A 
problem with the inline asm is that it should be written in 
several versions since DMD uses different calling conventions 
(unless we use extern(C)) and GDC and LDC use different asm 
syntax.

Tanya contains pretty much stuff now and I‘m just thinking to 
split it in a smaller parts (of a reasonable size), that are 
probably interesting for other people, who is ready to 
contribute, so I don‘t have to maintain everything myself. I 
don‘t know exactly what goes into this more „low-level“ library, 
we can always talk about it.

- OS API

Not sure if it belongs to the scope of utilD. Some time ago it 
became clear to me, that while C has functions for dynamic memory 
management, it uses them internally very seldom. Instead it lets 
the user to allocate the memory. So there functions like:

char *if_indextoname(unsigned int ifindex, char *ifname);

that take an output buffer as the last argument. The same can be 
done with output ranges in D, so these system functions can be 
rewritten in D with a better interface. Whereby I should say that 
tanya‘s range definitions differ from Phobos.

- meta

Another thing probably interesting for utilD library is 
meta-programming. Tanya has „tanya.meta“ package which contains 
templates similar to to std.traits and std.meta + some nice 
extras like Union/Intersection/Difference working on sets of 
types, that are inspired by Boost Hana. This part is completely 
independent (from Phobos and the rest of tanya) and can even be a 
separate library.

May 09 2019

Mike Franklin <slavo5150 yahoo.com> writes:

On Friday, 10 May 2019 at 05:20:59 UTC, Eugene Wissner wrote:
 - Memcmp, memcpy, memmove and memset are named equal, copy, 
 copyBackward and fill respectively. I just wanted to create 
 native implementations that are bit safer than their C 
 counterparts. So they do the same job, but accept void[] 
 instead of pointers. There are also templated functions with 
 the same names, that work with ranges.

 I‘m not very comfortable with GCC‘s inline asm and it doesn‘t 
 support naked asm as DMD does, so I put asm in the .S files and 
 compile them separately. But I‘m fine with inline asm too. A 
 problem with the inline asm is that it should be written in 
 several versions since DMD uses different calling conventions 
 (unless we use extern(C)) and GDC and LDC use different asm 
 syntax.

Yeah, that is indeed unfortunate, and something I'll have to 
consider. I have had to write 3 different inline-asm 
implementations for some of my exporations, and didn't find it to 
be too bad.  I very much prefer to read the D with the inline-asm 
than a straight assembly file.  I've studied the ARM 
implementation of memcpy a little, and it's quite hard to follow. 
  I'd like for the D implementations to make such code easier to 
understand and maintain.

 Tanya contains pretty much stuff now and I‘m just thinking to 
 split it in a smaller parts (of a reasonable size), that are 
 probably interesting for other people, who is ready to 
 contribute, so I don‘t have to maintain everything myself. I 
 don‘t know exactly what goes into this more „low-level“ 
 library, we can always talk about it.

Yes, I'm still working that out too.  If you apply the rule that 
it should not require anything from druntime, the C standard 
library, or dynamic memory allocation, it does eliminate quite a 
bit, and narrows the scope.  What I'm trying to do now is just 
focus on the obvious, and hopefully with that out of the way the 
rest will begin to reveal themselves.

 - OS API

 Not sure if it belongs to the scope of utilD. Some time ago it 
 became clear to me, that while C has functions for dynamic 
 memory management, it uses them internally very seldom. Instead 
 it lets the user to allocate the memory. So there functions 
 like:

 char *if_indextoname(unsigned int ifindex, char *ifname);

 that take an output buffer as the last argument. The same can 
 be done with output ranges in D, so these system functions can 
 be rewritten in D with a better interface. Whereby I should say 
 that tanya‘s range definitions differ from Phobos.

Yes, I like that.  The buffer and memory management is then 
delegated outside of the library.  That, IMO, makes the library 
more broadly useful.  Fundamental algorithms (e.g. from 
std.algorithm, std.range, etc.) that can operate on memory 
buffers/ranges in this way would be good candidates for utiliD, 
IMO.  But I'd want them to meet the criteria of being fundamental 
and broadly useful; not too specialized.  Specialized algorithms 
that are designed for specific problem domains should probably go 
in a library designed for that problem domain.

 - meta

 Another thing probably interesting for utilD library is 
 meta-programming. Tanya has „tanya.meta“ package which contains 
 templates similar to to std.traits and std.meta + some nice 
 extras like Union/Intersection/Difference working on sets of 
 types, that are inspired by Boost Hana. This part is completely 
 independent (from Phobos and the rest of tanya) and can even be 
 a separate library.

I'm thinking metaprogramming modules and packages are good 
candidates for utilitD as long as they are broadly useful.  I see 
them more as extensions of the language than a library.  Though, 
in a way, that's basically what libraries are too.  I'll have to 
think about this some more but at the moment I'm leaning towards 
inclusion in utiliD.

Mike

May 10 2019

Johan Engelen <j j.nl> writes:

On Friday, 10 May 2019 at 17:16:24 UTC, Mike Franklin wrote:
 On Friday, 10 May 2019 at 05:20:59 UTC, Eugene Wissner wrote:
 - Memcmp, memcpy, memmove and memset are named equal, copy, 
 copyBackward and fill respectively. I just wanted to create 
 native implementations that are bit safer than their C 
 counterparts. So they do the same job, but accept void[] 
 instead of pointers. There are also templated functions with 
 the same names, that work with ranges.

 I‘m not very comfortable with GCC‘s inline asm and it doesn‘t 
 support naked asm as DMD does, so I put asm in the .S files 
 and compile them separately. But I‘m fine with inline asm too.


Why would you use inline assembly ? (generalizing but: extremely 
bad portability, bad performance, bad readability)

Recent discussion on LLVM mailinglist about the problem of the 
optimizer recognizing memcmp implementation and substituting it 
with a call to memcmp: 
https://lists.llvm.org/pipermail/llvm-dev/2019-April/131973.html

cheers,
   Johan

May 10 2019

Mike Franklin <slavo5150 yahoo.com> writes:

On Friday, 10 May 2019 at 17:55:53 UTC, Johan Engelen wrote:

Why would you use inline assembly ? (generalizing but:
extremely bad portability, bad performance, bad readability)

The only reason to use inline assembly is to achieve something
that can't be achieved directly with D. For example, prior to
the introduction of `volatileLoad` and `volatileStore` inline
assembly was required to achieve `volatile` semantics.

For memcpy and memcmp, one would first attempt to write a good
implementation in straight D, but if the compiler doesn't
generate good code for it, it would be appropriate to take
control and provide an implementation in inline assembly.

I don't know how a proper assembly implementation would not be
performant. Perhaps you could elaborate.

Recent discussion on LLVM mailinglist about the problem of the
optimizer recognizing memcmp implementation and substituting it
with a call to memcmp:
https://lists.llvm.org/pipermail/llvm-dev/2019-April/131973.html

Yes, that bit me a while back when I was doing some bare-metal
ARM Cortex-m development
https://forum.dlang.org/post/clptjumxigcozfcyhzzx forum.dlang.org

For compilers that already provide an optimized intrinsic
implementation for memcpy none of this is necessary; one could
simply add a naive implementation, the compiler would recognize
it, and replace it with their optimized version. DMD, to my
understanding, is not one of those compilers.

One of the goals is to no longer require a C toolchain to build D
programs. If the compiler already provides intrinsics without
needed a C standard library, Great!

The other goal is to explore what D could improve upon with its
design-by-introspection features and compiler guarantees (e.g.
`pure` and ` safe`). My initial exploration into that can be
found at https://github.com/JinShil/memcpyD I find it much
easier to read D code like that than something like this:
https://github.com/opalmirror/glibc/blob/c38d0272b7c621078d84593c191e5ee656dbf27c/sysdeps/arm/memcpy.S

Mike

May 10 2019

Johan Engelen <j j.nl> writes:

On Friday, 10 May 2019 at 23:58:37 UTC, Mike Franklin wrote:
 I don't know how a proper assembly implementation would not be 
 performant.  Perhaps you could elaborate.

Inline assembly prevents a lot of optimizations that give large 
performance gains such as constant propagation. Say you implement 
a memcpy with a different signature than C's mempcy (because of 
slices instead of pointers), then the optimizer does not know 
what the semantics of that function are and will need the 
function to be transparent (not assembly) to do such 
optimizations.

But I'm sure you know all that, so that's not your question. :)

In the case of reimplementing memcpy/mem* in a function with the 
same signature as libc, that is not supposed to be inlined (like 
the current libc functions), then I also think the use of inline 
asm will not give a perf penalty. Be careful to recreate the 
exact same semantics as those libc functions because the 
optimizer is going to _assume_ it knows _exactly_ what those 
functions are doing.

cheers,
   Johan

May 11 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, May 10, 2019 at 05:16:24PM +0000, Mike Franklin via
Digitalmars-d-announce wrote:
[...]
 I've studied the ARM implementation of memcpy a little, and it's quite
 hard to follow.  I'd like for the D implementations to make such code
 easier to understand and maintain.

[...]

I'm not 100% sure it's a good idea to implement memcpy in D just to
prove that it can be done / just to say that we're independent of libc.
Libc implementations of fundamental operations, esp. memcpy, are usually
optimized to next week and back for the target architecture, taking
advantage of the target arch's quirks to maximize performance. Not to
mention that advanced compiler backends recognize calls to memcpy and
can optimize it in ways they can't optimize a generic D function they
fail to recognize as being equivalent to memcpy. I highly doubt a
generic D implementation could hope to beat that, and it's a little
unrealistic, given our current manpower situation, for us to be able to
optimize it for each target arch ourselves.


 On Friday, 10 May 2019 at 05:20:59 UTC, Eugene Wissner wrote:

[...]
 Whereby I should say that tanya‘s range definitions differ from
 Phobos.


[..]

I'm a bit uncomfortable with having multiple, incompatible range
definitions.  While the Phobos definition can be argued whether it's the
best, shouldn't we instead be focusing on improving the *standard*
definition of ranges, rather than balkanizing the situation by
introducing multiple, incompatible definitions just because?  It's one
thing for Andrei to propose a std.v2 that, ostensibly, might have a new,
hopefully improved, range API, deprecating the current definition; it's
another thing to have multiple alternative, competing definitions in
libraries that user code can choose from.  That would be essentially
inviting the Lisp Curse.


T

-- 
Life would be easier if I had the source code. -- YHL

May 10 2019

Mike Franklin <slavo5150 yahoo.com> writes:

On Friday, 10 May 2019 at 23:51:56 UTC, H. S. Teoh wrote:

 I'm not 100% sure it's a good idea to implement memcpy in D 
 just to prove that it can be done / just to say that we're 
 independent of libc. Libc implementations of fundamental 
 operations, esp. memcpy, are usually optimized to next week and 
 back for the target architecture, taking advantage of the 
 target arch's quirks to maximize performance. Not to mention 
 that advanced compiler backends recognize calls to memcpy and 
 can optimize it in ways they can't optimize a generic D 
 function they fail to recognize as being equivalent to memcpy. 
 I highly doubt a generic D implementation could hope to beat 
 that, and it's a little unrealistic, given our current manpower 
 situation, for us to be able to optimize it for each target 
 arch ourselves.

I understand that point of view.  Indeed we have to demonstrate 
benefit.  One benefit is to not have to obtain a C toolchain when 
building D programs.  That is actually quite an inconvenient 
barrier to entry when cross-compiling (e.g. for developing 
microcontroller firmware on a PC).

I'm also hoping that a D implementation would be easier to 
comprehend than something like this:  
https://github.com/opalmirror/glibc/blob/c38d0272b7c621078d84593c191e5ee656dbf27c/
ysdeps/arm/memcpy.S  The D implementation still has to handle all of those
corner cases, but I'd rather read D code with inline assembly sprinkled here
and there than read the entire thing in assembly.  The goal with the D
implementation would be to minimize the assembly.

For compilers that already do something special with memcpy and 
don't require a C standard library, there's no reason to do 
anything.  My initial exploration into this has shown that DMD is 
not one of those compilers.

 On Friday, 10 May 2019 at 05:20:59 UTC, Eugene Wissner wrote:

 [...]
 Whereby I should say that tanya‘s range definitions differ 
 from Phobos.


 [..]

 I'm a bit uncomfortable with having multiple, incompatible 
 range definitions.  While the Phobos definition can be argued 
 whether it's the best, shouldn't we instead be focusing on 
 improving the *standard* definition of ranges, rather than 
 balkanizing the situation by introducing multiple, incompatible 
 definitions just because?  It's one thing for Andrei to propose 
 a std.v2 that, ostensibly, might have a new, hopefully 
 improved, range API, deprecating the current definition; it's 
 another thing to have multiple alternative, competing 
 definitions in libraries that user code can choose from.  That 
 would be essentially inviting the Lisp Curse.

Agreed.  We should decide on one consistent definition.  I don't 
know what that looks like right now.  I'm more focused on 
low-level details right now.  I do, however, like the idea of 
delegating the memory management (allocation/deallocation) 
outside of the library.  If that's not feasible for some reason, 
then I would suggest it not be included in utiliD.  I don't want 
dynamic memory allocation in utiliD; that should go into a 
higher-level library that may import utiliD.

Mike

May 10 2019

Mike Franklin <slavo5150 yahoo.com> writes:

On Saturday, 11 May 2019 at 00:09:08 UTC, Mike Franklin wrote:
 On Friday, 10 May 2019 at 23:51:56 UTC, H. S. Teoh wrote:

 I'm not 100% sure it's a good idea to implement memcpy in D 
 just to prove that it can be done / just to say that we're 
 independent of libc. Libc implementations of fundamental 
 operations, esp. memcpy, are usually optimized to next week 
 and back for the target architecture, taking advantage of the 
 target arch's quirks to maximize performance. Not to mention 
 that advanced compiler backends recognize calls to memcpy and 
 can optimize it in ways they can't optimize a generic D 
 function they fail to recognize as being equivalent to memcpy. 
 I highly doubt a generic D implementation could hope to beat 
 that, and it's a little unrealistic, given our current 
 manpower situation, for us to be able to optimize it for each 
 target arch ourselves.

 I understand that point of view.  Indeed we have to demonstrate 
 benefit.  One benefit is to not have to obtain a C toolchain 
 when building D programs.  That is actually quite an 
 inconvenient barrier to entry when cross-compiling (e.g. for 
 developing microcontroller firmware on a PC).

 I'm also hoping that a D implementation would be easier to 
 comprehend than something like this:  
 https://github.com/opalmirror/glibc/blob/c38d0272b7c621078d84593c191e5ee656dbf27c/
ysdeps/arm/memcpy.S  The D implementation still has to handle all of those
corner cases, but I'd rather read D code with inline assembly sprinkled here
and there than read the entire thing in assembly.  The goal with the D
implementation would be to minimize the assembly.

 For compilers that already do something special with memcpy and 
 don't require a C standard library, there's no reason to do 
 anything.  My initial exploration into this has shown that DMD 
 is not one of those compilers.

Also, take a look at this data:  
https://forum.dlang.org/post/jdfiqpronazgglrkmwfq forum.dlang.org 
  Why is DMD making 48,000 runtime calls to memcpy to copy 8 bytes 
of data?  Many of those calls should be inlined.  I see 
opportunity for improvement there.

Mike

May 10 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sat, May 11, 2019 at 12:23:31AM +0000, Mike Franklin via
Digitalmars-d-announce wrote:
[...]
 Also, take a look at this data:
 https://forum.dlang.org/post/jdfiqpronazgglrkmwfq forum.dlang.org  Why
 is DMD making 48,000 runtime calls to memcpy to copy 8 bytes of data?
 Many of those calls should be inlined.  I see opportunity for
 improvement there.

[...]

When it comes to performance, I've essentially given up looking at DMD
output. DMD's inliner gives up far too easily, leading to a lot of calls
that aren't inlined when they really should be, and DMD's optimizer does
not have loop unrolling, which excludes a LOT of subsequent
optimizations that could have been applied.  I wouldn't base any
performance decisions on DMD output. If LDC or GDC produces non-optimal
code, then we have cause to do something. Otherwise, IMO we're just
uglifying D code and making it unmaintainable for no good reason.


T

-- 
Recently, our IT department hired a bug-fix engineer. He used to work for
Volkswagen.

May 10 2019

Mike Franklin <slavo5150 yahoo.com> writes:

On Saturday, 11 May 2019 at 00:32:54 UTC, H. S. Teoh wrote:

 When it comes to performance, I've essentially given up looking 
 at DMD output. DMD's inliner gives up far too easily, leading 
 to a lot of calls that aren't inlined when they really should 
 be, and DMD's optimizer does not have loop unrolling, which 
 excludes a LOT of subsequent optimizations that could have been 
 applied.  I wouldn't base any performance decisions on DMD 
 output. If LDC or GDC produces non-optimal code, then we have 
 cause to do something. Otherwise, IMO we're just uglifying D 
 code and making it unmaintainable for no good reason.

I think this thread is beginning losing sight of the larger 
picture.  What I'm trying to achieve is the opt-in continuum that 
Andrei mentioned elsewhere on this forum.  We can't do that with 
the way the compiler and runtime currently interact.  So, the 
first task, which I'm trying to get around to, is to convert 
runtime hooks to templates.  Using the compile-time type 
information will allow us to avoid `TypeInfo`, therefore classes, 
therefore the entire D runtime.  We're now much closer to the 
opt-in continuum Andrei mentioned previously on this forum.  Now 
let's assume that's done...

Those new templates will eventually call a very few functions 
from the C standard library, memcpy being one of them.  Because 
the runtime hooks are now templates, we have type information 
that we can use in the call to memcpy.  Therefore, I want to 
explore implementing `void memcpy(T)(ref T dst, const ref T src) 
 safe, nothrow, pure,  nogc` rather than `void* memcpy(void*, 
const void*, size_t)`  There are some issues here such as 
template bloat and compile times, but I want to explore it 
anyway.  I'm trying to imagine, what would memcpy in D look like 
if we didn't have a C implementation clouding narrowing our 
imagination.  I don't know how that will turn out, but I want to 
explore it.

For LDC we can just do something like this...

void memcpy(T)(ref T dst, const ref T src)  safe, nothrow,  nogc, 
pure
{
version(LDC)
{
     // after casting dst and src to byte arrays...
     // (probably need to put the casts in a  trusted block)
     for(int i = 0; i < size; i++)
         dstArray[i] = srcArry[i];
}
}

LDC is able to see that as memcpy and do the right thing.  Also 
if the LDC developers want to do their own thing altogether, more 
power to them.  I don't see anything ugly about it.

However, DMD won't do the right thing.  I guess others are 
thinking that we'd just re-implement `void* memcpy(void*, const 
void*, size_t)` in D and we'd throw in a runtime call to 
`memcpy(&dstArray[0], &srcArray[0], T.sizeof())`.  That's 
ridiculous.  What I want to do is use the type information to 
generate an optimal implementation (considering size and 
alignment) that DMD will be forced to inline with 
`pragma(inline)`  That implementation can also take into 
consideration target features such as SIMD.  I don't believe the 
code will be complex, and I expect it to perform at least as well 
as the C implementation.  My initial tests show that it will 
actually outperform the C implementation, but that could be a 
problem with my tests.  I'm still researching it.

Now assuming that's done, we now have language runtime 
implementations that are isolated from heavier runtime features 
(like the `TypeInfo` classes) that can easily be used in -betterC 
builds, bare-metal systems programming, etc. simply by importing 
them as a header-only library; it doesn't require first compiling 
(or cross-compiling) a runtime for linking with your program; you 
just import and go.  We're now much closer to the opt-in 
continuum.

Now what about development of druntime itself.  Well wouldn't it 
be nice if we could utilize things like `std.traits`, `std.meta`, 
`std.conv`, and a bunch of other stuff from Phobos?  Wouldn't it 
also be nice if we could use that stuff in DMD itself without 
importing Phobos?  So let's take that stuff in Phobos that 
doesn't need druntime and put them in a library that doesn't 
require druntime (i.e. utiliD).  Now druntime can import utiliD 
and have more idiomatic-D implementations.

But the benefits don't stop there, bare-metal developers, 
microcontroller developers, kernel driver developers, OS 
developers, etc... can all use the runtime-less library to 
bootstap their own implementations without having to re-invent or 
copy code out of Phobos and druntime.

I'm probably not articulating this vision well.  I'm sorry.  
Maybe we'll just have to hope I can find the time and energy to 
do it myself and then others will finally see from the results.  
Or maybe I'll go have a nice helping of crow.

Mike

May 10 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sat, May 11, 2019 at 01:45:08AM +0000, Mike Franklin via
Digitalmars-d-announce wrote:
[...]
 I think this thread is beginning losing sight of the larger picture.
 What I'm trying to achieve is the opt-in continuum that Andrei
 mentioned elsewhere on this forum.  We can't do that with the way the
 compiler and runtime currently interact.  So, the first task, which
 I'm trying to get around to, is to convert runtime hooks to templates.
 Using the compile-time type information will allow us to avoid
 `TypeInfo`, therefore classes, therefore the entire D runtime.  We're
 now much closer to the opt-in continuum Andrei mentioned previously on
 this forum.  Now let's assume that's done...

Yes, that's definitely a direction we want to head in.  I think it will
be very beneficial.


 Those new templates will eventually call a very few functions from the
 C standard library, memcpy being one of them.  Because the runtime
 hooks are now templates, we have type information that we can use in
 the call to memcpy.  Therefore, I want to explore implementing `void
 memcpy(T)(ref T dst, const ref T src)  safe, nothrow, pure,  nogc`
 rather than `void* memcpy(void*, const void*, size_t)`  There are some
 issues here such as template bloat and compile times, but I want to
 explore it anyway.  I'm trying to imagine, what would memcpy in D look
 like if we didn't have a C implementation clouding narrowing our
 imagination.  I don't know how that will turn out, but I want to
 explore it.

Put this way, I think that's a legitimate area to explore. But copying a
block of memory from one place to another is simply just that, copying a
block of memory from one place to another.  It just boils down to how to
copy N bytes from A to B in the fastest way possible. For that, you just
reduce it to moving K words (the size of which depends only on the
target machine, not the incoming type) of memory from A to B, plus or
minus a few bytes at the end for non-aligned data. The type T only
matters if you need to do type-specific operations like call default
ctors / dtors, but at the memcpy level that should already have been
taken care of by higher-level code, and it isn't memcpy's concern what
ctors/dtors to invoke.

The one thing knowledge of T can provide is whether or not T[] can be
unaligned. If T.sizeof < machine word size, then you need extra code to
take care of the start/end of the block; otherwise, you can just go
straight to the main loop of copying K words from A to B. So that's one
small thing we can take advantage of. It could save a few cycles by
avoiding a branch hazard at the start/end of the copy, and making the
code smaller for inlining.

Anything else you optimize on copying K words from A to B would be
target-specific, like using vector ops, specialized CPU instructions,
and the like. But once you start getting into that, you start getting
into the realm of whether all the complex setup needed for, e.g., a
vector op is worth the trouble if T.sizeof is small. Perhaps here's
another area where knowledge of T can help (if T is small, just use a
na�ve for-loop; if T is sufficiently large, it could be worth incurring
the overhead of setting up vector copy registers, etc., because it makes
copying the large body of T faster).

So potentially a D-based memcpy could have multiple concrete
implementations (copying strategies) that are statically chosen based on
the properties of T, like alignment and size.


[...]
 However, DMD won't do the right thing.

Honestly, at this point I don't even care.


 I guess others are thinking that we'd just re-implement `void*
 memcpy(void*, const void*, size_t)` in D and we'd throw in a runtime
 call to `memcpy(&dstArray[0], &srcArray[0], T.sizeof())`.  That's
 ridiculous.  What I want to do is use the type information to generate
 an optimal implementation (considering size and alignment) that DMD
 will be forced to inline with `pragma(inline)`.

It could be possible to select multiple different memcpy implementations
by statically examining the properties of T.  I think that might be one
advantage D could have over just calling libc's memcpy.  But you have to
be very careful not to outdo the compiler's optimizer so that it doesn't
recognize it as memcpy and fails to apply what would otherwise be a
routine optimization pass.


 That implementation can also take into consideration target features
 such as SIMD.  I don't believe the code will be complex, and I expect
 it to perform at least as well as the C implementation.  My initial
 tests show that it will actually outperform the C implementation, but
 that could be a problem with my tests.  I'm still researching it.

Actually, if you want to compete with the C implementation, you might
find that things could get quite hairy. Maybe not with memcpy, but other
functions like memchr have very clever hacks to speed it up that you
probably wouldn't think of without reading C library source code. There
may also be subtle differences that change depending on the target; it
used to be that `rep movsd` was faster in spite of requiring more
overhead setting up; but last I read, newer CPUs seem to have `rep
movsd` perform rather poorly whereas a plain ole for-loop actually
outperforms `rep movsd`.  At a certain point, this just begs the
question "should I just let the compiler's backend do its job by telling
it plainly that I mean memcpy, or should I engage in asm-hackery because
I'm confident I can outdo the compiler's codegen?".

One thing that might be worth considering is for the *compiler* to
expose a memcpy intrinsic, and then let the compiler decide how best to
implement it (using its intimate knowledge of the target machine arch),
rather than trying to do it manually in library code.


 Now assuming that's done, we now have language runtime implementations
 that are isolated from heavier runtime features (like the `TypeInfo`
 classes) that can easily be used in -betterC builds, bare-metal
 systems programming, etc. simply by importing them as a header-only
 library; it doesn't require first compiling (or cross-compiling) a
 runtime for linking with your program; you just import and go.  We're
 now much closer to the opt-in continuum.
 
 Now what about development of druntime itself.  Well wouldn't it be
 nice if we could utilize things like `std.traits`, `std.meta`,
 `std.conv`, and a bunch of other stuff from Phobos?

Based on what Andrei has voiced, the way to go would be to merge Phobos
and druntime into one, by making Phobos completely opt-in so that you
don't pay for what you don't use from the heavier / higher-level parts
of Phobos.  At a certain point it becomes clear that the division
between Phobos and druntime is artificial, the result of historical
accident, and not a logical necessity that we have to keep. If Phobos is
made completely pay-as-you-go, the distinction becomes completely
irrelevant and the two might as well be merged into one.


 Wouldn't it also be nice if we could use that stuff in DMD itself
 without importing Phobos?  So let's take that stuff in Phobos that
 doesn't need druntime and put them in a library that doesn't require
 druntime (i.e. utiliD).  Now druntime can import utiliD and have more
 idiomatic-D implementations.

See, this trouble is caused by the artificial boundary between Phobos
and druntime.  We should look into breaking down this barrier, not
enforcing it.


 But the benefits don't stop there, bare-metal developers,
 microcontroller developers, kernel driver developers, OS developers,
 etc... can all use the runtime-less library to bootstap their own
 implementations without having to re-invent or copy code out of Phobos
 and druntime.

[...]

I think the logical goal is to make Phobos completely pay-as-you-go.
IOW, an actual *library*, as opposed to a tangled hairball of
dependencies that always comes with strings attached (can't import one
small thing without pulling in the rest of the hairball). A library is
supposed to be a set of resources which you can draw from as needed.
Pulling out one book (module) should not require pulling out half the
library along with it.


T

-- 
Once the bikeshed is up for painting, the rainbow won't suffice. -- Andrei
Alexandrescu

May 10 2019

Mike Franklin <slavo5150 yahoo.com> writes:

On Saturday, 11 May 2019 at 05:39:12 UTC, H. S. Teoh wrote:

 So potentially a D-based memcpy could have multiple concrete 
 implementations (copying strategies) that are statically chosen 
 based on the properties of T, like alignment and size.

Exactly.

 [...]
 However, DMD won't do the right thing.

 Honestly, at this point I don't even care.

Personally I'd be fine with just killing of DMD's backend and 
just investing in LDC and GDC, but I don't think that's going to 
happen, and because of that, we have to care.  DMD is where 
policy and precedent is set for D.  To influence the direction of 
D, it must be done throught DMD.

 It could be possible to select multiple different memcpy 
 implementations by statically examining the properties of T.  I 
 think that might be one advantage D could have over just 
 calling libc's memcpy.  But you have to be very careful not to 
 outdo the compiler's optimizer so that it doesn't recognize it 
 as memcpy and fails to apply what would otherwise be a routine 
 optimization pass.

I understand.  That's why I'm calling it an "exploration" at this 
time.  I want to see what can and can't be done.

 At a certain point, this just begs the question "should I just 
 let the compiler's backend do its job by telling it plainly 
 that I mean memcpy, or should I engage in asm-hackery because 
 I'm confident I can outdo the compiler's codegen?".

I get that, but DMD is not the kind of backend that does that 
stuff.  If I could rely on DMD's, LDC's, and GDC's backend to 
just insert an optimized compiler intrinsic, without the C 
standard library, I would just leverage that. But that doesn't 
seem to be the world we're currently in.

 One thing that might be worth considering is for the *compiler* 
 to expose a memcpy intrinsic, and then let the compiler decide 
 how best to implement it (using its intimate knowledge of the 
 target machine arch), rather than trying to do it manually in 
 library code.

I would love for the backends to just know how to copy memory 
efficiently for all of their targets without me having to do 
anything, and without linking in the C standard library, but 
that's not what I'm seeing from the compilers right now.

 Based on what Andrei has voiced, the way to go would be to 
 merge Phobos and druntime into one, by making Phobos completely 
 opt-in so that you don't pay for what you don't use from the 
 heavier / higher-level parts of Phobos.  At a certain point it 
 becomes clear that the division between Phobos and druntime is 
 artificial, the result of historical accident, and not a 
 logical necessity that we have to keep. If Phobos is made 
 completely pay-as-you-go, the distinction becomes completely 
 irrelevant and the two might as well be merged into one.

Yes, but is making Phobos pay-as-you-go a real possibility?  I 
don't see it that way because all of Phobos has been developed 
under the assumption that all language features are implemented 
and available.  utiliD would be usable in an environment where 
only a subset of D's language features are available.

Also, Phobos has been developed under the assumption that any 
module in Phobos or druntime can be utilized as a dependency in 
any other module.  That has created a dependency mess in Phobos 
and I don't see how that can be disentangled without breaking 
everyone's code.  Furthermore, there is no clear hierarchy in 
Phobos where it is clear at the API level what language features 
are required for each module/function/whatever.  With utiliD, it 
is much clearer where the line is drawn in the hierarchy of 
language features.  Phobos will never be pay-as-you-go if you 
can't see what you're paying for as you go.

 See, this trouble is caused by the artificial boundary between 
 Phobos and druntime.  We should look into breaking down this 
 barrier, not enforcing it.

I agree.  We could actually merge druntime and Phobos into a 
single library today.  I also find the divide between Phobos and 
druntime artificial, but my goal with utiliD is different. I'm 
trying to create a library that does not require runtime language 
features.  I'm not proposing an artificial division that 
currently exists.  I'm trying to build something equivalent to a 
stack, where you start at a very low level (utilid) and add 
layers of increasing capability.  That's not what we have with 
Phobos and druntime today.

 I think the logical goal is to make Phobos completely 
 pay-as-you-go. IOW, an actual *library*, as opposed to a 
 tangled hairball of dependencies that always comes with strings 
 attached (can't import one small thing without pulling in the 
 rest of the hairball). A library is supposed to be a set of 
 resources which you can draw from as needed. Pulling out one 
 book (module) should not require pulling out half the library 
 along with it.

I agree, but that hairball is exactly what Phobos is right now. I 
don't see any way to start from that mess and achieve the 
pay-as-you-go opt-in continuum.  In a way, I'm starting over with 
utiliD, but I believe there is still value in druntime and Phobos 
that can be salvaged to start building an opt-in, pay-as-you-go 
stack of increasing features, sophistication, and capability in 
D, where you know, by what you're importing, what you're getting 
and what it costs.

Mike

May 11 2019

welkam <wwwelkam gmail.com> writes:

On Friday, 10 May 2019 at 23:51:56 UTC, H. S. Teoh wrote:
 Libc implementations of fundamental operations, esp. memcpy, 
 are usually optimized to next > week and back for the target 
 architecture, taking advantage of the target arch's quirks to > 
 maximize performance

Yeah about that...
Level1 Diagnostic: Fixing our Memcpy Troubles (for Looking Glass)
https://www.youtube.com/watch?v=idauoNVwWYE

May 20 2019

Jacob Carlborg <doob me.com> writes:

On 2019-05-05 05:45, Mike Franklin wrote:

 The idea behind the library is that it would not depend on druntime, 
 phobos, C standard library, or anything else but would still offer many 
 of the features that those libraries provide. To utilize the library, 
 one would only need a D compiler. It could be used in bare-metal 
 programming, -betterC builds, or as a fundamental utility library for 
 implementing DMD, druntime, and phobos themselves.

Might be interesting to write a tool that enforces the rules. It would 
use DMD as a library.

-- 
/Jacob Carlborg

May 05 2019

D Programming

C/C++ Programming

Other

digitalmars.D.announce - utiliD: A library with absolutely no dependencies for bare-metal