digitalmars.D.announce - Re: GoingNative 6: The D Episode with Walter Bright and Andrei Alexandrescu

bearophile <bearophileHUGS lycos.com> Feb 26 2012
bearophile <bearophileHUGS lycos.com> writes:
Sorry for the slow reply, I was busy.

Walter:

 Using the JVM forces your program into Java semantics.


Right, a JVM doesn't give all the freedom needed by a system language.


 For example, there are no
 structs in the JVM bytecode. No pointers, either. Nor unsigned types.


The little emscripten compiler compiles LLVM "bytecode" (IR code) to
JavaScript, and despite JavaScript has no structs, it's able to compile them
efficiently, lowering each one of them to a bunch of single variables. The
result seems efficient enough (taking into account the target language is JS):
https://github.com/kripken/emscripten
This can't be used to return structs from a function, but I think allows to
avoid few heap allocations (when the struct is a function argument, or it's
used locally. The Oracle JavaVM nowadays performs escape analysis, I presume
with similar effects).

Despite there are no unsigned integers in the JVM, something is moving:
https://blogs.oracle.com/darcy/entry/unsigned_api


 Your new language is fairly boxed in to being a rehash of Java semantics.


The semantics of languages like Jruby and Scala are different from Java
semantics. Scala type system is not comparable to Java one. But I understand
what you are saying. I think there is more than one meaning for "language
semantics".


 That works if your language is expressible as C, because LLVM is a C/C++ back
 end. If your language has different semantics (like how Go does stacks), using
 LLVM can be a large problem.


LLVM is not infinitely flexible, I agree. But LLVM is already quite wide and
growing. So even for strange languages LLVM seems able to implement a large
percentage of the semantics out of the box. Recently they have implemented a
Haskell back-end using LLVM and it was not able to implement everything needed.
So the Haskell devs have written a small patch (2000 lines or so), to implement
the missing semantics and the LLVM have added it to the main LLVM code.

In the discussion Andrei was talking about the amount of code needed to
implement the efficient base of a new language. Even if LLVM is sometimes not
able to implement 100% of the semantics of your code, I think writing just the
few missing parts reduces the work needed a lot.


 I looked into this years ago. Very little of array bounds checking can be
 optimized away.


This paper (that's about Java) shows nice results in terms of percentage of
array bound checks eliminated (but as expected the speedup is visible only on
code that uses array heavily, like scientific-style code):
http://ssw.jku.at/Research/Papers/Wuerthinger07/

So there are any low-hanging fruits here?


 I've been working on optimizers for 25 years now, including a
 native code generating Java compiler, and I do know a few things about how to
do
 arrays.


You are one of my few programming heroes :-) But there's always some more to
invent and learn.


 Clang has some pretty good ideas, like the spell checker on undefined
identifiers.


This is another idea I'd like in D:
http://d.puremagic.com/issues/show_bug.cgi?id=5004

Bye,
bearophile
Feb 26 2012
D Programming

C/C++ Programming

Other

digitalmars.D.announce - Re: GoingNative 6: The D Episode with Walter Bright and Andrei Alexandrescu