www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Re: GoingNative 6: The D Episode with Walter Bright and Andrei Alexandrescu

Sorry for the slow reply, I was busy.

Walter:

 Using the JVM forces your program into Java semantics.

Right, a JVM doesn't give all the freedom needed by a system language.
 For example, there are no
 structs in the JVM bytecode. No pointers, either. Nor unsigned types.

The little emscripten compiler compiles LLVM "bytecode" (IR code) to JavaScript, and despite JavaScript has no structs, it's able to compile them efficiently, lowering each one of them to a bunch of single variables. The result seems efficient enough (taking into account the target language is JS): https://github.com/kripken/emscripten This can't be used to return structs from a function, but I think allows to avoid few heap allocations (when the struct is a function argument, or it's used locally. The Oracle JavaVM nowadays performs escape analysis, I presume with similar effects). Despite there are no unsigned integers in the JVM, something is moving: https://blogs.oracle.com/darcy/entry/unsigned_api
 Your new language is fairly boxed in to being a rehash of Java semantics.

The semantics of languages like Jruby and Scala are different from Java semantics. Scala type system is not comparable to Java one. But I understand what you are saying. I think there is more than one meaning for "language semantics".
 That works if your language is expressible as C, because LLVM is a C/C++ back
 end. If your language has different semantics (like how Go does stacks), using
 LLVM can be a large problem.

LLVM is not infinitely flexible, I agree. But LLVM is already quite wide and growing. So even for strange languages LLVM seems able to implement a large percentage of the semantics out of the box. Recently they have implemented a Haskell back-end using LLVM and it was not able to implement everything needed. So the Haskell devs have written a small patch (2000 lines or so), to implement the missing semantics and the LLVM have added it to the main LLVM code. In the discussion Andrei was talking about the amount of code needed to implement the efficient base of a new language. Even if LLVM is sometimes not able to implement 100% of the semantics of your code, I think writing just the few missing parts reduces the work needed a lot.
 I looked into this years ago. Very little of array bounds checking can be
 optimized away.

This paper (that's about Java) shows nice results in terms of percentage of array bound checks eliminated (but as expected the speedup is visible only on code that uses array heavily, like scientific-style code): http://ssw.jku.at/Research/Papers/Wuerthinger07/ So there are any low-hanging fruits here?
 I've been working on optimizers for 25 years now, including a
 native code generating Java compiler, and I do know a few things about how to
do
 arrays.

You are one of my few programming heroes :-) But there's always some more to invent and learn.
 Clang has some pretty good ideas, like the spell checker on undefined
identifiers.

This is another idea I'd like in D: http://d.puremagic.com/issues/show_bug.cgi?id=5004 Bye, bearophile
Feb 26 2012