The D Programming Language
by Walter Bright
D is an advanced systems programming language.
It is designed to appeal to C and C++ programmers who
need a more powerful language that has a much lower
complexity and hence is easier to master.
D is multiparadigm, and looks and feels
very much like C and C++. It offers opportunities for
advanced compilers to generate more efficient code than
is possible for C/C++, while supporting facilities that
reduce the probability of program bugs.
Why D?
Refactoring
C++ has been around for 20 years now. C++ has largely succeeded
in adding enormous capability to C while retaining backwards
compatibility with it. But with 20 years experience comes the
opportunity to reflect on how one might engineer a language
that retains C++'s strengths, add modern features, and remove its
weaknesses and more
troublesome aspects.
Difficulty in adding modern new features
The longer a language has been evolving, the harder it gets
to add new features. Each new feature adds an unanticipated layer
on top of old ones, in a way that no legacy code breaks.
Eventually, it takes forever to add an insignificant improvement.
The C++ 'export' is an extreme example of this effect, taking a
reported 3
man years to implement and delivering little apparent benefit.
A more mundane indication of this problem is C++ was standardized
5 years ago and just now conformant compilers are emerging.
While C++ is pioneering generic programming practice, it lags
behind in other modern techniques such as
Contract Programming, modules, automated testing,
and automatic memory management,
It's very difficult to add these while still supporting legacy
code.
Brief Tour
D looks a great deal like C and C++, so much so that the
canonical hello world program is nearly identical:
import std.c.stdio;
int main()
{
printf("Hello world\n");
return 0;
}
Look and feel is very much like C and C++
Many years ago in grammar school, we were shown a film about
a researcher who wore special goggles that turned the world
upside down. He wore them continuously such that
his brain never saw the world right side up. After 2 weeks,
his brain suddenly righted that upside down view. Then, the
researcher took the goggles off. The film darkly warned the viewer
to not try that ourselves!
I am so, so used to C/C++ syntax that I feel like that poor guy when
faced with a new and improved language that also turns the syntax
inside out (or so it looks to me). Frankly, I rarely give such
languages a chance even when the feature set looks intriguing.
D doesn't take that route, its syntax is as comfortable to
C/C++ programmers as an old shoe. Functions, statements, expressions,
operator precedence, integral promotions, it's all there pretty
much unchanged.
The world is right side up,
it's just got brighter colors and sharper focus!
Since D is so similar to C/C++, the rest of the article will focus
on characteristics that distinguish it.
Binary compatibility with C
C is the lingua franca of the programming world. Most new languages
accept the inevitable and reluctantly provide some sort of
(usually barbaric) interface to C tacked on as an afterthought.
Not so with D. D supports every one of C's data types as a native type
(and not even all C compilers support all C99 types).
D can directly call any C function, pass any C type, and accept
any C return value as naturally as doing it in C. Additionally,
D can access any function in the C library, including malloc and
free. There are no shims, compatibility layers, or separately compiled
DLL's needed.
There is no direct D interface to C++, but the two languages
can talk to each other because they both support C interfaces.
Retains and even expands access to low level programming
Real programs sometimes need to get
down to the metal. D offers the usual methods for that with
pointers, casting, and unions. It extends access by having
an inline assembler as part of the language, rather than a
vendor specific add-on that is incompatible with every other
vendor's. If you're not convinced of the need for this,
take a tour through the linux kernel source code.
Clean separation between lexical analysis, syntactic analysis,
and semantic analysis.
At first blush, this feature would seem to be irrelevant to
D programmers, seeming to be just compiler implementation arcana.
But the positive effects of it do filter down to many indirect
benefits to the programmer such as: fewer compiler bugs, faster
development of compilers, ease of building syntax-directed editors,
ease of building source code analysis tools, fewer funky
exceptional rules in learning the language, etc.
For the kinds of troubles this avoids, see the infamous
"template angle bracket hack" in C++.
Add up all the time highly paid professionals have spent reading about
it, writing about it,
explaining it to other programmers, proposing fixes for it,
and multiply by their hourly cost.
Modules
Every source file in D is a module with its own namespace.
Other modules can be imported, which is a symbolic include
rather than a textual one. Modules can be precompiled.
There is no need to craft a 'header' file for a corresponding
source file, the module serves both functions.
No forward declarations needed!
Names at the global level can be forward referenced without
needing a special declaration:
void foo() { bar(); } // forward reference OK
void bar() { }
Oddly, C++ allows such forward referencing for class members but
not outside classes.
No preprocessor
C++ has added many features that obsolete parts of the preprocessor,
but the language is still heavily dependent on it.
Even Boost, representative of modern C++ code and thinking,
is remarkably tied to
the arcana of advanced preprocessor tricks.
D provides
syntactical elements that obsolete the rest of it - such as modules,
nested functions, version statements, debug statements,
and foreach iteration.
Unicode
Let's face it, the future is not with ASCII, EBCDIC, or
code pages. It's unicode. D can handle unicode from front to
back - source code, string literals, and library code
all speak unicode. There isn't any vagueness about what a multibyte
is in D, or what size a wchar_t is. UTF-8, UTF-16, and UTF-32
strings are all properly supported. Other encoding schemes are handled
by putting a filter on input and output.
Interfaces
The object-oriented inheritance model is single inheritance
enhanced with interfaces. Interfaces are essentially classes
with only abstract virtual functions as members. Users of Microsoft's
COM programming model will recognize interfaces as being a
natural fit with COM classes.
Function in, out, inout parameters
Function in parameters are values passed in to a function.
out parameters are references to variables that are to be initialized
and assigned a value by the function. inout parameters are
references to variables whose values can be read and modified
by the function.
void ham()
{ int x = 1, y = 2, z = 3;
eggs(x, y, z);
}
void eggs(in int a, out int b, inout int c)
{
a += 1; // x is not affected
b = 4; // y is set to 4
c *= 3; // z is set to 9
}
Being able to specify all three variations makes for more
self-documenting code, better code generation, and fewer
bugs.
Automatic memory management
Also known as garbage collection, this enables programmers to
focus on algorithms rather than the tedious details of determining
the owner of a block of memory, when that block can be deleted,
and finding/plugging all the memory leaks.
Contrary to common wisdom, automatic memory management can result
in smaller and faster programs than explicit memory management.
It certainly results in higher programmer productivity.
The current D memory manager uses a classic conservative mark/sweep
collector, although more advanced ones like a generational collector
are certainly possible.
RAII
Automatic memory management techniques are good for managing a resource
that is relatively cheap and plentiful, like memory, but are not
good for managing scarce and expensive resources, such as file handles.
For those, RAII (Resource Acquisition Is Initialization) techniques
are supported in an equivalent manner to the way destructors in
C++ are executed when a variable goes out of scope.
RAII can be either attached to all instances of a particular class,
or to a particular instance of any class.
Explicit memory allocation
Automatic memory management isn't a panacea for every memory
management problem. Explicit memory allocation is available
either by manually calling C's malloc/free, or by overloading
new and delete operators for a class.
Stack based allocation via alloca() is available as well.
Arrays
Arrays are enhanced from being little more than an alternative
syntax for a pointer into first class objects.
Key to making this work is that arrays always know their
length, and the contents of the arrays are managed by the
automatic memory manager.
Arrays can be sliced, sorted, and reversed.
Both rectangular and jagged arrays are represented.
Array bounds checking is performed, eliminating
a very common and mundane cause of bugs (infamously called
'buffer overflow' bugs). Bounds checking, being a
runtime check, can be expensive, and so can be optionally
removed for release builds.
Associative Arrays
Also known as dictionaries or maps, associative arrays
are key/value pairs where the key and the value can be
of arbitrary types. For example, a keyword/value setup
would look like:
int[ char[] ] keywords;
char[] abc;
...
keywords["foo"] = 3; // initialize table
...
if (keywords[abc] == 3) // look up keyword
...
Or, a sparse array of integers would be:
int[int] sa;
sa[1] = 3;
sa[1000] = 16;
Symbol table
Symbols are looked up using natural lexical scoping rules.
All module level symbols
are available, even if they only appear lexically after the
reference. There is no separate tag name space, there is no
two phase lookup, there is no ADL (Argument Dependent Lookup),
and there is no separation between point of definition
and point of instantiation.
D has true modules,
each module has its own symbol table.
Nested functions
Lexically nested functions is nothing more than embedding one
function inside another:
int foo()
{
int x = 3;
int bar(int z) { return x + z; }
return bar(2) * bar(3);
}
Nested functions can access local variables in the
lexically enclosing scope. Nested functions can be inline expanded
by a competent compiler. They have a surprising number of uses,
starting with eliminating another common use of the C++ preprocessor
to factor out common code within a function.
Delegates
Function pointers are pointers that contain the address of a function.
Delegates are function pointers combined with a context pointer.
A delegate for a class member function would contain the address
of the member function and the 'this' pointer.
A delegate for a nested function contains the address of the
nested function and a pointer to the stack frame of the lexically
enclosing function.
Delegates are a simpler and more powerful replacement for
the C++ pointer-to-member.
Function literals
Extending the idea of nested functions brings us function literals,
which are just nested functions without a name.
Function literals are a handy way to pass a function to some
generic function, such as an integrator:
double parabola(double x, double y)
{
return integrate(0.0, x,
function double (double x) { return y * x; });
}
double integrate(double x1, double x2, double delegate(double dx) f)
{ double sum = 0;
double dx = 0.0001;
for (double x = x1; x < x2; x += dx)
sum += f(x) * dx;
return sum;
}
Exception handling
D adopts the try-catch-finally paradigm for exception handling.
Having a finally clause means that the occasional create-a-dummy-
class-just-to-get-a-destructor programming pattern is no longer
necessary. The try-catch works much as it does in C++, except that
catches are restricted to catching only class references, not
arbitrary types.
Although not required by the semantics of the language, D adopts
the pattern of using exceptions to signal error conditions rather
than using error return codes.
Contract Programming
DbC (Contract Programming) is a technique pioneered by Bertrand Meyer.
Contracts are assertions that must be true at specified points
in a program. Contracts range from simple asserts to class
invariants, function entry preconditions, function exit postconditions,
and how they are affected by polymorphism.
Typical documentation for code is either wrong, out of date,
misleading, or absent. Contracts in many ways substitute
for documentation, and since they cannot be ignored and
are verified automatically, they have to be kept right and
up to date.
DbC is a significant cornerstone of improving the reliability
of programs.
DbC can be done in C++, but to do it fully winds up looking a lot
like implementing polymorphism in C. Building the constructs for
it into the language makes it easy and natural, and hence much more
likely to be used.
Unit testing
Like DbC, building unit testing capability into the language makes
it easier to use, and hence more likely that it will be used.
Putting the unit tests for a module right in the same source as
the module has great benefits in verifying that tests were actually
written, keeps the tests from getting lost, and helps ensure that
the tests actually get run.
In my experience using unit tests and not using unit tests, the
places where I've used it have wound up being much more reliable,
even in the presence of an external test suite. But of course this
is obvious, but if it's so obvious, why do we rarely see unit tests
in production code? D's presumption is that the problem is the lack of
a consistent,
portable, easy, language supported unit test system.
Minor feature improvements
D fleshes out the major features with a number of minor ones
that serve to just smooth things out:
No need for -> operator
There is no ambiguity between a pointer to an object and
the object itself, so there is no need to use a separate
operator for the former:
struct X { int a; }
...
X x;
X* px;
...
x.a; // access X.a
px.a; // access X.a
Anonymous structs/unions for member layout
It's not necessary to provide dummy names for struct/union
members when laying out a complex struct:
struct Foo
{ int x;
union
{ int a;
struct { int b, c; }
double d;
}
}
Anonymous unions and structs are possible in C++,
but they still require
a name, as in:
struct Foo
{ int x;
union
{ int a;
struct { int b, c; } s;
double d;
} u;
};
Struct member alignment control
Controlling alignment is a common issue with mapping structs
onto existing data structures. D provides an alignment
attribute obviating the need for incompatible compiler extensions.
struct Foo
{
align (4) char c;
int a;
}
Easier declaration syntax
Ever tried to declare in C a pointer to a pointer to an array
of pointers to ints?
int *(**p[]);
This gets even more complex when adding function pointers in
to the mix. D adopts a simple right-to-left declaration syntax:
int*[]** p;
Unsigned right shift >>>
To get an unsigned right shift in C, as opposed to a signed
right shift, the left operand must be an unsigned type. This
is accomplished by casting the left operand to an equivalently
unsigned type.
int x;
...
int i = (unsigned)x << 3;
The problem comes in when dealing with a
typedef'd type for the left operand and in C there's no way to
reliably determine what is the correct unsigned type to
cast it to. (In C++ one could write a traits template library
to do it, but it seems a weighty workaround for something so
simple.)
Having a separate right shift operator eliminates this subtle
source of bugs.
Embedded _ in numeric literals
Ever been faced with a numeric literal like 18446744073709551615?
Quick, how big is it? If you're like me, you put a pencil point
on the literal and carefully count out the digits. But there's
a better way. Taking a page from the usual way of dealing with
this, putting commas at every 3 digits, D allows _ to be
embedded into numeric literals, yielding
18_446_744_073_709_551_615. 18 quintillion. Not a big feature,
but it helps expose subtle transcription errors
like a dropped digit.
Imaginary suffixes
Imaginary floating point literals naturally have an 'i' suffix,
as in:
cdouble c = 6 + 7i; // initialize complex double c
as opposed to the C99:
double complex c = 6 + 7 * I;
or the C++:
complex<double> c = complex<double>(6, 7);
WYSIWYG strings
While embedded escape sequences are a must,
What-You-See-Is-What-You-Get is a nice thing to have for string
literals. D offers both kinds, the traditional escaped "" string
literal and the r"" WSYIWYG literal. The later is particularly
useful when entering regular expressions:
"y\\B\\w" // regular strings
r"y\B\w" // WYSIWYG strings
and for Windows filesystem names:
file("\\dm\\include\\stdio.h") // regular strings
file(r"\dm\include\stdio.h") // WYSIWYG strings
X strings
Hex dumps often come in the form of:
00 0A E3 DC 86 73 7E 7E
Putting them into a form acceptable to C:
0x00, 0x0A, 0xE3, 0xDC, 0x86, 0x73, 0x7E, 0x7E,
or:
"\x00\x0A\xE3\xDC\x86\x73\x7E\x7E"
This can get tedious and error prone when there's a lot of it.
D has the x string, where hex data can be simply wrapped with
double quotes, leaving the whitespace intact:
x"00 0A E3 DC 86 73 7E 7E"
Debug conditionals
Most non-trivial applications can be built with a 'debug'
build and a 'release' build. The debug build often adds
in extra code for printing things, extra checks, etc.
The debug conditional is a simple way to turn these
extra statements and code on and off:
debug (FooLogging) printf("checking foo\n");
means if the 'FooLogging' debug version is being built, compile
in the printf statement.
Versioning
It's a rare piece of application that doesn't have some
ability to generate multiple versions. But since D has
eliminated the preprocessor, where #if is used to generate
multiple versions, it is replace by the version statement.
version (Windows)
{
... windows version ...
}
else version (linux)
{
... linux version ...
}
else
{
static assert(0); // unsupported system
}
Deprecation
Library routines in active use inevitably evolve, and inevitably
some of the routines will become obsolete. But existing code
may still rely on them. This is a constant problem.
D offers a 'deprecated' attribute for
declarations:
deprecated int foo() { ... }
If foo() is referenced in the code, the compiler can (optionally)
diagnose it as an error.
This makes it easy for program maintainers to purge code
of obsolete and deprecated dependencies without requiring tedious
manual inspection.
Deprecated is an ideal tool for library vendors to use to mark
functions that are obsolete and may be removed in future versions.
Switch strings
Switch statements are extended to be able to select a case
based on string contents:
int main(char[][] args)
{
foreach (char[] arg; args)
{ switch (arg)
{ case "-h": printHelpMessage(); break;
case "-x": setXOption(); break;
default: printf("Unrecognized option %.*s\n", arg); break;
}
}
...
}
The string values can also be string constants, but like integer
cases, they cannot be string variables.
Module initializers
Module initializers are code that needs to get executed before
control is passed to main(). They are like static constructors,
except at the module level:
module foo;
static this()
{
... initialization code ...
}
All the module initializers are collected together by the runtime
and executed upon program startup. The order they are run
is controlled by how modules import other modules - the module
initializers of all the imported modules must be completed
before the importing module initializer can be run. The runtime
detects any cycles in this and will abort if the rule cannot
be followed.
Static asserts
Static asserts are like regular asserts, except that they are
evaluated at compile time. If they fail, then the compilation
is stopped with an error:
T x;
...
static assert(x.size == (int*).size);
Default initializers
All local variables are initialized to their default values
if an initializer is not provided:
int x = 3; // x is initialized to 3
int y; // y is initialized to y.init, which is 0
double d; // d is initialized to d.init, which is NAN
Typedefs can have their own unique default initializer:
typedef int T = 4;
T t; // t is initialized to T.init, which is 4
Even class and struct members are initialized to their
defaults. No more bugs with forgetting to add an initializer
to the constructor.
class Foo
{ int x;
int y = 4;
int z;
this() { z = 5; }
}
Foo f = new Foo();
printf("%d, %d, %d\n", f.x, f.y, f.z); // prints 0, 4, 5
Synchronized
Since multiprocessor and multithreaded computing environments
are becoming ubiquitous, improved support for multithreading
in the language is helpful. D offers synchronized methods,
synchronization on a per object basis, and synchronized
critical sections as language primitives:
synchronized
{
... // critical section - only one thread at a time here
}
synchronized (o)
{
... // only one thread using Object o at a time here
}
class C
{
// only one thread can use this.foo() at a time
synchronized int foo();
// only one thread at a time can execute bar()
synchronized static int bar();
}
Nested comments
Ever want to comment out a block of code regardless of
whether it contains comments or not? Comments that can
nest can do it. They are delineated by /+ +/.
Const means constant
Const is not a type modifier in D, it is a storage class.
Hence, the value of a const cannot change.
A const declaration can be put in read-only storage,
and the optimizer can assume its value never changes.
This is unlike C/C++, where since const is a type modifier,
the value of a reference to a const can legally change.
Advanced Features
Operator overloading
Operator overloading in D is based on the idea of enabling the
use of operators with their ordinary meaning on user defined types.
To that end, operator overloads keep the same semantics of
the operator on built-in types. For example, the operator overload
for + retains its commutivity; (a + b) is the same as (b + a).
This means that only one operator overload can be used for both
(a + b) and (b + a), if a and b are different types.
For another example, operator overload opCmp() takes care
of (a<b), (b<a), (a<=b), (b<=a), (a>b), (b>a), (a>=b), (b>=a).
Similarly, opEquals() takes care of (a==b), (b==a), (a!=b), and
(b!=a).
Hence, far less code needs to be written to create a user defined
arithmetic type, as well as the appeal of not having to worry that
>= may be overloaded in a bizarrely different way than <.
Operator overloads for non-commutative operators like / have a special
'reverse' operator overload. This means that the C++ asymmetry
of having member forward overloads and global reverse overloads is
not necessary. Global operator overloads are not necessary, and
eliminating them removes the requirement for ADL. Not having ADL
implies that templates can be in imported modules without having
complicated symbol lookup semantics.
Foreach iteration
Foreach is a generalized way to access each element of a collection.
Many languages implement some form of foreach, falling into two
categories: the first requires some form of linearized access
to the collection, the second relies on two coroutines.
D takes a unique third route. The elements of a collection can
be accessed in a sequence defined by an opApply function in the class;
no need for linearization, creation of specialized iterators, or
the problems of coroutines.
Furthermore, the body of the foreach does not have to be a separate
function. It can be a collection of arbitrary statements just like
the body of a standard for loop or a while loop. It looks like:
foreach (T v; collection)
{
... statements ...
}
where T is the type of the objects in the collection class object
instance collection. The foreach body is iterated once for each
element of the collection, iteratively assigning them to v.
Templates
Generic programming is a huge advance in programming methodology,
and much, if not nearly all of it, was pioneered by C++.
Some simple generic programming has filtered out into other
languages, but taking its cue from C++, D fully embraces
the trail blazed by its older brother.
In particular, D supports class templates, function templates,
partial specialization, explicit specialization, and partial
ordering. Template parameters can be types, values, or other
templates.
But it isn't just a boring clone of C++ templates. D corrects
some of the root problems and extends template technology in
several directions.
Template problems corrected:
Angle brackets are not used to enclose template arguments:
Foo<int> // C++ template instantiation
Foo!(int) // D template instantiation
The angle brackets cause much internal grief in the lexical,
syntactic, and semantic phases of compilation because of the
ambiguity with the less than, greater than, and right shift
operators. Using an unambiguous syntax eliminates a raft of
problems, special case rules, conflicting interpretations of
those rules, incompatible extensions, and plain old bugs.
It is no longer necessary to insert spaces, typename and template
keywords in strategic spots to get the parser to parse it
correctly.
Since D has a module system rather than #include, separate
compilation of templates follows naturally from the symbol
table rules of imports. There is no need for an export
keyword or any of the grief trying to implement it. It comes
for free.
Another aspect of the module system is that there are no
template declarations, there are only template definitions.
Just import the module containing the template definition
needed, and the language takes care of the rest.
D templates do not recognize a difference
between point of instantiation and point of definition when
dealing with forward references. All forward references are
visible.
Closely related templates, rather than being defined
separately with no obvious connection between them,
can be declared as one template with its
own scope:
template Foo(T)
{
struct bar { T t; }
void func(T t} { ... }
}
There is no need to provide a declaration of a member
template within a class, and then provide the definition
of it outside the class, and its associated complex and
tedious syntax. Member templates are always defined in
place.
Template Extensions:
A template can be its own scope, with all declarations within
that scope being 'templated'. This means that anything that
can be declared can be templated, not just classes and functions
but typedefs and enums.
Class templates (in fact, all templates) are overloadable.
The much discussed 'typeof' is supported as a standard
part of the language. Similar facilities for type
detection and manipulation are first class aspects of D,
rather than needing macros or trait templates.
For example, common properties of types
can be accessed
directly:
T t;
t.max; // maximum value
t.min; // minimum value
t.init; // default initializer
Any non-local symbol can be passed as a template parameter.
This includes templates, specific template instances,
module names, typedefs, aliases, functions, etc.
Inline assembler
The ultimate in performance programming is only achievable with
inline assembler, so it's fitting that D rounds out its support
for bare metal programming with a standardized inline assembler.
Inline assembly as well provides access to specialized CPU
instructions like LOCK, makes it easy to do multi-precision
arithmetic that requires access to the carry and overflow flags,
etc.
Standardizing the inline assembler means that one's inline assembler
code will be portable from compiler to compiler (as long as it's
on the same CPU). This is quite unlike the C/C++ world, where every
compiler vendor extends the language with an inline assembler
incompatible with all the others.
The D runtime library proves the worth of a standardized inline
assembler by implementing many routines in inline assembly;
the code is identical between the Windows and Linux versions.
Advantages of D
Simpler and faster to learn
Refactoring C and C++ enables D to offer the equivalent power
while eliminating a great many of the special case rules and
awkward syntax required for legacy code compatibility.
D offers much more available power built in to the core language,
while being a far less complicated language. This makes for short
learning curves, and quicker mastery of the language.
There's no need for multiple warning levels in D,
a "D Puzzle Book", a lint program, a "Bug Of The Month",
or a world's leading expert on preprocessor token concatenation.
Retains investment in existing C code
Rather than trying to compile existing C code, which would require
D to carry on with all the legacy decisions of C, instead D
assumes existing C code will be compiled with a C compiler,
and connects directly to it at the object code level. D supports
all C types (more than C++ does!), common calling conventions
(including printf's),
and runtime library functions,
so C code need never realize it is interfacing to D code.
There's no need to convert C code to D; new functionality can be
written in D and linked in to the existing C object code.
Indeed, some parts of the standard library - e.g. std.zlib,
std.recls - are widely used C and C++ libraries that have been
incorporated with no changes, only D declaration files were
created for the
libraries' C interfaces.
More portable
D locks down many undefined or implementation defined behaviors
of C and C++, thereby making code more predictable and portable.
Some of these are:
- char is unsigned. With C/C++, it is hard to verify that code
using chars is portable without actually trying it on one
compiler where chars are signed and another where they are
unsigned. D doesn't have that problem; chars are unsigned.
- Integral types are fixed sizes. No more need for those
endless typedef'ing schemes one sees in C/C++ code to
get a fixed integral type size.
- Floating point is IEEE 754. This means that NaN's and
infinities
work, rounding is well behaved, and comparisons when one
operand is a NaN are handled properly.
- Order of module initialization is specified.
- Source text is unicode, not some unspecified multibyte
encoding.
- wchar is a 2 byte UTF-16 character, not 2 bytes on one machine
and 4 bytes on another.
- It's far easier to write a compiler for D, meaning that
there will be better conformance of various D implementations
to the D standard.
More robust
With the ever increasing size of programs, and ever increasing cost
of bugs in shipping code, anything that can be done to improve
program robustness before it ships will pay big dividends.
D starts with ensuring that no variables or object members
get left uninitialized - a common source of random bugs
in C/C++. It follows up with the replacement of arbitrary
pointers with arrays and array bounds checking. (No more
buffer overflows overwriting the stack.)
Next, and most important, are Contract Programming and unit
testing. Let's face it, test and verification of code is
often clumsily done at the last minute, or not done at all.
(How many times have you seen the source to a shipping
program with no tests or verification code at all?).
The answer isn't to force programmers to write test and
verification code, it's to make it easy to do it and manage it.
Easy enough to tip the balance and make adding such as much
a matter of course as adding comments. Having the test and verify
code right there in the source along with the algorithm code
also brings to bear peer pressure and management pressure to
add it in. Having test (unit test) and verify (Contract Programming)
code in with the algorithm will become as normal and expected
as adding in explanatory comments.
With the difference that the test and verify code actually runs.
Pointers
While D supports generalized pointer operations, pointers
are not necessary for nearly all routine programming tasks.
They are replaced by function out and inout parameters,
first class arrays, automatic memory management,
and implicit class object dereferencing.
While under the hood these constructs still use pointers,
the notorious brittleness of pointer code isn't there anymore.
Better code generation
D compilers can potentially generate better code than equivalent
C/C++ code because:
- Compiler has semantic access to modules and potentially the
entire program, rather than being restricted to a single
source file and its #include'd headers.
- Compiler can look at forward references. For example, a
function appearing lexically after its use can still be inlined.
Any function is potentially inlineable - it doesn't need to be
declared as inline.
- Higher level constructs that enable better optimization
with simpler compilers. For example, D's foreach is built in
to the language making it simple for the compiler to emit
an optimized loop traversal. C and C++ compilers, on the
other hand, need to do some fairly advanced analysis to
detect loops, determine the loop counters, etc.
- Contracts (from asserts, Contract Programming) must evaluate
to true, so optimizers can mine them for more information
about data. Other language guarantees about the state of
data, such as array bounds
checking, can be used by the optimizer.
Straightforward symbol table
This is one of the indirect advantages of D. Having a simple,
straightforward symbol lookup scheme results in better, more
accurate, and more quickly produced compilers. It enables
new features to be added in easier. Implementing a correct
D compiler is not meant to be a test of programming virtuosity.
Having correct and reliable compilers quickly put in the hands of
programmers
is to the benefit of both the compiler vendors and the programmers.
Open source reference implementation
The source code to the D front end implementation is available
under both the GPL and Artistic License.
Example
This D program reads a text file, and counts the number of
occurrences of each word. It illustrates some features of D:
- It's close look and feel similarity to C.
- Use of imports rather than #include.
- New declaration syntax.
- Default initialization.
- Ability to seamlessly call C functions such as printf.
- Use of an associative array (dictionary) as a simple
symbol table.
- Using foreach to iterate through different kinds of
collections.
- Use of arrays.
- Use of array slicing (args[1 .. args.length]).
import std.c.stdio;
import std.file;
int main (char[][] args)
{
int w_total;
int l_total;
int c_total;
int[char[]] dictionary;
printf(" lines words bytes file\n");
foreach (char[] arg; args[1 .. args.length])
{
char[] input;
int w_cnt, l_cnt, c_cnt;
int inword;
int wstart;
input = cast(char[])std.file.read(arg);
for (int j = 0; j < input.length; j++)
{ char c;
c = input[j];
if (c == '\n')
++l_cnt;
if (c >= '0' && c <= '9')
{
}
else if (c >= 'a' && c <= 'z' ||
c >= 'A' && c <= 'Z')
{
if (!inword)
{
wstart = j;
inword = 1;
++w_cnt;
}
}
else if (inword)
{ char[] word = input[wstart .. j];
dictionary[word]++;
inword = 0;
}
++c_cnt;
}
if (inword)
{ char[] w = input[wstart .. input.length];
dictionary[w]++;
}
printf("%8lu%8lu%8lu %.*s\n", l_cnt, w_cnt, c_cnt, arg);
l_total += l_cnt;
w_total += w_cnt;
c_total += c_cnt;
}
if (args.length > 2)
{
printf("--------------------------------------\n%8lu%8lu%8lu total",
l_total, w_total, c_total);
}
printf("--------------------------------------\n");
foreach (char[] word1; dictionary.keys.sort)
{
printf("%3d %.*s\n", dictionary[word1], word1);
}
return 0;
}
D Community
There is a large and active D community. Discussion groups
are at news.digitalmars.com. Many open source D projects are
linked to from www.digitalmars.com/d/dlinks.html.
References
- The D Programming Language specification: www.digitalmars.com/d/
- D newsgroups: news.digitalmars.com
- Design By Contract: "Object-Oriented Software Construction"
by Bertrand Meyer
- C++ Boost: www.boost.org
Acknowledgements
The following people are just a few of the many who have contributed to
the D
language project
with ideas, code, expertise, inspiration
and moral support:
Bruce Eckel,
Eric Engstrom,
Jan Knepper,
Helmut Leitner,
Lubomir Litchev,
Pavel Minayev,
Paul Nash,
Pat Nelson,
Burton Radons,
Tim Rentsch,
Fabio Riccardi,
Bob Taniguchi,
John Whited,
Matthew Wilson,
Peter Zatloukal
Copyright (c) 2004 by Walter Bright, All Rights Reserved