Last update Sat Oct 7 16:49:27 2023

Arrays: Core or Library Type?

written by Walter Bright

May 28, 2008

As programming languages provide ever more powerful ways to specify user defined types, where do arrays fit in? Are arrays a fundamental type that must be in the core language, or are they a type that can be built up from other core types, and so be a user defined library type?

In languages that have pointers, there are several ways in which an array of type T could be represented, such as (in D programming language notation):

  1. length and pointer to the array data:
    struct Array { size_t length; T* begin; }
  2. pointer pair representing the start and end of the array:
    struct Array { T* begin; T* pastEnd; }
  3. length prefixed array data:
    struct Array { size_t length; ... data ... }
  4. slice out of memory block:
    struct Array { void* datablock; T* begin; size_t length; }

Add in some operator overloading for [ ] and other boilerplate, and there’s a nice, servicable user defined array type.

Or is it?

Consider it from the perspective of a tool that is analyzing the types. That tool could be the compiler, the optimizer, a debugger, the reflection/introspection facilities, the garbage collector, the security auditor, etc. Some of these tools will collect information at compile time (static analysis) while others at runtime (dynamic analysis). The tool will see the pointer(s), and that’s it. It will have no idea that it is pointing to an array, how many elements are in that array, or even if the pointer is pointing off the end of an array and is not pointing at any valid data at all. It hits an impermeable wall in its attempt to discover things about the program both at compile time and at run time.

There are a couple of ways one might break through that barrier. The first is for the tool to recognize the struct Array as something special that the tool has special knowledge of. While this works, it flies in the face of the whole point of a user defined array type. The user cannot change how the Arrays work, they are essentially fixed and become core types anyway.

The second way is to establish conventions for a user defined aggregate type like begin(), end(), and an iterator type. Then the tool looks for those names and uses them to walk the array type. The problem is that the code for begin(), end(), etc., can be arbitrarily complex, and so the tool must now be able to execute code. This requires a massive increase in the implementation complexity of the tool, as compared with the pretty simple case of just having a built in array type. Worse, the tool will have to use this strategy for every aggregate because it cannot tell the difference between an array and any other collection type. Dealing with this will also unacceptably slow down things like type aware garbage collectors that need to quickly walk a runtime data structure.

In addition, consider the problems an introspection facility must face:

Another problem with user defined arrays is that their implementations require pointer arithmetic. This means that the array implementations must necessarily be outside any memory safe subset of the language. The industry and academia both agree that the Trusted Computing Base (i.e. code that is not demonstrably safe and must be considered safe “by faith”) should be minimal in size so as to reduce bug incidence. See for example Daniel Bernstein’s notes about the exceptionally safe qmail program or Nickolai Zeldovich’s work.

Other collection types, such as linked lists, binary trees, etc., do not have this issue because they can be built up out of safe references and arrays. Arrays cannot be, and so they should properly be put into the core language.


This article was inspired and contributed to by Andrei Alexandrescu.

Home | Runtime Library | IDDE Reference | STL | Search | Download | Forums