Choosing a Memory Model

Digital Mars C++ is a comprehensive development system for the Intel 8086 family of processors. This chapter explains how to choose an appropriate memory model, so that you can create everything from small command line utilities to the largest and most complex applications.

Memory models for DOS programs under 640KB.
Memory models for DOS programs over 640KB.
Memory models for Win16 programs.
Memory models for Win32 programs.
How Digital Mars C++ stores program data.
How to mix memory models within a program by using type modifiers.

Overview of Memory Models

Choosing a memory model means making choices among meeting minimum system requirements, maximizing code efficiency, and gaining access to every available memory location. If you don't specify any particular memory model, the Digital Mars compilers use the Win32 model. To compile for DOS or Win16, a memory model must be selected.

To use the Small memory model, you don't need to know anything about compiler switches or configuring the IDDE. And when you use a debugger, small model addresses are easy to interpret. However, if your program requires more than 640KB of memory to store its code or data, you must choose a different memory model.

For Programs Under 640KB

If your program's total size is under 640KB, you should choose one of the memory models in Table 7-1 below. These are the real mode memory models. Since all the processors used in IBM PCs and compatibles can run in real mode, programs compiled with these models can run on all PCs.

The Tiny memory model creates .com programs. The Small model creates .exe programs.

Table 7-1 Choosing a real mode memory model
Code	Data	Use this model
under 64KB	under 64KB	Small (-ms) or Tiny (-mt)
over 64KB	under 64KB	Medium (-mm)
under 64KB	over 64KB	Compact (-mc)
over 64KB	over 64KB	Large (-ml)

For faster and more efficient code use the memory model that gives you the best fit for your program. For example, using the Large memory model when another model would suffice makes your program slower than it has to be because more data is referenced using both a segment and an offset. For information on how data is stored in the various memory models, see the section "How Data Is Stored" later in this chapter. For information on making your program as efficient as possible, see the section "Fine-Tuning with Mixed Model Programming" later in this chapter.

For Programs Over 640KB

If your program is over 640KB, that restricts what machines will be able to run it. Ultimately, for large DOS programs, you must choose between performance and portability to older systems.

Running on 8086/8088 machines and later

When using DOS on a machine with an 8086 or 8088 CPU, more than 640KB of memory is not accessible. If your program has a large amount of data, consider using handle pointers described in Using Handle Pointers.

Note: Digital Mars C++ no longer supports the Virtual Code Management system or the -v memory model.

For users compiling for a 80286 "minimum configuration," the Digital Mars C++ compiler still supports the Rational Systems 16-bit DOS extender (-mr), available separately from Rational Systems.

Running under DOS on 80386 machines and later

If your DOS program will run only on the 80386 and 80486, it can operate in 32-bit protected mode, which lets you access up to 4GB of RAM.

Note: 4GB is a theoretical maximum. Under DOS or XMS, the maximum extended memory is limited by BIOS function calls to just under 64MB. The DOSX 32-bit DOS extender can handle up to 3GB, because it allocates 3/4 of the available extended memory. There is no system that supports 4GB of memory at this time.

To run in 32-bit protected mode, you need a 32-bit DOS extender. The DOSX memory model (-mx) is compatible with the DOSX 386 DOS extender. The Phar Lap memory model (-mp) is compatible with the Phar Lap 32-bit DOS extender, available from Phar Lap.

Note: Digital Mars's OPTLINK linker can link DOSX programs, but does not support linking Phar Lap programs.

For related information, see DOS32 (DOSX) Programming Guidelines.

Memory Models for Windows 3.x Programs

Since Windows is itself a form of DOS extender, the DOSX or Phar Lap memory models cannot be used for Windows programs. You can compile a Windows 3 application with the Small, Medium, Compact, or Large memory models. Digital Mars recommends compiling Windows applications with the Large model because it minimizes the problems associated with mixed model programming. Windows 3.0 and later eliminate any advantage to using the Medium memory model.

Compile all the files in a Windows application with the same (preferably Large) memory model if possible, or explicitly declare a type for each pointer in a function prototype. If you are mixing near and far data references, make sure that all declarations match their corresponding definitions, or hard-to-find bugs can result. For more information see "Fine-Tuning with Mixed Model Programming" below.

Note: Since Digital Mars's Large memory model does not place data into far data segments by default, Large model programs compiled with Digital Mars C++ can be multiple-instance applications.

Memory Models for Windows 95 and NT

To create a program for a 32-bit operating system like Windows NT, you need a memory model that can reference a flat 32-bit address space (where CS, DS, SS, and ES all map onto a single memory area). Digital Mars C++ supports a 32-bit flat address space with the NT (-mn) memory model. For more information, see Win32 Programming Guidelines.

Note: The compiler ignores the keywords __far, __huge, __interrupt, __loadds, and __handle when compiling with the -mn memory model. You can tell the compiler to ignore these keywords for any compilation with the -NF compiler option.

How Data Is Stored

How your program stores data depends on whether it is a 16-or 32-bit program.

16-bit programs

Real mode programs can run on the 8086 and 8088 processors that were in the original IBM PCs and compatibles. In real mode DOS programs, code and data are stored in 64KB segments. DOS limits programs to 640KB bytes of memory, including both code and data.

For the most part, programs written for the 8086 architecture use two types of references: near and far.

Near code and data references

A near reference refers to a function or data object (or a pointer to a function or data object) that is within the current segment. It is 16 bits long and contains an offset into the current data segment if it's a data pointer, or into the current code segment (or the stack segment) if it's a function pointer.

Far code and data references

A far reference refers to a function or data object (or a pointer to a function or data object) that is in a different segment than the current one. It is 32 bits long and contains a 16-bit quantity called the segment, which identifies the memory segment where the code or data is stored, and a 16-bit quantity called the offset, which contains the location of the code or data in that segment. (The 8088 and 8086 have a 20-bit address bus. Therefore, they actually use a 20-bit segment address, which is obtained by shifting the 16-bit segment value four bits to the left. This 20-bit value is combined with the offset to reference an actual memory location.)

Choosing a memory model changes how the compiler stores addresses to functions and data. If the model can handle less than a segment's worth of code or data, it uses near pointers to reference them. If the model can handle more than a segment's worth of code or data, it uses far pointers to reference them.

Accessing code or data with a near reference is much quicker than accessing it with a far reference. When you use a far reference, your program must first find the segment and then find the code or data within that segment. When you use a near reference, your program only needs to find the code or data. For a faster program, choose the memory model that lets you make as many near references as possible.

Memory models and segmentation

Choosing a memory model does not change how the compiler segments your code. You choose the segment in which to store code and data with the __far and __huge keywords, as described in "Fine-Tuning with Mixed Model Programming" later in this chapter. The compiler and linker automatically segment your code. You can fine-tune how the compiler and linker segment your code with the techniques described in Compiling Code.

32-bit programs

In 32-bit protected mode programs (those compiled with the DOSX, Phar Lap, or NT memory model), near pointers are 32 bits long and far pointers are 48 bits long. With these models, your programs can access up to 4GB of RAM, all through near references.

In 32-bit applications, far pointers are used only for special purposes like accessing video memory. Therefore, you should not typically use pointer modifiers in 32-bit programs. Sizes of data types and pointer types The table below lists the sizes of the base data types and pointer types in all Digital Mars C++ memory models.

Table 7-2 Data and pointer types and sizes
Data/PointerType	Size in 16-bit compilations (T, S, M, C, L, and V models)	Size in 32-bit compilations (X, P, F, and N models)
char	signed 8 bits	signed 8 bits
signed char	signed 8 bits	signed 8 bits
unsigned char	unsigned 8 bits	unsigned 8 bits
short	signed 16 bits	signed 16 bits
unsigned short	unsigned 16 bits	unsigned 16 bits
int	signed 16 bits	signed 32 bits
unsigned	unsigned 16 bits	unsigned 32 bits
long	signed 32 bits	signed 32 bits
unsigned long	unsigned 32 bits	unsigned 32 bits
float	32 bits floating	32 bits floating
double	64 bits floating	64 bits floating
long double	64 bits floating	64 bits floating, 80 bits N model
__near pointer	16-bit segment offset	32-bit segment offset
__far pointer	16-bit segment and 16-bit offset	16-bit segment and 32-bit offset
__huge pointer	16-bit segment and 16-bit offset	16-bit segment and 32-bit offset
__ss pointer	16-bit segment offset	32-bit segment offset
__cs pointer	16-bit segment offset	32-bit segment offset
__handle pointer	16-bit segment and 16-bit offset	16-bit segment and 32-bit offset

Fine-Tuning with Mixed Model Programming

Digital Mars C++ lets you mix memory models within a program by using the __near, __far, __cs, __ss, and __huge keywords. These keywords permit you to fine-tune how your program uses memory.

Note: The __near, __far, and __huge keywords are not part of ANSI C language and are used only in operating systems with segmented memory. Code that uses them is not portable. In addition, they are of limited usefulness when creating 32-bit applications.

Creating large data structures with far data in 16-bit programs

In all the 16-bit memory models, the compiler puts all static and global variables into a single data segment (called DGROUP) that can only contain 64KB. With far data, you can put a particular data structure into a data segment of its own. However, that data structure cannot be larger than 64KB.

To declare a data structure to be far, put the __far keyword immediately before the identifier, like this:

int __far array[10000];
struct ABC __far table[600] = { .... }

Access far data with array syntax:

array[301] = 32;
table[258] = an_abc_struct;

The compiler creates a segment name for the data structure from the source file name and the variable name.

By default, the compiler uses far data in the Compact and Large memory models. When you use the __far keyword with a data declaration, the compiler starts a new data segment and puts the rest of the data in the file into that segment.

Portably declaring large arrays in 16-bit compilations

It is frequently necessary to declare arrays larger than 64K in size. For instance:

char array[100000];     // 100K bytes
double values[10][1000] // 10*1000*8= 80K bytes

To portably declare arrays greater than 64K in 16-bit compilations, you can construct an array of pointers to arrays, where each unit is less than 64K is size. Using this technique, the above arrays would be declared as:

char *array[2];
#define array(i) (array[(i) & 1][(i) >> 1])

double *values[10];

Code that declares large arrays using pointers must be compiled in one of the large data models (Compact or Large). Storage for an array of pointers cannot be allocated statically; you need to call calloc() to initialize them to all zeros:

main()
{
  int i;
  for (i = 0; i < 2; i++)
    array[i] = (char *) calloc(100000/2,sizeof(char));

  for (i = 0; i < sizeof(values)/sizeof(values[0]); i++)
    values[i] = (double *) calloc(1000, sizeof(double));
  . . .
}

To access an element of array[], instead of array[i]. use this syntax:

long i;
array(i) = array(i + 10);

Note that the macro can be used both as an lvalue or an rvalue. Similarly, for values:

int i, j;
values[i][j] = values[i][j] + 6.7;

Most of the time you won't need to deallocate the memory used for the arrays, if they are used for the duration of program execution; the operating system will deallocate the storage when the program terminates.

The methods described above are not only portable to ANSI C and to 32 bits, they can also be faster than using _huge.

Declaring class objects as far data

In the Small, Tiny, and Medium memory models, you cannot declare as far class objects that you create with new data. In this example, the first declaration causes an error, but the second will not:

AClassA __far *a1 = new(classA) // ERROR
AClassA __far a2;               // OK

In the other 16-bit memory models, you can declare any class object as far data. In the 32-bit models, you cannot declare class objects as far data.

Using near and far functions

When you compile a program with the Medium or Large memory models, by default the compiler uses far pointers for function addresses. However, if you know that a function is used only by other functions that are in the same code segment, you can declare it __near, so that the compiler will access it with near pointers.

The __near keyword is especially useful with static functions that is, functions that are used only within the file where they're defined. Since the compiler by default puts all a file's functions into the same code segment, you can declare any static function as near. However, you should not declare global functions as __near.

In the example below, walktree() is a recursive static near function. The program saves a significant amount of time by using a near instead of a far address. A near address pushes a 16-bit return address on the stack for each call.

typedef struct NODE
{
  int value;
  struct NODE *left;
  struct NODE *right;
} node;

/* Use a near function for * the recursive part */
static int __near walktree(node *n)
{
  return (n->value + walktree(n->left) + walktree(n->right));
}


/* Calculate sum of all nodes in the tree */
int calcsum(node *n)
{
  return walktree(n);
}

Note: You cannot declare a static function whose address you take as near and then attempt to call it as a far function.

You rarely need to declare functions with the __far keyword. Programs that use the Medium or Large memory models use far pointers by default and programs that use the Small and Compact memory models don't contain multiple code segments. The only exception is a Small, Tiny, or Compact program that runs under Windows and uses a dynamic link library (or DLL). The functions in the DLL are in a separate code segment and must be declared far.

Using huge pointers

A huge pointer is similar to a far pointer. It is 32 bits long and can point to any location in memory. You declare data to be huge by substituting the __huge keyword for __far.

Huge pointers offer three advantages over far pointers:

A huge pointer's segment value can change as the offset value "wraps around," unlike a far pointer. A huge pointer therefore can point to a data object greater than 64KB in size.
When you perform logical comparisons on huge pointers, both the segment and offset are compared. For far pointers, only the offsets are compared.
You can change a huge pointer's segment value with pointer arithmetic or array indexing.

Because of the extra overhead associated with huge pointer arithmetic, you should use huge pointers only for data objects larger than 64KB. Do not use huge pointers in 32-bit code. The keyword __huge is ignored in compilations using the NT (-mn) memory model.

Note: Digital Mars C++ does not support the Huge memory model. That is, a pointer whose type is unspecified cannot be made huge by default.

Using handle pointers

Handle pointers are a Digital Mars C++ extension to the far pointer type that support virtual memory management. You use handle pointers to access expanded memory (EMS or LIMS) in 16-bit programs.

Like far pointers, handle pointers are 32 bits long in 16-bit applications. They let a data structure use as much as 16KB of memory, and let your program use as much as 16MB.

Note: The keyword __handle is ignored in compilations using the NT (-mn) memory model.

See Using Handle Pointers for more information.

Using __ss pointers

You use __ss pointers to point to objects on the stack. In the Tiny, Small, Compact, Medium, and Large memory models, __ss is a 16-bit offset. In the DOSX and Phar Lap memory models, it is a 32-bit offset.

__ss pointers work like near pointers; the difference is that their segment address is set to the stack segment instead of the data segment. Thus __ss pointers are relative to the SS segment register, while near pointers are relative to the DS segment register.

If SS==DS (which is TRUE in the Tiny, Small, and Medium memory models), there is no difference between __ss pointers and near pointers. In the Compact and Large models, or whenever you set SS!= DS with the w qualifier to the -m compiler option (as for DLLs or ROM-based code), __ss can only be used to point to parameters and automatic variables, while near pointers can only point to static and global data.

Storing data in the code segment

Digital Mars C++ lets you store data in the code segment with the keyword __cs. Use __cs as you do __far. For example:

int __cs x = 3;         // x in code segment
char __cs ca[] = "abc"; // ca[] in code segment
char __cs *pc = "abc";  // "abc" in code segment pc in data segment

char __cs * __cs pc2 = "def"; // "def" and pc2 both go in code segment
char __cs * cps[] = {" abc", "def"};
	// "abc" and "def" are in code segment
	// array of pointers cpa is in data segment

void func ()
{ char __cs *p;
  p = "xyz";          // "xyz" in code segment
}

Advantages to storing data in the code segment Some of the significant advantages to storing data in the code segment are:

Data placed in the code segment does not take up room in the default data segment (DGROUP). This is especially important in Windows programs, where DGROUP space is at a premium.
For Windows programs, applications that require far data segments cannot be multiple-instance applications. Storing read-only data in the code segment helps to avoid this problem.
Data in the code segment can be accessed more efficiently than data in far data segments.
Putting read-only data in the code segment, and organizing your program so that such data is only accessed by functions in the same code segment, can result in improved performance on Windows.
In protected mode, the code segment is read-only, so data stored there will not be corrupted.
In overlay systems, data can be swapped in along with code. If you write to the data, however, changes are lost.

In 32-bit memory models, placing data in the code segment is rarely advantageous unless read-only protection is desired.

Potential problems

When using the __cs keyword, keep the following potential problems in mind:

For any program, declare all data stored in the code segment as const (or as static const if possible) to explicitly tell the compiler not to write to it. If you modify data in the code segment, the modified value is not saved if the segment is swapped out.
Since Microsoft object format does not allow COMDEF records in the code segment, int __cs x; is treated as int __cs x = 0;.
Data in a code segment cannot be accessed if the CS register does not contain the value for that code segment. Therefore, in programs with multiple code segments, the code and the associated data must exist in the same segment.
In real mode Windows 3.0 programs, make sure that the code segment is not relocated while a far pointer is pointing to data in that code segment. If the code segment is relocated, its contents will be corrupted.
When storing data in the code segment, you must disable the /FARCALLTRANSLATION option to OPTLINK. This linker optimization assumes that code segments contain only code.
The OBJ2ASM utility, as well as many debuggers, have problems handling arbitrary data stored in the code segment.

Tools

Compiling

Linking

Win32 Programming

DOS and Win16 Programming

C/C++ Extensions

Porting to DMC++

Choosing a Memory Model

Overview of Memory Models

For Programs Under 640KB

For Programs Over 640KB

Running on 8086/8088 machines and later

Running under DOS on 80386 machines and later

Memory Models for Windows 3.x Programs

Memory Models for Windows 95 and NT

How Data Is Stored

16-bit programs

Near code and data references

Far code and data references

Memory models and segmentation

32-bit programs

Fine-Tuning with Mixed Model Programming

Creating large data structures with far data in 16-bit programs

Portably declaring large arrays in 16-bit compilations

Declaring class objects as far data

Using near and far functions

Using huge pointers

Using handle pointers

Using __ss pointers

Storing data in the code segment

Potential problems

Tools

Compiling

Linking

Win32 Programming

DOS and Win16 Programming

C/C++ Extensions

Porting to DMC++

Choosing a Memory Model

Overview of Memory Models

For Programs Under 640KB

For Programs Over 640KB

Running on 8086/8088 machines and later

Running under DOS on 80386 machines and later

Memory Models for Windows 3.x Programs

Memory Models for Windows 95 and NT

How Data Is Stored

16-bit programs

Near code and data references

Far code and data references

Memory models and segmentation

32-bit programs

Fine-Tuning with Mixed Model Programming

Creating large data structures with far data in 16-bit programs

Portably declaring large arrays in 16-bit compilations

Declaring class objects as far data

Using __near and __far functions

Using huge pointers

Using handle pointers

Using __ss pointers

Storing data in the code segment

Potential problems

Using near and far functions