c++.dos.32-bits - X32 bug???
- Laurentiu Pancescu (37/37) Jan 31 2002 I'm using NASM to assemble an external function, then I link the OBJ fil...
- Walter (7/44) Jan 31 2002 Try a simple hello world program with -mx and verify that works on your
- Laurentiu Pancescu (13/15) Jan 31 2002 It does, and even pretty large programs, both in C and C++. It's no pro...
- Jan Knepper (3/18) Feb 01 2002 Check http://www.dosextender.com/
- Laurentiu Pancescu (6/8) Feb 01 2002 I used the latest version, when I saw those problems... it's downloaded ...
- Walter (8/23) Feb 01 2002 I use assembler files with x all the time. You can view them at
- Laurentiu Pancescu (11/13) Feb 02 2002 again
- Walter (7/18) Feb 02 2002 there,
- Walter (5/18) Feb 02 2002 Your solution is now in the FAQ! Thanks, -Walter
- Laurentiu Pancescu (14/15) Feb 03 2002 Great, thanks! And I'm also glad because my MMX code works now fine wit...
- Walter (9/23) Feb 03 2002 weaker
- Laurentiu Pancescu (22/27) Feb 04 2002 PXOR and
- Walter (12/39) Feb 04 2002 Since you said the critical loop is in the assembler code, it cannot be ...
- Heinz Saathoff (7/11) Feb 05 2002 Right. The code might fit into the processor cache in one case
- Laurentiu Pancescu (18/19) Feb 05 2002 You'd win the bet... almost! It was an alignment problem, indeed, not ...
I'm using NASM to assemble an external function, then I link the OBJ file with the rest of DMC compiled files, but the EXE crashes (GPF, 0DH) at run. Only -mx is affected by this, -mn works fine, also tests made with Borland C++ 5.5.1, Cygwin, MinGW and DJGPP (same assembly code, NASM can generate all suitable formats). Here's an example: ; test.asm ; use "nasm -f obj test.asm -o test.obj segment code public use32 global _get_value _get_value: push ebp mov ebp, esp mov eax, [ebp + 8] add eax, eax leave retn /* main.c */ /* sc -mx main.c test.obj x32.lib */ #include <stdio.h> unsigned get_value(unsigned); int main(void) { printf("Result is %u\n", get_value(9)); return 0; } On all other configurations, the displayed value is 18, as expected. I looked in the DMC generated code, and everything is okay (duh!). Maybe this is a problem with the DOS extender? Did anyone else encounter this problem? If you don't use NASM, I think my asm example should be straightforward to convert to TASM or MASM syntax. My environment is Win2k SP2, DMC 8.26, X32 from May 15th (latest version, you know what I'm talking about). Any feedback would be appreciated - thanks! Is there a debugger for DOSX programs? WUDEBUG only debugs WDOSX programs... :( Laurentiu
Jan 31 2002
Try a simple hello world program with -mx and verify that works on your system. -Walter "Laurentiu Pancescu" <lpancescu fastmail.fm> wrote in message news:a3c0li$2h0u$1 digitaldaemon.com...I'm using NASM to assemble an external function, then I link the OBJ file with the rest of DMC compiled files, but the EXE crashes (GPF, 0DH) atrun.Only -mx is affected by this, -mn works fine, also tests made with Borland C++ 5.5.1, Cygwin, MinGW and DJGPP (same assembly code, NASM can generate all suitable formats). Here's an example: ; test.asm ; use "nasm -f obj test.asm -o test.obj segment code public use32 global _get_value _get_value: push ebp mov ebp, esp mov eax, [ebp + 8] add eax, eax leave retn /* main.c */ /* sc -mx main.c test.obj x32.lib */ #include <stdio.h> unsigned get_value(unsigned); int main(void) { printf("Result is %u\n", get_value(9)); return 0; } On all other configurations, the displayed value is 18, as expected. I looked in the DMC generated code, and everything is okay (duh!). Maybethisis a problem with the DOS extender? Did anyone else encounter thisproblem?If you don't use NASM, I think my asm example should be straightforward to convert to TASM or MASM syntax. My environment is Win2k SP2, DMC 8.26, X32 from May 15th (latest version, you know what I'm talking about). Any feedback would be appreciated - thanks! Is there a debugger for DOSX programs? WUDEBUG only debugs WDOSX programs... :( Laurentiu
Jan 31 2002
"Walter" <walter digitalmars.com> wrote in message news:a3csdi$4ac$1 digitaldaemon.com...Try a simple hello world program with -mx and verify that works on your system. -WalterIt does, and even pretty large programs, both in C and C++. It's no problem when it's only high-level source code. Problems arise when I try to use externally defined functions (I use NASM for portability reasons, and because of its cleaner syntax: I can use the same ASM file for any 32bit compiler, and virtually any operating system! It's very suitable for portable MMX or 3dnow! optimizations). It works with DMC in Win32 mode, so I think the problem is in X32, not in my code. I "heard" about some stack alignment problems when X32 runs under real-mode DOS: maybe they're not the only ones? Do you think I should contact Mr. Doug Hoffman about this issue? Laurentiu
Jan 31 2002
Check http://www.dosextender.com/ I think Doug Huffman put out a new version... Laurentiu Pancescu wrote:"Walter" <walter digitalmars.com> wrote in message news:a3csdi$4ac$1 digitaldaemon.com...Try a simple hello world program with -mx and verify that works on your system. -WalterIt does, and even pretty large programs, both in C and C++. It's no problem when it's only high-level source code. Problems arise when I try to use externally defined functions (I use NASM for portability reasons, and because of its cleaner syntax: I can use the same ASM file for any 32bit compiler, and virtually any operating system! It's very suitable for portable MMX or 3dnow! optimizations). It works with DMC in Win32 mode, so I think the problem is in X32, not in my code. I "heard" about some stack alignment problems when X32 runs under real-mode DOS: maybe they're not the only ones? Do you think I should contact Mr. Doug Hoffman about this issue? Laurentiu
Feb 01 2002
I used the latest version, when I saw those problems... it's downloaded 2 days ago, but with the same result! It may be related to NTVDM bugs, I don't know... I'll try to boot with a DOS disk, and see if it still crashes. Laurentiu "Jan Knepper" <jan smartsoft.cc> wrote in message news:3C5AAA54.88A9D7CA smartsoft.cc...Check http://www.dosextender.com/ I think Doug Huffman put out a new version...
Feb 01 2002
I use assembler files with x all the time. You can view them at \dm\src\core32\*.asm and \dm\src\dos32\*.asm. Can I suggest taking your asm file and assembling it with nasm. Try it again using dmc's inline assembler. Obj2asm the results and compare! "Laurentiu Pancescu" <plaur crosswinds.net> wrote in message news:a3dhlt$f8m$1 digitaldaemon.com..."Walter" <walter digitalmars.com> wrote in message news:a3csdi$4ac$1 digitaldaemon.com...problemTry a simple hello world program with -mx and verify that works on your system. -WalterIt does, and even pretty large programs, both in C and C++. It's nowhen it's only high-level source code. Problems arise when I try to use externally defined functions (I use NASM for portability reasons, and because of its cleaner syntax: I can use the same ASM file for any 32bit compiler, and virtually any operating system! It's very suitable for portable MMX or 3dnow! optimizations). It works with DMC in Win32 mode, so I think the problem is in X32, not inmycode. I "heard" about some stack alignment problems when X32 runs under real-mode DOS: maybe they're not the only ones? Do you think I should contact Mr. Doug Hoffman about this issue? Laurentiu
Feb 01 2002
"Walter" <walter digitalmars.com> wrote in message news:a3fcef$rir$1 digitaldaemon.com...Can I suggest taking your asm file and assembling it with nasm. Try itagainusing dmc's inline assembler. Obj2asm the results and compare!I tried to obj2asm the object generated by NASM: I only saw db lines there, instead of actual assembly code. So, I added 'class=CODE' in the segment declaration, and it's fine now. Probably X32 got GPF when calling code inside of a DATA segment. I don't understand why this is okay with all Windows compilers, including DMC, and also DJGPP (which also uses a DOS extender). Probably it's related to how the linker and the OS loader work?? Thanks, Laurentiu
Feb 02 2002
"Laurentiu Pancescu" <lpancescu fastmail.fm> wrote in message news:a3gdvj$1i17$1 digitaldaemon.com..."Walter" <walter digitalmars.com> wrote in message news:a3fcef$rir$1 digitaldaemon.com...there,Can I suggest taking your asm file and assembling it with nasm. Try itagainusing dmc's inline assembler. Obj2asm the results and compare!I tried to obj2asm the object generated by NASM: I only saw db linesinstead of actual assembly code. So, I added 'class=CODE' in the segment declaration, and it's fine now. Probably X32 got GPF when calling code inside of a DATA segment. I don't understand why this is okay with all Windows compilers, including DMC, and also DJGPP (which also uses a DOS extender). Probably it's related to how the linker and the OS loaderwork?? Glad you found what was going wrong. The reason you got the crash is X32 marks the code segment as execute only, and the data as not executable. Other dos extenders apparently don't do that.
Feb 02 2002
Your solution is now in the FAQ! Thanks, -Walter "Laurentiu Pancescu" <lpancescu fastmail.fm> wrote in message news:a3gdvj$1i17$1 digitaldaemon.com..."Walter" <walter digitalmars.com> wrote in message news:a3fcef$rir$1 digitaldaemon.com...there,Can I suggest taking your asm file and assembling it with nasm. Try itagainusing dmc's inline assembler. Obj2asm the results and compare!I tried to obj2asm the object generated by NASM: I only saw db linesinstead of actual assembly code. So, I added 'class=CODE' in the segment declaration, and it's fine now. Probably X32 got GPF when calling code inside of a DATA segment. I don't understand why this is okay with all Windows compilers, including DMC, and also DJGPP (which also uses a DOS extender). Probably it's related to how the linker and the OS loaderwork??Thanks, Laurentiu
Feb 02 2002
"Walter" <walter digitalmars.com> wrote in message news:a3hkt8$232q$2 digitaldaemon.com...Your solution is now in the FAQ! Thanks, -WalterGreat, thanks! And I'm also glad because my MMX code works now fine with DMC. However, I notice that the performance of my loop is about 20% weaker than in the Borland or gcc cases (no external calls, only MOVQ, PXOR and POR!). I expect this to be the same for any compiler, since they don't touch it. Then, I tried to force an alignment to a paragraph border for my assembly function, but this only made things worse by an additional 10% - I guess OPTLINK knows better about alignments... :) Is it possible that the way different runtime libraries initialize the FPU affects the MMX performance (since both MMX and FPU instructions use the same physical registers)??? There's also a slight difference between Borland and gcc generated EXEs, about 2-3% - I don't see another reason. Laurentiu
Feb 03 2002
"Laurentiu Pancescu" <lpancescu fastmail.fm> wrote in message news:a3jgvm$2u58$2 digitaldaemon.com..."Walter" <walter digitalmars.com> wrote in message news:a3hkt8$232q$2 digitaldaemon.com...weakerYour solution is now in the FAQ! Thanks, -WalterGreat, thanks! And I'm also glad because my MMX code works now fine with DMC. However, I notice that the performance of my loop is about 20%than in the Borland or gcc cases (no external calls, only MOVQ, PXOR and POR!). I expect this to be the same for any compiler, since they don't touch it. Then, I tried to force an alignment to a paragraph border formyassembly function, but this only made things worse by an additional 10% -Iguess OPTLINK knows better about alignments... :)Alignment probably is the issue. Try putting in NOPs one at a time before your loop, and time each time.Is it possible that the way different runtime libraries initialize the FPU affects the MMX performance (since both MMX and FPU instructions use the same physical registers)??? There's also a slight difference between Borland and gcc generated EXEs, about 2-3% - I don't see another reason.I can't imagine how that would affect things. If it does, please let me know!
Feb 03 2002
"Walter" <walter digitalmars.com> wrote in message news:a3knen$dig$1 digitaldaemon.com...PXOR andI notice that the performance of my loop is about 20% weaker than in the Borland or gcc cases (no external calls, only MOVQ,I did some testing, with very interesting results: when I specified -o+space for the compiling of the C source files, the C code performance dropped slightly, but the MMX loop performance is the same as in the EXEs generated by BCC or gcc (even slightly better). I'm really confused about this, since NASM handles my MMX loop in the same way each time, and I called OPTLINK directly, so that it doesn't know about requirements to do space optimization (just in case it cares about SC's -o+space). Even more, I got used to the fact that the corresponding DOSX program, compiled from the same source, runs about 5-10% slower than its Win32 counterpart, but now, with -o+space, it runs faster!!! I also did another test, using a source with a simple C loop, seen on one of BCC's newsgroups some months ago: - -o, -o+speed, -o+all: execution time is 13 seconds - no optimization flags specified: execution time is 4 seconds - -o+space: execution time is 3 seconds I thought -o+all is *always* the best to use, but it proves not to be the case... I can send you the sources for those two tests, if you want - perhaps it could help improving the optimizer? LaurentiuPOR).Alignment probably is the issue. Try putting in NOPs one at a time before your loop, and time each time.
Feb 04 2002
Since you said the critical loop is in the assembler code, it cannot be the optimizer. The optimizer does not affect the assembler. I bet it's alignment. Try the NOP suggestion. -Walter "Laurentiu Pancescu" <lpancescu fastmail.fm> wrote in message news:a3mo9g$1fp3$1 digitaldaemon.com..."Walter" <walter digitalmars.com> wrote in message news:a3knen$dig$1 digitaldaemon.com...beforePXOR andI notice that the performance of my loop is about 20% weaker than in the Borland or gcc cases (no external calls, only MOVQ,POR).Alignment probably is the issue. Try putting in NOPs one at a timespecified -o+spaceyour loop, and time each time.I did some testing, with very interesting results: when Ifor the compiling of the C source files, the C code performance dropped slightly, but the MMX loop performance is the same as in the EXEsgeneratedby BCC or gcc (even slightly better). I'm really confused about this,sinceNASM handles my MMX loop in the same way each time, and I called OPTLINK directly, so that it doesn't know about requirements to do space optimization (just in case it cares about SC's -o+space). Even more, Igotused to the fact that the corresponding DOSX program, compiled from thesamesource, runs about 5-10% slower than its Win32 counterpart, but now, with -o+space, it runs faster!!! I also did another test, using a source with a simple C loop, seen on oneofBCC's newsgroups some months ago: - -o, -o+speed, -o+all: execution time is 13 seconds - no optimization flags specified: execution time is 4 seconds - -o+space: execution time is 3 seconds I thought -o+all is *always* the best to use, but it proves not to be the case... I can send you the sources for those two tests, if you want - perhaps it could help improving the optimizer? Laurentiu
Feb 04 2002
Walter schrieb...Since you said the critical loop is in the assembler code, it cannot be the optimizer. The optimizer does not affect the assembler. I bet it's alignment. Try the NOP suggestion. -WalterRight. The code might fit into the processor cache in one case and not in the other depending on the starting address of the critical code. Due to optimization the assembly part can move to a base address that is not optimal for caching. Just a guess, Heinz
Feb 05 2002
"Walter" <walter digitalmars.com> wrote in message news:a3nvlq$2f0q$2 digitaldaemon.com...I bet it's alignment. Try the NOP suggestion. -WalterYou'd win the bet... almost! It was an alignment problem, indeed, not of the code, of the data that the MMX instructions access. Playing with NOP only improved performance by 2%, not significant when compared to a boost from 2.5 seconds to 1.8 (execution time). One of the operands of my intructions cannot be aligned, but the other one could. I used an automatic vector (char p[48]), declared in main(), and passed the pointer to that. The option "-o+all" determines p to be aligned at a 4-byte boundary, while "-o+space" makes p's alignment to be 8-byte boundary, which is vital for MMX performance. Both BCC and GCC align automatic vectors at 8 or 16 bytes by default, so this is where the performance penalty came from! I did more tests related to alignment in code generated by DMC and other compilers, but I will post a separate message in c++, since we're already pretty far away from the original crash of NASM generated code... :) Many thanks for your help and suggestions! Laurentiu
Feb 05 2002