digitalmars.D - Fascinating new switch mechanism in assembler
- Dan Lewis (49/49) Mar 17 2006 Hi guys,
 - Sean Kelly (6/15) Mar 17 2006 Sounds like you're describing a Jump Table. Here's a PDF I dug up on
 - Dan Lewis (6/10) Mar 18 2006 Oh. I figured it was a little too simple to be a new idea. From the ma...
 - Dan Lewis (18/18) Mar 18 2006 Hi folks. Here's a gimme for anyone out there writing a lexer. Give it...
 - Walter Bright (44/54) Mar 18 2006 Given:
 - Lionello Lunesu (39/63) Mar 20 2006 In fact, it's even better. I got this code (dmd -O -release -inline):
 - Walter Bright (3/7) Mar 21 2006 Not necessarilly. It depends on which version of the CPU.
 - Dan Lewis (22/64) Apr 15 2006 Walter - I *think* your jump gate is jumping *into* the table. If I'm w...
 - Lionello Lunesu (6/8) Apr 15 2006 It's not jumping to "ebx" but to "[ebx*4], which means the value (dword)...
 - Dan (4/12) Apr 21 2006 Lionello, I have no idea what "dd offset FLAT:?test@@YAHH@Z[014h]" means...
 - Lionello Lunesu (6/21) Apr 26 2006 LOL, well, it breaks down to segment:symbol[offset], segment being FLAT
 
Hi guys,
I've been working on a lexical analyzer for my new scripting engine, and I
stumbled upon an interesting algorithm.
Essentially:
goto ((charCode << 2)+subroutinePointerArray);
The advantage of this strategy is the elimination of some 90% of the conditional
testing for each character in a string being interpreted (Yay for code that runs
in 40% the time!).  The disadvantage is that you need a table of pointers; if
your cache requirements get too big you can suffer cache thrashing in the
scanner loop which will slow it down even more (-900%?) so you need to keep the
rest of the scanner relatively compact.
Provided with a table of 256 pointers, we would have a single-jump unconditional
branch to get to the correct handler for each character.  Further refining the
process, I'm skipping characters >0x7F, and masking 0x60-0x7F onto 0x40-0x5F;
which gives you essentially the same handler functions (provided you still
preserve the original character somewhere, it makes no difference)  Then instead
of taking 1kb of cache, it only takes 380 bytes for the table at a cost of
introducing two conditional branches; one of which is often mispredicted.
So I have:
// ...
asm
{
naked;
cld;
mov	ECX,	len;
mov	ESI,	p;
xor	EAX,	EAX;
L1:	lodsb;
jecxz	short LX;
mov	EBX,	EAX;
and	EBX,	0xFFFF_FF80;
jnz	BAIL;
mov	EDX,	EAX;
mov	EBX,	EAX;
shr	EDX,	5;
cmp	EDX,	3;
je	short J1;
J2:	shl	EBX,	2;
add	EBX,	jumpGateP;
jmp	[EBX];
J1:	sub	EBX,	byte 0x20;
jmp	short J2; // ...
I'm hoping for some feedback and progression of the idea; and if it's already
been out there for ages, perhaps a link or two on the subject?
I'm just waddling through D, so right now I'm struggling to figure out how to
get my jumpGate table variables to point to functions.  :p  Once I'm done that,
I'll write the rest of it.
(c) Dan Lewis, 2006 (contact me, I'm typically cooperative)
http://murpsoft.com
 Mar 17 2006
Dan Lewis wrote:Hi guys, I've been working on a lexical analyzer for my new scripting engine, and I stumbled upon an interesting algorithm. Essentially: goto ((charCode << 2)+subroutinePointerArray);...I'm hoping for some feedback and progression of the idea; and if it's already been out there for ages, perhaps a link or two on the subject?Sounds like you're describing a Jump Table. Here's a PDF I dug up on the subject: http://www.inf.ethz.ch/personal/muellren/sysprog/ws05/session2-ifwd42rm2.pdf Sean
 Mar 17 2006
Sounds like you're describing a Jump Table. Here's a PDF I dug up on the subject: http://www.inf.ethz.ch/personal/muellren/sysprog/ws05/session2-ifwd42rm2.pdf SeanOh. I figured it was a little too simple to be a new idea. From the material I've read on the 'net, I'm getting the idea that I should let the 128 bytes sit there to avoid the unpredictable branch; but that my jump table is otherwise alot faster than what most C compilers would generate. : ) I was thinking I might be able to squeeze an extra few cycles out of each character if I knew what I was doing... but I'll save that for version 1.0.
 Mar 18 2006
Hi folks.  Here's a gimme for anyone out there writing a lexer.  Give it three
variables, and you have a tight jump table switch.  I'm about to port this over
to nasm to avoid the GC/Phobos overhead.  :p
asm
{
naked;
even;
cld;
mov	ECX,	strLength;
mov	ESI,	charP;
xor	EAX,	EAX;
L1:	lodsb;
jecxz	short EXIT_SUCCESS;
test	AL,	0x80;
jnz	short EXIT_BAD_CHAR;
lea	EBX,	[AL*4+jumpTableP]
jmp	EBX;
}
 Mar 18 2006
"Dan Lewis" <Dan_member pathlink.com> wrote in message news:dvfnur$a9$1 digitaldaemon.com...Hi guys, I've been working on a lexical analyzer for my new scripting engine, and I stumbled upon an interesting algorithm. Essentially: goto ((charCode << 2)+subroutinePointerArray); The advantage of this strategy is the elimination of some 90% of the conditional testing for each character in a string being interpreted (Yay for code that runs in 40% the time!).Given: int test(int i) { switch (i) { case 1: case 2: case 3: case 4: case 5: case 6: case 7: return i; default: return i + 1; } } The assembler produced is: ?test YAHH Z: push EBX mov EBX,8[ESP] sub EBX,1 cmp EBX,6 ja L1A jmp dword ptr FLAT:_DATA[00h][EBX*4] mov EAX,8[ESP] pop EBX ret L1A: mov EAX,8[ESP] inc EAX pop EBX ret _TEXT ends _DATA segment dd offset FLAT:?test YAHH Z[014h] dd offset FLAT:?test YAHH Z[014h] dd offset FLAT:?test YAHH Z[014h] dd offset FLAT:?test YAHH Z[014h] dd offset FLAT:?test YAHH Z[014h] dd offset FLAT:?test YAHH Z[014h] dd offset FLAT:?test YAHH Z[014h] _DATA ends
 Mar 18 2006
?test YAHH Z: push EBX mov EBX,8[ESP] sub EBX,1 cmp EBX,6 ja L1A jmp dword ptr FLAT:_DATA[00h][EBX*4] mov EAX,8[ESP] pop EBX ret L1A: mov EAX,8[ESP] inc EAX pop EBX ret _TEXT ends _DATA segment dd offset FLAT:?test YAHH Z[014h] dd offset FLAT:?test YAHH Z[014h] dd offset FLAT:?test YAHH Z[014h] dd offset FLAT:?test YAHH Z[014h] dd offset FLAT:?test YAHH Z[014h] dd offset FLAT:?test YAHH Z[014h] dd offset FLAT:?test YAHH Z[014h] _DATA endsIn fact, it's even better. I got this code (dmd -O -release -inline): 00402010 push ebx 00402011 mov ebx,eax 00402013 sub ebx,1 00402016 mov ecx,eax 00402018 cmp ebx,6 0040201B ja 00402026 0040201D jmp dword ptr [ebx*4+411080h] 00402024 pop ebx 00402025 ret 00402026 pop ebx 00402027 lea eax,[ecx+1] 0040202A ret Although the point is clear (the compiler already uses jump tables) the code does not seem optimal. Of course, my assembly knowledge is kind-of rusty (instruction pairing for the pentium 1 is the latest optimization I know of). First of all the jmp is useless (in fact, the jump table is useless). The code already compares the "default:" case, so there's no need to further differentiate between the actual values. I would have written the switch as follows: push ebx mov ebx,eax dec ebx mov ecx,eax cmp ebx,6 jbe L1A lea eax,[ecx+1] L1A: pop ebx ret What's the reason for the "sub ebx, 1", intead of "dec ebx"? Isn't an instruction using an immediate value slower (larger instruction => less instructions in cache) than one without? (At least that's what I remember from the pentium 1). Would the usage of the instructions setbe/seta improve the above code even more? It'll get rid of the conditional jump, but I have no idea what the performance of those set* instructions are. L.
 Mar 20 2006
"Lionello Lunesu" <lio remove.lunesu.com> wrote in message news:dvm0dd$2bhp$1 digitaldaemon.com...What's the reason for the "sub ebx, 1", intead of "dec ebx"? Isn't an instruction using an immediate value slower (larger instruction => less instructions in cache) than one without? (At least that's what I remember from the pentium 1).Not necessarilly. It depends on which version of the CPU.
 Mar 21 2006
Given:
int test(int i)
{
    switch (i)
    {
 case 1:
 case 2:
 case 3:
 case 4:
 case 5:
 case 6:
 case 7:
     return i;
 default:
     return i + 1;
    }
}
The assembler produced is:
?test  YAHH Z:
  push EBX
  mov EBX,8[ESP]
  sub EBX,1
  cmp EBX,6
  ja L1A
  jmp dword ptr FLAT:_DATA[00h][EBX*4]
  mov EAX,8[ESP]
  pop EBX
  ret
L1A:  mov EAX,8[ESP]
  inc EAX
  pop EBX
  ret
_TEXT ends
_DATA segment
 dd offset FLAT:?test  YAHH Z[014h]
 dd offset FLAT:?test  YAHH Z[014h]
 dd offset FLAT:?test  YAHH Z[014h]
 dd offset FLAT:?test  YAHH Z[014h]
 dd offset FLAT:?test  YAHH Z[014h]
 dd offset FLAT:?test  YAHH Z[014h]
 dd offset FLAT:?test  YAHH Z[014h]
_DATA ends
Walter - I *think* your jump gate is jumping *into* the table.  If I'm wrong,
then disregard all this...
~~~
This is bad because you're flushing the code cash twice to get where you're
going, and the jump instruction itself is being stored repeatedly.  Instead you
want to load the 'final' jump address into a register and jump to that address.
Then the table is essentially a void*[].
so:
mov eax, table[x];
jmp [eax];
dd a;
dd b;
is better than
mov eax, table[x];
jmp eax;
table:
jmp a;
jmp b;
This will roughly half the memory size.  That said, I plan to use a D switch and
trust you'll optimize it well enough.  :)  That way my code is more portable and
clean.  Thanks for showing that!
Sincerely, Dan.
 Apr 15 2006
I'm no Walter, but I think he had it right.It's not jumping to "ebx" but to "[ebx*4], which means the value (dword) at (ebx*4) is read first. The *4 also shows that the table contains dwords: the offsets of the code jumped to for each case.jmp dword ptr FLAT:_DATA[00h][EBX*4]L.dd offset FLAT:?test YAHH Z[014h]
 Apr 15 2006
In article <e1qf9o$1me0$1 digitaldaemon.com>, Lionello Lunesu says...I'm no Walter, but I think he had it right.Lionello, I have no idea what "dd offset FLAT:?test YAHH Z[014h]" means. I can read assembly, even a touch of hex, but I skipped gibberish class. Anyways, [EBX*4] makes sense to me, so I know you're right. : )It's not jumping to "ebx" but to "[ebx*4], which means the value (dword) at (ebx*4) is read first. The *4 also shows that the table contains dwords: the offsets of the code jumped to for each case.jmp dword ptr FLAT:_DATA[00h][EBX*4]L.dd offset FLAT:?test YAHH Z[014h]
 Apr 21 2006
Dan wrote:In article <e1qf9o$1me0$1 digitaldaemon.com>, Lionello Lunesu says...LOL, well, it breaks down to segment:symbol[offset], segment being FLAT (from 0 to infinity and beyond), the nearest symbol being ?test YAHH Z (the label at the start) and 014h the offset from that label, ?test YAHH Z+0x14 points to the mov right after the jmp, I think :) L.I'm no Walter, but I think he had it right.Lionello, I have no idea what "dd offset FLAT:?test YAHH Z[014h]" means. I can read assembly, even a touch of hex, but I skipped gibberish class. Anyways, [EBX*4] makes sense to me, so I know you're right. : )It's not jumping to "ebx" but to "[ebx*4], which means the value (dword) at (ebx*4) is read first. The *4 also shows that the table contains dwords: the offsets of the code jumped to for each case.jmp dword ptr FLAT:_DATA[00h][EBX*4]L.dd offset FLAT:?test YAHH Z[014h]
 Apr 26 2006








 
 
 
 Dan Lewis <Dan_member pathlink.com> 