digitalmars.D - misaligned read handling on various processors
- Andrei Alexandrescu (12/12) Oct 06 2009 Consider:
- Don (6/18) Oct 06 2009 Not on Intel. IIRC the trapping happens on Sparc. Misalignment on x86
- Andrei Alexandrescu (3/23) Oct 06 2009 Thanks! Are there some online docs that discuss that in detail?
- Jb (8/30) Oct 06 2009 http://www.intel.com/products/processor/manuals/
- Sean Kelly (4/22) Oct 06 2009 By default, an unaligned read on SPARC will cause a bus error. The trap...
- Michel Fortin (43/55) Oct 06 2009 Wikipedia:
Consider: struct A { char a; align(1) int b; } Accesses to b will be rather slow because it's a misaligned read. My question is, how exactly is that handled on various processors? I seem to recall various anecdotes (including that misaligned reads on Intel cause a trap that does the needed double reading, shifting, and masking), but Google search has surprisingly little on the matter. Thanks, Andrei
Oct 06 2009
Andrei Alexandrescu wrote:Consider: struct A { char a; align(1) int b; } Accesses to b will be rather slow because it's a misaligned read. My question is, how exactly is that handled on various processors? I seem to recall various anecdotes (including that misaligned reads on Intel cause a trap that does the needed double reading, shifting, and masking), but Google search has surprisingly little on the matter.Not on Intel. IIRC the trapping happens on Sparc. Misalignment on x86 doesn't hurt much at all, except for doubles and reals. For the case you mention there'll probably be no misalignment penalty at all, the latency gets hidden in the early stages of the pipeline. Although there may be a penalty if you cross a cache line boundary.
Oct 06 2009
Don wrote:Andrei Alexandrescu wrote:Thanks! Are there some online docs that discuss that in detail? AndreiConsider: struct A { char a; align(1) int b; } Accesses to b will be rather slow because it's a misaligned read. My question is, how exactly is that handled on various processors? I seem to recall various anecdotes (including that misaligned reads on Intel cause a trap that does the needed double reading, shifting, and masking), but Google search has surprisingly little on the matter.Not on Intel. IIRC the trapping happens on Sparc. Misalignment on x86 doesn't hurt much at all, except for doubles and reals. For the case you mention there'll probably be no misalignment penalty at all, the latency gets hidden in the early stages of the pipeline. Although there may be a penalty if you cross a cache line boundary.
Oct 06 2009
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message news:hafmb2$15lj$1 digitalmars.com...Don wrote:http://www.intel.com/products/processor/manuals/ Check the optimization manual at the bottom. Chapter 3.6.3 http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF chapter 5.2 http://www.agner.org/optimize/optimizing_assembly.pdf chapter : optimizing memory accessAndrei Alexandrescu wrote:Thanks! Are there some online docs that discuss that in detail?Consider: struct A { char a; align(1) int b; } Accesses to b will be rather slow because it's a misaligned read. My question is, how exactly is that handled on various processors? I seem to recall various anecdotes (including that misaligned reads on Intel cause a trap that does the needed double reading, shifting, and masking), but Google search has surprisingly little on the matter.Not on Intel. IIRC the trapping happens on Sparc. Misalignment on x86 doesn't hurt much at all, except for doubles and reals. For the case you mention there'll probably be no misalignment penalty at all, the latency gets hidden in the early stages of the pipeline. Although there may be a penalty if you cross a cache line boundary.
Oct 06 2009
== Quote from Don (nospam nospam.com)'s articleAndrei Alexandrescu wrote:By default, an unaligned read on SPARC will cause a bus error. The trap is enabled via a compiler switch, and as one might expect is ridiculously slow. I believe unaligned ops are nearly as fast as aligned ops on x86, as you say.Consider: struct A { char a; align(1) int b; } Accesses to b will be rather slow because it's a misaligned read. My question is, how exactly is that handled on various processors? I seem to recall various anecdotes (including that misaligned reads on Intel cause a trap that does the needed double reading, shifting, and masking), but Google search has surprisingly little on the matter.Not on Intel. IIRC the trapping happens on Sparc. Misalignment on x86 doesn't hurt much at all, except for doubles and reals. For the case you mention there'll probably be no misalignment penalty at all, the latency gets hidden in the early stages of the pipeline. Although there may be a penalty if you cross a cache line boundary.
Oct 06 2009
On 2009-10-06 09:58:42 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Consider: struct A { char a; align(1) int b; } Accesses to b will be rather slow because it's a misaligned read. My question is, how exactly is that handled on various processors? I seem to recall various anecdotes (including that misaligned reads on Intel cause a trap that does the needed double reading, shifting, and masking), but Google search has surprisingly little on the matter.Wikipedia: <http://en.wikipedia.org/wiki/Data_structure_alignment#Architectures> RISC Most RISC processors will generate an alignment fault when a load or store instruction accesses a misaligned address. This allows the operating system to emulate the misaligned access using other instructions. For example, the alignment fault handler might use byte loads or stores (which are always aligned) to emulate a larger load or store instruction. Some architectures like MIPS have special unaligned load and store instructions. One unaligned load instruction gets the bytes from the memory word with the lowest byte address and another gets the bytes from the memory word with the highest byte address. Similarly, store-high and store-low instructions store the appropriate bytes in the higher and lower memory words respectively. The Alpha architecture has a two-step approach to unaligned loads and stores. The first step is to load the upper and lower memory words into separate registers. The second step is to extract or modify the memory words using special low/high instructions similar to the MIPS instructions. An unaligned store is completed by storing the modified memory words back to memory. The reason for this complexity is that the original Alpha architecture could only read or write 32-bit or 64-bit values. This proved to be a severe limitation that often led to code bloat and poor performance. To address this limitation, an extension called the Byte Word Extensions (BWX) was added to the original architecture. It consisted of instructions for byte and word loads and stores. Because these instructions are larger and slower than the normal memory load and store instructions they should only be used when necessary. Most C and C++ compilers have an “unaligned” attribute that can be applied to pointers that need the unaligned instructions. x86 and x86-64 While the x86 architecture originally did not require aligned memory access and still works without it, SSE2 and x86-64 instructions on x86 CPUs do require the data to be 128-bit (16-byte) aligned and there can be substantial performance advantages from using aligned data on these architectures. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 06 2009