digitalmars.D - Determing cache sizes -- request for testing
- Don (19/19) Sep 10 2008 To implement efficient memory-intensive operations (memcpy, array
- Denis Koroskin (11/30) Sep 10 2008 Vendor string: AuthenticAMD
- Tomas Lindquist Olsen (10/10) Sep 10 2008 Vendor string: AuthenticAMD
- Tomas Lindquist Olsen (10/10) Sep 10 2008 Vendor string: GenuineIntel
- Don (3/3) Sep 10 2008 That's already showed up a heap of bugs. Aargh.
- torhu (12/12) Sep 10 2008 This is a thunderbird 1.4. According to cpu-z, the L1 cache is really
- Manuel =?ISO-8859-1?B?S/ZuaWc=?= (69/92) Sep 10 2008 Should the Pentium4 HyperThreading Technology count as two cores?
- bearophile (17/20) Sep 10 2008 Your code is surely useful, but take a look at "cache oblivious data str...
- bearophile (4/5) Sep 10 2008 I don't know if my CPU has HyperThreading, I presume not.
- Don (12/22) Sep 10 2008 Correct.
- bearophile (11/13) Sep 10 2008 I don't understand. And I have L2 cache.
- Don (12/26) Sep 10 2008 (1) Unfortunately, I don't think it's possible to determine the amount
- Manfred_Nowak (7/9) Sep 10 2008 Because without help of the OS, one will only get the available virtual
- TomD (10/10) Sep 10 2008 Vendor string: GenuineIntel
- dsimcha (264/286) Sep 10 2008 Seems to get things wrong for quad core Intels. No, I'm not rich enough...
- =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= (75/78) Sep 10 2008 -----BEGIN PGP SIGNED MESSAGE-----
- =?iso-8859-1?Q?Julio=20C=e9sar=20Carrascal=20Urquijo?= (12/21) Sep 10 2008 Vendor string: AuthenticAMD
- Craig Black (10/10) Sep 10 2008 Don,
- Don (10/20) Sep 11 2008 I don't think that's possible.
- Bruno Medeiros (14/37) Sep 23 2008 Vendor string: AuthenticAMD
To implement efficient memory-intensive operations (memcpy, array operations, matrix multiplication, etc), you really need to know the sizes of the data caches. Although most modern CPUs provide methods to determine the sizes of their built-in caches, it's a complete pigs breakfast. There are multiple complicated methods, and documentation is scant. I've written some code to make this mess usable, and provide what you really want. For each level of cache, the code provides size in KB, ways of associativity, and the cache line size. The attached code should eventually become part of std.cpuid, and an equivalent module in Tango. But, it needs significant further testing. Please compile and run the code, and report the results. Any results would be useful, but particularly valuable would be: (1) Multicore AMD machines; (2) Early AMD machines (K6 or earlier). (3) Early Intel machines; (4) anything from another manufacturer. (5) any crashes or obvious bugs. Public domain.
Sep 10 2008
On Wed, 10 Sep 2008 13:41:40 +0400, Don <nospam nospam.com.au> wrote:To implement efficient memory-intensive operations (memcpy, array operations, matrix multiplication, etc), you really need to know the sizes of the data caches. Although most modern CPUs provide methods to determine the sizes of their built-in caches, it's a complete pigs breakfast. There are multiple complicated methods, and documentation is scant. I've written some code to make this mess usable, and provide what you really want. For each level of cache, the code provides size in KB, ways of associativity, and the cache line size. The attached code should eventually become part of std.cpuid, and an equivalent module in Tango. But, it needs significant further testing. Please compile and run the code, and report the results. Any results would be useful, but particularly valuable would be: (1) Multicore AMD machines; (2) Early AMD machines (K6 or earlier). (3) Early Intel machines; (4) anything from another manufacturer. (5) any crashes or obvious bugs. Public domain.Vendor string: AuthenticAMD Processor string: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ Signature: Family=15 Model=35 Stepping=2 Features: MMX FXSR SSE SSE2 SSE3 3DNow! 3DNow!+ MMX+ AMD64 HTT Multithreading: 2 threads / 2 cores Family=F Model=3 Stepping=2 Data caches: Level 1 size=8K, ways=2 linesize=32 Level 2 size=512K, ways=16 linesize=0 Level 3 size=4194303K, ways=1 linesize=0
Sep 10 2008
Vendor string: AuthenticAMD Processor string: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ Signature: Family=15 Model=43 Stepping=1 Features: MMX FXSR SSE SSE2 SSE3 3DNow! 3DNow!+ MMX+ AMD64 HTT Multithreading: 2 threads / 2 cores Family=F Model=B Stepping=1 Data caches: Level 1 size=8K, ways=2 linesize=32 Level 2 size=512K, ways=16 linesize=0 Level 3 size=4194303K, ways=1 linesize=0
Sep 10 2008
Vendor string: GenuineIntel Processor string: Intel(R) Celeron(R) CPU 550 2.00GHz Signature: Family=6 Model=22 Stepping=1 Features: MMX FXSR SSE SSE2 SSE3 SSSE3 AMD64 Multithreading: 1 threads / 1 cores Family=6 Model=6 Stepping=1 Data caches: Level 1 size=32K, ways=8 linesize=64 Level 2 size=1024K, ways=4 linesize=64 Level 3 size=4194303K, ways=1 linesize=64
Sep 10 2008
That's already showed up a heap of bugs. Aargh. Here's an improved version. But I still don't understand why it's reporting no L3 cache on the AMD64 machines.
Sep 10 2008
This is a thunderbird 1.4. According to cpu-z, the L1 cache is really 64+64 kB, with 64-byte line size. The rest looks correct. Vendor string: AuthenticAMD Processor string: AMD Athlon(tm) processor Signature: Family=6 Model=4 Stepping=4 Features: MMX FXSR 3DNow! 3DNow!+ MMX+ Multithreading: 1 threads / 1 cores Family=6 Model=4 Stepping=4 Data caches: Level 1 size=8K, ways=2 linesize=32 Level 2 size=256K, ways=16 linesize=64 Level 3 size=4194303K, ways=1 linesize=64
Sep 10 2008
Am Wed, 10 Sep 2008 11:41:40 +0200 schrieb Don <nospam nospam.com.au>:To implement efficient memory-intensive operations (memcpy, array operations, matrix multiplication, etc), you really need to know the sizes of the data caches. Although most modern CPUs provide methods to determine the sizes of their built-in caches, it's a complete pigs breakfast. There are multiple complicated methods, and documentation is scant. I've written some code to make this mess usable, and provide what you really want. For each level of cache, the code provides size in KB, ways of associativity, and the cache line size. The attached code should eventually become part of std.cpuid, and an equivalent module in Tango. But, it needs significant further testing. Please compile and run the code, and report the results. Any results would be useful, but particularly valuable would be: (1) Multicore AMD machines; (2) Early AMD machines (K6 or earlier). (3) Early Intel machines; (4) anything from another manufacturer. (5) any crashes or obvious bugs. Public domain.Should the Pentium4 HyperThreading Technology count as two cores? $ ./cache Vendor string: GenuineIntel Processor string: Intel(R) Pentium(R) 4 CPU 3.00GHz Signature: Family=15 Model=2 Stepping=9 Features: MMX FXSR SSE SSE2 HTT Multithreading: 1 threads / 1 cores Family=F Model=2 Stepping=9 Data caches: Level 1 size=8K, ways=2 linesize=32 Level 2 size=512K, ways=8 linesize=64 Level 3 size=4194303K, ways=1 linesize=64 $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 3.00GHz stepping : 9 cpu MHz : 2992.567 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid xtpr bogomips : 5990.83 clflush size : 64 power management: processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 3.00GHz stepping : 9 cpu MHz : 2992.567 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 apicid : 1 initial apicid : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid xtpr bogomips : 5986.89 clflush size : 64 power management:
Sep 10 2008
Don:To implement efficient memory-intensive operations (memcpy, array operations, matrix multiplication, etc), you really need to know the sizes of the data caches.Your code is surely useful, but take a look at "cache oblivious data structures": http://en.wikipedia.org/wiki/Cache-oblivious_algorithm I presume your code doesn't work at compile time, so I presume you have to run your code once, save the results on disk, and then re-start the compilation to use those values to compute the tuning compile-time constants of the data structures :-) On this Intel the data looks almost correct (I have used the second version of your code), but there isn't level 3 cache, and the RAM size is 2 GB. Vendor string: GenuineIntel Processor string: Intel(R) Pentium(R) Dual CPU E2180 2.00GHz Signature: Family=6 Model=15 Stepping=13 Features: MMX FXSR SSE SSE2 SSE3 SSSE3 AMD64 HTT Multithreading: 2 threads / 2 cores Family=6 Model=F Stepping=D Data caches: Level 1 size=16K, ways=8 linesize=64 Level 2 size=512K, ways=4 linesize=64 Level 3 size=4194303K, ways=1 linesize=64 Bye, bearophile
Sep 10 2008
bearophile:Features: MMX FXSR SSE SSE2 SSE3 SSSE3 AMD64 HTTI don't know if my CPU has HyperThreading, I presume not. Bye, bearophile
Sep 10 2008
bearophile wrote:Don:Yes, it's a good approach, but it doesn't help for something like memcpy.To implement efficient memory-intensive operations (memcpy, array operations, matrix multiplication, etc), you really need to know the sizes of the data caches.Your code is surely useful, but take a look at "cache oblivious data structures": http://en.wikipedia.org/wiki/Cache-oblivious_algorithmI presume your code doesn't work at compile time,Correct.so I presume you have to run your code once, save the results on disk, and then re-start the compilation to use those values to compute the tuning compile-time constants of the data structures :-)No. That wouldn't be much use. The cache size is simply used as a parameter at run-time. It's only the linesize which has a major impact on optimal code -- but it's 32 or 64 bytes on every system which I know of. So it's possible to deal with it at compile time, too. You can, in fact, just plug the L1 cache size into your cache-oblivious algorithm as the cut-off level, significantly improving performance.On this Intel the data looks almost correct (I have used the second version of your code), but there isn't level 3 cache, and the RAM size is 2 GB.The value shown for L3 should be greater than the memory size, if there is no L2 cache. (you never fall out of the L3 cache). So it's correct.
Sep 10 2008
Don:The value shown for L3 should be greater than the memory size, if there is no L2 cache. (you never fall out of the L3 cache). So it's correct.I don't understand. And I have L2 cache. The results I expect from your code running on my PC are: L1: 32 + 32 KB L2: 1024 KB L3: 0 MB RAM: 2 GB Or if you want an output more usable by an algorithm, it can output a dynamic array of longs: Memory levels ==> [65536, 1048576, 2147483648] Bye, bearophile
Sep 10 2008
bearophile wrote:Don:Oops, that should have been "if there is no L3 cache".The value shown for L3 should be greater than the memory size, if there is no L2 cache. (you never fall out of the L3 cache). So it's correct.I don't understand. And I have L2 cache.The results I expect from your code running on my PC are: L1: 32 + 32 KB L2: 1024 KB L3: 0 MB RAM: 2 GB(1) Unfortunately, I don't think it's possible to determine the amount of RAM without help from the OS. So I give the last value uint.max bytes. Perhaps that is too confusing. (2) The cache values are per core.Or if you want an output more usable by an algorithm, it can output a dynamic array of longs: Memory levels ==> [65536, 1048576, 2147483648]Perhaps. I haven't decided on a final interface. Note, though, that the relevant size of the cache depends on what you are doing. For example, if you are operating on 3 arrays, you need to divide the cache size by 3. But if the cache level has an associativity less than 3, you have the risk of cache thrashing. So generally you need to make your own table of cache sizes anyway.
Sep 10 2008
Don wrote:I don't think it's possible to determine the amount of RAM without help from the OS.Because without help of the OS, one will only get the available virtual memory, which is provided by the OS? -manfred -- If life is going to exist in this Universe, then the one thing it cannot afford to have is a sense of proportion. (Douglas Adams)
Sep 10 2008
Vendor string: GenuineIntel Processor string: Intel(R) Core(TM)2 CPU T5500 1.66GHz Signature: Family=6 Model=15 Stepping=2 Features: MMX FXSR SSE SSE2 SSE3 SSSE3 AMD64 HTT Multithreading: 2 threads / 2 cores Family=6 Model=F Stepping=2 Data caches: Level 1 size=16K, ways=8 linesize=64 Level 2 size=1024K, ways=8 linesize=64 Level 3 size=4194303K, ways=1 linesize=64
Sep 10 2008
== Quote from Don (nospam nospam.com.au)'s articleThis is a multi-part message in MIME format. --------------000507070609070404050303 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit To implement efficient memory-intensive operations (memcpy, array operations, matrix multiplication, etc), you really need to know the sizes of the data caches. Although most modern CPUs provide methods to determine the sizes of their built-in caches, it's a complete pigs breakfast. There are multiple complicated methods, and documentation is scant. I've written some code to make this mess usable, and provide what you really want. For each level of cache, the code provides size in KB, ways of associativity, and the cache line size. The attached code should eventually become part of std.cpuid, and an equivalent module in Tango. But, it needs significant further testing. Please compile and run the code, and report the results. Any results would be useful, but particularly valuable would be: (1) Multicore AMD machines; (2) Early AMD machines (K6 or earlier). (3) Early Intel machines; (4) anything from another manufacturer. (5) any crashes or obvious bugs.Seems to get things wrong for quad core Intels. No, I'm not rich enough to own one myself, this is my work computer. Vendor string: GenuineIntel Processor string: Intel(R) Core(TM)2 Quad CPU 2.66GHz Signature: Family=6 Model=15 Stepping=7 Features: MMX FXSR SSE SSE2 SSE3 SSSE3 AMD64 HTT Multithreading: 4 threads / 4 cores Family=6 Model=F Stepping=7 Data caches: Level 1 size=8K, ways=8 linesize=64 Level 2 size=1024K, ways=16 linesize=64 Level 3 size=4194303K, ways=1 linesize=64 According to CPU-Z, the L2 cache is 4096 Kb * 2. I'm pretty sure that the quad-core Intels are basically two dual-cores stuck together, meaning that they have 2 L2 caches, each shared between two of the cores. Also, the L1 cache info is wrong if CPU-Z is right. I actually have 32K of L2 cache per core. For your convenience, I've attached my CPU-Z HTML output. begin 644 cpuz.htm M93PO=&ET;&4^/"]H96%D/CQB;V1Y(&)G8V]L;W(]1D9&1D9&/ T*/&9O;G0 M,"!W:61T:#TB,3`P)2( 8V5L;'-P86-I;F<](C`B(&-E;&QP861D:6YG/2(U M.B\O=W=W+F-P=6ED+F-O;2( =&%R9V5T/6)L86YK/CQI;6< <W)C/2)H='1P M.B\O=W=W+F-P=6ED+F-O;2]P:6-S+VQO9V\N:G!G(B!B;W)D97(](C`B('=I M9'1H/2(Q-S(B(&AE:6=H=#TB-34B M+C0U(')E<&]R="!F:6QE/"]B/CPO<#X- M9#X\=&%B;&4 8F]R9&5R/2(Q(B!W:61T:#TB,3`P)2( 8V5L;'-P86-I;F<] M(C`B(&-E;&QP861D:6YG/2(R(B!B9V-O;&]R/2(C1C!&,$8P(CX-"CQT<B!V M86QI9VX](G1O<"( 8F=C;VQO<CTB(T4P13!&1B(^/'1D('=I9'1H/2(S,"4B M/CQS;6%L;#X\8CY0<F]C97-S;W(H<RD\+V(^/"]S;6%L;#X\+W1D/CQT9"!V M86QI9VX](F-E;G1E<B(^/'-M86QL/CQS;6%L;#X\9F]N="!C;VQO<CTB(S`P M"CPO=&%B;&4^/"]T9#X\+W1R/ T*/'1R/CQT9#X\=&%B;&4 8F]R9&5R/2(Q M(B!W:61T:#TB,3`P)2( 8V5L;'-P86-I;F<](C`B(&-E;&QP861D:6YG/2(R M(B!B9V-O;&]R/2(C1C!&,$8P(CX-"CQT<B!V86QI9VX](G1O<"(^/'1D('=I M9'1H/2(S,"4B/CQS;6%L;#X\<VUA;&P^/&(^3G5M8F5R(&]F('!R;V-E<W-O M<G,\+V(^/"]S;6%L;#X\+W-M86QL/CPO=&0^/'1D/CQS;6%L;#X\<VUA;&P^ M/CPO=&0^/"]T<CX-"CQT<B!V86QI9VX](G1O<"(^/'1D('=I9'1H/2(S,"4B M/CQS;6%L;#X\<VUA;&P^/&(^3G5M8F5R(&]F(&-O<F5S/"]B/CPO<VUA;&P^ M/"]S;6%L;#X\+W1D/CQT9#X\<VUA;&P^/'-M86QL/CQF;VYT(&-O;&]R/2(C M/CPO=&0^/"]T<CX-"CQT<B!V86QI9VX](G1O<"(^/'1D('=I9'1H/2(S,"4B M/CQS;6%L;#X\<VUA;&P^/&(^3G5M8F5R(&]F('1H<F5A9',\+V(^/"]S;6%L M;#X\+W-M86QL/CPO=&0^/'1D/CQS;6%L;#X\<VUA;&P^/&9O;G0 8V]L;W(] M;&P^/"]T9#X\+W1R/ T*/'1R('9A;&EG;CTB=&]P(CX\=&0 =VED=& ](C,P M)2(^/'-M86QL/CQS;6%L;#X\8CY.86UE/"]B/CPO<VUA;&P^/"]S;6%L;#X\ M+W1D/CQT9#X M86QL/CPO=&0^/"]T<CX-"CQT<B!V86QI9VX](G1O<"(^/'1D('=I9'1H/2(S M,"4B/CQS;6%L;#X\<VUA;&P^/&(^0V]D92!.86UE/"]B/CPO<VUA;&P^/"]S M;6%L;#X\+W1D/CQT9#X M,$$P(CY+96YT<V9I96QD/"]F;VYT/CPO<VUA;&P^/"]S;6%L;#X\+W1D/CPO M/'-M86QL/CQB/E-P96-I9FEC871I;VX\+V(^/"]S;6%L;#X\+W-M86QL/CPO M=&0^/'1D/CQS;6%L;#X M=&5L*%(I($-O<F4H5$TI,B!1=6%D($-052` ("` ("` ("` 0"`R+C8V1TAZ M/"]F;VYT/CPO<VUA;&P^/"]S;6%L;#X M/2)T;W`B/CQT9"!W:61T:#TB,S`E(CX\<VUA;&P^/'-M86QL/CQB/E!A8VMA M9V4\+V(^/"]S;6%L;#X\+W-M86QL/CPO=&0^/'1D/CQS;6%L;#X\<VUA;&P^ M<VUA;&P^/"]S;6%L;#X M9"!W:61T:#TB,S`E(CX\<VUA;&P^/'-M86QL/CQB/D9A;6EL>2]-;V1E;"]3 M=&5P<&EN9SPO8CX\+W-M86QL/CPO<VUA;&P^/"]T9#X\=&0^/'-M86QL/CQS M;6%L;#X M;#X\+W-M86QL/CPO=&0^/"]T<CX-"CQT<B!V86QI9VX](G1O<"(^/'1D('=I M9'1H/2(S,"4B/CQS;6%L;#X\<VUA;&P^/&(^17AT96YD960 1F%M:6QY+TUO M9&5L/"]B/CPO<VUA;&P^/"]S;6%L;#X\+W1D/CQT9#X\<VUA;&P^/'-M86QL M86QL/CPO=&0^/"]T<CX-"CQT<B!V86QI9VX](G1O<"(^/'1D('=I9'1H/2(S M,"4B/CQS;6%L;#X\<VUA;&P^/&(^0V]R92!3=&5P<&EN9SPO8CX\+W-M86QL M/CPO<VUA;&P^/"]T9#X\=&0^/'-M86QL/CQS;6%L;#X\9F]N="!C;VQO<CTB M"CQT<B!V86QI9VX](G1O<"(^/'1D('=I9'1H/2(S,"4B/CQS;6%L;#X\<VUA M;&P^/&(^5&5C:&YO;&]G>3PO8CX\+W-M86QL/CPO<VUA;&P^/"]T9#X\=&0^ M/'-M86QL/CQS;6%L;#X M;G0^/"]S;6%L;#X\+W-M86QL/CPO=&0^/"]T<CX-"CQT<B!V86QI9VX](G1O M<"(^/'1D('=I9'1H/2(S,"4B/CQS;6%L;#X\<VUA;&P^/&(^0V]R92!3<&5E M9#PO8CX\+W-M86QL/CPO<VUA;&P^/"]T9#X\=&0^/'-M86QL/CQS;6%L;#X\ M/CPO<VUA;&P^/"]T9#X\+W1R/ T*/'1R('9A;&EG;CTB=&]P(CX\=&0 =VED M=& ](C,P)2(^/'-M86QL/CQS;6%L;#X\8CY-=6QT:7!L:65R(' 0G5S('-P M965D/"]B/CPO<VUA;&P^/"]S;6%L;#X\+W1D/CQT9#X\<VUA;&P^/'-M86QL M/CPO<VUA;&P^/"]S;6%L;#X M/CQT9"!W:61T:#TB,S`E(CX\<VUA;&P^/'-M86QL/CQB/E)A=&5D($)U<R!S M<&5E9#PO8CX\+W-M86QL/CPO<VUA;&P^/"]T9#X\=&0^/'-M86QL/CQS;6%L M;#X M86QL/CPO<VUA;&P^/"]T9#X\+W1R/ T*/'1R('9A;&EG;CTB=&]P(CX\=&0 M=VED=& ](C,P)2(^/'-M86QL/CQS;6%L;#X\8CY3=&]C:R!F<F5Q=65N8WD\ M+V(^/"]S;6%L;#X\+W-M86QL/CPO=&0^/'1D/CQS;6%L;#X\<VUA;&P^/&9O M;6%L;#X M,S`E(CX\<VUA;&P^/'-M86QL/CQB/DEN<W1R=6-T:6]N('-E=',\+V(^/"]S M;6%L;#X\+W-M86QL/CPO=&0^/'1D/CQS;6%L;#X\<VUA;&P^/&9O;G0 8V]L M-%0\+V9O;G0^/"]S;6%L;#X\+W-M86QL/CPO=&0^/"]T<CX-"CQT<B!V86QI M9VX](G1O<"(^/'1D('=I9'1H/2(S,"4B/CQS;6%L;#X M1&%T82!C86-H92`H<&5R('!R;V-E<W-O<BD\+V(^/"]S;6%L;#X\+W-M86QL M/CPO=&0^/'1D/CQS;6%L;#X M92!L:6YE('-I>F4\+V9O;G0^/"]S;6%L;#X\+W-M86QL/CPO=&0^/"]T<CX- M"CQT<B!V86QI9VX](G1O<"(^/'1D('=I9'1H/2(S,"4B/CQS;6%L;#X\<VUA M/CPO<VUA;&P^/"]S;6%L;#X\+W1D/CQT9#X\<VUA;&P^/'-M86QL/CQF;VYT M;6%L;#X M,S`E(CX\<VUA;&P^/'-M86QL/CQB/DPR(&-A8VAE("AP97( <')O8V5S<V]R M*3PO8CX\+W-M86QL/CPO<VUA;&P^/"]T9#X\=&0^/'-M86QL/CQS;6%L;#X\ M<V5T(&%S<V]C:6%T:79E+"`V-"UB>71E(&QI;F4 <VEZ93PO9F]N=#X\+W-M M86QL/CPO<VUA;&P^/"]T9#X M8VEN9STB,"( 8V5L;'!A9&1I;F<](C(B(&)G8V]L;W(](B-&,$8P1C`B/ T* M/'1R('9A;&EG;CTB=&]P(B!B9V-O;&]R/2(C13!%,$9&(CX\=&0 =VED=& ] M(C,P)2(^/'-M86QL/CQB/D-H:7!S970 )B!-96UO<GD\+V(^/"]S;6%L;#X\ M+W1D/CQT9"!V86QI9VX](F-E;G1E<B(^/'-M86QL/CQS;6%L;#X\9F]N="!C M=&0^/"]T<CX-"CPO=&%B;&4^/"]T9#X\+W1R/ T*/'1R/CQT9#X\=&%B;&4 M8F]R9&5R/2(Q(B!W:61T:#TB,3`P)2( 8V5L;'-P86-I;F<](C`B(&-E;&QP M861D:6YG/2(R(B!B9V-O;&]R/2(C1C!&,$8P(CX-"CQT<B!V86QI9VX](G1O M<"(^/'1D('=I9'1H/2(S,"4B/CQS;6%L;#X\<VUA;&P^/&(^3F]R=&AB<FED M9V4\+V(^/"]S;6%L;#X\+W-M86QL/CPO=&0^/'1D/CQS;6%L;#X\<VUA;&P^ M;"!%9&ET:6]N(')E=BX 0S$\+V9O;G0^/"]S;6%L;#X\+W-M86QL/CPO=&0^ M/"]T<CX-"CQT<B!V86QI9VX](G1O<"(^/'1D('=I9'1H/2(S,"4B/CQS;6%L M;#X\<VUA;&P^/&(^4V]U=&AB<FED9V4\+V(^/"]S;6%L;#X\+W-M86QL/CPO M=&0^/'1D/CQS;6%L;#X M241)02!N1F]R8V4 -3<P(%-,22!R978N($$R/"]F;VYT/CPO<VUA;&P^/"]S M;6%L;#X M,S`E(CX\<VUA;&P^/'-M86QL/CQB/D=R87!H:6, 26YT97)F86-E/"]B/CPO M<VUA;&P^/"]S;6%L;#X\+W1D/CQT9#X\<VUA;&P^/'-M86QL/CQF;VYT(&-O M;&P^/"]T9#X\+W1R/ T*/'1R('9A;&EG;CTB=&]P(CX\=&0 =VED=& ](C,P M)2(^/'-M86QL/CQS;6%L;#X\8CY00TDM12!,:6YK(%=I9'1H/"]B/CPO<VUA M;&P^/"]S;6%L;#X\+W1D/CQT9#X\<VUA;&P^/'-M86QL/CQF;VYT(&-O;&]R M<CX-"CQT<B!V86QI9VX](G1O<"(^/'1D('=I9'1H/2(S,"4B/CQS;6%L;#X\ M<VUA;&P^/&(^4$-)+44 36%X($QI;FL 5VED=& \+V(^/"]S;6%L;#X\+W-M M86QL/CPO=&0^/'1D/CQS;6%L;#X M03`B/G Q-CPO9F]N=#X\+W-M86QL/CPO<VUA;&P^/"]T9#X\+W1R/ T*/'1R M('9A;&EG;CTB=&]P(CX\=&0 =VED=& ](C,P)2(^/'-M86QL/CQS;6%L;#X\ M8CY-96UO<GD 5'EP93PO8CX\+W-M86QL/CPO<VUA;&P^/"]T9#X\=&0^/'-M M86QL/CQS;6%L;#X M+W-M86QL/CPO<VUA;&P^/"]T9#X\+W1R/ T*/'1R('9A;&EG;CTB=&]P(CX\ M=&0 =VED=& ](C,P)2(^/'-M86QL/CQS;6%L;#X\8CY-96UO<GD 4VEZ93PO M8CX\+W-M86QL/CPO<VUA;&P^/"]T9#X\=&0^/'-M86QL/CQS;6%L;#X\9F]N M+W-M86QL/CPO=&0^/"]T<CX-"CQT<B!V86QI9VX](G1O<"(^/'1D('=I9'1H M/2(S,"4B/CQS;6%L;#X\<VUA;&P^/&(^365M;W)Y($9R97%U96YC>3PO8CX\ M+W-M86QL/CPO<VUA;&P^/"]T9#X\=&0^/'-M86QL/CQS;6%L;#X\9F]N="!C M/"]S;6%L;#X M:#TB,S`E(CX\<VUA;&P^/'-M86QL/CQB/D-!4R, 3&%T96YC>2`H=$-,*3PO M8CX\+W-M86QL/CPO<VUA;&P^/"]T9#X\=&0^/'-M86QL/CQS;6%L;#X\9F]N M<VUA;&P^/"]T9#X\+W1R/ T*/'1R('9A;&EG;CTB=&]P(CX\=&0 =VED=& ] M(C,P)2(^/'-M86QL/CQS;6%L;#X\8CY205,C('1O($-!4R, *'120T0I/"]B M/CPO<VUA;&P^/"]S;6%L;#X\+W1D/CQT9#X\<VUA;&P^/'-M86QL/CQF;VYT M;&P^/"]T9#X\+W1R/ T*/'1R('9A;&EG;CTB=&]P(CX\=&0 =VED=& ](C,P M)2(^/'-M86QL/CQS;6%L;#X\8CY205,C(%!R96-H87)G92`H=%)0*3PO8CX\ M+W-M86QL/CPO<VUA;&P^/"]T9#X\=&0^/'-M86QL/CQS;6%L;#X\9F]N="!C M/CPO=&0^/"]T<CX-"CQT<B!V86QI9VX](G1O<"(^/'1D('=I9'1H/2(S,"4B M/CQS;6%L;#X\<VUA;&P^/&(^0WEC;&4 5&EM92`H=%)!4RD\+V(^/"]S;6%L M;#X\+W-M86QL/CPO=&0^/'1D/CQS;6%L;#X\<VUA;&P^/&9O;G0 8V]L;W(] M9#X\+W1R/ T*/'1R('9A;&EG;CTB=&]P(CX\=&0 =VED=& ](C,P)2(^/'-M M86QL/CQS;6%L;#X\8CY"86YK($-Y8VQE(%1I;64 *'120RD\+V(^/"]S;6%L M;#X\+W-M86QL/CPO=&0^/'1D/CQS;6%L;#X\<VUA;&P^/&9O;G0 8V]L;W(] M9#X M9&1I;F<](C(B(&)G8V]L;W(](B-&,$8P1C`B/ T*/'1R('9A;&EG;CTB=&]P M(B!B9V-O;&]R/2(C13!%,$9&(CX\=&0 =VED=& ](C,P)2(^/'-M86QL/CQB M/E-Y<W1E;3PO8CX\+W-M86QL/CPO=&0^/'1D('9A;&EG;CTB8V5N=&5R(CX\ M=#X\+W-M86QL/CPO<VUA;&P^/"]T9#X\+W1R/ T*/"]T86)L93X\+W1D/CPO M96QL<W!A8VEN9STB,"( 8V5L;'!A9&1I;F<](C(B(&)G8V]L;W(](B-&,$8P M1C`B/ T*/'1R('9A;&EG;CTB=&]P(CX\=&0 =VED=& ](C,P)2(^/'-M86QL M/CQS;6%L;#X\8CY3>7-T96T 36%N=69A8W1U<F5R/"]B/CPO<VUA;&P^/"]S M;6%L;#X\+W1D/CQT9#X M,$$P(CY$96QL($EN8RX\+V9O;G0^/"]S;6%L;#X\+W-M86QL/CPO=&0^/"]T M<CX-"CQT<B!V86QI9VX](G1O<"(^/'1D('=I9'1H/2(S,"4B/CQS;6%L;#X\ M<VUA;&P^/&(^4WES=&5M($YA;64\+V(^/"]S;6%L;#X\+W-M86QL/CPO=&0^ M/'1D/CQS;6%L;#X M6%!3-S$P/"]F;VYT/CPO<VUA;&P^/"]S;6%L;#X M=F%L:6=N/2)T;W`B/CQT9"!W:61T:#TB,S`E(CX\<VUA;&P^/'-M86QL/CQB M/E-Y<W1E;2!3+TX\+V(^/"]S;6%L;#X\+W-M86QL/CPO=&0^/'1D/CQS;6%L M;#X M/"]S;6%L;#X\+W-M86QL/CPO=&0^/"]T<CX-"CQT<B!V86QI9VX](G1O<"(^ M/'1D('=I9'1H/2(S,"4B/CQS;6%L;#X\<VUA;&P^/&(^36%I;F)O87)D(%9E M;F1O<CPO8CX\+W-M86QL/CPO<VUA;&P^/"]T9#X\=&0^/'-M86QL/CQS;6%L M;#X M;&P^/"]S;6%L;#X M:61T:#TB,S`E(CX\<VUA;&P^/'-M86QL/CQB/DUA:6YB;V%R9"!-;V1E;#PO M8CX\+W-M86QL/CPO<VUA;&P^/"]T9#X\=&0^/'-M86QL/CQS;6%L;#X\9F]N M;#X M(CX\<VUA;&P^/'-M86QL/CQB/D))3U, 5F5N9&]R/"]B/CPO<VUA;&P^/"]S M;6%L;#X\+W1D/CQT9#X M,$$P(CY$96QL($EN8RX\+V9O;G0^/"]S;6%L;#X\+W-M86QL/CPO=&0^/"]T M<CX-"CQT<B!V86QI9VX](G1O<"(^/'1D('=I9'1H/2(S,"4B/CQS;6%L;#X\ M<VUA;&P^/&(^0DE/4R!697)S:6]N/"]B/CPO<VUA;&P^/"]S;6%L;#X\+W1D M/CQT9#X M,3PO9F]N=#X\+W-M86QL/CPO<VUA;&P^/"]T9#X\+W1R/ T*/'1R('9A;&EG M;CTB=&]P(CX\=&0 =VED=& ](C,P)2(^/'-M86QL/CQS;6%L;#X\8CY"24]3 M($1A=&4\+V(^/"]S;6%L;#X\+W-M86QL/CPO=&0^/'1D/CQS;6%L;#X\<VUA M;6%L;#X\+W-M86QL/CPO=&0^/"]T<CX-"CPO=&%B;&4^/"]T9#X\+W1R/ T* M/'1R/CQT9#X\=&%B;&4 8F]R9&5R/2(Q(B!W:61T:#TB,3`P)2( 8V5L;'-P M86-I;F<](C`B(&-E;&QP861D:6YG/2(R(B!B9V-O;&]R/2(C1C!&,$8P(CX- M"CQT<B!V86QI9VX](G1O<"( 8F=C;VQO<CTB(T4P13!&1B(^/'1D('=I9'1H M/2(S,"4B/CQS;6%L;#X\8CY-96UO<GD 4U!$/"]B/CPO<VUA;&P^/"]T9#X\ M=&0 =F%L:6=N/2)C96YT97(B/CQS;6%L;#X\<VUA;&P^/&9O;G0 8V]L;W(] M<CTB,2( =VED=& ](C$P,"4B(&-E;&QS<&%C:6YG/2(P(B!C96QL<&%D9&EN M9"!W:61T:#TB,S`E(CX\<VUA;&P^/'-M86QL/CQB/DUO9'5L92`Q/"]B/CPO M<VUA;&P^/"]S;6%L;#X\+W1D/CQT9#X\<VUA;&P^/'-M86QL/CQF;VYT(&-O M34)Y=&5S+"!1:6UO;F1A/"]F;VYT/CPO<VUA;&P^/"]S;6%L;#X\+W1D/CPO M/'-M86QL/CQB/DUO9'5L92`R/"]B/CPO<VUA;&P^/"]S;6%L;#X\+W1D/CQT M9#X M/CPO<VUA;&P^/"]S;6%L;#X M/CQT9"!W:61T:#TB,S`E(CX\<VUA;&P^/'-M86QL/CQB/DUO9'5L92`S/"]B M/CPO<VUA;&P^/"]S;6%L;#X\+W1D/CQT9#X\<VUA;&P^/'-M86QL/CQF;VYT M,C0 34)Y=&5S+"!1:6UO;F1A/"]F;VYT/CPO<VUA;&P^/"]S;6%L;#X\+W1D M;&P^/'-M86QL/CQB/DUO9'5L92`T/"]B/CPO<VUA;&P^/"]S;6%L;#X\+W1D M/CQT9#X M;VYT/CPO<VUA;&P^/"]S;6%L;#X M/"]T<CX-"CQT<CX\=&0^/'1A8FQE(&)O<F1E<CTB,2( =VED=& ](C$P,"4B M(&-E;&QS<&%C:6YG/2(P(B!C96QL<&%D9&EN9STB,B( 8F=C;VQO<CTB(T8P M9"!W:61T:#TB,S`E(CX\<VUA;&P^/&(^4V]F='=A<F4\+V(^/"]S;6%L;#X\ M+W1D/CQT9"!V86QI9VX](F-E;G1E<B(^/'-M86QL/CQS;6%L;#X\9F]N="!C M=&0^/"]T<CX-"CPO=&%B;&4^/"]T9#X\+W1R/ T*/'1R/CQT9#X\=&%B;&4 M8F]R9&5R/2(Q(B!W:61T:#TB,3`P)2( 8V5L;'-P86-I;F<](C`B(&-E;&QP M861D:6YG/2(R(B!B9V-O;&]R/2(C1C!&,$8P(CX-"CQT<B!V86QI9VX](G1O M<"(^/'1D('=I9'1H/2(S,"4B/CQS;6%L;#X\<VUA;&P^/&(^5VEN9&]W<R!6 M97)S:6]N/"]B/CPO<VUA;&P^/"]S;6%L;#X\+W1D/CQT9#X\<VUA;&P^/'-M M4"!(;VUE($5D:71I;VX (%-E<G9I8V4 4&%C:R`S("A"=6EL9"`R-C`P*2`\ M+V9O;G0^/"]S;6%L;#X\+W-M86QL/CPO=&0^/"]T<CX-"CQT<B!V86QI9VX] M(G1O<"(^/'1D('=I9'1H/2(S,"4B/CQS;6%L;#X\<VUA;&P^/&(^1&ER96-T M6"!697)S:6]N/"]B/CPO<VUA;&P^/"]S;6%L;#X\+W1D/CQT9#X\<VUA;&P^ M;&P^/"]S;6%L;#X 59F]N=#X\+V)O9'D^/"]H=&UL/ T* ` end
Sep 10 2008
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I get:./cacheunknown CPU Family=0 Model=0 Stepping=0 Data caches: Level 1 size=8K, ways=2 linesize=32 Level 2 size=4194303K, ways=1 linesize=32 Level 3 size=4194303K, ways=1 linesize=32gdc --versiongdc (GCC) 4.1.2 20070214 ( gdc 0.24, using dmd 1.030)cat /proc/cpuinfoprocessor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 75 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping : 2 cpu MHz : 1800.000 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dno wext 3dnow rep_good pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy bogomips : 3618.77 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 75 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping : 2 cpu MHz : 1800.000 cache size : 512 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dno wext 3dnow rep_good pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy bogomips : 3618.77 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc - -- +------------------------- Jerome M. BERGER ---------------------+ | mailto:jeberger free.fr | ICQ: 238062172 | | http://jeberger.free.fr/ | Jabber: jeberger jabber.fr | +---------------------------------+------------------------------+ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkjIEG4ACgkQd0kWM4JG3k9yPwCgl4Fd7Yu2rH3tbIB9K/Ir05da mcgAnjbiErqCQ+GmrxJKoeS2TeRkO5QN =QE3i -----END PGP SIGNATURE-----
Sep 10 2008
Hello Don,Please compile and run the code, and report the results. Any results would be useful, but particularly valuable would be: (1) Multicore AMD machines; (2) Early AMD machines (K6 or earlier). (3) Early Intel machines; (4) anything from another manufacturer. (5) any crashes or obvious bugs. Public domain.Vendor string: AuthenticAMD Processor string: AMD Athlon(tm) 64 X2 Dual-Core Processor TK-55 Signature: Family=15 Model=104 Stepping=1 Features: MMX FXSR SSE SSE2 SSE3 3DNow! 3DNow!+ MMX+ AMD64 HTT Multithreading: 2 threads / 2 cores Family=F Model=8 Stepping=1 Data caches: Level 1 size=8K, ways=2 linesize=32 Level 2 size=256K, ways=16 linesize=0 Level 3 size=4194303K, ways=1 linesize=0 The same data is reported by CPU-Z.
Sep 10 2008
Don, Very good work on all this stuff!! I didn't realize that this was possible to do. Once you get the cache sizes it would be beneficial to know what regions of memory are currently loaded into cache and RAM. Do you know if this can be done? If such information was available, it would be possible to write a "memory optimizer" that would work on special data structures that could be moved around on the heap. Objects that are accessed frequently could be moved around to improve locality of reference. -Craig
Sep 10 2008
Craig Black wrote:Don, Very good work on all this stuff!! I didn't realize that this was possible to do. Once you get the cache sizes it would be beneficial to know what regions of memory are currently loaded into cache and RAM. Do you know if this can be done?I don't think that's possible. One thing you can do, though, is use the performance counters to measure how many cache misses you're getting. (There are performance counters cache, etc). Requires a small kernel mode driver, though, so can't be used for client code. But it's what I use for development -- you can learn a lot with it.If such information was available, it would be possible to write a "memory optimizer" that would work on special data structures that could be moved around on the heap. Objects that are accessed frequently could be moved around to improve locality of reference.Nice idea. Still, the most important things can be done at compile time. (especially, making sure that arrays of structs are sensibly arranged).
Sep 11 2008
Don wrote:To implement efficient memory-intensive operations (memcpy, array operations, matrix multiplication, etc), you really need to know the sizes of the data caches. Although most modern CPUs provide methods to determine the sizes of their built-in caches, it's a complete pigs breakfast. There are multiple complicated methods, and documentation is scant. I've written some code to make this mess usable, and provide what you really want. For each level of cache, the code provides size in KB, ways of associativity, and the cache line size. The attached code should eventually become part of std.cpuid, and an equivalent module in Tango. But, it needs significant further testing. Please compile and run the code, and report the results. Any results would be useful, but particularly valuable would be: (1) Multicore AMD machines; (2) Early AMD machines (K6 or earlier). (3) Early Intel machines; (4) anything from another manufacturer. (5) any crashes or obvious bugs. Public domain.Vendor string: AuthenticAMD Processor string: AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ Signature: Family=15 Model=107 Stepping=2 Features: MMX FXSR SSE SSE2 SSE3 3DNow! 3DNow!+ MMX+ AMD64 HTT Multithreading: 2 threads / 2 cores Family=F Model=6B Stepping=2 Data caches: Level 1 size=8K, ways=2 linesize=32 Level 2 size=512K, ways=16 linesize=64 Level 3 size=4194303K, ways=1 linesize=64 -- Bruno Medeiros - Software Developer, MSc. in CS/E graduate http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Sep 23 2008