digitalmars.D.learn - float[] =?UTF-8?B?4oaSIFZlcnRleFtdIOKAkyBkZWNyZWFzZXMgcGVyZm9ybWE=?=
- David (24/24) Jul 24 2012 I am writing a game engine, well I was using a float[] array to store my...
- bearophile (5/19) Jul 24 2012 Aligning floats to 1 byte doesn't seem a good idea. Try to remove
- David (2/9) Jul 24 2012 This makes no difference.
- Simon (11/22) Jul 24 2012 Could be that your structs are getting default initialised so you will
- David (2/9) Jul 24 2012 No. The vertices are just created once (with a call to the default ctor)...
- H. S. Teoh (5/17) Jul 24 2012 Hmm. Could this be a GC-related issue?
- David (3/4) Jul 24 2012 Actually this could be. They are stored inside a Vertex* array which is
- David (4/8) Jul 24 2012 import core.memory;
- H. S. Teoh (6/17) Jul 24 2012 This is strange. You said that you profiled the program and the extra
- David (5/7) Jul 24 2012 This is a damn good question. I tried to debug it manually with
- Jonathan M Davis (4/13) Jul 24 2012 dmd comes with a profile built into it. Just compile -profile, and you'l...
- Simen Kjaeraas (4/13) Jul 24 2012 As long as you're using malloc, the GC should leave it alone.
- H. S. Teoh (9/29) Jul 24 2012 [...]
- David (2/7) Jul 24 2012 Remvoing the `align(1)` changes nothing, not 1ms slower or faster,
- Era Scarecrow (16/29) Jul 24 2012 [quote]
- David (8/23) Jul 25 2012 Also not the problem, I returned the whole array at once and it didn't
- David (1/2) Jul 25 2012 It's the same issue with ldc
- Andrea Fontana (2/6) Jul 25 2012
- David (3/9) Jul 25 2012 They didn't change (of course I changed the args which are different for...
- Andrea Fontana (7/19) Jul 25 2012 I had a performance problem with std.xml some month ago. It takes me a
- David (6/6) Jul 25 2012 Ok here we go:
- Dmitry Olshansky (4/10) Jul 25 2012 Would be cool to have before/after graph.
- David (5/18) Jul 25 2012 I don't know how to make comparisons with perf.data but here is the
- Dmitry Olshansky (13/33) Jul 25 2012 It looks like a syscall/opengl issue. You somehow managed to hit a dark
- David (2/12) Jul 25 2012 I don't care about speed much, but 1000% less performance is just too ba...
- Dmitry Olshansky (6/19) Jul 25 2012 Been there once. I any case I'd try to split coordinates into 2 or 3
- David (5/20) Jul 25 2012 Well the intersting question is, why is it slower? I checked it twice,
- bearophile (5/9) Jul 25 2012 It's not easy to answer similar general questions. Why don't you
- David (1/3) Jul 25 2012 My assembly is pretty rusty and actually, I have no idea what to look fo...
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (3/4) Jul 25 2012 Random guess: CPU cache misses?
- David (2/6) Jul 25 2012 You're the 2nd one mentioning this, any ideas how to check this?
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (5/15) Jul 25 2012 I have no experience. Pages like this look promising:
- David (4/19) Jul 25 2012 From what I've seen everything is ok (I used `perf top -e
- David (6/6) Jul 26 2012 Ok, interesting thing.
- Dmitry Olshansky (5/11) Jul 26 2012 Hm. Do you ever do pointer arithmetic on Vertex*? Is the size and
- David (4/6) Jul 26 2012 No, yes. I really have no idea why this happens, I saved the contents of...
- dennis luehring (6/12) Jul 26 2012 can you create a version of you code thats allows switching
- Benjamin Thaut (10/34) Aug 24 2012 Check the dissassembly view of this line:
- David (3/12) Aug 24 2012 That's not the problem. The problem has nothing to do with the
- Sean Kelly (6/8) Aug 27 2012 tessellation, since the *rendering* is also 1000% slower (when all data ...
- David (3/7) Aug 28 2012 The arrays are 100% identical (I dumped a Vertex()-array and a raw
- bearophile (7/9) Aug 28 2012 I hope some people are realizing how much time is being wasted in
- David (6/14) Aug 28 2012 You're right, but I also said, that I don't care anylonger, I found a
- bearophile (16/19) Aug 28 2012 I understand you don't care much anymore for the discussed
- David (14/25) Aug 28 2012 And that's the problem, I tried to track down a few of the bugs I hit.
- Timon Gehr (3/20) Aug 28 2012 Use this to create a minimal test case with minimal user interaction:
- David (1/3) Aug 28 2012 Doesn't help if dmd doesn't crash, or?
- Timon Gehr (3/6) Aug 28 2012 It doesn't help a lot if compilation succeeds, but you stated that you
- Brad Roberts (15/24) Aug 28 2012 It's more generally useful than that. It can reduce for any set of
I am writing a game engine, well I was using a float[] array to store my vertices, this worked well, but I have to send more and more uv coordinates (and other information) which needn't be stored as `float`'s so I moved from a float-Array to a Vertex Array: https://github.com/Dav1dde/BraLa/blob/master/brala/dine/build r/tessellator.d#L30 align(1) struct Vertex { float x; float y; float z; float nx; float ny; float nz; float u_terrain; float v_terrain; float u_biome; float v_biome; } Everything is still a float, so it's easier. Nothing wrong with that or? Well this change decreases my performance by 1000%. My frame rate drops from ~12ms per frame to ~120ms per frame. I tried to find the bottleneck with `perf` but no results (the time is not spent in the game/engine). The commit: https://github.com/Dav1dde/BraLa/commit/02a37a0e46f195f5a46404747d659d26490e6c32 I hope you can see anything wrong. I have no idea!
Jul 24 2012
David:align(1) struct Vertex { float x; float y; float z; float nx; float ny; float nz; float u_terrain; float v_terrain; float u_biome; float v_biome; } Everything is still a float, so it's easier. Nothing wrong with that or? Well this change decreases my performance by 1000%.Aligning floats to 1 byte doesn't seem a good idea. Try to remove the aling(1). Bye, bearophile
Jul 24 2012
Am 24.07.2012 20:57, schrieb bearophile:David:This makes no difference.Everything is still a float, so it's easier. Nothing wrong with that or? Well this change decreases my performance by 1000%.Aligning floats to 1 byte doesn't seem a good idea. Try to remove the aling(1). Bye, bearophile
Jul 24 2012
On 24/07/2012 20:08, David wrote:Am 24.07.2012 20:57, schrieb bearophile:Could be that your structs are getting default initialised so you will be getting a constructor called for every instance of a Vertex. This will be a lot slower than a float array. Try void initialising your Vertex arrays. http://dlang.org/declaration.html See the bit Void Initializations near the bottom. Also make sure that you are passing fixed size arrays by reference. -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.ukDavid:This makes no difference.Everything is still a float, so it's easier. Nothing wrong with that or? Well this change decreases my performance by 1000%.Aligning floats to 1 byte doesn't seem a good idea. Try to remove the aling(1). Bye, bearophile
Jul 24 2012
Could be that your structs are getting default initialised so you will be getting a constructor called for every instance of a Vertex. This will be a lot slower than a float array. Try void initialising your Vertex arrays. http://dlang.org/declaration.html See the bit Void Initializations near the bottom. Also make sure that you are passing fixed size arrays by reference.No. The vertices are just created once (with a call to the default ctor) and immedialty added to the Vertex* but they are never instantiated.
Jul 24 2012
On Tue, Jul 24, 2012 at 09:08:10PM +0200, David wrote:Am 24.07.2012 20:57, schrieb bearophile:Hmm. Could this be a GC-related issue? T -- No! I'm not in denial!David:This makes no difference.Everything is still a float, so it's easier. Nothing wrong with that or? Well this change decreases my performance by 1000%.Aligning floats to 1 byte doesn't seem a good idea. Try to remove the aling(1). Bye, bearophile
Jul 24 2012
Hmm. Could this be a GC-related issue?Actually this could be. They are stored inside a Vertex* array which is allocated which is allocated with `malloc`, maybe the GC scans all of the created vertex structs? Could this be?
Jul 24 2012
Am 24.07.2012 21:46, schrieb David:import core.memory; GC.disable(); directly when entering main didn't help, so I guess it's not the GCHmm. Could this be a GC-related issue?Actually this could be. They are stored inside a Vertex* array which is allocated which is allocated with `malloc`, maybe the GC scans all of the created vertex structs? Could this be?
Jul 24 2012
On Tue, Jul 24, 2012 at 10:53:05PM +0200, David wrote:Am 24.07.2012 21:46, schrieb David:This is strange. You said that you profiled the program and the extra time spent is not in user code? Where is it spent then? T -- Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be algorithms.import core.memory; GC.disable(); directly when entering main didn't help, so I guess it's not the GCHmm. Could this be a GC-related issue?Actually this could be. They are stored inside a Vertex* array which is allocated which is allocated with `malloc`, maybe the GC scans all of the created vertex structs? Could this be?
Jul 24 2012
This is strange. You said that you profiled the program and the extra time spent is not in user code? Where is it spent then?This is a damn good question. I tried to debug it manually with writefln's, it showed that glfwSwapBuffers needed the time (which, I looked it up, is just a wrapper around glXSwapBuffers). `perf` showed me nothing, the time was used in some unresolved calls. I will make new tests with perf tomorrow.
Jul 24 2012
On Wednesday, July 25, 2012 00:12:19 David wrote:dmd comes with a profile built into it. Just compile -profile, and you'll get profile information when you run your program. - Jonathan m DavisThis is strange. You said that you profiled the program and the extra time spent is not in user code? Where is it spent then?This is a damn good question. I tried to debug it manually with writefln's, it showed that glfwSwapBuffers needed the time (which, I looked it up, is just a wrapper around glXSwapBuffers). `perf` showed me nothing, the time was used in some unresolved calls. I will make new tests with perf tomorrow.
Jul 24 2012
On Tue, 24 Jul 2012 22:53:05 +0200, David <d dav1d.de> wrote:Am 24.07.2012 21:46, schrieb David:As long as you're using malloc, the GC should leave it alone. -- Simenimport core.memory; GC.disable(); directly when entering main didn't help, so I guess it's not the GCHmm. Could this be a GC-related issue?Actually this could be. They are stored inside a Vertex* array which is allocated which is allocated with `malloc`, maybe the GC scans all of the created vertex structs? Could this be?
Jul 24 2012
On Tue, Jul 24, 2012 at 08:57:08PM +0200, bearophile wrote:David:[...] I agree. I don't know how the CPU handles misaligned floats, but from what I understand, it will do two loads to fetch the two word-aligned parts of the float, and then assemble it together. This may be what's causing the slowdown. T -- Маленькие детки - маленькие бедки.align(1) struct Vertex { float x; float y; float z; float nx; float ny; float nz; float u_terrain; float v_terrain; float u_biome; float v_biome; } Everything is still a float, so it's easier. Nothing wrong with that or? Well this change decreases my performance by 1000%.Aligning floats to 1 byte doesn't seem a good idea. Try to remove the aling(1).
Jul 24 2012
I agree. I don't know how the CPU handles misaligned floats, but from what I understand, it will do two loads to fetch the two word-aligned parts of the float, and then assemble it together. This may be what's causing the slowdown. TRemvoing the `align(1)` changes nothing, not 1ms slower or faster, unfortunatly.
Jul 24 2012
On Tuesday, 24 July 2012 at 19:42:34 UTC, David wrote:[quote] [code] Vertex[] data; foreach(i; 0..6) { data ~= Vertex(positions[i][0], positions[i][1], positions[i][2], [/code] [/quote] Try using reserve? The new structure size looks like it's about 40 bytes, and aside from resizing I'm not sure why it would have issues. [code] Vertex[] data; data.reserve(6); //following foreach... [/code]I agree. I don't know how the CPU handles misaligned floats, but from what I understand, it will do two loads to fetch the two word-aligned parts of the float, and then assemble it together. This may be what's causing the slowdown. TRemvoing the `align(1)` changes nothing, not 1ms slower or faster, unfortunately.
Jul 24 2012
Am 25.07.2012 01:10, schrieb Era Scarecrow:Also not the problem, I returned the whole array at once and it didn't help. But thanks for your idea. The strange thing is, this tessellation function(s) are just run once and then the data is passed to the GPU. So my comment shouldn't have a direct impact on the speed (e.g. GC issue would explain it, but unfortunatly it isn't the GC). I'll try a different compiler, too.Remvoing the `align(1)` changes nothing, not 1ms slower or faster, unfortunately.[quote] [code] Vertex[] data; foreach(i; 0..6) { data ~= Vertex(positions[i][0], positions[i][1], positions[i][2], [/code] [/quote] Try using reserve? The new structure size looks like it's about 40 bytes, and aside from resizing I'm not sure why it would have issues. [code] Vertex[] data; data.reserve(6); //following foreach... [/code]
Jul 25 2012
I'll try a different compiler, too.It's the same issue with ldc
Jul 25 2012
Have you checked your default compiler/linker args?=20 Il giorno mer, 25/07/2012 alle 15.23 +0200, David ha scritto:I'll try a different compiler, too.=20 It's the same issue with ldc =20
Jul 25 2012
Am 25.07.2012 15:44, schrieb Andrea Fontana:Have you checked your default compiler/linker args? Il giorno mer, 25/07/2012 alle 15.23 +0200, David ha scritto:They didn't change (of course I changed the args which are different for ldc), what do you exactly mean?I'll try a different compiler, too.It's the same issue with ldc
Jul 25 2012
I had a performance problem with std.xml some month ago. It takes me a lot to point out that there was a default linker param (in gdc & dmd under linux) that slow down the whole thing.=20 So maybe it's not a code-related issue, I mean :) =20 Il giorno mer, 25/07/2012 alle 15.53 +0200, David ha scritto:Am 25.07.2012 15:44, schrieb Andrea Fontana:=20Have you checked your default compiler/linker args? Il giorno mer, 25/07/2012 alle 15.23 +0200, David ha scritto:=20 They didn't change (of course I changed the args which are different for=I'll try a different compiler, too.It's the same issue with ldcldc), what do you exactly mean?
Jul 25 2012
Ok here we go: perf.data: http://dav1d.de/perf.data and a fancy image (showing the results of perf): http://dav1d.de/output.png I hope anyone knows where the time is spent. Most time spent: + 53,14% bralad [unknown] [k] 0xc01e5d2b
Jul 25 2012
On 25-Jul-12 17:54, David wrote:Ok here we go: perf.data: http://dav1d.de/perf.data and a fancy image (showing the results of perf): http://dav1d.de/output.png I hope anyone knows where the time is spent. Most time spent: + 53,14% bralad [unknown] [k] 0xc01e5d2bWould be cool to have before/after graph. -- Dmitry Olshansky
Jul 25 2012
Am 25.07.2012 16:23, schrieb Dmitry Olshansky:On 25-Jul-12 17:54, David wrote:I don't know how to make comparisons with perf.data but here is the captured data of the "working" version: http://dav1d.de/output_before.png perf.data: http://dav1d.de/perf_before.dataOk here we go: perf.data: http://dav1d.de/perf.data and a fancy image (showing the results of perf): http://dav1d.de/output.png I hope anyone knows where the time is spent. Most time spent: + 53,14% bralad [unknown] [k] 0xc01e5d2bWould be cool to have before/after graph.
Jul 25 2012
On 25-Jul-12 19:32, David wrote:Am 25.07.2012 16:23, schrieb Dmitry Olshansky:It looks like a syscall/opengl issue. You somehow managed to hit a dark corner of GL driver. It's either a fallback to software (partial) or some extra translation layer. I once had a cool table that showed which GL calls are direct to hardware and which are not for various nvidia cards. Now the trick is to get an idea why. The best idea to debug driver related stuff is to test on some other computer (like different version of OS, video card etc.). Can't quite decipher output but I find it strange that it mentions _d_invariant. You'd better compiler with -release if you care for speed. -- Dmitry OlshanskyOn 25-Jul-12 17:54, David wrote:I don't know how to make comparisons with perf.data but here is the captured data of the "working" version: http://dav1d.de/output_before.png perf.data: http://dav1d.de/perf_before.dataOk here we go: perf.data: http://dav1d.de/perf.data and a fancy image (showing the results of perf): http://dav1d.de/output.png I hope anyone knows where the time is spent. Most time spent: + 53,14% bralad [unknown] [k] 0xc01e5d2bWould be cool to have before/after graph.
Jul 25 2012
It looks like a syscall/opengl issue. You somehow managed to hit a dark corner of GL driver. It's either a fallback to software (partial) or some extra translation layer. I once had a cool table that showed which GL calls are direct to hardware and which are not for various nvidia cards. Now the trick is to get an idea why. The best idea to debug driver related stuff is to test on some other computer (like different version of OS, video card etc.).Worst case scenario ... driver issue.Can't quite decipher output but I find it strange that it mentions _d_invariant. You'd better compiler with -release if you care for speed.I don't care about speed much, but 1000% less performance is just too bad.
Jul 25 2012
On 26-Jul-12 00:52, David wrote:Been there once. I any case I'd try to split coordinates into 2 or 3 interleaved arrays. (like vertex+norm and separately 2 UV). It's usually slower but not 10x ;)It looks like a syscall/opengl issue. You somehow managed to hit a dark corner of GL driver. It's either a fallback to software (partial) or some extra translation layer. I once had a cool table that showed which GL calls are direct to hardware and which are not for various nvidia cards. Now the trick is to get an idea why. The best idea to debug driver related stuff is to test on some other computer (like different version of OS, video card etc.).Worst case scenario ... driver issue.-- Dmitry OlshanskyCan't quite decipher output but I find it strange that it mentions _d_invariant. You'd better compiler with -release if you care for speed.I don't care about speed much, but 1000% less performance is just too bad.
Jul 25 2012
Am 25.07.2012 23:03, schrieb Dmitry Olshansky:On 26-Jul-12 00:52, David wrote:Well the intersting question is, why is it slower? I checked it twice, the data passed to the GPU is 100% the same, no difference, the only difference is the stored format on the CPU (and that's just a matter of casting).Been there once. I any case I'd try to split coordinates into 2 or 3 interleaved arrays. (like vertex+norm and separately 2 UV). It's usually slower but not 10x ;)It looks like a syscall/opengl issue. You somehow managed to hit a dark corner of GL driver. It's either a fallback to software (partial) or some extra translation layer. I once had a cool table that showed which GL calls are direct to hardware and which are not for various nvidia cards. Now the trick is to get an idea why. The best idea to debug driver related stuff is to test on some other computer (like different version of OS, video card etc.).Worst case scenario ... driver issue.
Jul 25 2012
David:Well the intersting question is, why is it slower? I checked it twice, the data passed to the GPU is 100% the same, no difference, the only difference is the stored format on the CPU (and that's just a matter of casting).It's not easy to answer similar general questions. Why don't you list the assembly of the two versions and compare? Bye, bearophile
Jul 25 2012
It's not easy to answer similar general questions. Why don't you list the assembly of the two versions and compare?My assembly is pretty rusty and actually, I have no idea what to look for.
Jul 25 2012
On 07/24/2012 11:38 AM, David wrote:Well this change decreases my performance by 1000%.Random guess: CPU cache misses? Ali
Jul 25 2012
Am 26.07.2012 00:12, schrieb Ali Çehreli:On 07/24/2012 11:38 AM, David wrote: > Well this change decreases my performance by 1000%. Random guess: CPU cache misses? AliYou're the 2nd one mentioning this, any ideas how to check this?
Jul 25 2012
On 07/25/2012 03:26 PM, David wrote:Am 26.07.2012 00:12, schrieb Ali Çehreli:I have no experience. Pages like this look promising: http://stackoverflow.com/questions/2486840/linux-c-how-to-profile-time-wasted-due-to-cache-misses AliOn 07/24/2012 11:38 AM, David wrote:You're the 2nd one mentioning this, any ideas how to check this?Well this change decreases my performance by 1000%.Random guess: CPU cache misses? Ali
Jul 25 2012
Am 26.07.2012 00:37, schrieb Ali Çehreli:On 07/25/2012 03:26 PM, David wrote:From what I've seen everything is ok (I used `perf top -e L1-dcache-load-misses -e L1-dcache-loads` to see the hotspots, nothing too bad)Am 26.07.2012 00:12, schrieb Ali Çehreli:I have no experience. Pages like this look promising: http://stackoverflow.com/questions/2486840/linux-c-how-to-profile-time-wasted-due-to-cache-misses AliOn 07/24/2012 11:38 AM, David wrote:You're the 2nd one mentioning this, any ideas how to check this?Well this change decreases my performance by 1000%.Random guess: CPU cache misses? Ali
Jul 25 2012
Ok, interesting thing. I switched my buffer from Vertex* to void* and I cast every Vertex I get to void[] and add it to the buffer (slice → memcopy) and everything works fine now. I can live with that (once the basic functions are implemented it's not even a pain to use), but still, I wonder where the problem is.
Jul 26 2012
On 26-Jul-12 14:14, David wrote:Ok, interesting thing. I switched my buffer from Vertex* to void* and I cast every Vertex I get to void[] and add it to the buffer (slice → memcopy) and everything works fine now. I can live with that (once the basic functions are implemented it's not even a pain to use), but still, I wonder where the problem is.Hm. Do you ever do pointer arithmetic on Vertex*? Is the size and offsets are correct (like in Vertex vs float)? -- Dmitry Olshansky
Jul 26 2012
Hm. Do you ever do pointer arithmetic on Vertex*? Is the size and offsets are correct (like in Vertex vs float)?No, yes. I really have no idea why this happens, I saved the contents of my buffers and compared them with the buffers of the `float[]` version (thanks to `git checkout`) and they were exactly 100% the same. It's a mystery.
Jul 26 2012
Am 26.07.2012 21:18, schrieb David:can you create a version of you code thats allows switching (version(Vertex) else ...) between array and Vertex? or provide both versions here again you checked dmd and ldc output so it can't be a backend thing (maybe frontend or GC) - or mysterious GL bugsHm. Do you ever do pointer arithmetic on Vertex*? Is the size and offsets are correct (like in Vertex vs float)?No, yes. I really have no idea why this happens, I saved the contents of my buffers and compared them with the buffers of the `float[]` version (thanks to `git checkout`) and they were exactly 100% the same. It's a mystery.
Jul 26 2012
Am 24.07.2012 20:38, schrieb David:I am writing a game engine, well I was using a float[] array to store my vertices, this worked well, but I have to send more and more uv coordinates (and other information) which needn't be stored as `float`'s so I moved from a float-Array to a Vertex Array: https://github.com/Dav1dde/BraLa/blob/master/brala/dine/builder/tessellator.d#L30 align(1) struct Vertex { float x; float y; float z; float nx; float ny; float nz; float u_terrain; float v_terrain; float u_biome; float v_biome; } Everything is still a float, so it's easier. Nothing wrong with that or? Well this change decreases my performance by 1000%. My frame rate drops from ~12ms per frame to ~120ms per frame. I tried to find the bottleneck with `perf` but no results (the time is not spent in the game/engine). The commit: https://github.com/Dav1dde/BraLa/commit/02a37a0e46f195f5a46404747d659d26490e6c32 I hope you can see anything wrong. I have no idea!Check the dissassembly view of this line: buffer[elements++] = Vertex(x, y, z, nx, ny, nz, u, v, u_biome, v_biome); If you are using an old version of dmd it will allocate an block of memory which has the size of Vertex, then it will fill the date into that block of memory, and then memcpy it to your buffer array. You could try working around this by doing: buffer[elements++].__ctor(x, y, z, nx, ny, nz, u, v, u_biome, v_biome); Kind Regards Benjamin Thaut
Aug 24 2012
Check the dissassembly view of this line: buffer[elements++] = Vertex(x, y, z, nx, ny, nz, u, v, u_biome, v_biome); If you are using an old version of dmd it will allocate an block of memory which has the size of Vertex, then it will fill the date into that block of memory, and then memcpy it to your buffer array. You could try working around this by doing: buffer[elements++].__ctor(x, y, z, nx, ny, nz, u, v, u_biome, v_biome); Kind Regards Benjamin ThautThat's not the problem. The problem has nothing to do with the tessellation, since the *rendering* is also 1000% slower (when all data is already processed).
Aug 24 2012
On Aug 24, 2012, at 1:16 PM, David <d dav1d.de> wrote:=20 That's not the problem. The problem has nothing to do with the =tessellation, since the *rendering* is also 1000% slower (when all data = is already processed). Is the alignment different between one and the other? I would't think so = since it's dynamic memory, but the performance difference suggests that = it might be.=
Aug 27 2012
Am 28.08.2012 01:53, schrieb Sean Kelly:On Aug 24, 2012, at 1:16 PM, David <d dav1d.de> wrote:The arrays are 100% identical (I dumped a Vertex()-array and a raw float-array, they were 100% identical).That's not the problem. The problem has nothing to do with the tessellation, since the *rendering* is also 1000% slower (when all data is already processed).Is the alignment different between one and the other? I would't think so since it's dynamic memory, but the performance difference suggests that it might be.
Aug 28 2012
David:The arrays are 100% identical (I dumped a Vertex()-array and a raw float-array, they were 100% identical).I hope some people are realizing how much time is being wasted in this thread. Taking a look at the asm is my suggestion still. If someone is rusty in asm, it's time to brush away the rust with a steel brush. Bye, bearophile
Aug 28 2012
Am 28.08.2012 17:41, schrieb bearophile:David:You're right, but I also said, that I don't care anylonger, I found a workaround, I can live with it. I generally tend to ignore dmd bugs and just workaround them, I don't have the time to track down every stuipid bug from a ~8k codebase. Thanks anyways for your help.The arrays are 100% identical (I dumped a Vertex()-array and a raw float-array, they were 100% identical).I hope some people are realizing how much time is being wasted in this thread. Taking a look at the asm is my suggestion still. If someone is rusty in asm, it's time to brush away the rust with a steel brush. Bye, bearophile
Aug 28 2012
David:I generally tend to ignore dmd bugs and just workaround them, I don't have the time to track down every stuipid bug from a ~8k codebase.I understand you don't care much anymore for the discussed problem, and I know that localizing D/DMD bugs requires some time and work. But I'd like you to not ignore all the bugs you find, and instead minimize some of them and submit them to Bugzilla. Despite thousands of open bugs and about a hundred of open patches, many bugs do get fixed at every release. If you submit bugs, D/DMD will improve, in your future you will find less bugs to work around in your D code, and you will help other present and future D programmers avoid hitting them. This is important because D is young and its community is small. The idea is: they give you a compiler/language for free, and you give something back to the community submitting some bugs :-) Bye and thank you, bearophile
Aug 28 2012
But I'd like you to not ignore all the bugs you find, and instead minimize some of them and submit them to Bugzilla. Despite thousands of open bugs and about a hundred of open patches, many bugs do get fixed at every release. If you submit bugs, D/DMD will improve, in your future you will find less bugs to work around in your D code, and you will help other present and future D programmers avoid hitting them. This is important because D is young and its community is small. The idea is: they give you a compiler/language for free, and you give something back to the community submitting some bugs :-)I totally agreeI understand you don't care much anymore for the discussed problem, and I know that localizing D/DMD bugs requires some time and work.And that's the problem, I tried to track down a few of the bugs I hit. 50% vanished when I changed unrelated code (cool hugh? getting a segfault in std.net.curl → std.regex → std.functional.memoize, when chaning your ResourceManager, which has really nothing to do with either curl, regex or std.functional nor the module which calls std.net.curl), then I wasn't able to reproduce a few others, in the end, I think, I was able to track down a single dmd bug. That was with a relativly small code-base (maybe 1-2k?) now I have around 8k and I just don't have the time and maybe the knowledge. At least I can fix phobos/druntime bugs. Not sure why I wrote that, I don't wanna whiny, D is great/buggy and I knew it, when I started that project. And I am glad there are people like you, Kenji and lots of others who keep on improving D in their free time (not to forget Walter and Andrei).
Aug 28 2012
On 08/28/2012 06:35 PM, David wrote:Am 28.08.2012 17:41, schrieb bearophile:Use this to create a minimal test case with minimal user interaction: https://github.com/CyberShadow/DustMiteDavid:You're right, but I also said, that I don't care anylonger, I found a workaround, I can live with it. I generally tend to ignore dmd bugs and just workaround them, I don't have the time to track down every stuipid bug from a ~8k codebase. Thanks anyways for your help.The arrays are 100% identical (I dumped a Vertex()-array and a raw float-array, they were 100% identical).I hope some people are realizing how much time is being wasted in this thread. Taking a look at the asm is my suggestion still. If someone is rusty in asm, it's time to brush away the rust with a steel brush. Bye, bearophile
Aug 28 2012
Use this to create a minimal test case with minimal user interaction: https://github.com/CyberShadow/DustMiteDoesn't help if dmd doesn't crash, or?
Aug 28 2012
On 08/29/2012 01:26 AM, David wrote:It doesn't help a lot if compilation succeeds, but you stated that you generally tend to ignore dmd bugs. Most dmd bugs make compilation fail.Use this to create a minimal test case with minimal user interaction: https://github.com/CyberShadow/DustMiteDoesn't help if dmd doesn't crash, or?
Aug 28 2012
On Wed, 29 Aug 2012, Timon Gehr wrote:On 08/29/2012 01:26 AM, David wrote:It's more generally useful than that. It can reduce for any set of commands that together produce a binary decision: pass or fail. The key problem is that it does need to be deterministic. It doesn't matter if it's dmd that fails, or an execution of the output code, or really anything that determines pass or fail. The basic pattern is: while (progress can be made) try a reduction if reduction still reproduces the error continue else revert done (it's obviously more complex and there's tons of magic inside try a reduction)It doesn't help a lot if compilation succeeds, but you stated that you generally tend to ignore dmd bugs. Most dmd bugs make compilation fail.Use this to create a minimal test case with minimal user interaction: https://github.com/CyberShadow/DustMiteDoesn't help if dmd doesn't crash, or?
Aug 28 2012