digitalmars.D.learn - float[] =?UTF-8?B?4oaSIFZlcnRleFtdIOKAkyBkZWNyZWFzZXMgcGVyZm9ybWE=?=

David (24/24) Jul 24 2012 I am writing a game engine, well I was using a float[] array to store my...

bearophile (5/19) Jul 24 2012 Aligning floats to 1 byte doesn't seem a good idea. Try to remove

David (2/9) Jul 24 2012 This makes no difference.

Simon (11/22) Jul 24 2012 Could be that your structs are getting default initialised so you will

David (2/9) Jul 24 2012 No. The vertices are just created once (with a call to the default ctor)...

H. S. Teoh (5/17) Jul 24 2012 Hmm. Could this be a GC-related issue?

David (3/4) Jul 24 2012 Actually this could be. They are stored inside a Vertex* array which is

David (4/8) Jul 24 2012 import core.memory;

H. S. Teoh (6/17) Jul 24 2012 This is strange. You said that you profiled the program and the extra

David (5/7) Jul 24 2012 This is a damn good question. I tried to debug it manually with

Jonathan M Davis (4/13) Jul 24 2012 dmd comes with a profile built into it. Just compile -profile, and you'l...

Simen Kjaeraas (4/13) Jul 24 2012 As long as you're using malloc, the GC should leave it alone.

H. S. Teoh (9/29) Jul 24 2012 [...]

David (2/7) Jul 24 2012 Remvoing the `align(1)` changes nothing, not 1ms slower or faster,

Era Scarecrow (16/29) Jul 24 2012 [quote]

David (8/23) Jul 25 2012 Also not the problem, I returned the whole array at once and it didn't

David (1/2) Jul 25 2012 It's the same issue with ldc

Andrea Fontana (2/6) Jul 25 2012

David (3/9) Jul 25 2012 They didn't change (of course I changed the args which are different for...

Andrea Fontana (7/19) Jul 25 2012 I had a performance problem with std.xml some month ago. It takes me a

David (6/6) Jul 25 2012 Ok here we go:

Dmitry Olshansky (4/10) Jul 25 2012 Would be cool to have before/after graph.

David (5/18) Jul 25 2012 I don't know how to make comparisons with perf.data but here is the

Dmitry Olshansky (13/33) Jul 25 2012 It looks like a syscall/opengl issue. You somehow managed to hit a dark

David (2/12) Jul 25 2012 I don't care about speed much, but 1000% less performance is just too ba...

Dmitry Olshansky (6/19) Jul 25 2012 Been there once. I any case I'd try to split coordinates into 2 or 3

David (5/20) Jul 25 2012 Well the intersting question is, why is it slower? I checked it twice,

bearophile (5/9) Jul 25 2012 It's not easy to answer similar general questions. Why don't you

David (1/3) Jul 25 2012 My assembly is pretty rusty and actually, I have no idea what to look fo...

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (3/4) Jul 25 2012 Random guess: CPU cache misses?

David (2/6) Jul 25 2012 You're the 2nd one mentioning this, any ideas how to check this?

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (5/15) Jul 25 2012 I have no experience. Pages like this look promising:

David (4/19) Jul 25 2012 From what I've seen everything is ok (I used `perf top -e

David (6/6) Jul 26 2012 Ok, interesting thing.

Dmitry Olshansky (5/11) Jul 26 2012 Hm. Do you ever do pointer arithmetic on Vertex*? Is the size and

David (4/6) Jul 26 2012 No, yes. I really have no idea why this happens, I saved the contents of...

dennis luehring (6/12) Jul 26 2012 can you create a version of you code thats allows switching

Benjamin Thaut (10/34) Aug 24 2012 Check the dissassembly view of this line:

David (3/12) Aug 24 2012 That's not the problem. The problem has nothing to do with the

Sean Kelly (6/8) Aug 27 2012 tessellation, since the *rendering* is also 1000% slower (when all data ...

David (3/7) Aug 28 2012 The arrays are 100% identical (I dumped a Vertex()-array and a raw

bearophile (7/9) Aug 28 2012 I hope some people are realizing how much time is being wasted in

David (6/14) Aug 28 2012 You're right, but I also said, that I don't care anylonger, I found a

bearophile (16/19) Aug 28 2012 I understand you don't care much anymore for the discussed

David (14/25) Aug 28 2012 And that's the problem, I tried to track down a few of the bugs I hit.

Timon Gehr (3/20) Aug 28 2012 Use this to create a minimal test case with minimal user interaction:

David (1/3) Aug 28 2012 Doesn't help if dmd doesn't crash, or?

Timon Gehr (3/6) Aug 28 2012 It doesn't help a lot if compilation succeeds, but you stated that you

Brad Roberts (15/24) Aug 28 2012 It's more generally useful than that. It can reduce for any set of

David <d dav1d.de> writes:

I am writing a game engine, well I was using a float[] array to store my 
vertices, this worked well, but I have to send more and more uv 
coordinates (and other information) which needn't be stored as `float`'s 
so I moved from a float-Array to a Vertex Array:
https://github.com/Dav1dde/BraLa/blob/master/brala/dine/build
r/tessellator.d#L30 


align(1) struct Vertex {
     float x;
     float y;
     float z;
     float nx;
     float ny;
     float nz;
     float u_terrain;
     float v_terrain;
     float u_biome;
     float v_biome;
}

Everything is still a float, so it's easier. Nothing wrong with that or? 
Well this change decreases my performance by 1000%. My frame rate drops 
from ~12ms per frame to ~120ms per frame. I tried to find the bottleneck 
with `perf` but no results (the time is not spent in the game/engine).

The commit:
https://github.com/Dav1dde/BraLa/commit/02a37a0e46f195f5a46404747d659d26490e6c32

I hope you can see anything wrong. I have no idea!

Jul 24 2012

"bearophile" <bearophileHUGS lycos.com> writes:

David:

 align(1) struct Vertex {
     float x;
     float y;
     float z;
     float nx;
     float ny;
     float nz;
     float u_terrain;
     float v_terrain;
     float u_biome;
     float v_biome;
 }

 Everything is still a float, so it's easier. Nothing wrong with 
 that or? Well this change decreases my performance by 1000%.

Aligning floats to 1 byte doesn't seem a good idea. Try to remove 
the aling(1).

Bye,
bearophile

Jul 24 2012

David <d dav1d.de> writes:

Am 24.07.2012 20:57, schrieb bearophile:
 David:
 Everything is still a float, so it's easier. Nothing wrong with that
 or? Well this change decreases my performance by 1000%.

 Aligning floats to 1 byte doesn't seem a good idea. Try to remove the
 aling(1).

 Bye,
 bearophile

This makes no difference.

Jul 24 2012

Simon <s.d.hammett gmail.com> writes:

On 24/07/2012 20:08, David wrote:
 Am 24.07.2012 20:57, schrieb bearophile:
 David:
 Everything is still a float, so it's easier. Nothing wrong with that
 or? Well this change decreases my performance by 1000%.

 Aligning floats to 1 byte doesn't seem a good idea. Try to remove the
 aling(1).

 Bye,
 bearophile

 This makes no difference.

Could be that your structs are getting default initialised so you will 
be getting a constructor called for every instance of a Vertex.

This will be a lot slower than a float array.
Try void initialising your Vertex arrays.

http://dlang.org/declaration.html

See the bit Void Initializations near the bottom.

Also make sure that you are passing fixed size arrays by reference.

-- 
My enormous talent is exceeded only by my outrageous laziness.
http://www.ssTk.co.uk

Jul 24 2012

David <d dav1d.de> writes:

 Could be that your structs are getting default initialised so you will
 be getting a constructor called for every instance of a Vertex.

 This will be a lot slower than a float array.
 Try void initialising your Vertex arrays.

 http://dlang.org/declaration.html

 See the bit Void Initializations near the bottom.

 Also make sure that you are passing fixed size arrays by reference.

No. The vertices are just created once (with a call to the default ctor) 
and immedialty added to the Vertex* but they are never instantiated.

Jul 24 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 24, 2012 at 09:08:10PM +0200, David wrote:
 Am 24.07.2012 20:57, schrieb bearophile:
David:
Everything is still a float, so it's easier. Nothing wrong with that
or? Well this change decreases my performance by 1000%.

Aligning floats to 1 byte doesn't seem a good idea. Try to remove the
aling(1).

Bye,
bearophile

 
 This makes no difference.

Hmm. Could this be a GC-related issue?


T

-- 
No! I'm not in denial!

Jul 24 2012

David <d dav1d.de> writes:

 Hmm. Could this be a GC-related issue?

Actually this could be. They are stored inside a Vertex* array which is 
allocated which is allocated with `malloc`, maybe the GC scans all of 
the created vertex structs? Could this be?

Jul 24 2012

David <d dav1d.de> writes:

Am 24.07.2012 21:46, schrieb David:
 Hmm. Could this be a GC-related issue?

 Actually this could be. They are stored inside a Vertex* array which is
 allocated which is allocated with `malloc`, maybe the GC scans all of
 the created vertex structs? Could this be?

     import core.memory;
     GC.disable();

directly when entering main didn't help, so I guess it's not the GC

Jul 24 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 24, 2012 at 10:53:05PM +0200, David wrote:
 Am 24.07.2012 21:46, schrieb David:
Hmm. Could this be a GC-related issue?

Actually this could be. They are stored inside a Vertex* array which is
allocated which is allocated with `malloc`, maybe the GC scans all of
the created vertex structs? Could this be?

 
     import core.memory;
     GC.disable();
 
 directly when entering main didn't help, so I guess it's not the GC

This is strange. You said that you profiled the program and the extra
time spent is not in user code? Where is it spent then?


T

-- 
Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be
algorithms.

Jul 24 2012

David <d dav1d.de> writes:

 This is strange. You said that you profiled the program and the extra
 time spent is not in user code? Where is it spent then?

This is a damn good question. I tried to debug it manually with 
writefln's, it showed that glfwSwapBuffers needed the time (which, I 
looked it up, is just a wrapper around glXSwapBuffers). `perf` showed me 
nothing, the time was used in some unresolved calls.

I will make new tests with perf tomorrow.

Jul 24 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Wednesday, July 25, 2012 00:12:19 David wrote:
 This is strange. You said that you profiled the program and the extra
 time spent is not in user code? Where is it spent then?

 
 This is a damn good question. I tried to debug it manually with
 writefln's, it showed that glfwSwapBuffers needed the time (which, I
 looked it up, is just a wrapper around glXSwapBuffers). `perf` showed me
 nothing, the time was used in some unresolved calls.
 
 I will make new tests with perf tomorrow.

dmd comes with a profile built into it. Just compile -profile, and you'll get 
profile information when you run your program.

- Jonathan m Davis

Jul 24 2012

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

On Tue, 24 Jul 2012 22:53:05 +0200, David <d dav1d.de> wrote:

 Am 24.07.2012 21:46, schrieb David:
 Hmm. Could this be a GC-related issue?

 Actually this could be. They are stored inside a Vertex* array which is
 allocated which is allocated with `malloc`, maybe the GC scans all of
 the created vertex structs? Could this be?

      import core.memory;
      GC.disable();

 directly when entering main didn't help, so I guess it's not the GC

As long as you're using malloc, the GC should leave it alone.

-- 
Simen

Jul 24 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 24, 2012 at 08:57:08PM +0200, bearophile wrote:
 David:
 
align(1) struct Vertex {
    float x;
    float y;
    float z;
    float nx;
    float ny;
    float nz;
    float u_terrain;
    float v_terrain;
    float u_biome;
    float v_biome;
}

Everything is still a float, so it's easier. Nothing wrong with
that or? Well this change decreases my performance by 1000%.

 
 Aligning floats to 1 byte doesn't seem a good idea. Try to remove
 the aling(1).

[...]

I agree. I don't know how the CPU handles misaligned floats, but from
what I understand, it will do two loads to fetch the two word-aligned
parts of the float, and then assemble it together. This may be what's
causing the slowdown.


T

-- 
Маленькие детки - маленькие бедки.

Jul 24 2012

David <d dav1d.de> writes:

 I agree. I don't know how the CPU handles misaligned floats, but from
 what I understand, it will do two loads to fetch the two word-aligned
 parts of the float, and then assemble it together. This may be what's
 causing the slowdown.


 T

Remvoing the `align(1)` changes nothing, not 1ms slower or faster, 
unfortunatly.

Jul 24 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Tuesday, 24 July 2012 at 19:42:34 UTC, David wrote:
 I agree. I don't know how the CPU handles misaligned floats, 
 but from
 what I understand, it will do two loads to fetch the two 
 word-aligned
 parts of the float, and then assemble it together. This may be 
 what's
 causing the slowdown.


 T

 Remvoing the `align(1)` changes nothing, not 1ms slower or 
 faster, unfortunately.


[quote]
[code]
  Vertex[] data;
  foreach(i; 0..6) {
    data ~= Vertex(positions[i][0], positions[i][1], 
positions[i][2],
[/code]
[/quote]

Try using reserve? The new structure size looks like it's about 
40 bytes, and aside from resizing I'm not sure why it would have 
issues.

[code]
  Vertex[] data;
  data.reserve(6); //following foreach...
[/code]

Jul 24 2012

David <d dav1d.de> writes:

Am 25.07.2012 01:10, schrieb Era Scarecrow:
 Remvoing the `align(1)` changes nothing, not 1ms slower or faster,
 unfortunately.


 [quote]
 [code]
   Vertex[] data;
   foreach(i; 0..6) {
     data ~= Vertex(positions[i][0], positions[i][1], positions[i][2],
 [/code]
 [/quote]

 Try using reserve? The new structure size looks like it's about 40
 bytes, and aside from resizing I'm not sure why it would have issues.

 [code]
   Vertex[] data;
   data.reserve(6); //following foreach...
 [/code]

Also not the problem, I returned the whole array at once and it didn't 
help. But thanks for your idea.


The strange thing is, this tessellation function(s) are just run once 
and then the data is passed to the GPU.
So my comment shouldn't have a direct impact on the speed (e.g. GC issue 
would explain it, but unfortunatly it isn't the GC).

I'll try a different compiler, too.

Jul 25 2012

David <d dav1d.de> writes:

 I'll try a different compiler, too.

It's the same issue with ldc

Jul 25 2012

Andrea Fontana <nospam example.com> writes:

Have you checked your default compiler/linker args?=20

Il giorno mer, 25/07/2012 alle 15.23 +0200, David ha scritto:

 I'll try a different compiler, too.

=20
 It's the same issue with ldc
=20

Jul 25 2012

David <d dav1d.de> writes:

Am 25.07.2012 15:44, schrieb Andrea Fontana:
 Have you checked your default compiler/linker args?

 Il giorno mer, 25/07/2012 alle 15.23 +0200, David ha scritto:
 I'll try a different compiler, too.

 It's the same issue with ldc


They didn't change (of course I changed the args which are different for 
ldc), what do you exactly mean?

Jul 25 2012

Andrea Fontana <nospam example.com> writes:

I had a performance problem with std.xml some month ago. It takes me a
lot to point out that there was a default linker param (in gdc & dmd
under linux) that slow down the whole thing.=20
So maybe it's not a code-related issue, I mean :)
=20

Il giorno mer, 25/07/2012 alle 15.53 +0200, David ha scritto:

 Am 25.07.2012 15:44, schrieb Andrea Fontana:
 Have you checked your default compiler/linker args?

 Il giorno mer, 25/07/2012 alle 15.23 +0200, David ha scritto:
 I'll try a different compiler, too.

 It's the same issue with ldc


=20
 They didn't change (of course I changed the args which are different for=

=20
 ldc), what do you exactly mean?

Jul 25 2012

David <d dav1d.de> writes:

Ok here we go:

perf.data: http://dav1d.de/perf.data

and a fancy image (showing the results of perf): http://dav1d.de/output.png

I hope anyone knows where the time is spent.

Most time spent:
+  53,14%  bralad  [unknown]                   [k] 0xc01e5d2b

Jul 25 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 25-Jul-12 17:54, David wrote:
 Ok here we go:

 perf.data: http://dav1d.de/perf.data

 and a fancy image (showing the results of perf): http://dav1d.de/output.png

 I hope anyone knows where the time is spent.

 Most time spent:
 +  53,14%  bralad  [unknown]                   [k] 0xc01e5d2b

Would be cool to have before/after graph.

-- 
Dmitry Olshansky

Jul 25 2012

David <d dav1d.de> writes:

Am 25.07.2012 16:23, schrieb Dmitry Olshansky:
 On 25-Jul-12 17:54, David wrote:
 Ok here we go:

 perf.data: http://dav1d.de/perf.data

 and a fancy image (showing the results of perf):
 http://dav1d.de/output.png

 I hope anyone knows where the time is spent.

 Most time spent:
 +  53,14%  bralad  [unknown]                   [k] 0xc01e5d2b

 Would be cool to have before/after graph.

I don't know how to make comparisons with perf.data but here is the 
captured data of the "working" version:

http://dav1d.de/output_before.png
perf.data: http://dav1d.de/perf_before.data

Jul 25 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 25-Jul-12 19:32, David wrote:
 Am 25.07.2012 16:23, schrieb Dmitry Olshansky:
 On 25-Jul-12 17:54, David wrote:
 Ok here we go:

 perf.data: http://dav1d.de/perf.data

 and a fancy image (showing the results of perf):
 http://dav1d.de/output.png

 I hope anyone knows where the time is spent.

 Most time spent:
 +  53,14%  bralad  [unknown]                   [k] 0xc01e5d2b

 Would be cool to have before/after graph.

 I don't know how to make comparisons with perf.data but here is the
 captured data of the "working" version:

 http://dav1d.de/output_before.png
 perf.data: http://dav1d.de/perf_before.data


It looks like a syscall/opengl issue. You somehow managed to hit a dark 
corner of GL driver. It's either a fallback to software (partial) or 
some extra translation layer.
I once had a cool table that showed which GL calls  are direct to 
hardware and which are not for various nvidia cards.

Now the trick is to get an idea why. The best idea to debug driver 
related stuff is to test on some other computer (like different version 
of OS, video card etc.).

Can't quite decipher output but I find it strange that it mentions 
_d_invariant. You'd better compiler with -release if you care for speed.


-- 
Dmitry Olshansky

Jul 25 2012

David <d dav1d.de> writes:

 It looks like a syscall/opengl issue. You somehow managed to hit a dark
 corner of GL driver. It's either a fallback to software (partial) or
 some extra translation layer.
 I once had a cool table that showed which GL calls  are direct to
 hardware and which are not for various nvidia cards.

 Now the trick is to get an idea why. The best idea to debug driver
 related stuff is to test on some other computer (like different version
 of OS, video card etc.).

Worst case scenario ... driver issue.


 Can't quite decipher output but I find it strange that it mentions
 _d_invariant. You'd better compiler with -release if you care for speed.

I don't care about speed much, but 1000% less performance is just too bad.

Jul 25 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 26-Jul-12 00:52, David wrote:
 It looks like a syscall/opengl issue. You somehow managed to hit a dark
 corner of GL driver. It's either a fallback to software (partial) or
 some extra translation layer.
 I once had a cool table that showed which GL calls  are direct to
 hardware and which are not for various nvidia cards.

 Now the trick is to get an idea why. The best idea to debug driver
 related stuff is to test on some other computer (like different version
 of OS, video card etc.).

 Worst case scenario ... driver issue.

Been there once. I any case I'd try to split coordinates into 2 or 3 
interleaved arrays. (like vertex+norm and separately 2 UV). It's usually 
slower but not 10x ;)

 Can't quite decipher output but I find it strange that it mentions
 _d_invariant. You'd better compiler with -release if you care for speed.

 I don't care about speed much, but 1000% less performance is just too bad.


-- 
Dmitry Olshansky

Jul 25 2012

David <d dav1d.de> writes:

Am 25.07.2012 23:03, schrieb Dmitry Olshansky:
 On 26-Jul-12 00:52, David wrote:
 It looks like a syscall/opengl issue. You somehow managed to hit a dark
 corner of GL driver. It's either a fallback to software (partial) or
 some extra translation layer.
 I once had a cool table that showed which GL calls  are direct to
 hardware and which are not for various nvidia cards.

 Now the trick is to get an idea why. The best idea to debug driver
 related stuff is to test on some other computer (like different version
 of OS, video card etc.).

 Worst case scenario ... driver issue.

 Been there once. I any case I'd try to split coordinates into 2 or 3
 interleaved arrays. (like vertex+norm and separately 2 UV). It's usually
 slower but not 10x ;)

Well the intersting question is, why is it slower? I checked it twice, 
the data passed to the GPU is 100% the same, no difference, the only 
difference is the stored format on the CPU (and that's just a matter of 
casting).

Jul 25 2012

"bearophile" <bearophileHUGS lycos.com> writes:

David:

 Well the intersting question is, why is it slower? I checked it 
 twice, the data passed to the GPU is 100% the same, no 
 difference, the only difference is the stored format on the CPU 
 (and that's just a matter of casting).

It's not easy to answer similar general questions. Why don't you 
list the assembly of the two versions and compare?

Bye,
bearophile

Jul 25 2012

David <d dav1d.de> writes:

 It's not easy to answer similar general questions. Why don't you list
 the assembly of the two versions and compare?

My assembly is pretty rusty and actually, I have no idea what to look for.

Jul 25 2012

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 07/24/2012 11:38 AM, David wrote:

 Well this change decreases my performance by 1000%.

Random guess: CPU cache misses?

Ali

Jul 25 2012

David <d dav1d.de> writes:

Am 26.07.2012 00:12, schrieb Ali Çehreli:
 On 07/24/2012 11:38 AM, David wrote:

  > Well this change decreases my performance by 1000%.

 Random guess: CPU cache misses?

 Ali

You're the 2nd one mentioning this, any ideas how to check this?

Jul 25 2012

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 07/25/2012 03:26 PM, David wrote:
 Am 26.07.2012 00:12, schrieb Ali Çehreli:
 On 07/24/2012 11:38 AM, David wrote:

 Well this change decreases my performance by 1000%.

 Random guess: CPU cache misses?

 Ali

 You're the 2nd one mentioning this, any ideas how to check this?

I have no experience. Pages like this look promising:

 
http://stackoverflow.com/questions/2486840/linux-c-how-to-profile-time-wasted-due-to-cache-misses

Ali

Jul 25 2012

David <d dav1d.de> writes:

Am 26.07.2012 00:37, schrieb Ali Çehreli:
 On 07/25/2012 03:26 PM, David wrote:
 Am 26.07.2012 00:12, schrieb Ali Çehreli:
 On 07/24/2012 11:38 AM, David wrote:

 Well this change decreases my performance by 1000%.

 Random guess: CPU cache misses?

 Ali

 You're the 2nd one mentioning this, any ideas how to check this?

 I have no experience. Pages like this look promising:


 http://stackoverflow.com/questions/2486840/linux-c-how-to-profile-time-wasted-due-to-cache-misses


 Ali

 From what I've seen everything is ok (I used `perf top -e 
L1-dcache-load-misses -e L1-dcache-loads` to see the hotspots, nothing 
too bad)

Jul 25 2012

David <d dav1d.de> writes:

Ok, interesting thing.

I switched my buffer from Vertex* to void* and I cast every Vertex I get 
to void[] and add it to the buffer (slice → memcopy) and everything 
works fine now. I can live with that (once the basic functions are 
implemented it's not even a pain to use), but still, I wonder where the 
problem is.

Jul 26 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 26-Jul-12 14:14, David wrote:
 Ok, interesting thing.

 I switched my buffer from Vertex* to void* and I cast every Vertex I get
 to void[] and add it to the buffer (slice → memcopy) and everything
 works fine now. I can live with that (once the basic functions are
 implemented it's not even a pain to use), but still, I wonder where the
 problem is.

Hm. Do you ever do pointer arithmetic on Vertex*?  Is the size and 
offsets are correct (like in Vertex vs float)?

-- 
Dmitry Olshansky

Jul 26 2012

David <d dav1d.de> writes:

 Hm. Do you ever do pointer arithmetic on Vertex*?  Is the size and
 offsets are correct (like in Vertex vs float)?

No, yes. I really have no idea why this happens, I saved the contents of 
my buffers and compared them with the buffers of the `float[]` version 
(thanks to `git checkout`) and they were exactly 100% the same.
It's a mystery.

Jul 26 2012

dennis luehring <dl.soluz gmx.net> writes:

Am 26.07.2012 21:18, schrieb David:
 Hm. Do you ever do pointer arithmetic on Vertex*?  Is the size and
 offsets are correct (like in Vertex vs float)?

 No, yes. I really have no idea why this happens, I saved the contents of
 my buffers and compared them with the buffers of the `float[]` version
 (thanks to `git checkout`) and they were exactly 100% the same.
 It's a mystery.

can you create a version of you code thats allows switching 
(version(Vertex) else ...) between array and Vertex? or provide both 
versions here again

you checked dmd and ldc output so it can't be a backend thing (maybe 
frontend or GC) - or mysterious GL bugs

Jul 26 2012

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 24.07.2012 20:38, schrieb David:
 I am writing a game engine, well I was using a float[] array to store my
 vertices, this worked well, but I have to send more and more uv
 coordinates (and other information) which needn't be stored as `float`'s
 so I moved from a float-Array to a Vertex Array:
 https://github.com/Dav1dde/BraLa/blob/master/brala/dine/builder/tessellator.d#L30


 align(1) struct Vertex {
      float x;
      float y;
      float z;
      float nx;
      float ny;
      float nz;
      float u_terrain;
      float v_terrain;
      float u_biome;
      float v_biome;
 }

 Everything is still a float, so it's easier. Nothing wrong with that or?
 Well this change decreases my performance by 1000%. My frame rate drops
 from ~12ms per frame to ~120ms per frame. I tried to find the bottleneck
 with `perf` but no results (the time is not spent in the game/engine).

 The commit:
 https://github.com/Dav1dde/BraLa/commit/02a37a0e46f195f5a46404747d659d26490e6c32


 I hope you can see anything wrong. I have no idea!

Check the dissassembly view of this line:
buffer[elements++] = Vertex(x, y, z, nx, ny, nz, u, v, u_biome, v_biome);

If you are using an old version of dmd it will allocate an block of 
memory which has the size of Vertex, then it will fill the date into 
that block of memory, and then memcpy it to your buffer array.

You could try working around this by doing:

buffer[elements++].__ctor(x, y, z, nx, ny, nz, u, v, u_biome, v_biome);

Kind Regards
Benjamin Thaut

Aug 24 2012

David <d dav1d.de> writes:

 Check the dissassembly view of this line:
 buffer[elements++] = Vertex(x, y, z, nx, ny, nz, u, v, u_biome, v_biome);

 If you are using an old version of dmd it will allocate an block of
 memory which has the size of Vertex, then it will fill the date into
 that block of memory, and then memcpy it to your buffer array.

 You could try working around this by doing:

 buffer[elements++].__ctor(x, y, z, nx, ny, nz, u, v, u_biome, v_biome);

 Kind Regards
 Benjamin Thaut

That's not the problem. The problem has nothing to do with the 
tessellation, since the *rendering* is also 1000% slower (when all data 
is already processed).

Aug 24 2012

Sean Kelly <sean invisibleduck.org> writes:

On Aug 24, 2012, at 1:16 PM, David <d dav1d.de> wrote:
=20
 That's not the problem. The problem has nothing to do with the =

tessellation, since the *rendering* is also 1000% slower (when all data =
is already processed).

Is the alignment different between one and the other? I would't think so =
since it's dynamic memory, but the performance difference suggests that =
it might be.=

Aug 27 2012

David <d dav1d.de> writes:

Am 28.08.2012 01:53, schrieb Sean Kelly:
 On Aug 24, 2012, at 1:16 PM, David <d dav1d.de> wrote:
 That's not the problem. The problem has nothing to do with the tessellation,
since the *rendering* is also 1000% slower (when all data is already processed).

 Is the alignment different between one and the other? I would't think so since
it's dynamic memory, but the performance difference suggests that it might be.

The arrays are 100% identical (I dumped a Vertex()-array and a raw 
float-array, they were 100% identical).

Aug 28 2012

"bearophile" <bearophileHUGS lycos.com> writes:

David:

 The arrays are 100% identical (I dumped a Vertex()-array and a 
 raw float-array, they were 100% identical).

I hope some people are realizing how much time is being wasted in 
this thread. Taking a look at the asm is my suggestion still. If 
someone is rusty in asm, it's time to brush away the rust with a 
steel brush.

Bye,
bearophile

Aug 28 2012

David <d dav1d.de> writes:

Am 28.08.2012 17:41, schrieb bearophile:
 David:

 The arrays are 100% identical (I dumped a Vertex()-array and a raw
 float-array, they were 100% identical).

 I hope some people are realizing how much time is being wasted in this
 thread. Taking a look at the asm is my suggestion still. If someone is
 rusty in asm, it's time to brush away the rust with a steel brush.

 Bye,
 bearophile

You're right, but I also said, that I don't care anylonger, I found a 
workaround, I can live with it. I generally tend to ignore dmd bugs and 
just workaround them, I don't have the time to track down every stuipid 
bug from a ~8k codebase.

Thanks anyways for your help.

Aug 28 2012

"bearophile" <bearophileHUGS lycos.com> writes:

David:

 I generally tend to ignore dmd bugs and just workaround them, I 
 don't have the time to track down every stuipid bug from a ~8k 
 codebase.

I understand you don't care much anymore for the discussed 
problem, and I know that localizing D/DMD bugs requires some time 
and work.

But I'd like you to not ignore all the bugs you find, and instead 
minimize some of them and submit them to Bugzilla. Despite 
thousands of open bugs and about a hundred of open patches, many 
bugs do get fixed at every release. If you submit bugs, D/DMD 
will improve, in your future you will find less bugs to work 
around in your D code, and you will help other present and future 
D programmers avoid hitting them. This is important because D is 
young and its community is small. The idea is: they give you a 
compiler/language for free, and you give something back to the 
community submitting some bugs :-)

Bye and thank you,
bearophile

Aug 28 2012

David <d dav1d.de> writes:

 But I'd like you to not ignore all the bugs you find, and instead
 minimize some of them and submit them to Bugzilla. Despite thousands of
 open bugs and about a hundred of open patches, many bugs do get fixed at
 every release. If you submit bugs, D/DMD will improve, in your future
 you will find less bugs to work around in your D code, and you will help
 other present and future D programmers avoid hitting them. This is
 important because D is young and its community is small. The idea is:
 they give you a compiler/language for free, and you give something back
 to the community submitting some bugs :-)

I totally agree

 I understand you don't care much anymore for the discussed problem, and
 I know that localizing D/DMD bugs requires some time and work.

And that's the problem, I tried to track down a few of the bugs I hit. 
50% vanished when I changed unrelated code (cool hugh? getting a 
segfault in std.net.curl → std.regex → std.functional.memoize, when 
chaning your ResourceManager, which has really nothing to do with either 
curl, regex or std.functional nor the module which calls std.net.curl), 
then I wasn't able to reproduce a few others, in the end, I think, I was 
able to track down a single dmd bug. That was with a relativly small 
code-base (maybe 1-2k?) now I have around 8k and I just don't have the 
time and maybe the knowledge. At least I can fix phobos/druntime bugs.

Not sure why I wrote that, I don't wanna whiny, D is great/buggy and I 
knew it, when I started that project. And I am glad there are people 
like you, Kenji and lots of others who keep on improving D in their free 
time (not to forget Walter and Andrei).

Aug 28 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 08/28/2012 06:35 PM, David wrote:
 Am 28.08.2012 17:41, schrieb bearophile:
 David:

 The arrays are 100% identical (I dumped a Vertex()-array and a raw
 float-array, they were 100% identical).

 I hope some people are realizing how much time is being wasted in this
 thread. Taking a look at the asm is my suggestion still. If someone is
 rusty in asm, it's time to brush away the rust with a steel brush.

 Bye,
 bearophile

 You're right, but I also said, that I don't care anylonger, I found a
 workaround, I can live with it. I generally tend to ignore dmd bugs and
 just workaround them, I don't have the time to track down every stuipid
 bug from a ~8k codebase.

 Thanks anyways for your help.

Use this to create a minimal test case with minimal user interaction:
https://github.com/CyberShadow/DustMite

Aug 28 2012

David <d dav1d.de> writes:

 Use this to create a minimal test case with minimal user interaction:
 https://github.com/CyberShadow/DustMite

Doesn't help if dmd doesn't crash, or?

Aug 28 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 08/29/2012 01:26 AM, David wrote:
 Use this to create a minimal test case with minimal user interaction:
 https://github.com/CyberShadow/DustMite

 Doesn't help if dmd doesn't crash, or?

It doesn't help a lot if compilation succeeds, but you stated that you
generally tend to ignore dmd bugs. Most dmd bugs make compilation fail.

Aug 28 2012

Brad Roberts <braddr puremagic.com> writes:

On Wed, 29 Aug 2012, Timon Gehr wrote:

 On 08/29/2012 01:26 AM, David wrote:
 Use this to create a minimal test case with minimal user interaction:
 https://github.com/CyberShadow/DustMite

 
 Doesn't help if dmd doesn't crash, or?
 

 
 It doesn't help a lot if compilation succeeds, but you stated that you
 generally tend to ignore dmd bugs. Most dmd bugs make compilation fail.

It's more generally useful than that.  It can reduce for any set of 
commands that together produce a binary decision: pass or fail.  The key 
problem is that it does need to be deterministic.  It doesn't matter if 
it's dmd that fails, or an execution of the output code, or really 
anything that determines pass or fail.  The basic pattern is:

while (progress can be made)
   try a reduction

   if reduction still reproduces the error
      continue
   else
      revert
done

(it's obviously more complex and there's tons of magic inside try a 
reduction)

Aug 28 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - float[] =?UTF-8?B?4oaSIFZlcnRleFtdIOKAkyBkZWNyZWFzZXMgcGVyZm9ybWE=?=