D - dmd 30 bugs
- Patrick Down (18/18) May 09 2002 Walter,
- Walter (4/21) May 09 2002 Posting them here is fine. I thought the banner did put out the version
- Patrick Down (39/41) May 09 2002 You are right it does print the version it has no arguments.
- Pavel Minayev (24/28) May 10 2002 Nah, it's not my fault. Let's see how it goes. writeString()
- Patrick Down (5/6) May 10 2002 Aggg! I'm still thinking C++! Sorry for the
- Pavel Minayev (3/6) May 10 2002 Hah! I didn't notice it as well! =) Shame on me!
- MicroWizard (7/14) May 10 2002 Well... well...
- Pavel Minayev (3/7) May 10 2002 We're discussing it for the last few days already =)
- Patrick Down (13/13) May 10 2002 Is this right?
- Pavel Minayev (4/6) May 11 2002 There can be no pointers to bits. Since out parameters are
- OddesE (9/15) May 11 2002 So we do need a bool type, byte sized!
- Patrick Down (7/10) May 11 2002 typedef ubyte bool;
- Walter (3/13) May 11 2002 Where the bit type comes into its own is in very large arrays of bits.
- Sean L. Palmer (161/179) May 11 2002 If there's one thing a computer is good at, it's moving, scanning, and
- Walter (5/5) May 12 2002 You're far beyond me in graphics code. But one possibility for easilly
- Sean L. Palmer (6/11) May 12 2002 This array manipulation is just the basics of graphics code. Everything
- Russell Borogove (26/44) May 12 2002 This seems like such a small incremental win (an
- OddesE (12/18) May 14 2002 *snif*
- OddesE (10/30) May 14 2002 I am forking this of into a new thread, because
- Sean L. Palmer (21/27) May 11 2002 That sucks. The compiler should do some behind-the-scenes manipulation
- Patrick Down (25/25) May 11 2002 The following causes the compiler to generate:
- Walter (4/24) May 11 2002 Thanks for reporting this. The bit type is turning out to be a constant
- Sean L. Palmer (32/64) May 11 2002 Of course. It's what the other types are built out of. THE most
- Sean L. Palmer (60/125) May 12 2002 I've been thinking about this and it seems that it might be better to ex...
- Walter (10/32) May 12 2002 if
- Patrick Down (7/10) May 12 2002 I understand your reasons for disallowing overlapping ranges but
- Sean L. Palmer (12/22) May 12 2002 That's how memcpy is usually implemented. If everything about the addre...
- Russell Borogove (6/16) May 12 2002 ITYM memmove(); C standard memcpy() has undefined behavior
- Walter (20/24) May 14 2002 itching
- Sean L. Palmer (35/59) May 14 2002 of
- Walter (14/21) May 14 2002 that
- Patrick Down (5/17) May 14 2002 So is
- Walter (4/21) May 14 2002 Good question. It's not an overlapped copy because each array element is...
- Sean L. Palmer (20/41) May 14 2002 And if you can't guarantee that they don't overlap, then the optimizatio...
- Sean L. Palmer (33/40) May 12 2002 could
- Stephen Fuld (17/32) May 12 2002 extend
- Russell Borogove (10/28) May 12 2002 Eeek! I would beg for a distinct keyword like "foreach" in
Walter, If you have a prefered bug reporting method let me know. Also it might be nice if you would put a version or build number in the compler banner to refer to. ================= The following causes my dmd.exe to exception with a null pointer reference. import stream; // http://int19h.tamb.ru/files.html void Bar(int input) { } int main(char[][] args) { Bar(stream); return 0; }
May 09 2002
Posting them here is fine. I thought the banner did put out the version number. Execute DMD with no arguments. "Patrick Down" <pat codemoon.com> wrote in message news:Xns920A3EEB2579patcodemooncom 63.105.9.61...Walter, If you have a prefered bug reporting method let me know. Also it might be nice if you would put a version or build number in the compler banner to refer to. ================= The following causes my dmd.exe to exception with a null pointer reference. import stream; // http://int19h.tamb.ru/files.html void Bar(int input) { } int main(char[][] args) { Bar(stream); return 0; }
May 09 2002
"Walter" <walter digitalmars.com> wrote in news:abfp0m$1qa8$1 digitaldaemon.com:Posting them here is fine. I thought the banner did put out the version number. Execute DMD with no arguments.You are right it does print the version it has no arguments. =================== The next one's a little weard. When I run the following code it prints... here1 here2 err2 Then it hangs after printing err2. I don't know if this is your's or Pavel's. I suspect that MemoryStream.writeBlock is throwing the error but I don't know why the program hangs. import stream; int main(char[][] args) { try { printf("here1\n"); char[] foo = "variable=value\n" "var=value\n"; MemoryStream strm; printf("here2\n"); strm.writeString(foo); printf("here3\n"); } catch(WriteError e) { printf("err1\n"); } catch(Exception e) { printf("err2\n"); } return 0; }
May 09 2002
"Patrick Down" <pat codemoon.com> wrote in message news:Xns920A143BCE8FCpatcodemooncom 63.105.9.61...Then it hangs after printing err2. I don't know if this is your's or Pavel's. I suspect that MemoryStream.writeBlock is throwing the error but I don't know why the program hangs.Nah, it's not my fault. Let's see how it goes. writeString() calls writeExact(), which in turn calls writeBlock(). Since MemoryStream is just a wrapper around OutBuffer, it calls OutBuffer.write() - which throws an exception "overlapping array copy". You can check it yourself. Try: import outbuffer, c.stdio; int main(char[][] args) { try { printf("start\n"); char[] s = "foo"; OutBuffer strm; printf("writing..."); strm.write(s); printf("done\n"); } catch (Exception e) printf("\nError: %.*s\n", e.msg); return 0; }
May 10 2002
Patrick Down <pat codemoon.com> wrote in news:Xns920A143BCE8FCpatcodemooncom 63.105.9.61:MemoryStream strm;Aggg! I'm still thinking C++! Sorry for the false alarm. MemoryStream strm = new MemoryStream;
May 10 2002
"Patrick Down" <pat codemoon.com> wrote in message news:Xns920A56844ACEFpatcodemooncom 63.105.9.61...Aggg! I'm still thinking C++! Sorry for the false alarm. MemoryStream strm = new MemoryStream;Hah! I didn't notice it as well! =) Shame on me!
May 10 2002
In article <abgl73$2jq4$1 digitaldaemon.com>, Pavel Minayev says..."Patrick Down" <pat codemoon.com> wrote in message news:Xns920A56844ACEFpatcodemooncom 63.105.9.61...Well... well... What about to have a run-time check against calling null references instead of having the system hung... Frankly, it would be very useful. Since I'm a stupid C++ programmer and I always forget to initialize object instances with an explicit 'new'. Tamas NagyAggg! I'm still thinking C++! Sorry for the false alarm. MemoryStream strm = new MemoryStream;Hah! I didn't notice it as well! =) Shame on me!
May 10 2002
"MicroWizard" <MicroWizard_member pathlink.com> wrote in message news:abh43c$30k2$1 digitaldaemon.com...What about to have a run-time check against calling null references instead of having the system hung... Frankly, it would be very useful. Since I'm a stupid C++ programmer and I always forget to initialize object instances with an explicit 'new'.We're discussing it for the last few days already =)
May 10 2002
Is this right? err.d(2): cannot have out parameter of type bit void foo(out bit bar) { bar = true; } int main(char[][] args) { bit a; foo(a); return 0; }
May 10 2002
"Patrick Down" <pat codemoon.com> wrote in message news:Xns920B30A21022patcodemooncom 63.105.9.61...Is this right? err.d(2): cannot have out parameter of type bitThere can be no pointers to bits. Since out parameters are implicitly dereferenced pointers, they cannot be bit.
May 11 2002
"Pavel Minayev" <evilone omen.ru> wrote in message news:abiv29$2lhf$1 digitaldaemon.com..."Patrick Down" <pat codemoon.com> wrote in message news:Xns920B30A21022patcodemooncom 63.105.9.61...So we do need a bool type, byte sized! -- Stijn OddesE_XYZ hotmail.com http://OddesE.cjb.net _________________________________________________ Remove _XYZ from my address when replying by mailIs this right? err.d(2): cannot have out parameter of type bitThere can be no pointers to bits. Since out parameters are implicitly dereferenced pointers, they cannot be bit.
May 11 2002
"OddesE" <OddesE_XYZ hotmail.com> wrote in news:abjvsl$fem$1 digitaldaemon.com:So we do need a bool type, byte sized!typedef ubyte bool; I am sort of doubting the utility of the bit type for flags. I think I may go back to schemes like this. enum flags { f1=1, f2=2, f3=4, f4=8, f5=16 } flags a = flags.f1 | flags.f4;
May 11 2002
Where the bit type comes into its own is in very large arrays of bits. "Patrick Down" <pat codemoon.com> wrote in message news:Xns920BB36E7D600patcodemooncom 63.105.9.61..."OddesE" <OddesE_XYZ hotmail.com> wrote in news:abjvsl$fem$1 digitaldaemon.com:So we do need a bool type, byte sized!typedef ubyte bool; I am sort of doubting the utility of the bit type for flags. I think I may go back to schemes like this. enum flags { f1=1, f2=2, f3=4, f4=8, f5=16 } flags a = flags.f1 | flags.f4;
May 11 2002
If there's one thing a computer is good at, it's moving, scanning, and otherwise manipulating bits. Sucks that most languages don't extend down to the bit type... the operations on bit arrays are like "detailed" versions of the standard memcpy, memset, memmove, strchr, etc functions but at the bit level. And bit arrays are fairly powerful things. I'd like to see more attention to language features that deal with that kind of thing... maybe for a D 2.0 feature. The next area I'd focus on is color pixel processing. The concept of color is fundamental to our interface to the computer, the display subsystem, it almost deserves to be a builtin type. Some way of specifying operations on a range of array entries and preferrably even extend slicing to rectangular slices of a 2d array, cubic slices of a 3d array... that could get powerful pretty fast. The compiler could generate SIMD code for that stuff pretty nicely. Color arrays are palettes, 2d arrays are like images or textures, and 3d arrays are like movies or voxels. I wonder if C++ has ever considered a "lossy" modifier ("typedef unsigned lossy long color", which would represent a data type that will be dealing with potentially 64 bit values but if it couldn't do that, or it wasn't fast to use longs and you'd selected high optimization settings, it could substitute a smaller type) Standardizing filters and such (with wrapping!) could make D a great image processing language, wonderful for writing things that manipulate bitmaps. One doesn't have to know much about modern 3D graphics to see that alot of what goes on is specifying operations on millions of pixels by filling triangles with gradients of values and merging that with the existing image data somehow. D could become the next Renderman or OpenGL. Maybe too tight a focus but the graphics would sure make a great library. If we can chain together operations on a source one could parse the language itself for the operations.. source * x + dest * (1 - x) I'd actually like to see bounded int types in the language. Saturating int, call it whatever you want. Something that the hardware "tops out" or clamps to the allowed range before storing into. Alot of CPU's support this concept, but no language that I know of ever has. It's something the compiler could do for you automatically (using software emulation if necessary), and would sure be a convenient timesaver. Pixels usually fall into the category of saturated values struct Pixel { saturated(0 <= byte <= 255) r,g,b,a; or saturated(0, 255) byte r,g,b,a; } That leads me to another idea optimizing only areas of the code that it can tell are obvious inner loops (the part that needs optimizing the most). Maybe some kind of depth-first search could traverse the code, optimizing for a given amount of time, and when it's time to run it packs everything it has so far together into an .exe and runs it. Actually pixel could be a basic type. There could be pixel which is the ubiquitous 32-byte ARGB/RGBA whatever. D could also have "named size" types such as pixelARGB1555, pixelRGB565, pixelRGBA4444, etc, etc) actually this is leading back to SIMD support because it's far easier to code this yourself rather than rely on typedefs for the thousands of kinds of pixels that are possible. So I'd tackle the language feature at a lower level, that of saturating scalar types, chains of math on values of 2d/3d arrays. How would one describe mathematically what's going on in a typical SIMD situation, such as compositing (blending) one black-and-white image onto another based on the value in a third image. void composite(in byte[][] image1, in byte[][] image2, in byte[][] mask, out byte[][] result) { // make sure all arrays are the same size. Note use of introspection capability to inquire the dimensions of an array. assert(image1.dim(1) == image2.dim(1) == mask.dim(1) && image1.dim(2) == image2.dim(2) == mask.dim(2)) assert(image1.dim == image2.dim); // shorthand for the above. // make the result the same size as the others result.resize(image1.dim(1),image2.dim(2)) // for each value in result (recursively in the case of nested arrays) do the mixing. // writing it this way allows the compiler to write the operation using SIMD result.foreach[[x]] { result[[x]] = saturate_cast(byte) (image1[[x]] * mask[[x]] + image2[[x]] * ~mask[[x]]); } // do the same thing but now just for a slice of the arrays, and using only the first column of the mask result.foreach[[x]][[y]] in [result.dim(1)/4 .. (result.dim(1)/4)*3][result.dim(2)/4 .. (result.dim(2)/4)*3] { result[[x]][[y]] = saturate_cast(byte) (image1[[x]][[y] * mask[[x]][[0]] + image2[[x]][[y]] * ~mask[[x]][[0]]); } } the compiler can either transform that into for (int __i = 0; __i < result.dim(1); ++i) for (int __j = 0; __j < result.dim(2); ++j) { int temp = image1[__i][__j] * mask[__i][__j] + image2 * ~mask[__i][__j]; result[__i][__j] = (temp < 0 ? 0 : temp > 255 ? 255 : temp); } or into an optimized version of that (pulling those unchanging values out of the loop helps alot) or even directly to SIMD asm code. That SIMD stuff would kick ass if it was integrated as tightly into the compiler and optimizer as, say, the int type is. ;) Anyone who has ever written a 2D memory move or, say, a highly optimized memcpy, knows how difficult it is to write this stuff without bugs and fast, *and* in a general way. It could however be entirely automated by the compiler. I think the compiler will have a much easier time changing result.foreach[[x]][[y]] in [result.dim(1)/4 .. (result.dim(1)/4)*3][result.dim(2)/4 .. (result.dim(2)/4)*3] { result[[x]][[y]] = saturate_cast(byte) (image1[[x]][[y] * mask[[x]][[0]] + image2[[x]][[y]] * ~mask[[x]][[0]]); } into really tight SIMD code than it would trying to optimize the following (which is the only way to express the idea to the compiler in languages such as C): for (int __i = result.dim(1)/4; __i < (result.dim(1)/4)*3; ++i) for (int __j = result.dim(2)/4; __j < (result.dim(2)/4)*3; ++j) { int temp = image1[__i][__j] * mask[__i][__j] + image2 * ~mask[__i][__j]; result[__i][__j] = (temp < 0 ? 0 : temp > 255 ? 255 : temp); } If the loop limits are known at compile time, or god forbid the address was known at compile time, it could output some nice custom-crafted, highly optimized code utilizing whatever instructions and methods are available. Doing that stuff isn't easy, but I've done it and I know how repetitive and error-prone it is; but it wouldn't be incredibly hard to write a program that writes traversal loops that are well optimized if you start out with the concept that the optimizer would otherwise have to infer: that you're going to do the exact same operation to a whole bunch of data. With a language extension, you say this directly. If you assume the compiler can figure it out for you, it could, but it's much easier for you to just tell it what you mean instead of writing it yourself. One might be able to approach that level of sophisticated code generation if you had a really badass inline assembler and really badass template system that let you manipulate values and execute logic at compile time, but making the badass inline assembler and badass template system is a lot harder than just correcting a basic language deficiency: the ability to specify a set of operations on many datum as a single execution unit (leaving the traversal code entirely up to the compiler, which for multidimensional arrays, Walter's work now will save many a programmer countless man-hours of work writing this type of code. Code would be built custom for each loop probably, and that's ok with me. Now if we can find a way to integrate the concept of "brush" and "polygon" and "mask" and "blend" and "filter" into the language, we're well on the way to being a useful software rendering tool. Have you ever written code to do bilinear filtering? Conceptually it's given 2 inputs and a 2d array, does 4 lookups, 2 "frac" modulo 1.0 operations, and blend the results together using 3 linear interpolations of the form byte BilinearFilter(byte[][] image, int x_times_16, int y_times_16) { int ix = x_times_16>>4; int iy = y_times_16>>4; int fx = x_times_16&((1<<4)-1); int fy = y_times_16&((1<<4)-1); byte ul = image[iy][ix], ur = image[iy][ix+1], bl = image[iy+1][ix], br = image[iy+1][ix+1]; byte terp1 = (ul * fx + ur * (15 - fx))>>4; byte terp2 = (bl * fx + br * (15 - fx))>>4; return (terp1 * fy + terp2 * (15 - fy))>>4; } Not sure what kind of language feature would make that kind of mess easier to deal with. Sean "Walter" <walter digitalmars.com> wrote in message news:abk9iv$n2j$3 digitaldaemon.com...Where the bit type comes into its own is in very large arrays of bits. "Patrick Down" <pat codemoon.com> wrote in message news:Xns920BB36E7D600patcodemooncom 63.105.9.61..."OddesE" <OddesE_XYZ hotmail.com> wrote in news:abjvsl$fem$1 digitaldaemon.com:So we do need a bool type, byte sized!typedef ubyte bool; I am sort of doubting the utility of the bit type for flags. I think I may go back to schemes like this. enum flags { f1=1, f2=2, f3=4, f4=8, f5=16 } flags a = flags.f1 | flags.f4;
May 11 2002
You're far beyond me in graphics code. But one possibility for easilly generating loop code would be to, say, build a simple compiler for a specialized language that generated D code. I use similar kinds of things for building tables. This would be easilly modifiable to generate custom code.
May 12 2002
"Walter" <walter digitalmars.com> wrote in message news:abl940$1h50$1 digitaldaemon.com...You're far beyond me in graphics code. But one possibility for easilly generating loop code would be to, say, build a simple compiler for a specialized language that generated D code. I use similar kinds of things for building tables. This would be easilly modifiable to generate custom code.This array manipulation is just the basics of graphics code. Everything today uses polygons, not rectangles. I have some ideas for representing triangles in D but I'll save it til the right time. Sean
May 12 2002
Sean L. Palmer wrote:I wonder if C++ has ever considered a "lossy" modifier ("typedef unsigned lossy long color", which would represent a data type that will be dealing with potentially 64 bit values but if it couldn't do that, or it wasn't fast to use longs and you'd selected high optimization settings, it could substitute a smaller type)This seems like such a small incremental win (an occasional increase in portability for certain kinds of code) for the effort it would take to specify and implement the feature that I don't see any language doing it anytime soon. Of course, you could put it in a class via operator overloading; it's up to the library implementor to port it instead of the compiler implementor.I'd actually like to see bounded int types in the language. Saturating int, call it whatever you want. Something that the hardware "tops out" or clamps to the allowed range before storing into. Alot of CPU's support this concept,Not very many architectures do, and not very generally, though...but no language that I know of ever has. It's something the compiler could do for you automatically (using software emulation if necessary), and would sure be a convenient timesaver.Yeah, I called for this one a while ago. You should be able to use it on floats and in arbitrary non-power-of-two ranges as well. Besides arbitrary saturating ranges, arbitrary wrapping ranges would be nice as well. typedef saturated( 1, 5 ) char review_rating; // 1 to 5 stars typedef wrapped( 0.0, 2.0*3.14159265358 ) float phase_angle; wrapped types would effectively do a % or fmod every time they were evaluated unless the compiler could guarantee they were already in-range.Have you ever written code to do bilinear filtering? Conceptually it's given 2 inputs and a 2d array, does 4 lookups, 2 "frac" modulo 1.0 operations, and blend the results together using 3 linear interpolations... Not sure what kind of language feature would make that kind of mess easier to deal with.I've been playing with audio DSP stuff lately, and (usually 1D) interpolations and fmod(x,1.0) or f-int(f) are also heavily used therein. I have no trouble with this stuff as part of a standard library, but I'm less comfortable with making it part of the language. -Russell B
May 12 2002
"Patrick Down" <pat codemoon.com> wrote in message news:Xns920BB36E7D600patcodemooncom 63.105.9.61..."OddesE" <OddesE_XYZ hotmail.com> wrote in news:abjvsl$fem$1 digitaldaemon.com:*snif* And we are back in the bad old C days. This totally sucks and we all know why! We need a byte sized *bool*! -- Stijn OddesE_XYZ hotmail.com http://OddesE.cjb.net _________________________________________________ Remove _XYZ from my address when replying by mailSo we do need a bool type, byte sized!typedef ubyte bool;
May 14 2002
"OddesE" <OddesE_XYZ hotmail.com> wrote in message news:abrr5k$1acl$1 digitaldaemon.com..."Patrick Down" <pat codemoon.com> wrote in message news:Xns920BB36E7D600patcodemooncom 63.105.9.61...I am forking this of into a new thread, because I think it is important... -- Stijn OddesE_XYZ hotmail.com http://OddesE.cjb.net _________________________________________________ Remove _XYZ from my address when replying by mail"OddesE" <OddesE_XYZ hotmail.com> wrote in news:abjvsl$fem$1 digitaldaemon.com:*snif* And we are back in the bad old C days. This totally sucks and we all know why! We need a byte sized *bool*! -- Stijn OddesE_XYZ hotmail.com http://OddesE.cjb.net _________________________________________________ Remove _XYZ from my address when replying by mailSo we do need a bool type, byte sized!typedef ubyte bool;
May 14 2002
That sucks. The compiler should do some behind-the-scenes manipulation there so that it *can* store into that bit, instead of telling the programmer (with an error!) that he shouldn't be returning bits from functions. Either extend the concept of pointer for pointers to bit such that the pointer is not just the pointer to the memory, but also a combined bit index so the result bit can be shifted during the exit code for a function from the return value buffer into the correct bit of whatever it points to. Slightly larger pointer. Or have it implemented as a *byte and have the calling code deal with putting the value there and stuffing it back into its rightful place. But don't just disallow it. Exceptions such as this will ruin a language. Just have the compiler Do The Right Thing, which is exactly what you asked, nay commanded it to do. The "how" is an implementation question. Good programmers avoid using features which are too slow, but whether a particular implementation is acceptable is up to the programmer. I suppose Walter's D compiler could even implement it to return an error, I just don't think it should be part of the language spec that you can't use bits as out parameters. Sean "Pavel Minayev" <evilone omen.ru> wrote in message news:abiv29$2lhf$1 digitaldaemon.com..."Patrick Down" <pat codemoon.com> wrote in message news:Xns920B30A21022patcodemooncom 63.105.9.61...Is this right? err.d(2): cannot have out parameter of type bitThere can be no pointers to bits. Since out parameters are implicitly dereferenced pointers, they cannot be bit.
May 11 2002
The following causes the compiler to generate: Internal error: ..\ztc\cgcs.c 348 void foo(inout bit[] test) { test[0] = 1; test[1] = 0; test[2] = 1; } int main(char[][] args) { bit[] a; a.length = 32; for(int i = 0; i < a.length; ++i) a[i] = 0; foo(a[3..6]); // foo(a) works ok for(int i = 0; i < a.length; ++i) printf("%d",a[i]); printf("\n"); return 0; }
May 11 2002
Thanks for reporting this. The bit type is turning out to be a constant special case in the compiler. "Patrick Down" <pat codemoon.com> wrote in message news:Xns920B8C31915A4patcodemooncom 63.105.9.61...The following causes the compiler to generate: Internal error: ..\ztc\cgcs.c 348 void foo(inout bit[] test) { test[0] = 1; test[1] = 0; test[2] = 1; } int main(char[][] args) { bit[] a; a.length = 32; for(int i = 0; i < a.length; ++i) a[i] = 0; foo(a[3..6]); // foo(a) works ok for(int i = 0; i < a.length; ++i) printf("%d",a[i]); printf("\n"); return 0; }
May 11 2002
Of course. It's what the other types are built out of. THE most fundamental type that has storage. I'm not surprised that it's a special case. All the implementation has to deal with it as an array contained in a series of fixed-size arrays (the machine word), not as one array. I think you'll find that if you can write bit-packing code such as dynamic bit array that SIMD actually uses the same concept and in fact it's just an extension of the concept of ints being fixed-sized containers of bits with special operations. SIMD types are just fixed-sized containers of bytes or ints or floats with special operations. Wouldn't it be nice to generalize this entire concept into some kind of SIMD type? You could declare typedef bit[[32]] machine_word; or typedef float[[4]] vector; then write shorthand vector a,b,c; a[[]] += b[[]] * c[[]]; I'm using [[]] as shorthand for "everything in not only this dimension but all smaller dimensions as well". The compiler would be able to generate pretty optimal code using SIMD even if the type you declare isn't exactly available as a basic SIMD register type. For instance if you declare typedef float[[6]] vector6; vector a,b,c; a[[]] += b[[]] * c[[]]; and the compiler only had a 4-float SIMD register available it could still use that for part of the operation and either another 4-float SIMD register or 2 FPU registers to do the remainder of the work. Sean "Walter" <walter digitalmars.com> wrote in message news:abk9iu$n2j$2 digitaldaemon.com...Thanks for reporting this. The bit type is turning out to be a constant special case in the compiler. "Patrick Down" <pat codemoon.com> wrote in message news:Xns920B8C31915A4patcodemooncom 63.105.9.61...The following causes the compiler to generate: Internal error: ..\ztc\cgcs.c 348 void foo(inout bit[] test) { test[0] = 1; test[1] = 0; test[2] = 1; } int main(char[][] args) { bit[] a; a.length = 32; for(int i = 0; i < a.length; ++i) a[i] = 0; foo(a[3..6]); // foo(a) works ok for(int i = 0; i < a.length; ++i) printf("%d",a[i]); printf("\n"); return 0; }
May 11 2002
I've been thinking about this and it seems that it might be better to extend the slicing mechanism D already has. D already has: typedef float[4] vector; vector a,b,c; a[] += b[] * c[]; which the compiler can easily tell what you intend, but what are the rules for what happens when the sizes of the arrays don't match? What happens if they're multidimensional? D can also already do this: a[1..3] += b[1..3] * c[1..3]; If one defined a range like so: range int r = 1..3; a[r] += b[r] * c[r]; would work as convenient shorthand. For multidimensional slicing, it could work as follows: typedef float[4][4] matrix; matrix a,b,c; range(int) i = 1..3; range(int) j = 1..3; a[i][j] = 2.0f * sqrt(b[i][j] + c[i][j+1]); So I guess for this to work ranges would have to be a class A type in the D language. They could be compile-time constants or variables, or a mixture of the two. The generated code would obviously benefit from the ranges being compile-time constants, but that shouldn't be necessary. If one writes an expression with side-effects using a range in place of a value inside an array access, there should be a few rules: Only one range may be included in any pair of brackets, along with optional integer offset or scale. The entire expression is evaluated once for each permutation of possibilities in each of the included ranges. If you have a range R from 0..2 (two possibilities, 0 and 1) and a 3d array int x[][][], then x[R][R][R] += 1 would evaluate to eight increments. The order of evaluation is undefined however it would be nice if one could control this behavior so that overlapping moves do the right thing. It should be possible to figure out whether to go forward or backward for each range by evaluating the dependencies in the graph of information flow. I can't see how it would handle say a matrix transpose gracefully though: typedef float[4][4] matrix; matrix a; range(int) i = 0..4; range(int) j = 0..4; swap(a[i][j],a[j][i]); Writing memcpy() is hard on today's processors. This job will be far more difficult as what's needed is a program that generates optimized code to execute a loop that deals with alignment and direction and multiple operations instead of writing the one program yourself which only has just copying. I know memset is about half as difficult to write as memcpy... this sounds like 4 or 5 times harder than writing optimized memcpy. The unoptimized version is still useful though and it's not hard to write at all. Walter already has most of what's needed. Sean "Sean L. Palmer" <spalmer iname.com> wrote in message news:abkl59$vpu$1 digitaldaemon.com...Of course. It's what the other types are built out of. THE most fundamental type that has storage. I'm not surprised that it's a special case. All the implementation has to deal with it as an array contained inaseries of fixed-size arrays (the machine word), not as one array. I think you'll find that if you can write bit-packing code such as dynamic bitarraythat SIMD actually uses the same concept and in fact it's just anextensionof the concept of ints being fixed-sized containers of bits with special operations. SIMD types are just fixed-sized containers of bytes or intsorfloats with special operations. Wouldn't it be nice to generalize this entire concept into some kind ofSIMDtype? You could declare typedef bit[[32]] machine_word; or typedef float[[4]] vector; then write shorthand vector a,b,c; a[[]] += b[[]] * c[[]]; I'm using [[]] as shorthand for "everything in not only this dimension but all smaller dimensions as well". The compiler would be able to generate pretty optimal code using SIMD even if the type you declare isn't exactly available as a basic SIMD register type. For instance if you declare typedef float[[6]] vector6; vector a,b,c; a[[]] += b[[]] * c[[]]; and the compiler only had a 4-float SIMD register available it could still use that for part of the operation and either another 4-float SIMDregisteror 2 FPU registers to do the remainder of the work. Sean "Walter" <walter digitalmars.com> wrote in message news:abk9iu$n2j$2 digitaldaemon.com...Thanks for reporting this. The bit type is turning out to be a constant special case in the compiler. "Patrick Down" <pat codemoon.com> wrote in message news:Xns920B8C31915A4patcodemooncom 63.105.9.61...The following causes the compiler to generate: Internal error: ..\ztc\cgcs.c 348 void foo(inout bit[] test) { test[0] = 1; test[1] = 0; test[2] = 1; } int main(char[][] args) { bit[] a; a.length = 32; for(int i = 0; i < a.length; ++i) a[i] = 0; foo(a[3..6]); // foo(a) works ok for(int i = 0; i < a.length; ++i) printf("%d",a[i]); printf("\n"); return 0; }
May 12 2002
"Sean L. Palmer" <spalmer iname.com> wrote in message news:abl6cj$1f1g$1 digitaldaemon.com...typedef float[4] vector; vector a,b,c; a[] += b[] * c[]; which the compiler can easily tell what you intend, but what are the rules for what happens when the sizes of the arrays don't match? What happensifthey're multidimensional?That would be a runtime error.D can also already do this: a[1..3] += b[1..3] * c[1..3]; If one defined a range like so: range int r = 1..3; a[r] += b[r] * c[r]; would work as convenient shorthand.I never thought of that.For multidimensional slicing, it could work as follows: typedef float[4][4] matrix; matrix a,b,c; range(int) i = 1..3; range(int) j = 1..3; a[i][j] = 2.0f * sqrt(b[i][j] + c[i][j+1]);D doesn't currently allow multidimensional slicing, though it's an obvious future extension.The order of evaluation is undefined however it would be nice if one could control this behavior so that overlapping moves do the right thing. It should be possible to figure out whether to go forward or backward foreachrange by evaluating the dependencies in the graph of information flow.The best thing to do with overlapping ranges is disallow them. That cleans up the semantics, and enables better code generation.
May 12 2002
"Walter" <walter digitalmars.com> wrote in news:abm4c0$25l6$1 digitaldaemon.com:The best thing to do with overlapping ranges is disallow them. That cleans up the semantics, and enables better code generation.I understand your reasons for disallowing overlapping ranges but it seems to make the language less consistant. Would it be a problem to have a small check at the begining of a copy to check for overlapping ranges and switch between two differnt copy routines based on the result?
May 12 2002
"Patrick Down" <pat codemoon.com> wrote in message news:Xns920C89BAF31DApatcodemooncom 63.105.9.61..."Walter" <walter digitalmars.com> wrote in news:abm4c0$25l6$1 digitaldaemon.com:That's how memcpy is usually implemented. If everything about the addresses involved is known at compile time, the compiler can select between several different code snippet templates to generate the code, ranging from completely unoptimized all the way down to just completely unrolling the loop. I know this feels like one of the basics, Walter, and I know you're itching to move on to bigger and better things, but the fact that moving blocks of data is one of the basics of computing indicates that it needs robust language support, not special cases. SeanThe best thing to do with overlapping ranges is disallow them. That cleans up the semantics, and enables better code generation.I understand your reasons for disallowing overlapping ranges but it seems to make the language less consistant. Would it be a problem to have a small check at the begining of a copy to check for overlapping ranges and switch between two differnt copy routines based on the result?
May 12 2002
Sean L. Palmer wrote:"Patrick Down" <pat codemoon.com> wrote in message news:Xns920C89BAF31DApatcodemooncom 63.105.9.61...ITYM memmove(); C standard memcpy() has undefined behavior for overlapping ranges, and I'd consider it a violation of the implicit performance contract for memcpy() to check the ranges. -RBI understand your reasons for disallowing overlapping ranges but it seems to make the language less consistant. Would it be a problem to have a small check at the begining of a copy to check for overlapping ranges and switch between two differnt copy routines based on the result?That's how memcpy is usually implemented.
May 12 2002
"Sean L. Palmer" <spalmer iname.com> wrote in message news:abmetg$2e7i$1 digitaldaemon.com...I know this feels like one of the basics, Walter, and I know you'reitchingto move on to bigger and better things, but the fact that moving blocks of data is one of the basics of computing indicates that it needs robust language support, not special cases.You're right, it is a basic thing, and I have not just blown it off. I use memcpy() very regularly, and I depend on it. With a few observations, I think you'll see my reasoning for disallowing overlapping copies, even if you don't agree with it: 1) 99% of memory copies are not overlapping. 2) overall program performance tends to be sensitive to memcpy speed. 3) putting the test in for each copy will sacrifice performance and add bloat to satisfy only 1% of the cases. 4) I've seen memcpy code that *relied* on the wrong way overlapping copy behavior. 5) Because of (4), I think if one needs to do an overlapping copy, one does need to decide just what it means to do an overlapping copy, and explicitly code appropriately. Leaving it implicit can be misleading. 6) In debug builds, checks for overlapping are inserted, so one will not inadvertantly fall prey to copy direction bugs. 7) And most importantly, it gets rid of the "aliasing" problem C has that prevents array operations from being compiled as efficiently as FORTRAN.
May 14 2002
"Walter" <walter digitalmars.com> wrote in message news:abqfqk$574$1 digitaldaemon.com..."Sean L. Palmer" <spalmer iname.com> wrote in message news:abmetg$2e7i$1 digitaldaemon.com...ofI know this feels like one of the basics, Walter, and I know you'reitchingto move on to bigger and better things, but the fact that moving blocksYou are right. I suppose. <g>data is one of the basics of computing indicates that it needs robust language support, not special cases.You're right, it is a basic thing, and I have not just blown it off. I use memcpy() very regularly, and I depend on it. With a few observations, I think you'll see my reasoning for disallowing overlapping copies, even if you don't agree with it: 1) 99% of memory copies are not overlapping. 2) overall program performance tends to be sensitive to memcpy speed. 3) putting the test in for each copy will sacrifice performance and add bloat to satisfy only 1% of the cases.4) I've seen memcpy code that *relied* on the wrong way overlapping copy behavior.I still don't like special cases, but I have actually seen code that used a wrong-way memcpy before too, (though why they didn't just write a pattern fill routine I don't know..) When writing overlapping memcpy, 99% of the time you'll want to do it the "right" way. Leaving it explicit leaves open a possibility for the programmer to accidentally write bugs in what would otherwise be a perfectly formed compiler-generated memory shift. Sample bug: If I scroll up it works, but if I scroll down it destroys all entries with copies of the first entry.5) Because of (4), I think if one needs to do an overlapping copy, onedoesneed to decide just what it means to do an overlapping copy, andexplicitlycode appropriately. Leaving it implicit can be misleading.They can always code an explicit loop if they want explicit control. i.e. want to copy the "wrong" way.6) In debug builds, checks for overlapping are inserted, so one will not inadvertantly fall prey to copy direction bugs.Won't inadvertently fall prey to some bugs at least. I hope your user's program doesn't die when his client tries to copy a record to clipboard, then later tries to paste it back onto the original data... if the cut is lazy you may very well end up with a "copying range exactly onto itself" NO-OP which is caught by a runtime check and causes the program to die. Maybe if start and end of ranges of memory are identical it should fall thru as no-op instead of runtime assert? But that's more runtime checks needed in debug build.7) And most importantly, it gets rid of the "aliasing" problem C has that prevents array operations from being compiled as efficiently as FORTRAN.I've never tried to code a highly optimized backend for arrays. Can you describe this problem in more detail? I'm curious to know what the compiler is probably doing under there... I know aliasing more in terms of bugs that happen when you pass the same argument in as both an input and an output (via two different parameters). If the implementation relies on the two objects not being the same, you get a bug. It prevents certain optimizations from being 100% safe. I prefer to code without aliasing when possible so I can turn up the optimization more. There are many ways to have aliasing, I'm just curious how potential aliasing affects the array optimization problem. Sean
May 14 2002
"Sean L. Palmer" <spalmer iname.com> wrote in message news:abqkno$9cv$1 digitaldaemon.com..."Walter" <walter digitalmars.com> wrote in message news:abqfqk$574$1 digitaldaemon.com...that7) And most importantly, it gets rid of the "aliasing" problem C hascompilerprevents array operations from being compiled as efficiently as FORTRAN.I've never tried to code a highly optimized backend for arrays. Can you describe this problem in more detail? I'm curious to know what theis probably doing under there...Let's say you are trying to compute a=b+c, where a, b and c are arrays. In C, you'd write: for (i = 0; i < n; i++) a[i] = b[i] + c[i]; If a CPU has parallel execution capabilities, they can be exploited to optimize code (such as a vector add instruction). Also, multiple iterations of the loop can be interleaved with the stores delayed. But in C these optimizations cannot be done, because if the arrays overlap then the result of one iteration of the loop can be dependent on the result of the previous iteration.
May 14 2002
"Walter" <walter digitalmars.com> wrote in news:abrcj6$tsq$1 digitaldaemon.com:Let's say you are trying to compute a=b+c, where a, b and c are arrays. In C, you'd write: for (i = 0; i < n; i++) a[i] = b[i] + c[i]; If a CPU has parallel execution capabilities, they can be exploited to optimize code (such as a vector add instruction). Also, multiple iterations of the loop can be interleaved with the stores delayed. But in C these optimizations cannot be done, because if the arrays overlap then the result of one iteration of the loop can be dependent on the result of the previous iteration.So is a[] = a[] + b[]; also going to be classified as and overlapped copy?
May 14 2002
"Patrick Down" <pat codemoon.com> wrote in message news:Xns920E81D566F62patcodemooncom 63.105.9.61..."Walter" <walter digitalmars.com> wrote in news:abrcj6$tsq$1 digitaldaemon.com:Good question. It's not an overlapped copy because each array element is not dependent on any other array element in the same array.Let's say you are trying to compute a=b+c, where a, b and c are arrays. In C, you'd write: for (i = 0; i < n; i++) a[i] = b[i] + c[i]; If a CPU has parallel execution capabilities, they can be exploited to optimize code (such as a vector add instruction). Also, multiple iterations of the loop can be interleaved with the stores delayed. But in C these optimizations cannot be done, because if the arrays overlap then the result of one iteration of the loop can be dependent on the result of the previous iteration.So is a[] = a[] + b[]; also going to be classified as and overlapped copy?
May 14 2002
And if you can't guarantee that they don't overlap, then the optimization can't be done safely. Gotcha. Usually there'll be one of two situations: Either the compiler can tell that they don't overlap at compile time, or it can't. If it can't, it's worth spending a few instructions checking so it can do the optimized inner loop in the likely situation that they don't in fact overlap. It's probably gotta check for alignment anyway to use SIMD so it may as well check overlap as well. If they overlap, or if the alignment sucks, do the unoptimized loop. If the array sizes are large, the overhead of the checks will be negligible in comparison. If the array sizes are small, well, too bad there's nothing we can do to help. Or just disallow overlap. Problem solved, albeit with a sledgehammer. Sean "Walter" <walter digitalmars.com> wrote in message news:abrcj6$tsq$1 digitaldaemon.com..."Sean L. Palmer" <spalmer iname.com> wrote in message news:abqkno$9cv$1 digitaldaemon.com...FORTRAN."Walter" <walter digitalmars.com> wrote in message news:abqfqk$574$1 digitaldaemon.com...that7) And most importantly, it gets rid of the "aliasing" problem C hasprevents array operations from being compiled as efficiently asiterationsI've never tried to code a highly optimized backend for arrays. Can you describe this problem in more detail? I'm curious to know what thecompileris probably doing under there...Let's say you are trying to compute a=b+c, where a, b and c are arrays. In C, you'd write: for (i = 0; i < n; i++) a[i] = b[i] + c[i]; If a CPU has parallel execution capabilities, they can be exploited to optimize code (such as a vector add instruction). Also, multipleof the loop can be interleaved with the stores delayed. But in C these optimizations cannot be done, because if the arrays overlap then theresultof one iteration of the loop can be dependent on the result of thepreviousiteration.
May 14 2002
"Walter" <walter digitalmars.com> wrote in message news:abm4c0$25l6$1 digitaldaemon.com...couldThe order of evaluation is undefined however it would be nice if oneI'd weigh carefully before making it illegal. People use memcpy and strcpy with overlapping all the time. I'd strongly suggest trying to make the compiler "do the right thing". I suppose someone could always write their own (probably non-optimal) move function that deals with this. There are simple rules for it for 1d overlapping array copies: if source doesn't overlap destination, do it however you please, i.e. whatever's fastest. if source begin is equal to destination begin, nothing needs to be done so long as you're just copying (math ops would still need to be done) if source begin is less than destination begin, iterate from the end to the start if source begin is greater than destination begin, iterate from the start to the end I think these rules could be easily extended to multiple dimensions, by applying the same rule separately for each dimension. If you ignore these rules, and do it the other way, what it actually accomplishes is a "pattern fill" where the pattern size is equal to the offset between the starts of the ranges. But that's not what the user asked for, they wanted the entire range copied intact. One could see problems arise when the copy was to be done using a larger memory unit than the size of the type, such as copy using 128-bit blocks. But those kinds of copies usually have to be aligned anyway so the compiler would have to know the addresses at compile time or a "copy dispatch" wrapper would have had to decide that the large block move is ok. If not, it does the slower "one item at a time" move. Here again we have an arbitrary decision to limit a language feature just because it makes the compiler implementation a bit easier. At least with your solution we don't end up with a language that doesn't specify the behavior ("implementation-defined" ala C++) ;) Seancontrol this behavior so that overlapping moves do the right thing. It should be possible to figure out whether to go forward or backward foreachrange by evaluating the dependencies in the graph of information flow.The best thing to do with overlapping ranges is disallow them. That cleans up the semantics, and enables better code generation.
May 12 2002
"Sean L. Palmer" <spalmer iname.com> wrote in message news:abl6cj$1f1g$1 digitaldaemon.com...I've been thinking about this and it seems that it might be better toextendthe slicing mechanism D already has. D already has: typedef float[4] vector; vector a,b,c; a[] += b[] * c[]; which the compiler can easily tell what you intend, but what are the rules for what happens when the sizes of the arrays don't match? What happensifthey're multidimensional? D can also already do this: a[1..3] += b[1..3] * c[1..3]; If one defined a range like so: range int r = 1..3; a[r] += b[r] * c[r]; would work as convenient shorthand.Yup. That is one of the nice things you can do to extend the syntax once you have ranges. Another is to symplify the common case of loop variables incrementing by 1. range int r = 1..10; For (r) . . . etc. And, since they are defined at compile time, the range variable name can be used to specify the size of an array. There are a lot of advantages in maintainability since you can change the range in one place and it automaically gets all the places. -- - Stephen Fuld e-mail address disguised to prevent spam
May 12 2002
Stephen Fuld wrote:"Sean L. Palmer" <spalmer iname.com> wrote in message news:abl6cj$1f1g$1 digitaldaemon.com...Eeek! I would beg for a distinct keyword like "foreach" in both those cases to make it totally clear that these were vectorized/iterated/looped operations in cases where the range definition was far away from the range usage: range int r = 1..3; foreach (r) { a[r] += b[r] * c[r]; } Other than that, it's good. -RBIf one defined a range like so: range int r = 1..3; a[r] += b[r] * c[r]; would work as convenient shorthand.Yup. That is one of the nice things you can do to extend the syntax once you have ranges. Another is to symplify the common case of loop variables incrementing by 1. range int r = 1..10; For (r) . . .
May 12 2002