digitalmars.D - RFC: std.uuid
- Johannes Pfau (30/30) Dec 22 2011 I've finished the port of boost.uuid to D and I'd hope to get some initi...
- Vladimir Panteleev (4/11) Dec 22 2011 Would it be hard to reimplement the ragel part using string
- Johannes Pfau (8/20) Dec 22 2011 It wouldn't be too difficult. Ragel is only used to parse the text form ...
- Vladimir Panteleev (8/30) Dec 22 2011 Using 3rd-party technology such as source preprocessors makes
- Andrei Alexandrescu (4/33) Dec 22 2011 Agreed. I think it would be wonderful to achieve ragel capabilities with...
- Johannes Pfau (16/26) Dec 23 2011 I must be missing something. Your talking of mixins? Ragel is not a
- Andrei Alexandrescu (4/12) Dec 23 2011 I'm talking about ragel. D should have enough capabilities to generate
- Johannes Pfau (5/23) Dec 23 2011 I meant especially this sentence: "Considering that the language provide...
- Johannes Pfau (4/16) Dec 23 2011 I uploaded a new version at
- Piotr Szturmaj (22/33) Dec 23 2011 I want to contribute it to Phobos. I will be working on a project which
- Johannes Pfau (8/52) Dec 23 2011 I read this discussion as well. But std.uuid really benefits from having...
- Piotr Szturmaj (9/10) Dec 23 2011 Weakly pure, yes - but for what?. Btw. I just gave a try at compile time...
- Jonathan M Davis (4/8) Dec 23 2011 In general, if a function _can_ be pure, it _should_ be pure. If it can ...
- Piotr Szturmaj (3/10) Dec 23 2011 Yes, Johannes probably want to mark uuid hash gen as pure. I just wanted...
- Jonathan M Davis (7/9) Dec 23 2011 We really need to find a way for the C memory functions to be considered...
- Vladimir Panteleev (7/22) Dec 24 2011 Where does your code use memcpy? I see one mention in the
- Piotr Szturmaj (3/24) Dec 24 2011 I converted memcpy calls to array copy but it become about 1 Mbps slower...
- Vladimir Panteleev (4/15) Dec 24 2011 I guess array copy is currently a runtime call rather than an
- Vladimir Panteleev (4/11) Dec 24 2011 That's strange. I've tried optimizing some of my code today, and
- Piotr Szturmaj (4/13) Dec 26 2011 Yes. Here are the results: http://pastebin.com/rD8kiaQy. This is
- Vladimir Panteleev (7/9) Dec 28 2011 I'd be more interested in seeing the code.
- Piotr Szturmaj (4/12) Jan 06 2012 Sorry for late answer. For memcpy cases code is the same as in my github...
- Vladimir Panteleev (9/20) Jan 06 2012 I haven't looked at the disassembly yet, but I'd suggest to
I've finished the port of boost.uuid to D and I'd hope to get some initial feedback. Quoting the module documentation: This is a port of boost.uuid from the boost project with some minor additions and API changes for a more D-like API. A UUID, or Universally unique identifier, is intended to uniquely identify information in a distributed environment without significant central coordination. It can be used to tag objects with very short lifetimes, or to reliably identify very persistent objects across a network [...] Documentation: http://dl.dropbox.com/u/24218791/d/src/uuid.html Source Code: https://github.com/jpf91/phobos/blob/std.uuid/std/uuid.rl https://github.com/jpf91/phobos/blob/std.uuid/std/uuid.d There's one special thing about this module: It uses the ragel (http://www.complang.org/ragel/) state machine compiler. I hope this is not a problem for phobos, ragel doesn't introduce any runtime dependencies and has no effect on licensing. There's also no need to integrate ragel in the build process. We can publish a pregenerated .d file, which should be updated manually whenever changes to uuid.rl are made. This module also depends on Piotr Szturmaj's crypto library to generate level 3&5 UUIDS. The code for this is written, but wouldn't be included in phobos until official SHA1 and MD5 implementations are in phobos. Swapping the MD5/SHA1 implementations against a different implementation should be very easy. Some things I'd especially like feedback for: * I'd really like to get suggestions for type/function names. Should the UUID struct be UUID/uuid/Uuid ? * the names nameMD5UUID/nameSHAUUID look especially ugly. ideas? * comments on typos/language etc
Dec 22 2011
On Thursday, 22 December 2011 at 23:12:07 UTC, Johannes Pfau wrote:There's one special thing about this module: It uses the ragel (http://www.complang.org/ragel/) state machine compiler. I hope this is not a problem for phobos, ragel doesn't introduce any runtime dependencies and has no effect on licensing. There's also no need to integrate ragel in the build process. We can publish a pregenerated .d file, which should be updated manually whenever changes to uuid.rl are made.Would it be hard to reimplement the ragel part using string mixins? From a quick glance at it, it doesn't seem too difficult.
Dec 22 2011
Vladimir Panteleev wrote:On Thursday, 22 December 2011 at 23:12:07 UTC, Johannes Pfau wrote:It wouldn't be too difficult. Ragel is only used to parse the text form of uuids. As that's a pretty simple task, such a parser could be written manually. But I also have a almost complete HTTP header parser/formatter implementation using ragel which I wanted to propose for phobos eventually. And I'm not going to rewrite those, so I hoped std.uuid could clear the way for ragel based code in phobos ;-)There's one special thing about this module: It uses the ragel (http://www.complang.org/ragel/) state machine compiler. I hope this is not a problem for phobos, ragel doesn't introduce any runtime dependencies and has no effect on licensing. There's also no need to integrate ragel in the build process. We can publish a pregenerated .d file, which should be updated manually whenever changes to uuid.rl are made.Would it be hard to reimplement the ragel part using string mixins? From a quick glance at it, it doesn't seem too difficult.
Dec 22 2011
On Thursday, 22 December 2011 at 23:30:06 UTC, Johannes Pfau wrote:Vladimir Panteleev wrote:Using 3rd-party technology such as source preprocessors makes future contributions more difficult for the vast majority. Considering that the language provides nearly the same thing as a fully-supported feature, it seems like a redundant complication. Parsing/formatting HTTP headers isn't exactly rocket-science, either.On Thursday, 22 December 2011 at 23:12:07 UTC, Johannes Pfau wrote:It wouldn't be too difficult. Ragel is only used to parse the text form of uuids. As that's a pretty simple task, such a parser could be written manually. But I also have a almost complete HTTP header parser/formatter implementation using ragel which I wanted to propose for phobos eventually. And I'm not going to rewrite those, so I hoped std.uuid could clear the way for ragel based code in phobos ;-)There's one special thing about this module: It uses the ragel (http://www.complang.org/ragel/) state machine compiler. I hope this is not a problem for phobos, ragel doesn't introduce any runtime dependencies and has no effect on licensing. There's also no need to integrate ragel in the build process. We can publish a pregenerated .d file, which should be updated manually whenever changes to uuid.rl are made.Would it be hard to reimplement the ragel part using string mixins? From a quick glance at it, it doesn't seem too difficult.
Dec 22 2011
On 12/22/11 5:38 PM, Vladimir Panteleev wrote:On Thursday, 22 December 2011 at 23:30:06 UTC, Johannes Pfau wrote:Agreed. I think it would be wonderful to achieve ragel capabilities with compile-time execution. If we need ragel in D we've failed. AndreiVladimir Panteleev wrote:Using 3rd-party technology such as source preprocessors makes future contributions more difficult for the vast majority. Considering that the language provides nearly the same thing as a fully-supported feature, it seems like a redundant complication. Parsing/formatting HTTP headers isn't exactly rocket-science, either.On Thursday, 22 December 2011 at 23:12:07 UTC, Johannes Pfau wrote:It wouldn't be too difficult. Ragel is only used to parse the text form of uuids. As that's a pretty simple task, such a parser could be written manually. But I also have a almost complete HTTP header parser/formatter implementation using ragel which I wanted to propose for phobos eventually. And I'm not going to rewrite those, so I hoped std.uuid could clear the way for ragel based code in phobos ;-)There's one special thing about this module: It uses the ragel (http://www.complang.org/ragel/) state machine compiler. I hope this is not a problem for phobos, ragel doesn't introduce any runtime dependencies and has no effect on licensing. There's also no need to integrate ragel in the build process. We can publish a pregenerated .d file, which should be updated manually whenever changes to uuid.rl are made.Would it be hard to reimplement the ragel part using string mixins? From a quick glance at it, it doesn't seem too difficult.
Dec 22 2011
Andrei Alexandrescu wrote:On 12/22/11 5:38 PM, Vladimir Panteleev wrote:I must be missing something. Your talking of mixins? Ragel is not a macro/mixin system, it's a state machine compiler, compiling a 20 line machine definition into 700 line fast, goto based code? Or are you talking about the new regex? I seriously doubt that any regex implementation can compare with ragel generated code in terms of performance. (Not important for the uuid id parser, but if you ever want to implement a fast http server...)Using 3rd-party technology such as source preprocessors makes future contributions more difficult for the vast majority. Considering that the language provides nearly the same thing as a fully-supported featureNope, but it gets exhausting if you write all parsers by hand (and it's more probable that you introduce bugs). Mongrel (ruby http server) uses ragel, lighttpd2 (small, fast webserver) does as well. Also how can writing a state machine definition which looks exactly like the BNF in the spec be more difficult than writing a parser manually?it seems like a redundant complication. Parsing/formatting HTTP headers isn't exactly rocket-science, either.Agreed. I think it would be wonderful to achieve ragel capabilities with compile-time execution. If we need ragel in D we've failed. AndreiI agree that a state machine compiler in D using ctfe would be awesome, but currently there is none. If ragel based code isn't acceptable for phobos at all, I'll just rewrite the uuid parsers using for, switch and to!ubyte.
Dec 23 2011
On 12/23/11 2:40 AM, Johannes Pfau wrote:Andrei Alexandrescu wrote:I'm talking about ragel. D should have enough capabilities to generate 700 lines of fast code from a 20-line spec. AndreiOn 12/22/11 5:38 PM, Vladimir Panteleev wrote:I must be missing something. Your talking of mixins? Ragel is not a macro/mixin system, it's a state machine compiler, compiling a 20 line machine definition into 700 line fast, goto based code?Using 3rd-party technology such as source preprocessors makes future contributions more difficult for the vast majority. Considering that the language provides nearly the same thing as a fully-supported feature
Dec 23 2011
Andrei Alexandrescu wrote:On 12/23/11 2:40 AM, Johannes Pfau wrote:I meant especially this sentence: "Considering that the language provides nearly the same thing as a fully-supported feature", what is this feature? I can only think of string mixins, which sure allows to implement a state machine compiler in ctfe, but it's not "nearly the same"Andrei Alexandrescu wrote:I'm talking about ragel. D should have enough capabilities to generate 700 lines of fast code from a 20-line spec. AndreiOn 12/22/11 5:38 PM, Vladimir Panteleev wrote:I must be missing something. Your talking of mixins? Ragel is not a macro/mixin system, it's a state machine compiler, compiling a 20 line machine definition into 700 line fast, goto based code?Using 3rd-party technology such as source preprocessors makes future contributions more difficult for the vast majority. Considering that the language provides nearly the same thing as a fully-supported feature
Dec 23 2011
Vladimir Panteleev wrote:On Thursday, 22 December 2011 at 23:12:07 UTC, Johannes Pfau wrote:I uploaded a new version at https://github.com/jpf91/phobos/blob/std.uuid/std/uuid.d which gets rid of the ragel dependency.There's one special thing about this module: It uses the ragel (http://www.complang.org/ragel/) state machine compiler. I hope this is not a problem for phobos, ragel doesn't introduce any runtime dependencies and has no effect on licensing. There's also no need to integrate ragel in the build process. We can publish a pregenerated .d file, which should be updated manually whenever changes to uuid.rl are made.Would it be hard to reimplement the ragel part using string mixins? From a quick glance at it, it doesn't seem too difficult.
Dec 23 2011
Johannes Pfau wrote:I've finished the port of boost.uuid to D and I'd hope to get some initial feedback.Very nice. I will need UUIDs in one of my D projects :)This module also depends on Piotr Szturmaj's crypto library to generate level 3&5 UUIDS. The code for this is written, but wouldn't be included in phobos until official SHA1 and MD5 implementations are in phobos. Swapping the MD5/SHA1 implementations against a different implementation should be very easy.I want to contribute it to Phobos. I will be working on a project which will make extensive use of cryptography. So if I'm about to write D crypto code anyway, I thought it might be better to contribute it to std (if everyone would like it). There are couple of issues though: * there is a pull request with SHA1 implementation using SSSE3. But it is only SHA1. My implementation contains all SHA flavors up to SHA-512 without SHA-0 (which is broken). I think we should combine these implementations to get the best of both. * comments about side-channel vurnelability. I think each crypto primitive should have a note in the docs if its vurnelable or not. That should be enough IMHO. It is impractical to make it safe on all platforms - no single general purpose crypto library is 100% safe against side channel attacks. * it is not finished yet. Currently there are no ciphers, only hashes. * after reading some posts in "Early std.crypto" thread I don't know if it is still welcome to Phobos. I need a "green light" first.Some things I'd especially like feedback for: * I'd really like to get suggestions for type/function names. Should the UUID struct be UUID/uuid/Uuid ?UUID is the standard name. It is a shortcut similar to "UTF" which in Phobos is uppercase.* the names nameMD5UUID/nameSHAUUID look especially ugly. ideas?uuidMD5 / uuidSHA1 ?
Dec 23 2011
Piotr Szturmaj wrote:Johannes Pfau wrote:I read this discussion as well. But std.uuid really benefits from having sha and md5 at compile time, so using a C library as proposed in that thread would be bad for std.uuid. I hope you'll get your crypto code into phobos ;-) Related question to the SHA/MD5 hash functions: could those be pure?I've finished the port of boost.uuid to D and I'd hope to get some initial feedback.Very nice. I will need UUIDs in one of my D projects :)This module also depends on Piotr Szturmaj's crypto library to generate level 3&5 UUIDS. The code for this is written, but wouldn't be included in phobos until official SHA1 and MD5 implementations are in phobos. Swapping the MD5/SHA1 implementations against a different implementation should be very easy.I want to contribute it to Phobos. I will be working on a project which will make extensive use of cryptography. So if I'm about to write D crypto code anyway, I thought it might be better to contribute it to std (if everyone would like it). There are couple of issues though: * there is a pull request with SHA1 implementation using SSSE3. But it is only SHA1. My implementation contains all SHA flavors up to SHA-512 without SHA-0 (which is broken). I think we should combine these implementations to get the best of both. * comments about side-channel vurnelability. I think each crypto primitive should have a note in the docs if its vurnelable or not. That should be enough IMHO. It is impractical to make it safe on all platforms - no single general purpose crypto library is 100% safe against side channel attacks. * it is not finished yet. Currently there are no ciphers, only hashes. * after reading some posts in "Early std.crypto" thread I don't know if it is still welcome to Phobos. I need a "green light" first.OKSome things I'd especially like feedback for: * I'd really like to get suggestions for type/function names. Should the UUID struct be UUID/uuid/Uuid ?UUID is the standard name. It is a shortcut similar to "UTF" which in Phobos is uppercase.that's definitely better. I think I'll use that.* the names nameMD5UUID/nameSHAUUID look especially ugly. ideas?uuidMD5 / uuidSHA1 ?
Dec 23 2011
Johannes Pfau wrote:Related question to the SHA/MD5 hash functions: could those be pure?Weakly pure, yes - but for what?. Btw. I just gave a try at compile time evaluation of these hashes. I got rid of memcpy but then endianness issues arise. According to CTFE docs: "non-portable casts (eg, from int[] to float[]), including casts which depend on endianness, are not permitted." and the sad part is that hash source code involves endianness. So, there will be no compile-time hash support unless compiler would allow casting from uint[] to ubyte[] or uint* to ubyte*.
Dec 23 2011
On Friday, December 23, 2011 23:09:32 Piotr Szturmaj wrote:Johannes Pfau wrote:In general, if a function _can_ be pure, it _should_ be pure. If it can be and it isn't, it artificially restricts the types of functions which can call it. - Jonathan M DavisRelated question to the SHA/MD5 hash functions: could those be pure?Weakly pure, yes - but for what?
Dec 23 2011
Jonathan M Davis wrote:On Friday, December 23, 2011 23:09:32 Piotr Szturmaj wrote:Yes, Johannes probably want to mark uuid hash gen as pure. I just wanted to know if its something important as my code used memcpy which is impure.Johannes Pfau wrote:In general, if a function _can_ be pure, it _should_ be pure. If it can be and it isn't, it artificially restricts the types of functions which can call it.Related question to the SHA/MD5 hash functions: could those be pure?Weakly pure, yes - but for what?
Dec 23 2011
On Saturday, December 24, 2011 01:31:46 Piotr Szturmaj wrote:Yes, Johannes probably want to mark uuid hash gen as pure. I just wanted to know if its something important as my code used memcpy which is impure.We really need to find a way for the C memory functions to be considered pure like the D ones are short of having to use casts inside functions using them. Appender has the same problem for the same reason, I believe, which makes a lot of functions in Phobos not be able to be pure when they should be able to be. - Jonathan M Davis
Dec 23 2011
On Saturday, 24 December 2011 at 00:31:43 UTC, Piotr Szturmaj wrote:Jonathan M Davis wrote:Where does your code use memcpy? I see one mention in the comments, but none in the code. Anyway, I believe you can do without memcpy by using array copy? Array copy might even be faster, since memcpy is not a DMD compiler intrinsic like in many C/C++ compilers.On Friday, December 23, 2011 23:09:32 Piotr Szturmaj wrote:Yes, Johannes probably want to mark uuid hash gen as pure. I just wanted to know if its something important as my code used memcpy which is impure.Johannes Pfau wrote:In general, if a function _can_ be pure, it _should_ be pure. If it can be and it isn't, it artificially restricts the types of functions which can call it.Related question to the SHA/MD5 hash functions: could those be pure?Weakly pure, yes - but for what?
Dec 24 2011
Vladimir Panteleev wrote:On Saturday, 24 December 2011 at 00:31:43 UTC, Piotr Szturmaj wrote:See putArray() in base.dJonathan M Davis wrote:Where does your code use memcpy? I see one mention in the comments, but none in the code.On Friday, December 23, 2011 23:09:32 Piotr Szturmaj wrote:Yes, Johannes probably want to mark uuid hash gen as pure. I just wanted to know if its something important as my code used memcpy which is impure.Johannes Pfau wrote:In general, if a function _can_ be pure, it _should_ be pure. If it can be and it isn't, it artificially restricts the types of functions which can call it.Related question to the SHA/MD5 hash functions: could those be pure?Weakly pure, yes - but for what?Anyway, I believe you can do without memcpy by using array copy? Array copy might even be faster, since memcpy is not a DMD compiler intrinsic like in many C/C++ compilers.I converted memcpy calls to array copy but it become about 1 Mbps slower.
Dec 24 2011
On Sunday, 25 December 2011 at 01:08:04 UTC, Piotr Szturmaj wrote:Sorry, I lost track of the conversation. I was looking at uuid.d.Where does your code use memcpy? I see one mention in the comments, but none in the code.See putArray() in base.dI guess array copy is currently a runtime call rather than an intrinsic, then...Anyway, I believe you can do without memcpy by using array copy? Array copy might even be faster, since memcpy is not a DMD compiler intrinsic like in many C/C++ compilers.I converted memcpy calls to array copy but it become about 1 Mbps slower.
Dec 24 2011
On Sunday, 25 December 2011 at 01:08:04 UTC, Piotr Szturmaj wrote:That's strange. I've tried optimizing some of my code today, and changing slice copies to memcpy had the opposite effect. You are benchmarking with -O -release -inline, right?Anyway, I believe you can do without memcpy by using array copy? Array copy might even be faster, since memcpy is not a DMD compiler intrinsic like in many C/C++ compilers.I converted memcpy calls to array copy but it become about 1 Mbps slower.
Dec 24 2011
Vladimir Panteleev wrote:On Sunday, 25 December 2011 at 01:08:04 UTC, Piotr Szturmaj wrote:That should be MBps.Anyway, I believe you can do without memcpy by using array copy? Array copy might even be faster, since memcpy is not a DMD compiler intrinsic like in many C/C++ compilers.I converted memcpy calls to array copy but it become about 1 Mbps slower.That's strange. I've tried optimizing some of my code today, and changing slice copies to memcpy had the opposite effect. You are benchmarking with -O -release -inline, right?Yes. Here are the results: http://pastebin.com/rD8kiaQy. This is observed only with Windows DMD.
Dec 26 2011
On Monday, 26 December 2011 at 17:37:17 UTC, Piotr Szturmaj wrote:Yes. Here are the results: http://pastebin.com/rD8kiaQy. This is observed only with Windows DMD.I'd be more interested in seeing the code. I've done some more research on this. In release builds, DMD on Windows emits a memcpy call for a slice copy. However, the auto-generated memcpy call has slightly less overhead (register/stack shuffling) than a manual memcpy call, which explains the performance difference I was seeing.
Dec 28 2011
Vladimir Panteleev wrote:On Monday, 26 December 2011 at 17:37:17 UTC, Piotr Szturmaj wrote:Sorry for late answer. For memcpy cases code is the same as in my github Phobos fork. Here is the change to slice copying: http://pastebin.com/EteqEperYes. Here are the results: http://pastebin.com/rD8kiaQy. This is observed only with Windows DMD.I'd be more interested in seeing the code.I've done some more research on this. In release builds, DMD on Windows emits a memcpy call for a slice copy. However, the auto-generated memcpy call has slightly less overhead (register/stack shuffling) than a manual memcpy call, which explains the performance difference I was seeing.
Jan 06 2012
On Friday, 6 January 2012 at 21:10:50 UTC, Piotr Szturmaj wrote:Vladimir Panteleev wrote:I haven't looked at the disassembly yet, but I'd suggest to rewrite your code so that the left side of the assignment is a slice expression beginning with 0. I think DMD will generate optimal code (memcpy with slightly less overhead than a manual call) if you make it clear to the compiler that the left-hand slice and the right-hand slice have the same length. Also, it looks like the slice version wastes an extra variable (bw).On Monday, 26 December 2011 at 17:37:17 UTC, Piotr Szturmaj wrote:Sorry for late answer. For memcpy cases code is the same as in my github Phobos fork. Here is the change to slice copying: http://pastebin.com/EteqEperYes. Here are the results: http://pastebin.com/rD8kiaQy. This is observed only with Windows DMD.I'd be more interested in seeing the code.
Jan 06 2012