digitalmars.D - The review of std.hash package
- Dmitry Olshansky (28/28) Aug 07 2012 Since the review queue has been mostly silent again I've decided to
- David (19/19) Aug 07 2012 Is the API already set in stone?
- David (2/5) Aug 07 2012 Well, it's there to implement the Range interface, but still, put
- Jonathan M Davis (5/6) Aug 07 2012 No. That's the main point of the review process. The API needs to be rev...
- Johannes Pfau (66/94) Aug 07 2012 No, as Jonathan already mentioned reviewing the API is an important
- David (45/134) Aug 07 2012 Ok this point with the one above makes sense (I implemented my OpenSSL
- Johannes Pfau (50/145) Aug 08 2012 You mean an external function which constructs the digest context and
- Piotr Szturmaj (14/15) Aug 07 2012 I tried it before but I wanted to create whole crypto package at once,
- Johannes Pfau (43/61) Aug 08 2012 I'm sorry, I didn't want to conceal your work. What I meant with
- Piotr Szturmaj (21/68) Aug 08 2012 Yes, there should be bcrypt, scrypt and PBKDF2.
- Johannes Pfau (25/53) Aug 08 2012 Great! I always tried the *endianToNative and nativeTo*Endian functions.
- Jonathan M Davis (5/16) Aug 08 2012 What's wrong with the *endianToNative and nativeTo*Endian functions? The...
- Johannes Pfau (6/12) Aug 08 2012 in CTFE?
- Jonathan M Davis (15/30) Aug 08 2012 No. It wouldn't work in CTFE, because it uses a union. But what it's try...
- Johannes Pfau (7/34) Aug 09 2012 I completely agree, but this is true for hashes. Once the final hash
- Johannes Pfau (3/5) Aug 08 2012 Wow, I didn't know about scrypt. Seems to be pretty cool.
- Ary Manzana (10/21) Aug 07 2012 I think "std.crypto" is a better name for the package. At first I
- deadalnix (6/31) Aug 07 2012 You'll find very hard to convince anyone that crc32 is a cryptographic
- Johannes Pfau (15/20) Aug 07 2012 And there will hopefully be more hashes in std.hash a some point.
- Jonathan M Davis (4/7) Aug 07 2012 That doesn't fly, because crc32 is going to be in there, and while it's ...
- Regan Heath (6/12) Aug 08 2012 std.digest then?
- H. S. Teoh (7/18) Aug 08 2012 [...]
- Tobias Pankrath (7/27) Aug 08 2012 -1
- Regan Heath (11/34) Aug 08 2012 That's exactly what it's supposed to suggest. The algorithm does digest...
- Tobias Pankrath (9/17) Aug 08 2012 So at least this implies that hash function is the more general
- Regan Heath (14/31) Aug 08 2012 I could have, but I didn't read that far :p I knew what I was looking f...
- travert phare.normalesup.org (Christophe Travert) (5/9) Aug 08 2012 I think the question is: is std.hash going to contain only
- Chris Cain (6/12) Aug 08 2012 Even if that were the case, I'd say they should be kept separate.
- Regan Heath (12/22) Aug 08 2012 I don't think there is any reason to separate them. People should know ...
- travert phare.normalesup.org (Christophe Travert) (14/27) Aug 08 2012 They should not be categorized the same. I don't expect a regular hash
- Chris Cain (38/46) Aug 08 2012 In this case, I'm not suggesting keep them separate to not
- travert phare.normalesup.org (Christophe Travert) (23/23) Aug 08 2012 "Chris Cain" , dans le message (digitalmars.D:174477), a écrit :
- Regan Heath (30/51) Aug 08 2012 t
- Chris Cain (17/33) Aug 08 2012 Actually, maybe I'm the one not doing a good job of explaining.
- Johannes Pfau (4/11) Aug 08 2012 std.hash.digest doesn't sound too bad. We could have std.hash.func (or
- Jonathan M Davis (3/14) Aug 08 2012 I say just keep at simple and leave it at std.hash. It's plenty clear IM...
- Andrei Alexandrescu (5/6) Aug 08 2012 Not clear to quite a few of us. IMHO it just makes us seem (to the
- Jonathan M Davis (7/14) Aug 08 2012 I prefer std.hash to std.digest, but I don't necessarily care all that m...
- RivenTheMage (11/14) Aug 15 2012 Three basic types of hash functions are:
- David Nadlinger (6/13) Aug 15 2012 Why? 1) might have a different interface than the others, but 2)
- RivenTheMage (10/15) Aug 15 2012 The "only" difference between 2) and 3) is a big difference.
- RivenTheMage (6/6) Aug 15 2012 Another example is a systematic error-correcting codes. The
- =?ISO-8859-1?Q?Jos=E9_Armando_Garc=EDa_Sancio?= (14/19) Aug 15 2012 Some people's point is that MD5 was consider a cryptographic digest
- ReneSac (10/31) Aug 15 2012 I agree that MD5 isn't cryptographically secure anymore, but it
- =?ISO-8859-1?Q?Jos=E9_Armando_Garc=EDa_Sancio?= (13/15) Aug 15 2012 Thats because it is a "password module" and nobody or a small
- RivenTheMage (19/23) Aug 15 2012 In turn, that's because CRC is not not a crytographic hash and
- RivenTheMage (2/5) Aug 15 2012 I forgot that this case is already covered by reduce!(...)
- Dmitry Olshansky (11/38) Aug 08 2012 You still can use say crc32 as normal hash function for some binary
- Dmitry Olshansky (4/13) Aug 08 2012 Damned spellcheckers: desperate -> disparate
- Andrei Alexandrescu (3/4) Aug 08 2012 Yes please.
- Dmitry Olshansky (11/13) Aug 07 2012 There is std.container, so unambiguous for me.
- Walter Bright (3/6) Aug 07 2012 The hash functions must use a Range interface, not a file interface.
- Johannes Pfau (7/15) Aug 08 2012 I guess this is meant as a general statement and not specifically
- Walter Bright (20/35) Aug 08 2012 It should accept an input range. But using an Output Range confuses me. ...
- Piotr Szturmaj (6/10) Aug 08 2012 Suppose you have a callback that will give you blocks of bytes to hash.
- Walter Bright (2/13) Aug 08 2012 Have the callback supply a range interface to call the hash with.
- Martin Nowak (17/18) Aug 08 2012 --------
- Walter Bright (2/5) Aug 08 2012 See the discussion on using reduce().
- Johannes Pfau (39/47) Aug 09 2012 I just don't understand it. Let's take the example by Martin Nowak and
- Walter Bright (5/52) Aug 09 2012 That isn't a problem, the internal state can be private data for the str...
- deadalnix (3/50) Aug 16 2012 I'm pretty sure it is possible to pad and finish when a result is
- Regan Heath (13/72) Aug 17 2012 d
- deadalnix (2/40) Aug 08 2012 That is a really good point. +1
- Martin Nowak (3/16) Aug 08 2012 I think sha1Of/digest!SHA1 should do this.
- Walter Bright (4/25) Aug 08 2012 Take a look at the reduce function in
- Johannes Pfau (17/24) Aug 08 2012 This can only work if the final state is valid as an initial state.
- Walter Bright (5/7) Aug 08 2012 The idea is to have hash act like a component - not with special added c...
- Johannes Pfau (32/43) Aug 09 2012 Please explain that. Nobody's going to simply replace a call to reduce
- David Nadlinger (18/22) Aug 09 2012 Sorry, but I think this is a meaningless statement without
-
Regan Heath
(51/71)
Aug 09 2012
On Thu, 09 Aug 2012 10:59:47 +0100, David Nadlinger
... - Dmitry Olshansky (20/30) Aug 09 2012 struct ShaState
- Dmitry Olshansky (6/22) Aug 09 2012 Too fast.. should have been:
- David Nadlinger (8/11) Aug 09 2012 I have been thinking about using AliasThis as well, but the
- Jonathan M Davis (10/22) Aug 09 2012 Yeah. alias this can be very useful, but it's very dangerous when it com...
- Walter Bright (15/35) Aug 09 2012 It is not a meaningless statement in that components have a predictable ...
- travert phare.normalesup.org (Christophe Travert) (21/21) Aug 09 2012 If a has is a range, it's an output range, because it's something you
- Walter Bright (3/3) Aug 11 2012 See the new thread Andrei started entitled "finish function for output r...
- Kagamin (3/9) Aug 15 2012 An example of stateless hash in .net:
- Dmitry Olshansky (5/14) Aug 15 2012 AFAIK it'a method of HashAlgorithm Object.
- Kagamin (6/7) Aug 15 2012 It's a minor design detail, see the example: the method is called
- David Nadlinger (10/14) Aug 15 2012 No, it's not a »minor design detail«, at least not regarding
- Dmitry Olshansky (6/12) Aug 15 2012 Brrr. It's how convenience wrapper works :)
- Kagamin (10/15) Aug 15 2012 Well there was a wish for stateless hash, Walter even posted the
- David Nadlinger (5/8) Aug 15 2012 And our point is that such an interface is trivial to implement
- Dmitry Olshansky (6/17) Aug 15 2012 auto result = file.byChunk(4096 * 1025).joiner.digest();
- David Nadlinger (3/10) Aug 15 2012 http://msdn.microsoft.com/en-us/library/system.security.cryptography.has...
- Kagamin (5/16) Aug 15 2012 Ok, but HashAlgorithm still supports stateless interface which
- Johannes Pfau (69/98) Aug 08 2012 auto crc = crc32Of(data);
- Johannes Pfau (16/19) Aug 08 2012 I implemented the function, it's actually quite simple:
- travert phare.normalesup.org (Christophe Travert) (3/5) Aug 08 2012 Don't overload the function taking a void[][]. Remplace it. void[][] is
- Walter Bright (3/22) Aug 08 2012 I don't know what you mean, it takes a range, not a void[][] as input.
- Johannes Pfau (6/37) Aug 08 2012 So the post in D.learn for a detailed description. Yes the code I
- Walter Bright (3/7) Aug 08 2012 Have the templated version with overloads simply call the single version...
- Johannes Pfau (10/21) Aug 09 2012 Well that's possible, but I don't like the template bloat it causes.
- Walter Bright (2/3) Aug 09 2012 It's more the user API that matters, not how it works under the hood.
- Walter Bright (4/5) Aug 09 2012 The Range argument - is it an InputRange, an OutputRange? While it's jus...
- Johannes Pfau (7/14) Aug 09 2012 It's an InputRange (of bytes) or an InputRange of some byte buffer
- Andrei Alexandrescu (6/7) Aug 09 2012 What have you measured, and what is your dislike based upon?
- Johannes Pfau (16/25) Aug 09 2012 What annoys me is that as long the function only supported arrays, it
- Jacob Carlborg (7/14) Aug 09 2012 A workaround is to make the non-template function to a template, with no...
- Denis Shelomovskij (9/9) Aug 08 2012 The question about module names.
- Johannes Pfau (12/21) Aug 08 2012 They're supposed to contain more implementations in the future. I
- travert phare.normalesup.org (Christophe Travert) (32/32) Aug 08 2012 I'm not familiar with hash functions in general.
- Vladimir Panteleev (6/9) Aug 09 2012 Is it too late to ask to include MurmurHash 2 and/or 3? It's
- Johannes Pfau (8/20) Aug 09 2012 To be honest I didn't even know that MurmurHash can be used
- Regan Heath (6/26) Aug 09 2012 Once the API is formalised I can contribute the hashes I have also :)
- Johannes Pfau (4/7) Aug 09 2012 great! with all those contributions we'll probably have a rather
- Johannes Pfau (30/30) Aug 10 2012 I implemented some of the suggestions, here's the list of changes:
- Johannes Pfau (15/15) Aug 20 2012 Changelog:
- Jesse Phillips (10/10) Aug 28 2012 All this discussion on the use of auto in the docs made me notice
- Johannes Pfau (11/25) Sep 04 2012 I had a look at how std.range documents the range interfaces, but the
Since the review queue has been mostly silent again I've decided to jump in and manage the one that's ready to go :) Today starts the the review of std.hash package by Johannes Pfau. We go with the usual cycle of two weeks for review and one week for voting. Thus review ends on 22th of August, followed by voting that ends on 29th of August. Description: std.hash.hash is a new module for Phobos defining an uniform interface for hashes and checksums. It also provides some useful helper functions to deal with this new API. The std.hash package also includes: - MD5 implementation deprecating std.md5 (in std.hash.md, adapted from std.md5); - new SHA1 implementation by redstar (in std.hash.sha); - CRC32 implementation (in std.hash.crc) based on and deprecating the crc32 module (that's shipped with phobos but not documented). It only covers hashes which can process data incrementally (in smaller buffers as opposed to all data at once). Code: https://github.com/jpf91/phobos/tree/newHash/std/hash https://github.com/jpf91/phobos/compare/master...newHash Docs: http://dl.dropbox.com/u/24218791/d/phobos/std_hash_hash.html http://dl.dropbox.com/u/24218791/d/phobos/std_hash_md.html http://dl.dropbox.com/u/24218791/d/phobos/std_hash_sha.html http://dl.dropbox.com/u/24218791/d/phobos/std_hash_crc.html -- Dmitry Olshansky
Aug 07 2012
Is the API already set in stone? Using .start and .finish, feels like the use of OpenSSL. Why not making a Hash object not "restartable" (since e.g. MD5.start() sets `this` just to MD5.init) and making finish private and implementing a digest and hexdigest property which calls finish and returns an ubyte[] array (or string for hexdigest), this would also eliminate the need of the `digest` wrapper (e.g. you can mixin these properties with template mixins, so no code duplication) Also I am not sure if we want to use `hash.put` instead of `hash.update` (which is used by python and openssl). But that's just a minor point (e.g. perl uses .add) Furthermore, I don't like the `sha1Of` and `md5Of` etc. functions, why not having a static method? e.g. MD5.hexdigest("input"), which returns the hexdigest and not a MD5 object. One more thing, `.reset` does the same as `start`? If so, why do both exist? (I also find the examples quite confusing, why do you want to reset the hash onject? I would documentate the function but wouldn't use it in the examples) Well, that's it from my side :)
Aug 07 2012
Also I am not sure if we want to use `hash.put` instead of `hash.update` (which is used by python and openssl). But that's just a minor point (e.g. perl uses .add)Well, it's there to implement the Range interface, but still, put doesn't make too much sense for me (maybe personal preference).
Aug 07 2012
On Tuesday, August 07, 2012 20:07:07 David wrote:Is the API already set in stone?No. That's the main point of the review process. The API needs to be reviewed and revised as appropriate (as does the implementation, but the API is much harder to fix later, since that tends to cause breakin changes). - Jonathan M Davis
Aug 07 2012
Am Tue, 07 Aug 2012 20:07:07 +0200 schrieb David <d dav1d.de>:Is the API already set in stone?No, as Jonathan already mentioned reviewing the API is an important part of the review. (Actually the most important part for this review, as we define one API for many implementations)Using .start and .finish, feels like the use of OpenSSL. Why not making a Hash object not "restartable" (since e.g. MD5.start() sets `this` just to MD5.init)I'm not an expert in the hash/digest area. We had incompatible APIs for CRC32, MD5 and SHA1 and I proposed we should have a unique interface. As I had some free time and nobody else stepped up I implemented it, asking for expert help in the newsgroup. This probably also explains why this API isn't revolutionary, it's inspired by other designs (tango, .net, ...). BTW: making it possible to wrap OpenSSL and other C hash libraries was also a goal. This is one reason why a start() function is necessary. OK, that was the intro ;-) start is here as structs can't have default constructors. For all current hashes it's really mostly useless (except it can be used as a simple way to reset a hash, which is also why the hash structs don't have a reset member), but I just don't know if there are hashes which would require a more complex (runtime) initialization. Classes don't have a start method as they can do that initialization in the default constructor. But the default constructor can't be used as a cheap way to reset a hash, therefore the reset function was added to the OOP interface.and making finish private and implementing a digest and hexdigest propertyproperty is not a good idea, as finish must reset the internal state for some objects, so you can't access the property repeatedly. It'd have to be a function.which calls finish and returns an ubyte[] array (or string for hexdigest)This could be done and is probably a matter of taste. I prefer the free function in this case, it makes clear that digest() really is a generic function which can be used with any hash. It's also one function less which must be implemented by hash writers. I'd like to keep the number of members required to implement a hash as low as possible. (This is however unrelated to the actual naming of the functions. If you think 'digest' is not a good name, please let me know), this would also eliminate the need of the `digest` wrapper (e.g. you can mixin these properties with template mixins, so no code duplication)there's actually no code duplication. 'digest' is the only implementation, sha1Of, md5Of are just aliases.Also I am not sure if we want to use `hash.put` instead of `hash.update` (which is used by python and openssl). But that's just a minor point (e.g. perl uses .add)I initially called it update, I agree it's a better name. But phobos is all about ranges, so we need the put member anyway. Providing an 'update' alias just for the name isn't worth it, imho (see above, keeping the number of hash members low).Furthermore, I don't like the `sha1Of` and `md5Of` etc. functions, why not having a static method? e.g. MD5.hexdigest("input"), which returns the hexdigest and not a MD5 object.Same reason as above, sha1Of, md5Of are just convenience aliases, it could be argued you don't need those at all. You can use 'digest' directly and it will provide exactly the same result. So hash writers don't have to implement anything to support digest, providing an alias is optional and just 1 line of code. A static method would have to be implemented by every hash. Yes you could use mixins to make that painless, but I think it's still to much work. (And mixins can't be documented in DDOC, but that's an unrelated compiler issue) BTW: it is implemented as a member for the OOP API, as we can just implement it in the base interface, so the actual implementations don't have to care about it.One more thing, `.reset` does the same as `start`? If so, why do both exist?See above. It's because start is used in the template/struct API and structs can't have default constructors, so we need start. The OOP/class api can use a default constructor. But unlike the start function it can't work as a reset function, so there's an explicit reset function.(I also find the examples quite confusing, why do you want to reset the hash onject?My approach was to document how to use the functions, not as much when it makes sense to use them. Maybe I should care more about that? It's more efficient than allocating a new hash with 'new' (or do you mean why use reset if finish resets anyway? It's useful if you put some data into a hash, then decide you don't want the hash (because the user aborts the operation, network connection is lost,...) but you still need the hash object later on.) You could just call finish and disregard the result, but the reset implementation is faster. I agree there's probably no _strong_ need for reset, but I think it doesn't do any harm.I would documentate the function but wouldn't use it in the examples) Well, that's it from my side :)Thanks for your review!
Aug 07 2012
Am 07.08.2012 21:53, schrieb Johannes Pfau:Am Tue, 07 Aug 2012 20:07:07 +0200 schrieb David <d dav1d.de>:Ok this point with the one above makes sense (I implemented my OpenSSL hashing wrapper as a class, initilaization is done in the constructor), it still doesn't feel right, if you have to call .start first. What about a struct-constructor which calls .start internally, so you get a bit more of a modern API (imo) and you're still able to implement the same interface for any kind of wrapper/own implementation/whatever.Is the API already set in stone?No, as Jonathan already mentioned reviewing the API is an important part of the review. (Actually the most important part for this review, as we define one API for many implementations)Using .start and .finish, feels like the use of OpenSSL. Why not making a Hash object not "restartable" (since e.g. MD5.start() sets `this` just to MD5.init)I'm not an expert in the hash/digest area. We had incompatible APIs for CRC32, MD5 and SHA1 and I proposed we should have a unique interface. As I had some free time and nobody else stepped up I implemented it, asking for expert help in the newsgroup. This probably also explains why this API isn't revolutionary, it's inspired by other designs (tango, .net, ...). BTW: making it possible to wrap OpenSSL and other C hash libraries was also a goal. This is one reason why a start() function is necessary. OK, that was the intro ;-) start is here as structs can't have default constructors. For all current hashes it's really mostly useless (except it can be used as a simple way to reset a hash, which is also why the hash structs don't have a reset member), but I just don't know if there are hashes which would require a more complex (runtime) initialization.Classes don't have a start method as they can do that initialization in the default constructor. But the default constructor can't be used as a cheap way to reset a hash, therefore the reset function was added to the OOP interface.Ok. (more below)Well, you could store the result internally, property generates on the first call the digest, stores it (that's not too much, 16 byte for a md5) and returns the already stored value on the 2nd call.and making finish private and implementing a digest and hexdigest propertyproperty is not a good idea, as finish must reset the internal state for some objects, so you can't access the property repeatedly. It'd have to be a function.With digest you mean: http://dl.dropbox.com/u/24218791/d/phobos/std_hash_hash.html#digest ? You normally always want the hexdigest (you barely need "real" digest), so it's a matter of convenience. Well, thanks to UFCS, you can call it like a method.which calls finish and returns an ubyte[] array (or string for hexdigest)This could be done and is probably a matter of taste. I prefer the free function in this case, it makes clear that digest() really is a generic function which can be used with any hash. It's also one function less which must be implemented by hash writers. I'd like to keep the number of members required to implement a hash as low as possible.(This is however unrelated to the actual naming of the functions. If you think 'digest' is not a good name, please let me know)I think that's a fitting name, but I am not a hashing expert.Yeah, I meant, you would have to implement these properties for each hash-type and in the end, all properties do the same (calling finish and maybe setting some internal flags, buffers), so you can put them into a template mixin and mixin them for each hash-type., this would also eliminate the need of the `digest` wrapper (e.g. you can mixin these properties with template mixins, so no code duplication)there's actually no code duplication. 'digest' is the only implementation, sha1Of, md5Of are just aliases.I have to agree, an alias would produce more confusion (what two methods which do the same?)Also I am not sure if we want to use `hash.put` instead of `hash.update` (which is used by python and openssl). But that's just a minor point (e.g. perl uses .add)I initially called it update, I agree it's a better name. But phobos is all about ranges, so we need the put member anyway. Providing an 'update' alias just for the name isn't worth it, imho (see above, keeping the number of hash members low).Yes, you would have to implement it for every hash, but that's 3 lines: static string hexdigest(void[] data) { return toHexString(digest!(typeof(this))("yay")); }Furthermore, I don't like the `sha1Of` and `md5Of` etc. functions, why not having a static method? e.g. MD5.hexdigest("input"), which returns the hexdigest and not a MD5 object.Same reason as above, sha1Of, md5Of are just convenience aliases, it could be argued you don't need those at all. You can use 'digest' directly and it will provide exactly the same result. So hash writers don't have to implement anything to support digest, providing an alias is optional and just 1 line of code. A static method would have to be implemented by every hash.Yes you could use mixins to make that painless, but I think it's still to much work. (And mixins can't be documented in DDOC, but that's an unrelated compiler issue)BTW: it is implemented as a member for the OOP API, as we can just implement it in the base interface, so the actual implementations don't have to care about it.I've seen that, but I am wondering why it's not a static method, the creation of the Hash Object could be hidden inside. You might say, this would allocate a class instance just for a single use, without the possibility to reuse it. Correct, but I think that's the use of such a function, if you just need a Hash, nothing else, otherwise you could use the class "directly", with all its other functionality.I am not sure what do think about that. On the one side it's useful if you really need that speed, but then I think, is it really worth it resetting the state hidden in a function. If you really want to avoid another allocation you could use emplace (if I didn't misunderstand the use of emplace). To quote from the Python Zen: "Explicit is better than implicit."One more thing, `.reset` does the same as `start`? If so, why do both exist?See above. It's because start is used in the template/struct API and structs can't have default constructors, so we need start. The OOP/class api can use a default constructor. But unlike the start function it can't work as a reset function, so there's an explicit reset function.When I saw the documentation/examples, I was confused, why do you want to reset a hash object, so does it really reset the state? So I had to take a look into the code before I was sure, it surely does reset the whole thingy (is it called context?).(I also find the examples quite confusing, why do you want to reset the hash onject?My approach was to document how to use the functions, not as much when it makes sense to use them. Maybe I should care more about that?It's more efficient than allocating a new hash with 'new' (or do you mean why use reset if finish resets anyway? It's useful if you put some data into a hash, then decide you don't want the hash (because the user aborts the operation, network connection is lost,...) but you still need the hash object later on.) You could just call finish and disregard the result, but the reset implementation is faster. I agree there's probably no _strong_ need for reset, but I think it doesn't do any harm."Explicit is better than implicit."Thanks for writing std.hash, I think lots of people need it.Well, that's it from my side :)Thanks for your review!
Aug 07 2012
Am Wed, 08 Aug 2012 00:55:47 +0200 schrieb David <d dav1d.de>:Am 07.08.2012 21:53, schrieb Johannes Pfau:You mean an external function which constructs the digest context and calls start? That's similar to the appender function in std.array and probably a good idea. We'd need a good name for it though. * initContext * initializeContext * context * startContext * startHash / startDigestAm Tue, 07 Aug 2012 20:07:07 +0200 schrieb David <d dav1d.de>: start is here as structs can't have default constructors. For all current hashes it's really mostly useless (except it can be used as a simple way to reset a hash, which is also why the hash structs don't have a reset member), but I just don't know if there are hashes which would require a more complex (runtime) initialization.Ok this point with the one above makes sense (I implemented my OpenSSL hashing wrapper as a class, initilaization is done in the constructor), it still doesn't feel right, if you have to call .start first. What about a struct-constructor which calls .start internally, so you get a bit more of a modern API (imo) and you're still able to implement the same interface for any kind of wrapper/own implementation/whatever.yes, that could be done but to be honest I don't think it's worth it.property is not a good idea, as finish must reset the internal state for some objects, so you can't access the property repeatedly. It'd have to be a function.Well, you could store the result internally, property generates on the first call the digest, stores it (that's not too much, 16 byte for a md5) and returns the already stored value on the 2nd call.really? I though it's the other way round and comparing/verifying hashes is the common case?With digest you mean: http://dl.dropbox.com/u/24218791/d/phobos/std_hash_hash.html#digest ? You normally always want the hexdigest (you barely need "real" digest),which calls finish and returns an ubyte[] array (or string for hexdigest)This could be done and is probably a matter of taste. I prefer the free function in this case, it makes clear that digest() really is a generic function which can be used with any hash. It's also one function less which must be implemented by hash writers. I'd like to keep the number of members required to implement a hash as low as possible.so it's a matter of convenience. Well, thanks to UFCS, you can call it like a method.a hexDigest which works like digest could probably be useful, although it's very easy to implement this in user code as well: string hexDigest(Hash, Order order = Order.increasing)(scope const(void[])[] data...) if(isDigest!Hash) { return digest!Hash(data).toHexString!order(); }The implementation is indeed quite simple, so it probably is a matter of taste whether hexDigest is implemented as a member or as a free function.A static method would have to be implemented by every hash.Yes, you would have to implement it for every hash, but that's 3 lines: static string hexdigest(void[] data) { return toHexString(digest!(typeof(this))("yay")); }I have to think about this. It's too bad we can't have static & member functions with the same name in D. I'd probably argue if you want the behavior of a static function, use the struct API: digest!MD5(data); Doesn't this do everything you need, without the GC allocation the OOP api would cause? I mean the OOP API is mostly about polymorphism (and about reference semantics). If you use a static method as proposed, you can't use polymorphism and reference semantics are not useful either, as you never get a reference to the context?BTW: it is implemented as a member for the OOP API, as we can just implement it in the base interface, so the actual implementations don't have to care about it.I've seen that, but I am wondering why it's not a static method, the creation of the Hash Object could be hidden inside. You might say, this would allocate a class instance just for a single use, without the possibility to reuse it. Correct, but I think that's the use of such a function, if you just need a Hash, nothing else, otherwise you could use the class "directly", with all its other functionality.Yes, it should be possible to use emplace for that. Emplace is not well-known though and I'm not sure how well it works now. Also I'm not sure if emplace is enough in this case, I guess you'd have to destroy the old instance as well (so destructors are run)? But I'm not really sure about the importance of reset either. As long as the main implementation is in the struct API and WrapperDigest is used to wrap it into the OOP API there's no extra work to implement reset. If the main implementation is in the OOP interface, the reset function probably looks a lot like the constructor, so simply calling reset from the constructor should avoid code duplication.The OOP/class api can use a default constructor. But unlike the start function it can't work as a reset function, so there's an explicit reset function.I am not sure what do think about that. On the one side it's useful if you really need that speed, but then I think, is it really worth it resetting the state hidden in a function. If you really want to avoid another allocation you could use emplace (if I didn't misunderstand the use of emplace). To quote from the Python Zen: "Explicit is better than implicit."Yes, the whole terminology is probably a little confusing. The old API called the contexts like this: MD5_CTX, but that doesn't fit the phobos naming conventions. MD5CTX, MD5Ctx, Md5Ctx and Md5CTX all look ugly (and only the first fits our naming conventions). So MD5Context is the next best choice, but it's a little long. It'd more precise though.When I saw the documentation/examples, I was confused, why do you want to reset a hash object, so does it really reset the state? So I had to take a look into the code before I was sure, it surely does reset the whole thingy (is it called context?).(I also find the examples quite confusing, why do you want to reset the hash onject?My approach was to document how to use the functions, not as much when it makes sense to use them. Maybe I should care more about that?I'm just not sure if reset can be implemented in a reasonable way outside of the class in all cases. But then it's probably not important enough to justify an extra member...It's more efficient than allocating a new hash with 'new' (or do you mean why use reset if finish resets anyway? It's useful if you put some data into a hash, then decide you don't want the hash (because the user aborts the operation, network connection is lost,...) but you still need the hash object later on.) You could just call finish and disregard the result, but the reset implementation is faster. I agree there's probably no _strong_ need for reset, but I think it doesn't do any harm."Explicit is better than implicit."
Aug 08 2012
Johannes Pfau wrote:As I had some free time and nobody else stepped up I implemented it,I tried it before but I wanted to create whole crypto package at once, and I guess it was a huge mistake. I know you have used my code in your uuid module, you can use it to add it to std.hash if you want. There are all SHA implementations, MD5 and a Tiger (both versions). They all support bit hashing and work with CTFE. I made a quick look at your proposal, but currently don't have time to do a deep investigation :). One note though. I don't think it's neccessary to maintain two different API sets. Sure, classes need heap allocation but usually that's only one allocation, it's more important to not allocate during algorithm computations. Fortunately crypto algorithms don't do this. So, struct allocation efficency is negligible here. I think separate struct based API set is not worth it (or I'm missing something not related to allocation cost).
Aug 07 2012
Am Wed, 08 Aug 2012 01:27:45 +0200 schrieb Piotr Szturmaj <bncrbme jadamspam.pl>:Johannes Pfau wrote:I'm sorry, I didn't want to conceal your work. What I meant with 'nobody stepped up' is this: We had a pull request which moved the crc code into std.crc and it was already merged in for 2.060. I complained that it had yet another API and that we should have a common API. In the end that pull request was reverted but I felt kinda guilty as that left us without a public crc module. In that discussion, nobody else had time to implement that API, so I thought we'd need a solution soon and started implementing it.As I had some free time and nobody else stepped up I implemented it,I tried it before but I wanted to create whole crypto package at once,and I guess it was a huge mistake.We still need your crypto work ;-) I'd personally like to see a bcrypt implementation. There's no widespread bcrypt C library which could be used, so there's no painless way to use bcrypt in D. (Although there _is_ public domain C source code)I know you have used my code in your uuid module, you can use it to add it to std.hash if you want. There are all SHA implementations, MD5 and a Tiger (both versions). They all support bit hashing and work with CTFE.Great, I'll have a look if this review works out (or as soon as we have a standardized hash API in phobos). BTW: How does it work in CTFE? Don't you have to do endianness conversions at some time? According to Don that's not really supported. Another problem with prevents CTFE for my proposal would be that the internal state is currently implemented as an array of uints, but the API uses ubyte[] as a return type. That sort of reinterpret cast is not supposed to work in CTFE though. I wonder how you avoided that issue? And another problem is that void[][] (as used in the 'digest' function) doesn't work in CTFE (and it isn't supposed to work). But that's a problem specific to this API.I made a quick look at your proposal, but currently don't have time to do a deep investigation :). One note though. I don't think it's neccessary to maintain two different API sets. Sure, classes need heap allocation but usually that's only one allocation, it's more important to not allocate during algorithm computations. Fortunately crypto algorithms don't do this. So, struct allocation efficency is negligible here. I think separate struct based API set is not worth it (or I'm missing something not related to allocation cost).I'm not sure, I guess some people would disagree. I initially asked whether we should use a struct based API or a class based API (+emplace where necessary). At that time the struct API was favored. It's about the allocation cost (and the additional GC work) although people could probably also complain about the indirection classes bring along. There's one important example where you really don't want any allocation to happen (and especially no GC allocation): You have a function which always calculates the same type of hash (e.g. md5) and you don't care about the implementation. You only care about the final hash, the hash context is never used outside of that function. This is an perfect fit for stack allocation, especially if the function is used a lot (GC). With classes you'd have to do some caching to do it efficiently (which then gives problems with const). It would be possible to use a class API + emplace, but that was deemed to be too cumbersome.
Aug 08 2012
Johannes Pfau wrote:Am Wed, 08 Aug 2012 01:27:45 +0200 schrieb Piotr Szturmaj <bncrbme jadamspam.pl>:Hey, I didn't say you did :-) I'm fine with your work :-)Johannes Pfau wrote:I'm sorry, I didn't want to conceal your work. What I meant with 'nobody stepped up' is this: We had a pull request which moved the crc code into std.crc and it was already merged in for 2.060. I complained that it had yet another API and that we should have a common API. In the end that pull request was reverted but I felt kinda guilty as that left us without a public crc module. In that discussion, nobody else had time to implement that API, so I thought we'd need a solution soon and started implementing it.As I had some free time and nobody else stepped up I implemented it,I tried it before but I wanted to create whole crypto package at once,Yes, there should be bcrypt, scrypt and PBKDF2.and I guess it was a huge mistake.We still need your crypto work ;-) I'd personally like to see a bcrypt implementation. There's no widespread bcrypt C library which could be used, so there's no painless way to use bcrypt in D. (Although there _is_ public domain C source code)std.bitmanip.swapEndian() works for meI know you have used my code in your uuid module, you can use it to add it to std.hash if you want. There are all SHA implementations, MD5 and a Tiger (both versions). They all support bit hashing and work with CTFE.Great, I'll have a look if this review works out (or as soon as we have a standardized hash API in phobos). BTW: How does it work in CTFE? Don't you have to do endianness conversions at some time? According to Don that's not really supported.Another problem with prevents CTFE for my proposal would be that the internal state is currently implemented as an array of uints, but the API uses ubyte[] as a return type. That sort of reinterpret cast is not supposed to work in CTFE though. I wonder how you avoided that issue?There is set of functions that abstract some operations to work with CTFE and at runtime: https://github.com/pszturmaj/phobos/blob/master/std/crypto/hash/base.d#L66. Particularly memCopy().And another problem is that void[][] (as used in the 'digest' function) doesn't work in CTFE (and it isn't supposed to work). But that's a problem specific to this API.Yes, that's why I use ubyte[]. I think if it's possible to do it all with CTFE by using hacks, it should be rather implemented in the compiler, assuming the endiannes of the CPU it's running on. [...cut...]There's one important example where you really don't want any allocation to happen (and especially no GC allocation): You have a function which always calculates the same type of hash (e.g. md5) and you don't care about the implementation. You only care about the final hash, the hash context is never used outside of that function. This is an perfect fit for stack allocation, especially if the function is used a lot (GC). With classes you'd have to do some caching to do it efficiently (which then gives problems with const). It would be possible to use a class API + emplace, but that was deemed to be too cumbersome.I don't think std.typecons.scoped is cumbersome: auto sha = scoped!SHA1(); // allocates on the stack auto digest = sha.digest("test"); Why I think classes should be supported is the need of polymorphism. One should be able to accept Digest base class as a function parameter or a field/property. Without polymorphism we need to resort to C-like switches, ifs and so on. Templates are no go, because hash function may be determined at runtime (it may depend on security protocol handshake).
Aug 08 2012
Am Wed, 08 Aug 2012 11:27:49 +0200 schrieb Piotr Szturmaj <bncrbme jadamspam.pl>:Great! I always tried the *endianToNative and nativeTo*Endian functions. So I didn't expect swapEndian to work.BTW: How does it work in CTFE? Don't you have to do endianness conversions at some time? According to Don that's not really supported.std.bitmanip.swapEndian() works for meI should definitely look at this later. Would be great if hashes worked in CTFE.Another problem with prevents CTFE for my proposal would be that the internal state is currently implemented as an array of uints, but the API uses ubyte[] as a return type. That sort of reinterpret cast is not supposed to work in CTFE though. I wonder how you avoided that issue?There is set of functions that abstract some operations to work with CTFE and at runtime: https://github.com/pszturmaj/phobos/blob/master/std/crypto/hash/base.d#L66. Particularly memCopy().But then you can't even hash a string in CTFE. I wanted to special case strings, but for various reasons it didn't work out in the end.And another problem is that void[][] (as used in the 'digest' function) doesn't work in CTFE (and it isn't supposed to work). But that's a problem specific to this API.Yes, that's why I use ubyte[].I don't think std.typecons.scoped is cumbersome: auto sha = scoped!SHA1(); // allocates on the stack auto digest = sha.digest("test");Yes I'm not sure about this. But a class only based interface probably hasn't high chances of being accepted into phobos. And I think the struct interface+wrappers approach isn't bad.Why I think classes should be supported is the need of polymorphism.And ABI compatibility and switching the backend (OpenSSL, native D, windows crypto) at runtime. I know it's very useful, this is why we have the OOP api. It's very easy to wrap the OOP api onto the struct api. These are the implementations of MD5Digest, CRC32Digest and SHA1Digest: alias WrapperDigest!CRC32 CRC32Digest; alias WrapperDigest!MD5 MD5Digest; alias WrapperDigest!SHA1 SHA1Digest; with the support code in std.hash.hash 1LOC is enough to implement the OOP interface if a struct interface is available, so I don't think maintaining two APIs is a problem. A bigger problem is that the real implementation must be the struct interface, so you can't use polymorphism there. I hope alias this is enough.
Aug 08 2012
On Wednesday, August 08, 2012 18:12:23 Johannes Pfau wrote:Am Wed, 08 Aug 2012 11:27:49 +0200 schrieb Piotr Szturmaj <bncrbme jadamspam.pl>:What's wrong with the *endianToNative and nativeTo*Endian functions? They work just fine as far as I know. swapEndian works too if you want it to use that, but there should be nothing wrong with the endian-specific ones. - Jonathan M DavisGreat! I always tried the *endianToNative and nativeTo*Endian functions. So I didn't expect swapEndian to work.BTW: How does it work in CTFE? Don't you have to do endianness conversions at some time? According to Don that's not really supported.std.bitmanip.swapEndian() works for me
Aug 08 2012
Am Wed, 08 Aug 2012 14:36:32 -0400 schrieb "Jonathan M Davis" <jmdavisProg gmx.com>:What's wrong with the *endianToNative and nativeTo*Endian functions? They work just fine as far as I know. swapEndian works too if you want it to use that, but there should be nothing wrong with the endian-specific ones. - Jonathan M Davisin CTFE? http://dpaste.dzfl.pl/0503b8af According to Don reinterpret casts (even if done through unions) won't be supported in CTFE. So you can't convert from uint-->ubyte[4]
Aug 08 2012
On Wednesday, August 08, 2012 20:55:19 Johannes Pfau wrote:Am Wed, 08 Aug 2012 14:36:32 -0400 schrieb "Jonathan M Davis" <jmdavisProg gmx.com>:No. It wouldn't work in CTFE, because it uses a union. But what it's trying to doesn't really make sense in CTFE in most cases anyway, because the endianness of the target machine may not be the same endianness as the machine doing the compilation. Any computations which cared about endianness must be in a state where they don't care about endianness anymore once CTFE has completed, or you're going to have bugs. Though if the issue is std.hash being CTFEable, I don't know why anyone would even care. It's cool if it's CTFEable, but the sorts of things that you hash pretty much always require user or file input of some kind (which you can't do with CTFE). You'd have to have a use case where something within the program itself needed to be hashed for some reason for it to matter whether std.hash was CTFEable or not, and it wouldn't surprise me at all if it were typical in hash functions to do stuff that isn't CTFEable anyway. - Jonathan M DavisWhat's wrong with the *endianToNative and nativeTo*Endian functions? They work just fine as far as I know. swapEndian works too if you want it to use that, but there should be nothing wrong with the endian-specific ones. - Jonathan M Davisin CTFE? http://dpaste.dzfl.pl/0503b8af According to Don reinterpret casts (even if done through unions) won't be supported in CTFE. So you can't convert from uint-->ubyte[4]
Aug 08 2012
Am Wed, 08 Aug 2012 16:44:03 -0400 schrieb "Jonathan M Davis" <jmdavisProg gmx.com>:I completely agree, but this is true for hashes. Once the final hash value is produced it doesn't depend on the endianness.in CTFE? http://dpaste.dzfl.pl/0503b8af According to Don reinterpret casts (even if done through unions) won't be supported in CTFE. So you can't convert from uint-->ubyte[4]No. It wouldn't work in CTFE, because it uses a union.But what it's trying to doesn't really make sense in CTFE in most cases anyway, because the endianness of the target machine may not be the same endianness as the machine doing the compilation. Any computations which cared about endianness must be in a state where they don't care about endianness anymore once CTFE has completed, or you're going to have bugs.Though if the issue is std.hash being CTFEable, I don't know why anyone would even care. It's cool if it's CTFEable, but the sorts of things that you hash pretty much always require user or file input of some kind (which you can't do with CTFE).Yeah it's not that useful, that's why I didn't care about CTFE support right now. The only usecase I can think of is to hash a string in CTFE, for example UUID could use it to support name based UUID literals.You'd have to have a use case where something within the program itself needed to be hashed for some reason for it to matter whether std.hash was CTFEable or not, and it wouldn't surprise me at all if it were typical in hash functions to do stuff that isn't CTFEable anyway. - Jonathan M Davis
Aug 09 2012
Am Wed, 08 Aug 2012 11:27:49 +0200 schrieb Piotr Szturmaj <bncrbme jadamspam.pl>:Yes, there should be bcrypt, scrypt and PBKDF2.Wow, I didn't know about scrypt. Seems to be pretty cool.
Aug 08 2012
On 8/7/12 14:39 , Dmitry Olshansky wrote:Since the review queue has been mostly silent again I've decided to jump in and manage the one that's ready to go :) Today starts the the review of std.hash package by Johannes Pfau. We go with the usual cycle of two weeks for review and one week for voting. Thus review ends on 22th of August, followed by voting that ends on 29th of August. Description: std.hash.hash is a new module for Phobos defining an uniform interface for hashes and checksums. It also provides some useful helper functions to deal with this new API. The std.hash package also includes:I think "std.crypto" is a better name for the package. At first I thought it contained an implementation of a Hash table. Also note these entries in wikipedia: http://en.wikipedia.org/wiki/Hash_function http://en.wikipedia.org/wiki/Cryptographic_hash_function Your package provides the later, not just any hash functions, but *crypto*graphic hash functions. :-) (and yes, I know I'm just discussing the name here, but names *are* important)
Aug 07 2012
Le 07/08/2012 20:31, Ary Manzana a crit :On 8/7/12 14:39 , Dmitry Olshansky wrote:You'll find very hard to convince anyone that crc32 is a cryptographic hash function. And this API is suited for both cryptographic hash and regular hash. Many of them can be added in the future if need is met. I definitively am for std.hash .Since the review queue has been mostly silent again I've decided to jump in and manage the one that's ready to go :) Today starts the the review of std.hash package by Johannes Pfau. We go with the usual cycle of two weeks for review and one week for voting. Thus review ends on 22th of August, followed by voting that ends on 29th of August. Description: std.hash.hash is a new module for Phobos defining an uniform interface for hashes and checksums. It also provides some useful helper functions to deal with this new API. The std.hash package also includes:I think "std.crypto" is a better name for the package. At first I thought it contained an implementation of a Hash table. Also note these entries in wikipedia: http://en.wikipedia.org/wiki/Hash_function http://en.wikipedia.org/wiki/Cryptographic_hash_function Your package provides the later, not just any hash functions, but *crypto*graphic hash functions. :-) (and yes, I know I'm just discussing the name here, but names *are* important)
Aug 07 2012
Am Tue, 07 Aug 2012 20:46:44 +0200 schrieb deadalnix <deadalnix gmail.com>:You'll find very hard to convince anyone that crc32 is a cryptographic hash function.And there will hopefully be more hashes in std.hash a some point. BTW: I also considered splitting hashes into cryptographic and non-cryptographic/checksums. But as we also have some generic parts (currently in std.hash.hash) this would pose the question where to put the generic part? Put it in std.checksum and std.crypto users will complain, put it in std.crypto and std.checksum users won't be happy. And as was discussed previously what's considered a safe cryptographic hash might change as time goes by.And this API is suited for both cryptographic hash and regular hash. Many of them can be added in the future if need is met. I definitively am for std.hash .We had this package name discussion a few times on the newsgroup and I think on github as well. I personally don't care about the package name and I'll just choose what the majority thinks is best. Last time it seemed std.hash was the favorite (although hash and digest can probably be used interchangeably)
Aug 07 2012
On Tuesday, August 07, 2012 15:31:57 Ary Manzana wrote:That doesn't fly, because crc32 is going to be in there, and while it's a hash, it's no good for cryptography. - Jonathan M DavisThe std.hash package also includes:I think "std.crypto" is a better name for the package. At first I thought it contained an implementation of a Hash table.
Aug 07 2012
On Tue, 07 Aug 2012 19:41:12 +0100, Jonathan M Davis <jmdavisProg gmx.com> wrote:On Tuesday, August 07, 2012 15:31:57 Ary Manzana wrote:std.digest then? R -- Using Opera's revolutionary email client: http://www.opera.com/mail/That doesn't fly, because crc32 is going to be in there, and while it's a hash, it's no good for cryptography.The std.hash package also includes:I think "std.crypto" is a better name for the package. At first I thought it contained an implementation of a Hash table.
Aug 08 2012
On Wed, Aug 08, 2012 at 11:37:35AM +0100, Regan Heath wrote:On Tue, 07 Aug 2012 19:41:12 +0100, Jonathan M Davis <jmdavisProg gmx.com> wrote:[...] +1. I think std.hash is needlessly confusing (I thought it was another hashtable implementation until I read this thread more carefully). T -- Two wrongs don't make a right; but three rights do make a left...On Tuesday, August 07, 2012 15:31:57 Ary Manzana wrote:std.digest then?I think "std.crypto" is a better name for the package. At first I thought it contained an implementation of a Hash table.That doesn't fly, because crc32 is going to be in there, and while it's a hash, it's no good for cryptography.
Aug 08 2012
On Wednesday, 8 August 2012 at 12:00:42 UTC, H. S. Teoh wrote:On Wed, Aug 08, 2012 at 11:37:35AM +0100, Regan Heath wrote:-1 std.digest let's me think of http://en.wikipedia.org/wiki/Digestion digest is just not common if you mean hash in my cycles. I didn't think of an hash table implementation, maybe you are spoiled by writing one at the moment? (no offence).On Tue, 07 Aug 2012 19:41:12 +0100, Jonathan M Davis <jmdavisProg gmx.com> wrote:[...] +1. I think std.hash is needlessly confusing (I thought it was another hashtable implementation until I read this thread more carefully). TOn Tuesday, August 07, 2012 15:31:57 Ary Manzana wrote:std.digest then?I think "std.crypto" is a better name for the package. At first I thought it contained an implementation of a Hash table.That doesn't fly, because crc32 is going to be in there, and while it's a hash, it's no good for cryptography.
Aug 08 2012
On Wed, 08 Aug 2012 13:11:43 +0100, Tobias Pankrath <tobias pankrath.net> wrote:On Wednesday, 8 August 2012 at 12:00:42 UTC, H. S. Teoh wrote:That's exactly what it's supposed to suggest. The algorithm does digest the input (AKA message) and output .. something else :pOn Wed, Aug 08, 2012 at 11:37:35AM +0100, Regan Heath wrote:-1 std.digest let's me think of http://en.wikipedia.org/wiki/DigestionOn Tue, 07 Aug 2012 19:41:12 +0100, Jonathan M Davis <jmdavisProg gmx.com> wrote: >On Tuesday, August 07, 2012 15:31:57 Ary Manzana wrote:[...] +1. I think std.hash is needlessly confusing (I thought it was another hashtable implementation until I read this thread more carefully). Tstd.digest then?I think "std.crypto" is a better name for the package. At >>first I thought it contained an implementation of a Hash table.That doesn't fly, because crc32 is going to be in there, and >while it's a hash, it's no good for cryptography.digest is just not common if you mean hash in my cycles.Like it or not, Digest is the correct term: http://en.wikipedia.org/wiki/MD5 "The MD5 Message-Digest Algorithm .."I didn't think of an hash table implementation, maybe you are spoiled by writing one at the moment? (no offence)."Hash" has too many meanings, we should avoid it. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Aug 08 2012
On Wednesday, 8 August 2012 at 12:55:04 UTC, Regan Heath wrote:Like it or not, Digest is the correct term: http://en.wikipedia.org/wiki/MD5 "The MD5 Message-Digest Algorithm .."You could have cited the hole sentenceThe MD5 Message-Digest Algorithm is a widely used cryptographic hash functionSo at least this implies that hash function is the more general term here and the corresponding wiki article is named "hash function" and does not even mention digest.At least hash table does not use a different meaning of the term hash. But I'm not that deep into it, I'd just say that digest is not clearly better than hash.I didn't think of an hash table implementation, maybe you are spoiled by writing one at the moment? (no offence)."Hash" has too many meanings, we should avoid it.
Aug 08 2012
On Wed, 08 Aug 2012 14:03:32 +0100, Tobias Pankrath <tobias pankrath.net> wrote:On Wednesday, 8 August 2012 at 12:55:04 UTC, Regan Heath wrote:I could have, but I didn't read that far :p I knew what I was looking for and I copy/pasted it.Like it or not, Digest is the correct term: http://en.wikipedia.org/wiki/MD5 "The MD5 Message-Digest Algorithm .."You could have cited the hole sentence"Message-Digest Algorithm" is the proper term, "hash" is another, correct, more general term. "hash" has other meanings, "Message-Digest Algorithm" does not. std.message-digest-algorithm is a bit wordy. std.digest is not. std.digest cannot be confused with anything else.The MD5 Message-Digest Algorithm is a widely used cryptographic hash functionSo at least this implies that hash function is the more general term here and the corresponding wiki article is named "hash function" and does not even mention digest.I think it is. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/At least hash table does not use a different meaning of the term hash. But I'm not that deep into it, I'd just say that digest is not clearly better than hash.I didn't think of an hash table implementation, maybe you are spoiled by writing one at the moment? (no offence)."Hash" has too many meanings, we should avoid it.
Aug 08 2012
"Regan Heath" , dans le message (digitalmars.D:174462), a écrit :"Message-Digest Algorithm" is the proper term, "hash" is another, correct, more general term. "hash" has other meanings, "Message-Digest Algorithm" does not.I think the question is: is std.hash going to contain only message-digest algorithm, or could it also contain other hash functions? I think there is enough room in a package to have both message-digest algorithm and other kinds of hash functions.
Aug 08 2012
On Wednesday, 8 August 2012 at 13:38:26 UTC, travert phare.normalesup.org (Christophe Travert) wrote:I think the question is: is std.hash going to contain only message-digest algorithm, or could it also contain other hash functions? I think there is enough room in a package to have both message-digest algorithm and other kinds of hash functions.Even if that were the case, I'd say they should be kept separate. Cryptographic hash functions serve extremely different purposes from regular hash functions. There is no reason they should be categorized the same.
Aug 08 2012
On Wed, 08 Aug 2012 14:50:22 +0100, Chris Cain <clcain uncg.edu> wrote:On Wednesday, 8 August 2012 at 13:38:26 UTC, travert phare.normalesup.org (Christophe Travert) wrote:I don't think there is any reason to separate them. People should know which digest algorithm they want, they're not going to pick one at random and assume it's "super secure!"(tm). And if they do, well tough, they deserve what they get. "std.digest" can encompass all message digest algorithms, whether secure or not. We could create a 2nd level below "secure" or "crypto" or similar if we really want, but I don't see much point TBH. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/I think the question is: is std.hash going to contain only message-digest algorithm, or could it also contain other hash functions? I think there is enough room in a package to have both message-digest algorithm and other kinds of hash functions.Even if that were the case, I'd say they should be kept separate. Cryptographic hash functions serve extremely different purposes from regular hash functions. There is no reason they should be categorized the same.
Aug 08 2012
"Chris Cain" , dans le message (digitalmars.D:174466), a écrit :On Wednesday, 8 August 2012 at 13:38:26 UTC, travert phare.normalesup.org (Christophe Travert) wrote:They should not be categorized the same. I don't expect a regular hash function to pass the isDigest predicate. But they have many similarities, which explains they are all called hash functions. There is enough room in a package to put several related concepts! Here, we have a package for 4 files, with a total number of line that is about one third of the single std.algorithm file (which is probably too big, I conceed). There aren't hundreds of message-digest functions to add here. If it where me, I would have the presently reviewed module std.hash.hash be called std.hash.digest, and leave room here for regular hash functions. In any case, I think regular hash HAVE to be in a std.hash module or package, because people looking for a regular hash function will look here first.I think the question is: is std.hash going to contain only message-digest algorithm, or could it also contain other hash functions? I think there is enough room in a package to have both message-digest algorithm and other kinds of hash functions.Even if that were the case, I'd say they should be kept separate. Cryptographic hash functions serve extremely different purposes from regular hash functions. There is no reason they should be categorized the same.
Aug 08 2012
On Wednesday, 8 August 2012 at 14:14:29 UTC, Regan Heath wrote:I don't think there is any reason to separate them. People should know which digest algorithm they want, they're not going to pick one at random and assume it's "super secure!"(tm). And if they do, well tough, they deserve what they get.In this case, I'm not suggesting keep them separate to not confuse those who don't know better. They're simply disparate in actual use. What do you use a traditional hash function for? Usually to turn a large multibyte stream into some finite size so that you can use a lookup table or maybe to decrease wasted time in comparisons. What do you use a cryptographic hash function for? Almost always it's to verify the integrity of some data (usually files) or protect the original form from prying eyes (passwords ... though, there are better approaches for that now). You'd _never_ use a cryptographic hash function in place of a traditional hash function and vice versa because they designed for completely different purposes. At a cursory glance, they bare only one similarity and that's the fact that they turn a big chunk of data into a smaller form that has a fixed size. On Wednesday, 8 August 2012 at 14:16:40 UTC, travert phare.normalesup.org (Christophe Travert) wrote:function to pass the isDigest predicate. But they have many similarities, which explains they are all called hash functions. There is enough room in a package to put several related concepts!Crytographic hash functions are also known as "one-way compression functions." They also have similarities to file compression algorithms. After all, both of them turn large files into smaller data. However, the actual use of them is completely different and you wouldn't use one in place of the other. I wouldn't put the Burrows-Wheeler transform in the same package. It's just my opinion of course, but I just feel it wouldn't be right to intermingle normal hash functions and cryptographic hash functions in the same package. If we had to make a compromise and group them with something else, I'd really like to see cryptographic hash functions put in the same place we'd put other cryptography (such as AES) ... in a std.crypto package. But std.digest is good if they can exist in their own package. It also occurs to me that a lot of people are confounding cryptographic hash functions and normal hash functions enough that they think that a normal hash function has a "digest" ... I'm 99% sure that's exclusive to the cryptographic hash functions (at least, I've never heard of a normal hash function producing a digest).
Aug 08 2012
"Chris Cain" , dans le message (digitalmars.D:174477), a écrit : I think you misunderstood me (and it's probably my fault, since I don't know much of hash functions), I was wanted to compare two kind of concepts: 1/ message digest functions, like md5, or sha1, used on large files, which is what is covered by this std.hash proposal. 2/ small hash function. Like what are use in an associative array, and are called toHash when used a member function. And I didn't thought of: 3/ cryptographic hash functions My opinion was that in a module or package called hash, I expect tools would rather have it named std.hash.digest, leaving room in the hash package to other concepts, like small hash functions that can be used in crypto package makes sense. -- Christophe
Aug 08 2012
On Wed, 08 Aug 2012 18:33:01 +0100, Christophe Travert = <travert phare.normalesup.org> wrote:"Chris Cain" , dans le message (digitalmars.D:174477), a =E9crit : I think you misunderstood me (and it's probably my fault, since I don'=tknow much of hash functions), I was wanted to compare two kind of concepts: 1/ message digest functions, like md5, or sha1, used on large files, which is what is covered by this std.hash proposal. 2/ small hash function. Like what are use in an associative array, and=are called toHash when used a member function. And I didn't thought of: 3/ cryptographic hash functions My opinion was that in a module or package called hash, I expect tools=would rather have it named std.hash.digest, leaving room in the hash package to other concepts, like small hash functions that can be used =incrypto package makes sense.Here is a perfect example of why we need to avoid using "hash", it has t= oo = many meanings to different people. I suggest: std.digest <- cryptographic "hash" algorithms std.crc <- crc "hash" algorithms std.uuid <- identity "hash" algorithms This is assuming we cannot have more levels of depth in the package/modu= le = tree, otherwise you could group them all under the package "hash": std.hash.digest std.hash.crc std.hash.uuid Some people are going to argue it should be: std.crypto.digest or.. std.crypto.hash But that leads us to something like: std.crypto.hash std.crc.hash std.uuid.hash And that seems back-to-front to me, and more importantly would = assume/suggest/require we have more packages to put in std.crc and = std.uuid, which I suspect we wont. R -- = Using Opera's revolutionary email client: http://www.opera.com/mail/
Aug 08 2012
On Wednesday, 8 August 2012 at 17:33:01 UTC, travert phare.normalesup.org (Christophe Travert) wrote:I think you misunderstood me (and it's probably my fault, since I don't know much of hash functions), I was wanted to compare two kind of concepts: 1/ message digest functions, like md5, or sha1, used on large files, which is what is covered by this std.hash proposal. 2/ small hash function. Like what are use in an associative array, and are called toHash when used a member function. And I didn't thought of: 3/ cryptographic hash functionsActually, maybe I'm the one not doing a good job of explaining. 1 and 3 are the same things (what you're calling "message digest" functions are cryptographic hash functions). I'm saying that even though similar in name, cryptographic hash functions really can't (IMO, I suppose I should make clear) be put in the same place as normal hash functions because they barely have anything in common. You can't use on in the place of another nor are they really used in similar manners.My opinion was that in a module or package called hash, I expect toolsI agree. I'd think similarly (I'd assume std.hash has something to do with hash tables or hash functions used for hash tables). If I were looking to use a cryptographic hash function like SHA1 or (eh) MD5, I'd look for std.crypto first, and probably pick std.digest if I saw that. As a last resort I'd look in std.hash and vomit profusely after seeing it grouped with the "times 33" hash.
Aug 08 2012
Am Wed, 8 Aug 2012 14:16:40 +0000 (UTC) schrieb travert phare.normalesup.org (Christophe Travert):If it where me, I would have the presently reviewed module std.hash.hash be called std.hash.digest, and leave room here for regular hash functions. In any case, I think regular hash HAVE to be in a std.hash module or package, because people looking for a regular hash function will look here first.std.hash.digest doesn't sound too bad. We could have std.hash.func (or a better named module ;-) for general hash functions later.
Aug 08 2012
On Wednesday, August 08, 2012 18:47:34 Johannes Pfau wrote:Am Wed, 8 Aug 2012 14:16:40 +0000 (UTC) schrieb travert phare.normalesup.org (Christophe Travert):I say just keep at simple and leave it at std.hash. It's plenty clear IMHO. - Jonathan M DavisIf it where me, I would have the presently reviewed module std.hash.hash be called std.hash.digest, and leave room here for regular hash functions. In any case, I think regular hash HAVE to be in a std.hash module or package, because people looking for a regular hash function will look here first.std.hash.digest doesn't sound too bad. We could have std.hash.func (or a better named module ;-) for general hash functions later.
Aug 08 2012
On 8/8/12 4:34 PM, Jonathan M Davis wrote:I say just keep at simple and leave it at std.hash. It's plenty clear IMHO.Not clear to quite a few of us. IMHO it just makes us seem (to the larger community) clever about a petty point. There's plenty of other better names, and std.digest is very adequate. Andrei
Aug 08 2012
On Wednesday, August 08, 2012 18:47:04 Andrei Alexandrescu wrote:On 8/8/12 4:34 PM, Jonathan M Davis wrote:I prefer std.hash to std.digest, but I don't necessarily care all that much. What I was objecting to in particular was the suggestion to split it into std.hash.digest and std.hash.func. I think that all of the hashing algorithms should just go in the one package. Adding another layer is an unnecessary complication IMHO. - Jonathan M DavisI say just keep at simple and leave it at std.hash. It's plenty clear IMHO.Not clear to quite a few of us. IMHO it just makes us seem (to the larger community) clever about a petty point. There's plenty of other better names, and std.digest is very adequate.
Aug 08 2012
On Wednesday, 8 August 2012 at 16:47:35 UTC, Johannes Pfau wrote:std.hash.digest doesn't sound too bad. We could have std.hash.func (or a better named module ;-) for general hash functions later.Three basic types of hash functions are: 1) Hash - for fast searching and indexing in data structures 2) Checksum - detects the accidental errors in files, archives, streams 3) Message digest code - prevents the intentional modification of data They should not be mixed IMHO. 1) should go into std.container or (maybe) std.algorithm 2) std.checksum 3) std.crypto.mdc or std.crypto.digest
Aug 15 2012
On Wednesday, 15 August 2012 at 08:49:26 UTC, RivenTheMage wrote:Three basic types of hash functions are: 1) Hash - for fast searching and indexing in data structures 2) Checksum - detects the accidental errors in files, archives, streams 3) Message digest code - prevents the intentional modification of data They should not be mixed IMHO.Why? 1) might have a different interface than the others, but 2) and 3) only differ in their cryptological properties, the interface will likely be just the same – or what are you thinking about? David
Aug 15 2012
On Wednesday, 15 August 2012 at 08:55:30 UTC, David Nadlinger wrote:Why? 1) might have a different interface than the others, but 2) and 3) only differ in their cryptological properties, the interface will likely be just the same – or what are you thinking about? DavidThe "only" difference between 2) and 3) is a big difference. CRC32, Adler, etc. are NOT a cryptographic hash fuctions. Their purpose is to detect accidental errors caused by malfunction of hardware or software, nothing more. For me, it's weird and confusing to mix checksums and MDCs. It's about organizing the standard library for the better usability. That is the whole point of modules. After all, you can place all the standard library in one module, why not? :-)
Aug 15 2012
Another example is a systematic error-correcting codes. The "only" difference between them and checksums is the ability to correct errors, not just detect them. CRC or MD5 can be viewed as systematic code with zero error-correcting ability. Should we mix Reed-Solomon codes and MD5 in one module? I don't think so.
Aug 15 2012
On Wed, Aug 15, 2012 at 2:40 AM, RivenTheMage <riven-mage id.ru> wrote:Another example is a systematic error-correcting codes. The "only" difference between them and checksums is the ability to correct errors, not just detect them. CRC or MD5 can be viewed as systematic code with zero error-correcting ability. Should we mix Reed-Solomon codes and MD5 in one module? I don't think so.Some people's point is that MD5 was consider a cryptographic digest function 16 years ago. It is not consider cryptographically secure today. So why make any design assumption today on how the landscape will look tomorrow? Specially on a field that is always changing. Why not lumped them all together and explain the current situation and recommendation in the comments. Looks at Python's passlib module for example. They enumerate every password encoding scheme under the sun (except for scrypt :() and give a recommendation on the appropriate algorithm to use in the current computing landscape. http://packages.python.org/passlib/lib/passlib.hash.html#module-passlib.hash Thanks, -Jose
Aug 15 2012
On Wednesday, 15 August 2012 at 14:36:00 UTC, José Armando García Sancio wrote:Some people's point is that MD5 was consider a cryptographic digest function 16 years ago. It is not consider cryptographically secure today. So why make any design assumption today on how the landscape will look tomorrow? Specially on a field that is always changing. Why not lumped them all together and explain the current situation and recommendation in the comments. Looks at Python's passlib module for example. They enumerate every password encoding scheme under the sun (except for scrypt :() and give a recommendation on the appropriate algorithm to use in the current computing landscape. http://packages.python.org/passlib/lib/passlib.hash.html#module-passlib.hash Thanks, -JoseI agree that MD5 isn't cryptographically secure anymore, but it was designed as a cryptographic hash algorithm, and it shows. It's statistical and performance proprieties are completely different from CRCs, and no matter how broken, it still has a little of cryptographic strength (no practical preimage attack was found till this date, for example). Note that in the Python passlib, there is no mention to CRC, FNV, ROT13, etc. Their place is different.
Aug 15 2012
On Wed, Aug 15, 2012 at 8:11 AM, ReneSac <reneduani yahoo.com.br> wrote:Note that in the Python passlib, there is no mention to CRC, FNV, ROT13, etc. Their place is different.Thats because it is a "password module" and nobody or a small percentage of the population uses CRC for password digest. Note that the Python passlib module also has archaic plaintext encodings mainly for interacting with legacy systems. The basic point is that std.digest/std.hash (whatever people decide) should probably just have generic digesting algorithm. The user can decided which one to use given their requirements. Also, it would be beneficial if the module also includes a section where it recommends digest based on the current landscape of computing. High-level documentation and suggestions are easy to change; APIs are not. Thanks, -Jose
Aug 15 2012
On Wednesday, 15 August 2012 at 19:38:34 UTC, José Armando García Sancio wrote:Thats because it is a "password module" and nobody or a small percentage of the population uses CRC for password digest.In turn, that's because CRC is not not a crytographic hash and not suited for password hashing :)The basic point is that std.digest/std.hash (whatever people decide) should probably just have generic digesting algorithm.Generic digesting algorithm should probably go into std.algorithm. It could be used like that: ------------ import std.algorithm; import std.checksum; import std.crypto.mdc; ushort num = 1234; auto hash1 = hash!("(a >>> 20) ^ (a >>> 12) ^ (a >>> 7) ^ (a >>> 4) ^ a")(str); // indexing hash string str = "abcd"; auto hash3 = hash!(CRC32)(str); // checksum auto hash2 = hash!(MD5)(str); // crytographic hash ------------ CRC32 and MD5 are ranges and/or classes, derived from HashAlgorithm interface.
Aug 15 2012
On Thursday, 16 August 2012 at 03:02:59 UTC, RivenTheMage wrote:ushort num = 1234; auto hash1 = hash!("(a >>> 20) ^ (a >>> 12) ^ (a >>> 7) ^ (a >>> 4) ^ a")(str); // indexing hashI forgot that this case is already covered by reduce!(...)
Aug 15 2012
On 08-Aug-12 18:16, Christophe Travert wrote:"Chris Cain" , dans le message (digitalmars.D:174466), a écrit :You still can use say crc32 as normal hash function for some binary object. The notions are not as desperate as some designers would want them to be.On Wednesday, 8 August 2012 at 13:38:26 UTC, travert phare.normalesup.org (Christophe Travert) wrote:They should not be categorized the same. I don't expect a regular hash function to pass the isDigest predicate. But they have many similarities, which explains they are all called hash functions. There is enough room in a package to put several related concepts!I think the question is: is std.hash going to contain only message-digest algorithm, or could it also contain other hash functions? I think there is enough room in a package to have both message-digest algorithm and other kinds of hash functions.Even if that were the case, I'd say they should be kept separate. Cryptographic hash functions serve extremely different purposes from regular hash functions. There is no reason they should be categorized the same.Here, we have a package for 4 files, with a total number of line that is about one third of the single std.algorithm file (which is probably too big, I conceed). There aren't hundreds of message-digest functions to add here.I'd rather see clean by family separation, as importing one huge digest module only to use SHA is kind of creepy. On the other hand as all of code is templated it's not a big deal.If it where me, I would have the presently reviewed module std.hash.hash be called std.hash.digest, and leave room here for regular hash functions. In any case, I think regular hash HAVE to be in a std.hash module or package, because people looking for a regular hash function will look here first.I thing concerns me: if incremental digest hashes are all in one module what are the (would be) other modules in std.hash? -- Dmitry Olshansky
Aug 08 2012
On 08-Aug-12 21:00, Dmitry Olshansky wrote:On 08-Aug-12 18:16, Christophe Travert wrote:Damned spellcheckers: desperate -> disparate -- Dmitry OlshanskyThey should not be categorized the same. I don't expect a regular hash function to pass the isDigest predicate. But they have many similarities, which explains they are all called hash functions. There is enough room in a package to put several related concepts!You still can use say crc32 as normal hash function for some binary object. The notions are not as desperate as some designers would want them to be.
Aug 08 2012
On 8/8/12 8:54 AM, Regan Heath wrote:"Hash" has too many meanings, we should avoid it.Yes please. Andrei
Aug 08 2012
On 07-Aug-12 22:31, Ary Manzana wrote:I think "std.crypto" is a better name for the package. At first I thought it contained an implementation of a Hash table.There is std.container, so unambiguous for me. As for std.crypto it's been discussed to death before. Short answer - no, because it's assumed to include other useful hash functions (like crc32, later I expect adler and whatnot). Also keep in mind that cryptographic hash is more of a status then permanent property. Now that md5 was cracked, it is typically used only as normal hash (e.g. checksum). Same thing would one day happen to SHA family. -- Dmitry Olshansky
Aug 07 2012
On 8/7/2012 10:39 AM, Dmitry Olshansky wrote:std.hash.hash is a new module for Phobos defining an uniform interface for hashes and checksums. It also provides some useful helper functions to deal with this new API.The hash functions must use a Range interface, not a file interface. This is extremely important.
Aug 07 2012
Am Tue, 07 Aug 2012 17:39:15 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:On 8/7/2012 10:39 AM, Dmitry Olshansky wrote:I guess this is meant as a general statement and not specifically targeted at my std.hash proposal? I'm a little confused as all hashes already are OutputRanges in my proposal. It's probably not explicit enough in the documentation, but it's mentioned in one example and in the documentation for 'put';std.hash.hash is a new module for Phobos defining an uniform interface for hashes and checksums. It also provides some useful helper functions to deal with this new API.The hash functions must use a Range interface, not a file interface. This is extremely important.
Aug 08 2012
On 8/8/2012 1:44 AM, Johannes Pfau wrote:Am Tue, 07 Aug 2012 17:39:15 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:Both.On 8/7/2012 10:39 AM, Dmitry Olshansky wrote:I guess this is meant as a general statement and not specifically targeted at my std.hash proposal?std.hash.hash is a new module for Phobos defining an uniform interface for hashes and checksums. It also provides some useful helper functions to deal with this new API.The hash functions must use a Range interface, not a file interface. This is extremely important.I'm a little confused as all hashes already are OutputRanges in my proposal. It's probably not explicit enough in the documentation, but it's mentioned in one example and in the documentation for 'put';It should accept an input range. But using an Output Range confuses me. A hash function is a reduce algorithm - it accepts a sequence of input values, and produces a single value. You should be able to write code like: ubyte[] data; ... auto crc = data.crc32(); For example, the hash example given is: foreach (buffer; file.byChunk(4096 * 1024)) hash.put(buffer); auto result = hash.finish(); Instead it should be something like: auto result = file.byChunk(4096 * 1025).joiner.hash(); The magic is that any input range that produces bytes could be used, and that byte producing input range can be hooked up to the input of any reducing function. The use of a member finish() is not what any other reduce algorithm has, and so the interface is not a general component interface. I know the documentation on ranges in Phobos is incomplete and confusing. I appreciate the effort and care you're putting into this.
Aug 08 2012
Walter Bright wrote:auto result = file.byChunk(4096 * 1025).joiner.hash(); The magic is that any input range that produces bytes could be used, and that byte producing input range can be hooked up to the input of any reducing function.Suppose you have a callback that will give you blocks of bytes to hash. Blocks of bytes come from a socket, but not a blocking one. Instead, socket uses eventing mechanism (libevent) to get notifications about its readiness. How would you use the hash API in this situation?
Aug 08 2012
On 8/8/2012 3:12 AM, Piotr Szturmaj wrote:Walter Bright wrote:Have the callback supply a range interface to call the hash with.auto result = file.byChunk(4096 * 1025).joiner.hash(); The magic is that any input range that produces bytes could be used, and that byte producing input range can be hooked up to the input of any reducing function.Suppose you have a callback that will give you blocks of bytes to hash. Blocks of bytes come from a socket, but not a blocking one. Instead, socket uses eventing mechanism (libevent) to get notifications about its readiness. How would you use the hash API in this situation?
Aug 08 2012
-------- Hash hash; void onData(void[] data) { hash.put(data); } void main() { hash.start(); auto stream = new EventTcpStream("localhost", 80); stream.onData = &onData; hash.finish(); } --------Have the callback supply a range interface to call the hash with.That hardly works for event based programming without using coroutines. It's the classical inversion-of-control dilemma of event based programming that forces you to save/restore your state with every event.
Aug 08 2012
On 8/8/2012 12:14 PM, Martin Nowak wrote:That hardly works for event based programming without using coroutines. It's the classical inversion-of-control dilemma of event based programming that forces you to save/restore your state with every event.See the discussion on using reduce().
Aug 08 2012
Am Wed, 08 Aug 2012 12:31:29 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:On 8/8/2012 12:14 PM, Martin Nowak wrote:I just don't understand it. Let's take the example by Martin Nowak and port it to reduce: (The code added as comments is the same code for hashes, working with the current API) int state; //Hash state; void onData(void[] data) { state = reduce(state, data); //copy(data, state); //state = copy(data, state); //also valid, but not necessary //state.put(data); //simple way, doesn't work for ranges } void main() { state = 0; //state.start(); auto stream = new EventTcpStream("localhost", 80); stream.onData = &onData; //auto result = hash.finish(); } There are only 2 differences: 1: the order of the arguments passed to copy and reduce is swapped. This kinda makes sense (if copy is interpreted as copyTo). Solution: Provide a method copyInto with swapped arguments if consistency is really so important. 2: We need an additional call to finish. I can't say it often enough, I don't see a sane way to avoid it. Hashes work on blocks, if you didn't pass enough data finish will have to fill the rest of the block with zeros before you can get the hash value. This operation can't be undone. To get a valid result with every call to copy, you'd have to always call finish. This is * inefficient, you calculate intermediate values you don't need at all * you have to copy the hashes state, as you can't continue hashing after finish has been called and both, the state and the result would have to fit into the one value (called seed for reduce). But then it's still not 100% consistent, as reduce will return a single value, not some struct including internal state.That hardly works for event based programming without using coroutines. It's the classical inversion-of-control dilemma of event based programming that forces you to save/restore your state with every event.See the discussion on using reduce().
Aug 09 2012
On 8/9/2012 2:48 AM, Johannes Pfau wrote:Am Wed, 08 Aug 2012 12:31:29 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:Consistency is important so that disparate components can fit together.On 8/8/2012 12:14 PM, Martin Nowak wrote:I just don't understand it. Let's take the example by Martin Nowak and port it to reduce: (The code added as comments is the same code for hashes, working with the current API) int state; //Hash state; void onData(void[] data) { state = reduce(state, data); //copy(data, state); //state = copy(data, state); //also valid, but not necessary //state.put(data); //simple way, doesn't work for ranges } void main() { state = 0; //state.start(); auto stream = new EventTcpStream("localhost", 80); stream.onData = &onData; //auto result = hash.finish(); } There are only 2 differences: 1: the order of the arguments passed to copy and reduce is swapped. This kinda makes sense (if copy is interpreted as copyTo). Solution: Provide a method copyInto with swapped arguments if consistency is really so important.That hardly works for event based programming without using coroutines. It's the classical inversion-of-control dilemma of event based programming that forces you to save/restore your state with every event.See the discussion on using reduce().2: We need an additional call to finish. I can't say it often enough, I don't see a sane way to avoid it. Hashes work on blocks, if you didn't pass enough data finish will have to fill the rest of the block with zeros before you can get the hash value. This operation can't be undone. To get a valid result with every call to copy, you'd have to always call finish. This is * inefficient, you calculate intermediate values you don't need at all * you have to copy the hashes state, as you can't continue hashing after finish has been called and both, the state and the result would have to fit into the one value (called seed for reduce). But then it's still not 100% consistent, as reduce will return a single value, not some struct including internal state.That isn't a problem, the internal state can be private data for the struct, and the "finish value" can be the result of overloading operator() on that struct. I'm not sure if that would work, but it's worth investigating.
Aug 09 2012
Le 09/08/2012 11:48, Johannes Pfau a crit :Am Wed, 08 Aug 2012 12:31:29 -0700 schrieb Walter Bright<newshound2 digitalmars.com>:I'm pretty sure it is possible to pad and finish when a result is required without messing up the internal state.On 8/8/2012 12:14 PM, Martin Nowak wrote:I just don't understand it. Let's take the example by Martin Nowak and port it to reduce: (The code added as comments is the same code for hashes, working with the current API) int state; //Hash state; void onData(void[] data) { state = reduce(state, data); //copy(data, state); //state = copy(data, state); //also valid, but not necessary //state.put(data); //simple way, doesn't work for ranges } void main() { state = 0; //state.start(); auto stream = new EventTcpStream("localhost", 80); stream.onData =&onData; //auto result = hash.finish(); } There are only 2 differences: 1: the order of the arguments passed to copy and reduce is swapped. This kinda makes sense (if copy is interpreted as copyTo). Solution: Provide a method copyInto with swapped arguments if consistency is really so important. 2: We need an additional call to finish. I can't say it often enough, I don't see a sane way to avoid it. Hashes work on blocks, if you didn't pass enough data finish will have to fill the rest of the block with zeros before you can get the hash value. This operation can't be undone. To get a valid result with every call to copy, you'd have to always call finish. This is * inefficient, you calculate intermediate values you don't need at all * you have to copy the hashes state, as you can't continue hashing after finish has been called and both, the state and the result would have to fit into the one value (called seed for reduce). But then it's still not 100% consistent, as reduce will return a single value, not some struct including internal state.That hardly works for event based programming without using coroutines. It's the classical inversion-of-control dilemma of event based programming that forces you to save/restore your state with every event.See the discussion on using reduce().
Aug 16 2012
On Thu, 16 Aug 2012 21:25:55 +0100, deadalnix <deadalnix gmail.com> wrot= e:Le 09/08/2012 11:48, Johannes Pfau a =E9crit :dAm Wed, 08 Aug 2012 12:31:29 -0700 schrieb Walter Bright<newshound2 digitalmars.com>:On 8/8/2012 12:14 PM, Martin Nowak wrote:I just don't understand it. Let's take the example by Martin Nowak an=That hardly works for event based programming without using coroutines. It's the classical inversion-of-control dilemma of event based programming that forces you to save/restore your state with every event.See the discussion on using reduce().port it to reduce: (The code added as comments is the same code for hashes, working with the current API) int state; //Hash state; void onData(void[] data) { state =3D reduce(state, data); //copy(data, state); //state =3D copy(data, state); //also valid, but not necessary //state.put(data); //simple way, doesn't work for ranges } void main() { state =3D 0; //state.start(); auto stream =3D new EventTcpStream("localhost", 80); stream.onData =3D&onData; //auto result =3D hash.finish(); } There are only 2 differences: 1: the order of the arguments passed to copy and reduce is swapped. This=dekinda makes sense (if copy is interpreted as copyTo). Solution: Provi=ta method copyInto with swapped arguments if consistency is really so important. 2: We need an additional call to finish. I can't say it often enough, I don't see a sane way to avoid it. Hashes work on blocks, if you didn'=lpass enough data finish will have to fill the rest of the block with zeros before you can get the hash value. This operation can't be undone. To get a valid result with every call to copy, you'd have to always call finish. This is * inefficient, you calculate intermediate values you don't need at al=ue* you have to copy the hashes state, as you can't continue hashing after finish has been called and both, the state and the result would have to fit into the one val=(called seed for reduce). But then it's still not 100% consistent, as=reduce will return a single value, not some struct including internal=state.I'm pretty sure it is possible to pad and finish when a result is =required without messing up the internal state.Without copying it? AFAICR padding/finishing mutates the state, I mean,= = that's the whole point of it. R -- = Using Opera's revolutionary email client: http://www.opera.com/mail/
Aug 17 2012
Le 08/08/2012 11:49, Walter Bright a crit :On 8/8/2012 1:44 AM, Johannes Pfau wrote:That is a really good point. +1Am Tue, 07 Aug 2012 17:39:15 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:Both.On 8/7/2012 10:39 AM, Dmitry Olshansky wrote:I guess this is meant as a general statement and not specifically targeted at my std.hash proposal?std.hash.hash is a new module for Phobos defining an uniform interface for hashes and checksums. It also provides some useful helper functions to deal with this new API.The hash functions must use a Range interface, not a file interface. This is extremely important.I'm a little confused as all hashes already are OutputRanges in my proposal. It's probably not explicit enough in the documentation, but it's mentioned in one example and in the documentation for 'put';It should accept an input range. But using an Output Range confuses me. A hash function is a reduce algorithm - it accepts a sequence of input values, and produces a single value. You should be able to write code like: ubyte[] data; ... auto crc = data.crc32(); For example, the hash example given is: foreach (buffer; file.byChunk(4096 * 1024)) hash.put(buffer); auto result = hash.finish(); Instead it should be something like: auto result = file.byChunk(4096 * 1025).joiner.hash(); The magic is that any input range that produces bytes could be used, and that byte producing input range can be hooked up to the input of any reducing function. The use of a member finish() is not what any other reduce algorithm has, and so the interface is not a general component interface. I know the documentation on ranges in Phobos is incomplete and confusing. I appreciate the effort and care you're putting into this.
Aug 08 2012
It should accept an input range. But using an Output Range confuses me. A hash function is a reduce algorithm - it accepts a sequence of input values, and produces a single value. You should be able to write code like: ubyte[] data; ... auto crc = data.crc32(); For example, the hash example given is: foreach (buffer; file.byChunk(4096 * 1024)) hash.put(buffer); auto result = hash.finish(); Instead it should be something like: auto result = file.byChunk(4096 * 1025).joiner.hash();I think sha1Of/digest!SHA1 should do this. It's also important to have a stateful hash implementation that can be updated incrementally, e.g. from a callback.
Aug 08 2012
On 8/8/2012 5:13 AM, Martin Nowak wrote:Take a look at the reduce function in http://dlang.org/phobos/std_algorithm.html#reduce It has provision for an initial state that can be the current running total.It should accept an input range. But using an Output Range confuses me. A hash function is a reduce algorithm - it accepts a sequence of input values, and produces a single value. You should be able to write code like: ubyte[] data; ... auto crc = data.crc32(); For example, the hash example given is: foreach (buffer; file.byChunk(4096 * 1024)) hash.put(buffer); auto result = hash.finish(); Instead it should be something like: auto result = file.byChunk(4096 * 1025).joiner.hash();I think sha1Of/digest!SHA1 should do this. It's also important to have a stateful hash implementation that can be updated incrementally, e.g. from a callback.
Aug 08 2012
Am Wed, 08 Aug 2012 11:40:10 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:Take a look at the reduce function in http://dlang.org/phobos/std_algorithm.html#reduce It has provision for an initial state that can be the current running total.This can only work if the final state is valid as an initial state. This is just not true for some hash algorithms. --- auto sum = reduce!("a + b")(0, range); auto sum2 = reduce!("a + b")(sum, range2); --- --- MD5 hash; hash.start(); auto sum = copy(range, hash); auto sum2 = copy(range2, sum); auto result = hash.finish(); --- No where's the difference, except that for hashes the context ('hash') has to be setup and finished manually?
Aug 08 2012
On 8/8/2012 12:08 PM, Johannes Pfau wrote:No where's the difference, except that for hashes the context ('hash') has to be setup and finished manually?The idea is to have hash act like a component - not with special added code the user has to write. In this case, it needs to work like a reduce algorithm, because it is a reduce algorithm. Need to find a way to make this work.
Aug 08 2012
Am Wed, 08 Aug 2012 12:27:39 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:On 8/8/2012 12:08 PM, Johannes Pfau wrote:Please explain that. Nobody's going to simply replace a call to reduce with a call to a fictional 'hashReduce'. Why is it so important that reduce + hash API's match 100% even if the API doesn't fit hashes?No where's the difference, except that for hashes the context ('hash') has to be setup and finished manually?The idea is to have hash act like a component - not with special added code the user has to write.In this case, it needs to work like a reduce algorithm, because it is a reduce algorithm. Need to find a way to make this work.We could do some ugly, performance killing hacks to make it possible, but I just don't see why this is necessary. ---- struct InterHash { MD5 ctx; ubyte[16] finished; alias finished this; } InterHash hashReduce(Range)(Range data) { InterHash hash; hash.ctx.start(); return hashReduce(hash, data); } InterHash hashReduce(Range)(InterHash hash, Range data) { copy(data, hash); auto ctxCopy = hash.ctx; hash.finished = ctxCopy.finish(); return hash; } auto a = hashReduce([1,2,3]); auto b = hashReduce(a, [3,4]); ---- However, a and b are still not really valid hash values. I just don't see why we should force an interface onto hashes which just doesn't fit.
Aug 09 2012
On Wednesday, 8 August 2012 at 19:27:54 UTC, Walter Bright wrote:The idea is to have hash act like a component - not with special added code the user has to write.Sorry, but I think this is a meaningless statement without specifying what kind of interface the component should adhere to. In my opinion, the proposed std.hash design would be a perfectly valid interface for »accumulate stuff and at some point get a result«-type components.In this case, it needs to work like a reduce algorithm, because it is a reduce algorithm. Need to find a way to make this work.Hash functions are _not_ analogous to reduce(), because the operation performed by reduce() is stateless, whereas hash functions generally have some internal state. »Continuing« a reduce() operation by repeatedly calling it with the last partial result as the starting value is only possible because there is no additional state to carry over. To make this work with hashes, you'd have to return something encapsulating the internal state from your hash function. But then, you again need to obtain the actual result from that return value from that result somehow, which defeats the original intent of making it work like reduce – and incidentally is what finish() does. David
Aug 09 2012
On Thu, 09 Aug 2012 10:59:47 +0100, David Nadlinger <see klickverbot.at>= = wrote:On Wednesday, 8 August 2012 at 19:27:54 UTC, Walter Bright wrote:d =The idea is to have hash act like a component - not with special adde==code the user has to write.Sorry, but I think this is a meaningless statement without specifying =what kind of interface the component should adhere to. In my opinion, ==the proposed std.hash design would be a perfectly valid interface for ===C2=BBaccumulate stuff and at some point get a result=C2=AB-type compo=nents.a =In this case, it needs to work like a reduce algorithm, because it is==reduce algorithm. Need to find a way to make this work.Hash functions are _not_ analogous to reduce(), because the operation =performed by reduce() is stateless, whereas hash functions generally =have some internal state. =C2=BBContinuing=C2=AB a reduce() operation by repeatedly calling it w=ith the last =partial result as the starting value is only possible because there is==no additional state to carry over. To make this work with hashes, you'=d =have to return something encapsulating the internal state from your ha=sh =function.This isn't necessarily a problem.But then, you again need to obtain the actual result from that return ==value from that result somehow, which defeats the original intent of =making it work like reduce =E2=80=93 and incidentally is what finish()=does. But, this is a problem. finish in most cases pads the remaining data to= a = boundary of the internal state size, then completes one more iteration o= f = the algorithm to produce the final result. So, like David has just said, you can have 1 or the other. Either you c= an = chain hashreduce operations together, but you have to perform a manual = finish step to get the actual result, or you cannot chain hashreduce = operations together and finish is done automatically when the input rang= e = is consumed. Wild thought.. and I have to admit I've not followed the proposed or = suggested API closely, nor have I used ranges extensively so this may no= t = be possible.. If the range/hash object stores the current state and returns this as th= e = result of hashreduce, it would be chainable. If it also had a "Digest" = = property/method which performed finish on a /temporary copy/ of the stat= e = it would almost be as automatic as reduce. There would still be a manua= l = step to get the result, but it would be analogous to calling toString on= = any range object to output it's "value". The Digest property/method wou= ld = not modify the internal state, and could be called at any time between = (not sure there is a point to this) or after chained hashreduce operatio= ns. R -- = Using Opera's revolutionary email client: http://www.opera.com/mail/
Aug 09 2012
On 09-Aug-12 14:15, Regan Heath wrote:On Thu, 09 Aug 2012 10:59:47 +0100, David Nadlinger <see klickverbot.at>If the range/hash object stores the current state and returns this as the result of hashreduce, it would be chainable. If it also had a "Digest" property/method which performed finish on a /temporary copy/ of the state it would almost be as automatic as reduce. There would still be a manual step to get the result, but it would be analogous to calling toString on any range object to output it's "value". The Digest property/method would not modify the internal state, and could be called at any time between (not sure there is a point to this) or after chained hashreduce operations.struct ShaState { ... alias ubyte[16] getDidgest(); } Problem is: too much magic, spoils auto x = reduce(...); idiom. To be brutally honest I don't see what in std.hash doesn't fit "range component" model: - it works on input ranges via convenient adaptor (or would work soon) - it has output _ranges_. What's not to like about it? That it doesn't fit general reduce algorithm? I'm not convinced it's important. In the same vane you may try to shoehorn symmetric cyphers to follow map interface because conceptually it does 1:1 conversion. Now important thing: just look at other output ranges. They either require extra call (see Appender .data) or horrendously slow and ugly see lockingTextWriter/Reader. -- Dmitry Olshansky
Aug 09 2012
On 09-Aug-12 20:32, Dmitry Olshansky wrote:On 09-Aug-12 14:15, Regan Heath wrote:Too fast.. should have been: ubyte[16] getDidgest(); alias getDigest this; -- Dmitry OlshanskyOn Thu, 09 Aug 2012 10:59:47 +0100, David Nadlinger <see klickverbot.at>If the range/hash object stores the current state and returns this as the result of hashreduce, it would be chainable. If it also had a "Digest" property/method which performed finish on a /temporary copy/ of the state it would almost be as automatic as reduce. There would still be a manual step to get the result, but it would be analogous to calling toString on any range object to output it's "value". The Digest property/method would not modify the internal state, and could be called at any time between (not sure there is a point to this) or after chained hashreduce operations.struct ShaState { ... alias ubyte[16] getDidgest(); }
Aug 09 2012
On Thursday, 9 August 2012 at 16:37:57 UTC, Dmitry Olshansky wrote:Too fast.. should have been: ubyte[16] getDidgest(); alias getDigest this;I have been thinking about using AliasThis as well, but the problem is that precisely the use case this is meant to enable (i.e. »snapping together components«, like Walter said) tends to get broken in subtle ways due to the use of template functions/type inference. David
Aug 09 2012
On Thursday, August 09, 2012 18:46:59 David Nadlinger wrote:On Thursday, 9 August 2012 at 16:37:57 UTC, Dmitry Olshansky wrote:Yeah. alias this can be very useful, but it's very dangerous when it comes to templated functions, because it's very easy to make assumptions when writing those functions which hold great when using the actual type but do funny things when alias this comes into play. And unfortunately, I don't think that it's something that's at all well understood. It's probably one of those things that we as a community need to get a better grip on. Just imagine how bad it would be though if D allowed as many implicit conversions as C++ does... - Jonathan M DavisToo fast.. should have been: ubyte[16] getDidgest(); alias getDigest this;I have been thinking about using AliasThis as well, but the problem is that precisely the use case this is meant to enable (i.e. »snapping together components«, like Walter said) tends to get broken in subtle ways due to the use of template functions/type inference.
Aug 09 2012
On 8/9/2012 2:59 AM, David Nadlinger wrote:On Wednesday, 8 August 2012 at 19:27:54 UTC, Walter Bright wrote:It is not a meaningless statement in that components have a predictable set of methods and properties. That's all a range is. Requiring extra methods means there's either an error in the component interface design or an error in the component instance design. What I'm trying to get away from is the C library style where every library lives in its own world, and when the user tries to connect them he's got a fair amount of work to do building a scaffolding between them. With component programming, the interfaces between disparate things is standardized. It does not have unique methods for different instances. For example, one component has a finish() method, another has a getResult() method, and a third has no method at all. This situation I wish to avoid.The idea is to have hash act like a component - not with special added code the user has to write.Sorry, but I think this is a meaningless statement without specifying what kind of interface the component should adhere to. In my opinion, the proposed std.hash design would be a perfectly valid interface for »accumulate stuff and at some point get a result«-type components.Wouldn't that be simply the handle to the hash?In this case, it needs to work like a reduce algorithm, because it is a reduce algorithm. Need to find a way to make this work.Hash functions are _not_ analogous to reduce(), because the operation performed by reduce() is stateless, whereas hash functions generally have some internal state. »Continuing« a reduce() operation by repeatedly calling it with the last partial result as the starting value is only possible because there is no additional state to carry over. To make this work with hashes, you'd have to return something encapsulating the internal state from your hash function.But then, you again need to obtain the actual result from that return value from that result somehow, which defeats the original intent of making it work like reduce – and incidentally is what finish() does.I understand what finish() does. The interesting part is trying to figure a way out of needing that method. Or perhaps the reduce component design is incorrect.
Aug 09 2012
If a has is a range, it's an output range, because it's something you fee data to. Output range have only one method: put. Johannes used this method. But it's not sufficient, you need something to start and to finish the hash. To bring consistency in the library, we should not remove this start and finish method. We should make all output range of the library use the same functions. In the library, we have really few output range. We have writable input range and we have Appender. Is there more? There should be files, socket, maybe even signal, but IIRC these don't implement output range at the moment. What did I miss? Appender doesn't use a finish method, but we have to 'get the result' of the appender, and for this we use appender.data. This name is not appropriate for generically getting a result or terminating an output range. So We need a name that fits most output range use. start/finish sounds not bad. open/close fits files and socket, but many not all output ranges. Relying solely on constructors, opCall or alias this seems dangerous to me. -- Christophe
Aug 09 2012
See the new thread Andrei started entitled "finish function for output ranges". I think this discussion has clearly discovered a shortcoming in the current range design, and Andrei has a proposed solution.
Aug 11 2012
On Thursday, 9 August 2012 at 09:59:48 UTC, David Nadlinger wrote:An example of stateless hash in .net: http://msdn.microsoft.com/en-us/library/xa627k19.aspxIn this case, it needs to work like a reduce algorithm, because it is a reduce algorithm. Need to find a way to make this work.Hash functions are _not_ analogous to reduce(), because the operation performed by reduce() is stateless, whereas hash functions generally have some internal state.
Aug 15 2012
On 15-Aug-12 11:41, Kagamin wrote:On Thursday, 9 August 2012 at 09:59:48 UTC, David Nadlinger wrote:AFAIK it'a method of HashAlgorithm Object. http://msdn.microsoft.com/en-us/library/c06s9c55 It also includes TransformBlock & TransformFinalBlock. It does contain state of course.An example of stateless hash in .net: http://msdn.microsoft.com/en-us/library/xa627k19.aspxIn this case, it needs to work like a reduce algorithm, because it is a reduce algorithm. Need to find a way to make this work.Hash functions are _not_ analogous to reduce(), because the operation performed by reduce() is stateless, whereas hash functions generally have some internal state.
Aug 15 2012
On Wednesday, 15 August 2012 at 08:13:27 UTC, Dmitry Olshansky wrote:AFAIK it'a method of HashAlgorithm Object.It's a minor design detail, see the example: the method is called on each file without any explicit preparations and without calls to methods like TransformBlock. That's how stateless computation usually looks - it's just done and that's all.
Aug 15 2012
On Wednesday, 15 August 2012 at 08:17:14 UTC, Kagamin wrote:On Wednesday, 15 August 2012 at 08:13:27 UTC, Dmitry Olshansky wrote:No, it's not a »minor design detail«, at least not regarding what has been the topic of the discussion here – you can always provide a simple wrapper function in the proposed design and call it »stateless« as well (in fact, an implementation has already been posted, IIRC). The point is that the ability to execute a hashing operation block by block is necessary, and that this operation is not analogous to reduce() because it potentially needs internal state. DavidAFAIK it'a method of HashAlgorithm Object.It's a minor design detail, see the example […]
Aug 15 2012
On 15-Aug-12 12:17, Kagamin wrote:On Wednesday, 15 August 2012 at 08:13:27 UTC, Dmitry Olshansky wrote:Brrr. It's how convenience wrapper works :) And I totally expect this to call the same code and keep the same state during the work. E.g. see std.digest.digest functions digest or hexDigest you could call it stateless in the same vane.AFAIK it'a method of HashAlgorithm Object.It's a minor design detail, see the example: the method is called on each file without any explicit preparations and without calls to methods like TransformBlock. That's how stateless computation usually looks - it's just done and that's all.
Aug 15 2012
On Wednesday, 15 August 2012 at 08:25:51 UTC, Dmitry Olshansky wrote:Brrr. It's how convenience wrapper works :) And I totally expect this to call the same code and keep the same state during the work. E.g. see std.digest.digest functions digest or hexDigest you could call it stateless in the same vane.Well there was a wish for stateless hash, Walter even posted the required interface: auto result = file.byChunk(4096 * 1025).joiner.hash(); I just pointed out, that possibly stateful implementation doesn't prevent stateless interface. Can one even say that the implementation is stateful given just a stateless interface? One can even call reduce stateful because it does keep track of the result which is or a part of its state.
Aug 15 2012
On Wednesday, 15 August 2012 at 08:45:35 UTC, Kagamin wrote:I just pointed out, that possibly stateful implementation doesn't prevent stateless interface. Can one even say that the implementation is stateful given just a stateless interface?And our point is that such an interface is trivial to implement over a »stateful« interface, and even already exists. Maybe you want to peruse some of the old posts? David
Aug 15 2012
On 15-Aug-12 12:45, Kagamin wrote:On Wednesday, 15 August 2012 at 08:25:51 UTC, Dmitry Olshansky wrote:auto result = file.byChunk(4096 * 1025).joiner.digest(); and is already supported in the proposal, peek at updated docs. There is no need for additional methods and whatnot. -- Olshansky DmitryBrrr. It's how convenience wrapper works :) And I totally expect this to call the same code and keep the same state during the work. E.g. see std.digest.digest functions digest or hexDigest you could call it stateless in the same vane.Well there was a wish for stateless hash, Walter even posted the required interface: auto result = file.byChunk(4096 * 1025).joiner.hash();
Aug 15 2012
On Wednesday, 15 August 2012 at 07:41:20 UTC, Kagamin wrote:On Thursday, 9 August 2012 at 09:59:48 UTC, David Nadlinger wrote:http://msdn.microsoft.com/en-us/library/system.security.cryptography.hashalgorithm.state DavidHash functions are _not_ analogous to reduce(), because the operation performed by reduce() is stateless, whereas hash functions generally have some internal state.An example of stateless hash in .net: http://msdn.microsoft.com/en-us/library/xa627k19.aspx
Aug 15 2012
On Wednesday, 15 August 2012 at 08:17:01 UTC, David Nadlinger wrote:On Wednesday, 15 August 2012 at 07:41:20 UTC, Kagamin wrote:Ok, but HashAlgorithm still supports stateless interface which consists of a single method with a couple of overloads, the example in the ComputeHash article speaks for itself.On Thursday, 9 August 2012 at 09:59:48 UTC, David Nadlinger wrote:http://msdn.microsoft.com/en-us/library/system.security.cryptography.hashalgorithm.state DavidHash functions are _not_ analogous to reduce(), because the operation performed by reduce() is stateless, whereas hash functions generally have some internal state.An example of stateless hash in .net: http://msdn.microsoft.com/en-us/library/xa627k19.aspx
Aug 15 2012
Am Wed, 08 Aug 2012 02:49:00 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:It should accept an input range. But using an Output Range confuses me. A hash function is a reduce algorithm - it accepts a sequence of input values, and produces a single value. You should be able to write code like: ubyte[] data; ... auto crc = data.crc32();auto crc = crc32Of(data); auto crc = data.crc32Of(); //ufcs This doesn't wok with every InputRange and this needs to be fixed. That's a quite simple fix (max 10 lines of code, one new overload) and not a inherent problem of the API (see below for more).For example, the hash example given is: foreach (buffer; file.byChunk(4096 * 1024)) hash.put(buffer); auto result = hash.finish(); Instead it should be something like: auto result = file.byChunk(4096 * 1025).joiner.hash();But it also says this: //As digests implement OutputRange, we could use std.algorithm.copy //Let's do it manually for now You can basically do this with a range interface in 1 line: ---- import std.algorithm : copy; auto result = copy(file.byChunk(4096 * 1024), hash).finish(); ---- or with ufcs: ---- auto result = file.byChunk(4096 * 1024).copy(hash).finish(); ---- OK, you have to initialize hash and you have to call finish. With a new overload for digest it's as simple as this: ---- auto result = file.byChunk(4096 * 1024).digest!CRC32(); auto result = file.byChunk(4096 * 1024).crc32Of(); //with alias ---- The digests are OutputRanges, you can write data to them. There's absolutely no need to make them InputRanges as they only produce 1 value, and the hash sum is produced at once, so there's no way to receive the result in a partial way. A digest is very similar to Appender and it's .data property in this regard. The put function could accept an InputRange, but I think there was a thread recently which said this is evil for OutputRanges as the same feature can be achieved with copy. There's also no big benefit in doing it that way. If your InputRange is really unbuffered you could avoid double buffering. But then you transfer data byte by byte which will be horribly slow. If your InputRange has an internal buffer copy should just copy from that internal buffer to the 64 byte buffer used inside the digest implementation. This double buffering could only be avoided if the put function accepted an InputRange and could supply a buffer for that InputRange so the InputRange could write directly into the 64 byte buffer. But there's nothing like that in phobos, so this is all speculation. (Also the internal buffer is only used for the first 64 bytes (or less) of the supplied data. The rest is processed without copying. It could probably be optimized so that there's absolutely no copying as long as the input buffer length is a multiple of 64)The magic is that any input range that produces bytes could be used, and that byte producing input range can be hooked up to the input of any reducing function.See above. Every InputRange with byte element type does work. You just have to use copy.The use of a member finish() is not what any other reduce algorithm has, and so the interface is not a general component interface.It's a struct with state, not a simple reduce function so it needs that finish member. It works like that way in every other language (and this is not cause those languages don't have ranges; streams and iterators Let's take a real world example: You want to download a huge file with std.net.curl and hash it on the fly. Completely reading into a buffer is not possible (large file!). Now std.net.curl has a callback interface (which is forced on us by libcurl). How would you map that into an InputRange? (The byLine range in std.net.curl is eager, byLineAsync needs an additional thread). A newbie trying to do that will despair as it would work just fine in every other language, but D forces that InputRange interface. Implementing it as an OutputRange is much better. The described scenario works fine and hashing an InputRange also works fine - just use copy. OutputRange is much more universal for this usecase. However, I do agree digest!Hash, md5Of, sha1Of should have an additional overload which takes a InputRange. It would be implemented with copy and be a nice convenience function.I know the documentation on ranges in Phobos is incomplete and confusing.Especially for copy, as the documentation doesn't indicate the line I posted could work in any way ;-)
Aug 08 2012
Am Wed, 8 Aug 2012 17:50:33 +0200 schrieb Johannes Pfau <nospam example.com>:However, I do agree digest!Hash, md5Of, sha1Of should have an additional overload which takes a InputRange. It would be implemented with copy and be a nice convenience function.I implemented the function, it's actually quite simple: ---- digestType!Hash digestRange(Hash, Range)(Range data) if(isDigest!Hash && isInputRange!Range && __traits(compiles, digest!Hash(ElementType!(Range).init))) { Hash hash; hash.start(); copy(data, hash); return hash.finish(); } ---- but I don't know how make it an overload. See thread "overloading a function taking a void[][]" in D.learn for details.
Aug 08 2012
Johannes Pfau , dans le message (digitalmars.D:174478), a écrit :but I don't know how make it an overload. See thread "overloading a function taking a void[][]" in D.learn for details.Don't overload the function taking a void[][]. Remplace it. void[][] is a range of void[].
Aug 08 2012
On 8/8/2012 9:47 AM, Johannes Pfau wrote:Am Wed, 8 Aug 2012 17:50:33 +0200 schrieb Johannes Pfau <nospam example.com>:The finish() should be implicit when the range ends.However, I do agree digest!Hash, md5Of, sha1Of should have an additional overload which takes a InputRange. It would be implemented with copy and be a nice convenience function.I implemented the function, it's actually quite simple: ---- digestType!Hash digestRange(Hash, Range)(Range data) if(isDigest!Hash && isInputRange!Range && __traits(compiles, digest!Hash(ElementType!(Range).init))) { Hash hash; hash.start(); copy(data, hash); return hash.finish(); } ----but I don't know how make it an overload. See thread "overloading a function taking a void[][]" in D.learn for details.I don't know what you mean, it takes a range, not a void[][] as input.
Aug 08 2012
Am Wed, 08 Aug 2012 11:47:38 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:On 8/8/2012 9:47 AM, Johannes Pfau wrote:So the post in D.learn for a detailed description. Yes the code I posted takes a range, but digest (as it is now) takes void[][] to accept all kind of types _without_ template bloat. The difficulty is to combine those two overloads without causing unnecessary template bloat.Am Wed, 8 Aug 2012 17:50:33 +0200 schrieb Johannes Pfau <nospam example.com>:The finish() should be implicit when the range ends.However, I do agree digest!Hash, md5Of, sha1Of should have an additional overload which takes a InputRange. It would be implemented with copy and be a nice convenience function.I implemented the function, it's actually quite simple: ---- digestType!Hash digestRange(Hash, Range)(Range data) if(isDigest!Hash && isInputRange!Range && __traits(compiles, digest!Hash(ElementType!(Range).init))) { Hash hash; hash.start(); copy(data, hash); return hash.finish(); } ----but I don't know how make it an overload. See thread "overloading a function taking a void[][]" in D.learn for details.I don't know what you mean, it takes a range, not a void[][] as input.
Aug 08 2012
On 8/8/2012 12:05 PM, Johannes Pfau wrote:So the post in D.learn for a detailed description. Yes the code I posted takes a range, but digest (as it is now) takes void[][] to accept all kind of types _without_ template bloat. The difficulty is to combine those two overloads without causing unnecessary template bloat.Have the templated version with overloads simply call the single version (with a different name) with void[][].
Aug 08 2012
Am Wed, 08 Aug 2012 12:30:31 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:On 8/8/2012 12:05 PM, Johannes Pfau wrote:Well that's possible, but I don't like the template bloat it causes. AFAIK a function taking a void[][] is just one instance, with that redirecting approach we'll have one instance per array type. This seems unnecessary (and maybe the compiler can merge such template instances in the future), but I can't seem to find a way to avoid it, so we'll probably have to live with that. http://dpaste.dzfl.pl/f86717f7 I guess a second function digestRange is not acceptable?So the post in D.learn for a detailed description. Yes the code I posted takes a range, but digest (as it is now) takes void[][] to accept all kind of types _without_ template bloat. The difficulty is to combine those two overloads without causing unnecessary template bloat.Have the templated version with overloads simply call the single version (with a different name) with void[][].
Aug 09 2012
On 8/9/2012 2:05 AM, Johannes Pfau wrote:I guess a second function digestRange is not acceptable?It's more the user API that matters, not how it works under the hood.
Aug 09 2012
On 8/9/2012 2:05 AM, Johannes Pfau wrote:http://dpaste.dzfl.pl/f86717f7The Range argument - is it an InputRange, an OutputRange? While it's just a type name, the name should reflect what kind of range it is from the menagerie of ranges in std.range.
Aug 09 2012
Am Thu, 09 Aug 2012 02:13:10 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:On 8/9/2012 2:05 AM, Johannes Pfau wrote:It's an InputRange (of bytes) or an InputRange of some byte buffer (ElementType == ubyte[] || ElementType == ubyte[num]). We get the second version for free, so I just included it ;-) The documentation would have to make that clear of course. I could also change the name, it's just a proof of concept right now.http://dpaste.dzfl.pl/f86717f7The Range argument - is it an InputRange, an OutputRange? While it's just a type name, the name should reflect what kind of range it is from the menagerie of ranges in std.range.
Aug 09 2012
On 8/9/12 5:05 AM, Johannes Pfau wrote:Well that's possible, but I don't like the template bloat it causes.What have you measured, and what is your dislike based upon? The library function must be generic. Then users worried about bloating may use it with a limited number of types. A digest function only dealing with void[void[]] is unacceptable. Andrei
Aug 09 2012
Am Thu, 09 Aug 2012 08:48:37 -0400 schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:On 8/9/12 5:05 AM, Johannes Pfau wrote:What annoys me is that as long the function only supported arrays, it didn't need templates _at all_. So template bloat for arrays = 0. But adding range support means the version dealing with arrays now has to be a template as well(which is probably a bug, can't overload template and non template function) and will produce extra code for every array type. I just think adding range support shouldn't cause the array code to change in any way. But that's why I said I don't like it, not that it's a show stopper. The overhead is probably neglectable, but in theory it shouldn't be there at all.Well that's possible, but I don't like the template bloat it causes.What have you measured, and what is your dislike based upon?The library function must be generic. Then users worried about bloating may use it with a limited number of types. A digest function only dealing with void[void[]] is unacceptable.Sure I agree that range support is necessary, I just forgot to implement it initially. I'm not against range support / ranges in general / the template instances needed for ranges. I just dislike that it affects the array implementation in this specific case.
Aug 09 2012
On 2012-08-09 15:02, Johannes Pfau wrote:What annoys me is that as long the function only supported arrays, it didn't need templates _at all_. So template bloat for arrays = 0. But adding range support means the version dealing with arrays now has to be a template as well(which is probably a bug, can't overload template and non template function) and will produce extra code for every array type. I just think adding range support shouldn't cause the array code to change in any way.A workaround is to make the non-template function to a template, with no arguments. This should only cause one instantiation: void foo (T) (T t) if (/* some constraint making it not match "int" */); void foo () (int x); -- /Jacob Carlborg
Aug 09 2012
The question about module names. Is it supposed that e.g. std.hash.crc module will contain many CRC implementations, not only one CRC-32? If not, it will be better to call it std.hash.crc32 because other CRC variants are also in use. Or even std.hash.crc.crc32. Same with std.hash.sha and std.hash.md modules. -- Денис В. Шеломовский Denis V. Shelomovskij
Aug 08 2012
Am Wed, 08 Aug 2012 11:48:54 +0400 schrieb Denis Shelomovskij <verylonglogin.reg gmail.com>:The question about module names. Is it supposed that e.g. std.hash.crc module will contain many CRC implementations, not only one CRC-32? If not, it will be better to call it std.hash.crc32 because other CRC variants are also in use. Or even std.hash.crc.crc32. Same with std.hash.sha and std.hash.md modules.They're supposed to contain more implementations in the future. I basically just wrote the new API, then ported what we already had in phobos (and in a pull request by redstar) to the new API. The rest of the SHA functions could probably follow soon as Piotr Szturmaj already has a properly licensed implementation. The Boost project has a templated CRC implementation, we could port that as well. One problem is that the current MD5 and SHA implementations have been manually optimized for dmds inliner though, so those probably won't be replaced by a more generic version until we have benchmarks in phobos which ensure that there won't be performance regressions.
Aug 08 2012
I'm not familiar with hash functions in general. I think the core of std.hash is the digest function: digestType!Hash digest(Hash)(scope const(void[])[] data...) if(isDigest!Hash) { Hash hash; hash.start(); foreach(datum; data) hash.put(cast(const(ubyte[]))datum); return hash.finish(); } That seems to be too restrictive: you can only provide a void[][] or one or several void[], but you should be able to give it any range of void[] or of ubyte[] like: auto dig = file.byChunk.digest!MD5; That's the point of the range interface. this can be done by templatizing the function, something like (untested): template digest(Hash) if(isDigest!Hash) { auto digest(R)(R data) if (isInputRange!R && is(ElementType!R : void[]) { Hash hash; hash.start(); data.copy(hash); return hash.finish(); } } An interesting overload for range of single ubyte could be provided. This overload would fill a buffer of with data from this range, feed the hash, and start again.
Aug 08 2012
On Tuesday, 7 August 2012 at 17:39:50 UTC, Dmitry Olshansky wrote:std.hash.hash is a new module for Phobos defining an uniform interface for hashes and checksums. It also provides some useful helper functions to deal with this new API.Is it too late to ask to include MurmurHash 2 and/or 3? It's public domain, and great for things like hash tables. You can steal some code from here: https://github.com/CyberShadow/ae/blob/master/utils/digest.d https://github.com/CyberShadow/ae/blob/master/utils/digest_murmurhash3.d
Aug 09 2012
Am Thu, 09 Aug 2012 11:32:34 +0200 schrieb "Vladimir Panteleev" <vladimir thecybershadow.net>:On Tuesday, 7 August 2012 at 17:39:50 UTC, Dmitry Olshansky wrote:To be honest I didn't even know that MurmurHash can be used incrementally. I could port that code soon, but I think it's best to do it after the review. After we have formalized a common API adding new hashes probably won't require a full review. It should be possible to do that as a pull request.std.hash.hash is a new module for Phobos defining an uniform interface for hashes and checksums. It also provides some useful helper functions to deal with this new API.Is it too late to ask to include MurmurHash 2 and/or 3? It's public domain, and great for things like hash tables. You can steal some code from here: https://github.com/CyberShadow/ae/blob/master/utils/digest.d https://github.com/CyberShadow/ae/blob/master/utils/digest_murmurhash3.d
Aug 09 2012
On Thu, 09 Aug 2012 10:58:10 +0100, Johannes Pfau <nospam example.com> wrote:Am Thu, 09 Aug 2012 11:32:34 +0200 schrieb "Vladimir Panteleev" <vladimir thecybershadow.net>:Once the API is formalised I can contribute the hashes I have also :) R -- Using Opera's revolutionary email client: http://www.opera.com/mail/On Tuesday, 7 August 2012 at 17:39:50 UTC, Dmitry Olshansky wrote:To be honest I didn't even know that MurmurHash can be used incrementally. I could port that code soon, but I think it's best to do it after the review. After we have formalized a common API adding new hashes probably won't require a full review. It should be possible to do that as a pull request.std.hash.hash is a new module for Phobos defining an uniform interface for hashes and checksums. It also provides some useful helper functions to deal with this new API.Is it too late to ask to include MurmurHash 2 and/or 3? It's public domain, and great for things like hash tables. You can steal some code from here: https://github.com/CyberShadow/ae/blob/master/utils/digest.d https://github.com/CyberShadow/ae/blob/master/utils/digest_murmurhash3.d
Aug 09 2012
Am Thu, 09 Aug 2012 11:16:36 +0100 schrieb "Regan Heath" <regan netmail.co.nz>:Once the API is formalised I can contribute the hashes I have also :)great! with all those contributions we'll probably have a rather complete set of digests soon.
Aug 09 2012
I implemented some of the suggestions, here's the list of changes: Changelog: * Add a new overload to the 'digest' function which accepts an InputRange * Add a new convenience function 'hexDigest' which works just like 'digest', but returns a string (also works with InputRanges) An open question is whether md5StringOf/md5HexOf aliases should be added (similar to md5Of)? * Add a new convenience function 'startDigest' which returns an initialized digest * New example for file hashing in idiomatic D * Documented that Digests are always OutputRanges * Added new examples using std.algorithm.copy & OutputRange interface * Small optimization in toHexString & hexDigest: do not allocate if possible TODO: * move the package to std.digest (unless there are objections): std.hash.hash --> std.digest.digest std.hash.md --> std.digest.md std.hash.sha --> std.digest.sha std.hash.crc --> std.digest.crc * make sure the docs are consistent regarding names (digest vs. hash) Code: https://github.com/jpf91/phobos/tree/newHash/std/hash https://github.com/jpf91/phobos/compare/master...newHash Docs: http://dl.dropbox.com/u/24218791/d/phobos/std_hash_hash.html http://dl.dropbox.com/u/24218791/d/phobos/std_hash_md.html http://dl.dropbox.com/u/24218791/d/phobos/std_hash_sha.html http://dl.dropbox.com/u/24218791/d/phobos/std_hash_crc.html
Aug 10 2012
Changelog: * moved the package to std.digest: std.hash.hash --> std.digest.digest std.hash.md --> std.digest.md std.hash.sha --> std.digest.sha std.hash.crc --> std.digest.crc * make sure the docs are consistent regarding names (digest vs. hash) Code: (location changed!) https://github.com/jpf91/phobos/tree/newHash/std/digest https://github.com/jpf91/phobos/compare/master...newHash Docs: (location changed!) http://dl.dropbox.com/u/24218791/d/phobos/std_digest_digest.html http://dl.dropbox.com/u/24218791/d/phobos/std_digest_md.html http://dl.dropbox.com/u/24218791/d/phobos/std_digest_sha.html http://dl.dropbox.com/u/24218791/d/phobos/std_digest_crc.html
Aug 20 2012
All this discussion on the use of auto in the docs made me notice something else about the docs I missed. I like how ranges are documented and think digest could do the same. Instead of an ExampleDigest, just write the details under isDigest. I don't see a need for template the constraint example (D idiom). This would require changing examples which use ExampleDigest, but maybe that should happen anyway since it doesn't exist. I don't see a reason to change my vote because of this, its all documentation.
Aug 28 2012
Am Wed, 29 Aug 2012 04:57:32 +0200 schrieb "Jesse Phillips" <jessekphillips+D gmail.com>:All this discussion on the use of auto in the docs made me notice something else about the docs I missed. I like how ranges are documented and think digest could do the same. Instead of an ExampleDigest, just write the details under isDigest.I had a look at how std.range documents the range interfaces, but the std.digest API forces more details on the implementation ( trusted, exact parameter types for put, return type of finish,...) so I think simply writing a text paragraph could get confusing. But if someone posts a pull request which replaces the ExampleDigest with something else I'm all for it.I don't see a need for template the constraint example (D idiom). This would require changing examples which use ExampleDigest, but maybe that should happen anyway since it doesn't exist.Yes, I'll change the examples (this also makes them runnable in theory. Although I have not found any documentation about making examples runnable on dlang.org)I don't see a reason to change my vote because of this, its all documentation.
Sep 04 2012