digitalmars.D - Checking function parameters in Phobos
- Andrei Alexandrescu (31/31) Nov 19 2013 There's been recent discussion herein about what parameter validation
- growler (26/60) Nov 19 2013 I'm not a Phobos dev. but as a user of Phobos and coming from
- bearophile (32/52) Nov 19 2013 I think Phobos should rely much more on Contract Programming
- Brad Anderson (3/15) Nov 19 2013 Is that not what phobo's AsciiString is?
- Walter Bright (3/6) Nov 20 2013 Which ones? The ones I coded up originally were designed so they weren't...
- Jacob Carlborg (10/39) Nov 19 2013 Would we accompany the assumeSorted with an assert in the function
- Marco Leise (7/13) Nov 20 2013 That is what LDC does and with the -defaultlib switch it is
- Timon Gehr (9/11) Nov 20 2013 We do in any case:
- Jacob Carlborg (5/13) Nov 20 2013 I don't understand what this is supposed to show. That the type is
- Timon Gehr (2/16) Nov 20 2013 Yes, hence SortedRange being sorted is just a convention in any case.
- Andrei Alexandrescu (6/7) Nov 20 2013 That's right. In particular we can't have assumeSorted check for
- Meta (6/28) Nov 20 2013 Couldn't we have an overload of each of the mutating functions in
- Meta (3/33) Nov 20 2013 That is, a mutating function that takes a sorted range strips the
- Andrei Alexandrescu (3/28) Nov 20 2013 That wouldn't help much - people have access to the underlying range any...
- Meta (8/11) Nov 20 2013 You're right, I forgot about that. However, people generally
- Joseph Rushton Wakeling (7/17) Nov 19 2013 Regarding enforce() vs. assert(), a good rule that I remember having sug...
- Walter Bright (15/17) Nov 20 2013 Important is deciding upon the notions of "validated data" and "untruste...
- Jacob Carlborg (8/23) Nov 20 2013 How should we accomplish this? We can't replace:
- Jonathan M Davis (13/45) Nov 20 2013 You'd do it the other way around by having something like
- Jacob Carlborg (7/18) Nov 20 2013 If not just if the string is valid UTF-8. There can be many other types
- Marco Leise (15/36) Nov 20 2013 None of that is feasible. We can only hope that we simply
- Jacob Carlborg (6/17) Nov 20 2013 I don't know how getopt behaves but using them as a filename will most
- =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (21/44) Nov 20 2013 May I suggest:
- Meta (4/61) Nov 20 2013 I was having the exact same thought. I think this could be very
- Dicebot (6/6) Nov 20 2013 I also think this is very powerful and under-explored approach
- Dmitry Olshansky (13/19) Nov 20 2013 I think the obstacles are mostly:
- Dicebot (7/13) Nov 20 2013 This is the very reason why I am saying it makes much more sense
- inout (19/76) Nov 21 2013 What if you have more that just one validation, e.g. Positive and
- Meta (11/17) Nov 21 2013 Allow multiple validation functions. Then a Validated type is
- =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (10/22) Nov 21 2013 I believe inout's point was this, though:
- Marco Leise (10/35) Nov 25 2013 =20
- =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (15/47) Nov 25 2013 Do you mean this?
- inout (11/65) Nov 26 2013 I find this to be too verbose to be useful. And you also need to
- =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (19/30) Nov 26 2013 This I understand. It is actually the best argument I can find in favor
- Meta (4/6) Nov 26 2013 It isn't surprising that any operation that expects int will get
- Marco Leise (6/32) Nov 26 2013 =20
- =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (12/34) Nov 24 2013 I've created a version of Validated now that takes 1 or more
- Meta (84/114) Nov 24 2013 Awesome, I was messing around with something similar but you beat
- Meta (2/11) Nov 24 2013 "//Fails" should be "//Passes" as well.
- =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (24/99) Nov 25 2013 Even better - test if 'if (fn(value)) {}' compiles. Fixed.
- Meta (7/31) Nov 25 2013 What about a version flag, then, that can be passed to specify
- =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (5/11) Nov 26 2013 That's already in:
- =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (5/53) Nov 20 2013 Uh-hm. Add this:
- Dmitry Olshansky (6/31) Nov 20 2013 And it decays to the naked type in a blink of an eye. And some function
- Meta (8/39) Nov 20 2013 Yes. It is very important not to allow direct access to the
- Jonathan M Davis (5/11) Nov 20 2013 It's arguably pretty pointless to put a nullable type in
- Meta (5/18) Nov 20 2013 See the discussion from the other thread for why it can be useful
- Jonathan M Davis (5/27) Nov 20 2013 I know. And I still think that it's pointless - and it incurs extra over...
- Jacob Carlborg (6/12) Nov 20 2013 In that case all string functionality needs to be provided inside the
- Jonathan M Davis (6/17) Nov 20 2013 You could use alias this and alias the Validated struct to the underlyin...
- Jacob Carlborg (5/9) Nov 21 2013 Yeah, that's what needs to be avoided and is the reason "alias this" or
- Meta (21/36) Nov 20 2013 This is tricky business. Unfortunately, having the wrapper be
- =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (25/42) Nov 20 2013 And guess what? That's (often) ok. It's better to do the validation once...
- Jacob Carlborg (4/19) Nov 20 2013 It's still accessible via "value".
- =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (10/30) Nov 21 2013 Indeed it is. If we want to make it perfectly impossible to get at the
- Daniel Davidson (6/8) Nov 21 2013 Not if that function down the road only accepted validated in the
- Walter Bright (8/17) Nov 20 2013 Utf validation isn't the only form of validation for strings. You could,...
- Jonathan M Davis (29/48) Nov 20 2013 Yes, but we seemed to be discussing the possibility of having some kind ...
- Marco Leise (15/23) Nov 25 2013 A checked type for database access goes a bit beyond the scope
- Walter Bright (3/8) Nov 20 2013 Use a different type for the validated string, validated means your prog...
- Jonathan M Davis (74/75) Nov 20 2013 In general, I favor using defensive programming in library APIs and usin...
- Jacob Carlborg (16/38) Nov 20 2013 I think Walter suggestion requires the use of asserts:
- Timon Gehr (4/10) Nov 20 2013 void process(Data data)in{ assert(isValid(data)); }body{
- Jacob Carlborg (4/7) Nov 20 2013 Right, forgot about contracts.
- Lars T. Kyllingstad (30/33) Nov 20 2013 I think it is fair to always assume that a char[] is a valid
- Jonathan M Davis (12/27) Nov 20 2013 That doesn't work when strings are being created via concatenation and t...
- Dmitry Olshansky (9/25) Nov 20 2013 Sadly it's horrifically slow to do so. Above all practicality must take
- Lionello Lunesu (9/16) Nov 26 2013 +1
- Jonathan M Davis (25/51) Nov 20 2013 When an assertion fails, it's a bug in your code. Assertions should _nev...
There's been recent discussion herein about what parameter validation method would be best for Phobos to adhere to. Currently we are using a mix of approaches: 1. Some functions enforce() 2. Some functions just assert() 3. Some (fewer I think) functions assert(0) 4. Some functions don't do explicit checking, relying instead on lower-level enforcement such as null dereference and bounds checking to ensure safety. Each method has its place. The question is what guidelines we put forward for Phobos code to follow; we're a bit loose about that right now. A second, just as interesting topic, is how to design abstractions for speed and safety. There are cases in which spurious checking is prohibitively expensive if not necessary, so it should be avoided where necessary. Examples: (a) FracSecs(long x) validates x to be within range. The cost of the validation itself is about as high as the payload itself (which is one assignment). (b) sort() offers a SortedRange with its goodies. We also have assumeSorted that also offers a SortedRange, but relies on the user to validate that assumption. (c) A variety of text functions currently suffer because we don't make the difference between validated UTF strings and potentially invalid ones. Walter and I are thinking of fostering the idiom in which types (or attributes?) are used as information about validation, similar to how assumeSorted works. Building on that, we'd have a function like "static FracSecs assumeValid(long)" inside FracSecs (no need for a different type here). Then, we'd have a CleanUTF type or something that would guarantee the string stored within has been validated. Please chime in with ideas! Andrei
Nov 19 2013
On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei Alexandrescu wrote:There's been recent discussion herein about what parameter validation method would be best for Phobos to adhere to. Currently we are using a mix of approaches: 1. Some functions enforce() 2. Some functions just assert() 3. Some (fewer I think) functions assert(0) 4. Some functions don't do explicit checking, relying instead on lower-level enforcement such as null dereference and bounds checking to ensure safety. Each method has its place. The question is what guidelines we put forward for Phobos code to follow; we're a bit loose about that right now. A second, just as interesting topic, is how to design abstractions for speed and safety. There are cases in which spurious checking is prohibitively expensive if not necessary, so it should be avoided where necessary. Examples: (a) FracSecs(long x) validates x to be within range. The cost of the validation itself is about as high as the payload itself (which is one assignment). (b) sort() offers a SortedRange with its goodies. We also have assumeSorted that also offers a SortedRange, but relies on the user to validate that assumption. (c) A variety of text functions currently suffer because we don't make the difference between validated UTF strings and potentially invalid ones. Walter and I are thinking of fostering the idiom in which types (or attributes?) are used as information about validation, similar to how assumeSorted works. Building on that, we'd have a function like "static FracSecs assumeValid(long)" inside FracSecs (no need for a different type here). Then, we'd have a CleanUTF type or something that would guarantee the string stored within has been validated. Please chime in with ideas! AndreiI'm not a Phobos dev. but as a user of Phobos and coming from C/C++ I'd like to see... Less enforce and more debug-only contracts in the std lib, with opt-in run-time checks for release builds. That way I can decide on a function-by-function basis or globally at compile time whether the run-time checks occur in release builds. For example, given: 1. FracSecs(long x) 2. FracSecs!Args.verify(long x) In debug 1. would always have full run-time checking enabled. In release builds 1. would only have essential run-time checks, preferably none. I can then opt-in for run-time checks in release builds using 2. There would also be a version(ArgsVerify) so I can turn on run-time checks globally at compile time in release builds (maybe the --debug flag allows this already, not sure). Of course this unfortunately requires even more work from Phobos devs and I'm not a D expert so I don't know how viable it would be. Whatever is decided I'm looking forward to see what you guys come up with because I'm currently using Phobos as my "Idiomatic D" reference guide. Thanks G.
Nov 19 2013
Andrei Alexandrescu:There's been recent discussion herein about what parameter validation method would be best for Phobos to adhere to. Currently we are using a mix of approaches: 1. Some functions enforce() 2. Some functions just assert() 3. Some (fewer I think) functions assert(0) 4. Some functions don't do explicit checking, relying instead on lower-level enforcement such as null dereference and bounds checking to ensure safety. Each method has its place. The question is what guidelines we put forward for Phobos code to follow; we're a bit loose about that right now.I think Phobos should rely much more on Contract Programming based on asserts. This could mean Dmd automatically using a Phobos compiled with asserts when you compile your D code normally, and automatically using a assert-stripped version of Phobos libs when you compile with -release and similar. In other situations enforce and exceptions are still useful.(b) sort() offers a SortedRange with its goodies. We also have assumeSorted that also offers a SortedRange, but relies on the user to validate that assumption.I'd like another function, that could be named validateSorted() that returns a SortedRange and always fully verifies its range argument is actually sorted, and throws an exception otherwise. So it doesn't assume its input is sorted. It's like a isSorted + assumeSorted.(c) A variety of text functions currently suffer because we don't make the difference between validated UTF strings and potentially invalid ones.Often I have genomic data or other text data that is surely ASCII (and I can accept a run-time exception at loading time if it's not ASCII). Once such text is in memory I'd like to not pay for UTF on it. Sometimes you can do this with std.string.representation, but there is no opposite function (http://d.puremagic.com/issues/show_bug.cgi?id=10162 ). Also in Phobos there are several string/char functions that could be made faster if the input is assumed to be ASCII. To solve this problem in languages as Haskell they usually introduce a new type like AsciiString. In past I have suggested to introduce such string wrapper in Phobos.Then, we'd have a CleanUTF type or something that would guarantee the string stored within has been validated.In recent talks Bjarne Stroustrup has being advocating a lot such usage of types for safety in C++11/C++14, and functional programmers use it often since lot of time. OcaML programmers use such style of coding to write "safer" code all the time. Too many types make the code harder (also because D doesn't have de-structuring syntax in function signatures and so on), but few strategically designed structs can help. Bye, bearophile
Nov 19 2013
On Wednesday, 20 November 2013 at 00:48:40 UTC, bearophile wrote:[snip] Often I have genomic data or other text data that is surely ASCII (and I can accept a run-time exception at loading time if it's not ASCII). Once such text is in memory I'd like to not pay for UTF on it. Sometimes you can do this with std.string.representation, but there is no opposite function (http://d.puremagic.com/issues/show_bug.cgi?id=10162 ). Also in Phobos there are several string/char functions that could be made faster if the input is assumed to be ASCII. To solve this problem in languages as Haskell they usually introduce a new type like AsciiString. In past I have suggested to introduce such string wrapper in Phobos.Is that not what phobo's AsciiString is?
Nov 19 2013
On 11/19/2013 4:48 PM, bearophile wrote:Also in Phobos there are several string/char functions that could be made faster if the input is assumed to be ASCII.Which ones? The ones I coded up originally were designed so they weren't degraded by utf.
Nov 20 2013
On 2013-11-20 01:01, Andrei Alexandrescu wrote:There's been recent discussion herein about what parameter validation method would be best for Phobos to adhere to. Currently we are using a mix of approaches: 1. Some functions enforce() 2. Some functions just assert() 3. Some (fewer I think) functions assert(0) 4. Some functions don't do explicit checking, relying instead on lower-level enforcement such as null dereference and bounds checking to ensure safety. Each method has its place. The question is what guidelines we put forward for Phobos code to follow; we're a bit loose about that right now. A second, just as interesting topic, is how to design abstractions for speed and safety. There are cases in which spurious checking is prohibitively expensive if not necessary, so it should be avoided where necessary. Examples: (a) FracSecs(long x) validates x to be within range. The cost of the validation itself is about as high as the payload itself (which is one assignment). (b) sort() offers a SortedRange with its goodies. We also have assumeSorted that also offers a SortedRange, but relies on the user to validate that assumption. (c) A variety of text functions currently suffer because we don't make the difference between validated UTF strings and potentially invalid ones. Walter and I are thinking of fostering the idiom in which types (or attributes?) are used as information about validation, similar to how assumeSorted works. Building on that, we'd have a function like "static FracSecs assumeValid(long)" inside FracSecs (no need for a different type here). Then, we'd have a CleanUTF type or something that would guarantee the string stored within has been validated.Would we accompany the assumeSorted with an assert in the function assuming something is sorted? We probably don't want to rely on convention. What about distributing a version of druntime and Phobos with asserts enabled that is used by default (or with the -debug flag). Then a version with asserts disabled is used when the -release flag is used. We probably also want it to be possible to use Phobos with asserts enabled even in release mode. -- /Jacob Carlborg
Nov 19 2013
Am Wed, 20 Nov 2013 08:49:28 +0100 schrieb Jacob Carlborg <doob me.com>:What about distributing a version of druntime and Phobos with asserts enabled that is used by default (or with the -debug flag). Then a version with asserts disabled is used when the -release flag is used. We probably also want it to be possible to use Phobos with asserts enabled even in release mode.That is what LDC does and with the -defaultlib switch it is easy to use the debug Phobos in release builds. Currently this flag is mostly used to link against the shared phobos2.so. -- Marco
Nov 20 2013
On 11/20/2013 08:49 AM, Jacob Carlborg wrote:Would we accompany the assumeSorted with an assert in the function assuming something is sorted? We probably don't want to rely on convention.We do in any case: import std.algorithm, std.range; void main(){ auto a = [1,2,3,4,5]; auto s = sort(a); swap(a[0],a[$-1]); assert(is(typeof(s)==SortedRange!(int[])) && !s.isSorted()); }
Nov 20 2013
On 2013-11-20 13:56, Timon Gehr wrote:We do in any case: import std.algorithm, std.range; void main(){ auto a = [1,2,3,4,5]; auto s = sort(a); swap(a[0],a[$-1]); assert(is(typeof(s)==SortedRange!(int[])) && !s.isSorted()); }I don't understand what this is supposed to show. That the type is "SortedRange" but it's actually not sorted? -- /Jacob Carlborg
Nov 20 2013
On 11/20/2013 02:52 PM, Jacob Carlborg wrote:On 2013-11-20 13:56, Timon Gehr wrote:Yes, hence SortedRange being sorted is just a convention in any case.We do in any case: import std.algorithm, std.range; void main(){ auto a = [1,2,3,4,5]; auto s = sort(a); swap(a[0],a[$-1]); assert(is(typeof(s)==SortedRange!(int[])) && !s.isSorted()); }I don't understand what this is supposed to show. That the type is "SortedRange" but it's actually not sorted?
Nov 20 2013
On 11/20/13 6:14 AM, Timon Gehr wrote:Yes, hence SortedRange being sorted is just a convention in any case.That's right. In particular we can't have assumeSorted check for isSorted even at the point of creation, and even with debug-only asserts. This is because checking would change the complexity of binary search and related algorithms, which is often prohibitive. Andrei
Nov 20 2013
On Wednesday, 20 November 2013 at 14:14:28 UTC, Timon Gehr wrote:On 11/20/2013 02:52 PM, Jacob Carlborg wrote:Couldn't we have an overload of each of the mutating functions in std.algorithm that takes a SortedRange and does static assert(0, "Cannot modify a sorted range")? I suppose there are cases where we *want* to mutate a sorted range... Unwrap the inner type, maybe?On 2013-11-20 13:56, Timon Gehr wrote:Yes, hence SortedRange being sorted is just a convention in any case.We do in any case: import std.algorithm, std.range; void main(){ auto a = [1,2,3,4,5]; auto s = sort(a); swap(a[0],a[$-1]); assert(is(typeof(s)==SortedRange!(int[])) && !s.isSorted()); }I don't understand what this is supposed to show. That the type is "SortedRange" but it's actually not sorted?
Nov 20 2013
On Wednesday, 20 November 2013 at 17:56:22 UTC, Meta wrote:On Wednesday, 20 November 2013 at 14:14:28 UTC, Timon Gehr wrote:That is, a mutating function that takes a sorted range strips the SortedRange wrapper and returns the underlying type.On 11/20/2013 02:52 PM, Jacob Carlborg wrote:Couldn't we have an overload of each of the mutating functions in std.algorithm that takes a SortedRange and does static assert(0, "Cannot modify a sorted range")? I suppose there are cases where we *want* to mutate a sorted range... Unwrap the inner type, maybe?On 2013-11-20 13:56, Timon Gehr wrote:Yes, hence SortedRange being sorted is just a convention in any case.We do in any case: import std.algorithm, std.range; void main(){ auto a = [1,2,3,4,5]; auto s = sort(a); swap(a[0],a[$-1]); assert(is(typeof(s)==SortedRange!(int[])) && !s.isSorted()); }I don't understand what this is supposed to show. That the type is "SortedRange" but it's actually not sorted?
Nov 20 2013
On 11/20/13 9:56 AM, Meta wrote:On Wednesday, 20 November 2013 at 14:14:28 UTC, Timon Gehr wrote:That wouldn't help much - people have access to the underlying range anyway. AndreiOn 11/20/2013 02:52 PM, Jacob Carlborg wrote:Couldn't we have an overload of each of the mutating functions in std.algorithm that takes a SortedRange and does static assert(0, "Cannot modify a sorted range")? I suppose there are cases where we *want* to mutate a sorted range... Unwrap the inner type, maybe?On 2013-11-20 13:56, Timon Gehr wrote:Yes, hence SortedRange being sorted is just a convention in any case.We do in any case: import std.algorithm, std.range; void main(){ auto a = [1,2,3,4,5]; auto s = sort(a); swap(a[0],a[$-1]); assert(is(typeof(s)==SortedRange!(int[])) && !s.isSorted()); }I don't understand what this is supposed to show. That the type is "SortedRange" but it's actually not sorted?
Nov 20 2013
On Wednesday, 20 November 2013 at 20:06:47 UTC, Andrei Alexandrescu wrote:That wouldn't help much - people have access to the underlying range anyway. AndreiYou're right, I forgot about that. However, people generally won't be modifying a SortedRange in place, will they? Even if they do, it'll probably be using one of the mutating functions in std.algorithm. Also, somewhat related, couldn't std.algorithm.sort simply return the passed-in range if that range is already wrapped with SortedRange?
Nov 20 2013
On 20/11/13 01:01, Andrei Alexandrescu wrote:There's been recent discussion herein about what parameter validation method would be best for Phobos to adhere to. Currently we are using a mix of approaches: 1. Some functions enforce() 2. Some functions just assert() 3. Some (fewer I think) functions assert(0) 4. Some functions don't do explicit checking, relying instead on lower-level enforcement such as null dereference and bounds checking to ensure safety. Each method has its place. The question is what guidelines we put forward for Phobos code to follow; we're a bit loose about that right now.Regarding enforce() vs. assert(), a good rule that I remember having suggested to me was that enforce() should be used for actual runtime checking (e.g. checking that the input to a public API function has correct properties), assert() should be used to test logical failures (i.e. checking that cases which should never arise, really don't arise). I've always followed that as a rule of thumb ever since.
Nov 19 2013
On 11/19/2013 4:01 PM, Andrei Alexandrescu wrote:There's been recent discussion herein about what parameter validation method would be best for Phobos to adhere to.Important is deciding upon the notions of "validated data" and "untrusted data" is. 1. Validated data should get asserts if it is found to be invalid. 2. Untrusted data should get exceptions thrown if it is found to be invalid (or return errors). For example, consider a utf string. If it has passed a validation check, then it becomes trusted data. Further processing on it should assert if it turns out to be invalid (because then you've got a programming bug). File open failures should always throw, and never assert, because the file is not part of the program and so is inherently not trusted. One way to distinguish validated from untrusted data is by using different types (or a naming convention, see Joel Spolsky's http://www.joelonsoftware.com/articles/Wrong.html). It is of major importance in a program to think about what APIs get validated arguments and what APIs get untrusted arguments.
Nov 20 2013
On 2013-11-20 09:50, Walter Bright wrote:Important is deciding upon the notions of "validated data" and "untrusted data" is. 1. Validated data should get asserts if it is found to be invalid. 2. Untrusted data should get exceptions thrown if it is found to be invalid (or return errors). For example, consider a utf string. If it has passed a validation check, then it becomes trusted data. Further processing on it should assert if it turns out to be invalid (because then you've got a programming bug). File open failures should always throw, and never assert, because the file is not part of the program and so is inherently not trusted. One way to distinguish validated from untrusted data is by using different types (or a naming convention, see Joel Spolsky's http://www.joelonsoftware.com/articles/Wrong.html). It is of major importance in a program to think about what APIs get validated arguments and what APIs get untrusted arguments.How should we accomplish this? We can't replace: void main (string[] args) With void main (UnsafeString[] args) And break every application out there. -- /Jacob Carlborg
Nov 20 2013
On Wednesday, November 20, 2013 11:49:32 Jacob Carlborg wrote:On 2013-11-20 09:50, Walter Bright wrote:You'd do it the other way around by having something like ValidatedString!char s = validateString("hello world"); ValidatedString would then avoid any extra validation when iterating over the characters, though I don't know how much of an efficiency gain that would actually be given that much of the validation occurs naturally when decoding or using stride. It would have the downside that any function which specializes on strings would likely have to then specialize on ValidatedString as well. So, while I agree with the idea in concept, I'd propose that we benchmark the difference in decoding and striding without the checks and see if there actually is much difference. Because if there isn't, then I don't think that it's worth going to the trouble of adding something like ValidatedString. - Jonathan M DavisImportant is deciding upon the notions of "validated data" and "untrusted data" is. 1. Validated data should get asserts if it is found to be invalid. 2. Untrusted data should get exceptions thrown if it is found to be invalid (or return errors). For example, consider a utf string. If it has passed a validation check, then it becomes trusted data. Further processing on it should assert if it turns out to be invalid (because then you've got a programming bug). File open failures should always throw, and never assert, because the file is not part of the program and so is inherently not trusted. One way to distinguish validated from untrusted data is by using different types (or a naming convention, see Joel Spolsky's http://www.joelonsoftware.com/articles/Wrong.html). It is of major importance in a program to think about what APIs get validated arguments and what APIs get untrusted arguments.How should we accomplish this? We can't replace: void main (string[] args) With void main (UnsafeString[] args) And break every application out there.
Nov 20 2013
On 2013-11-20 12:16, Jonathan M Davis wrote:You'd do it the other way around by having something like ValidatedString!char s = validateString("hello world");Right.ValidatedString would then avoid any extra validation when iterating over the characters, though I don't know how much of an efficiency gain that would actually be given that much of the validation occurs naturally when decoding or using stride. It would have the downside that any function which specializes on strings would likely have to then specialize on ValidatedString as well. So, while I agree with the idea in concept, I'd propose that we benchmark the difference in decoding and striding without the checks and see if there actually is much difference. Because if there isn't, then I don't think that it's worth going to the trouble of adding something like ValidatedString.If not just if the string is valid UTF-8. There can be many other types of valid strings. Or rather other functions that have additional requirements. Like sanitized filenames, HTML/SQL escaped strings and so on. -- /Jacob Carlborg
Nov 20 2013
Am Wed, 20 Nov 2013 12:49:20 +0100 schrieb Jacob Carlborg <doob me.com>:On 2013-11-20 12:16, Jonathan M Davis wrote:None of that is feasible. We can only hope that we simply catch every case of user input (or untrusted data) and check it before passing it to Phobos APIs. That's why there are functions to validate and also to sanitize UTF strings on a best effort basis in Phobos. So in my opinion Phobos should continue forward with assert instead of enforce. I/O functions, of course, have to use exceptions. That said, I never thought of validating args[] before passing it to getopt or using them as a filename. Lesson learned, I guess? -- MarcoYou'd do it the other way around by having something like ValidatedString!char s = validateString("hello world");Right.ValidatedString would then avoid any extra validation when iterating over the characters, though I don't know how much of an efficiency gain that would actually be given that much of the validation occurs naturally when decoding or using stride. It would have the downside that any function which specializes on strings would likely have to then specialize on ValidatedString as well. So, while I agree with the idea in concept, I'd propose that we benchmark the difference in decoding and striding without the checks and see if there actually is much difference. Because if there isn't, then I don't think that it's worth going to the trouble of adding something like ValidatedString.If not just if the string is valid UTF-8. There can be many other types of valid strings. Or rather other functions that have additional requirements. Like sanitized filenames, HTML/SQL escaped strings and so on.
Nov 20 2013
On 2013-11-20 13:22, Marco Leise wrote:None of that is feasible. We can only hope that we simply catch every case of user input (or untrusted data) and check it before passing it to Phobos APIs. That's why there are functions to validate and also to sanitize UTF strings on a best effort basis in Phobos. So in my opinion Phobos should continue forward with assert instead of enforce. I/O functions, of course, have to use exceptions. That said, I never thought of validating args[] before passing it to getopt or using them as a filename. Lesson learned, I guess?I don't know how getopt behaves but using them as a filename will most likely end up calling a system function, which will hopefully take care of the checking. -- /Jacob Carlborg
Nov 20 2013
On 20.11.2013 12:49, Jacob Carlborg wrote:On 2013-11-20 12:16, Jonathan M Davis wrote:May I suggest: struct Validated(alias fn, T) { private T value; property inout T get() { return value; } } Validated!(fn, T) validate(alias fn, T)(T value) { Validated!(fn, T) result; fn(value); result.value = value; return result; } void functionThatTakesSanitizedFileNames(Validated!(sanitizeFileName, string) path) { // Do stuff } -- SimenYou'd do it the other way around by having something like ValidatedString!char s = validateString("hello world");Right.ValidatedString would then avoid any extra validation when iterating over the characters, though I don't know how much of an efficiency gain that would actually be given that much of the validation occurs naturally when decoding or using stride. It would have the downside that any function which specializes on strings would likely have to then specialize on ValidatedString as well. So, while I agree with the idea in concept, I'd propose that we benchmark the difference in decoding and striding without the checks and see if there actually is much difference. Because if there isn't, then I don't think that it's worth going to the trouble of adding something like ValidatedString.If not just if the string is valid UTF-8. There can be many other types of valid strings. Or rather other functions that have additional requirements. Like sanitized filenames, HTML/SQL escaped strings and so on.
Nov 20 2013
On Wednesday, 20 November 2013 at 17:45:43 UTC, Simen Kjærås wrote:On 20.11.2013 12:49, Jacob Carlborg wrote:I was having the exact same thought. I think this could be very powerful if done correctly.On 2013-11-20 12:16, Jonathan M Davis wrote:May I suggest: struct Validated(alias fn, T) { private T value; property inout T get() { return value; } } Validated!(fn, T) validate(alias fn, T)(T value) { Validated!(fn, T) result; fn(value); result.value = value; return result; } void functionThatTakesSanitizedFileNames(Validated!(sanitizeFileName, string) path) { // Do stuff }You'd do it the other way around by having something like ValidatedString!char s = validateString("hello world");Right.ValidatedString would then avoid any extra validation when iterating over the characters, though I don't know how much of an efficiency gain that would actually be given that much of the validation occurs naturally when decoding or using stride. It would have the downside that any function which specializes on strings would likely have to then specialize on ValidatedString as well. So, while I agree with the idea in concept, I'd propose that we benchmark the difference in decoding and striding without the checks and see if there actually is much difference. Because if there isn't, then I don't think that it's worth going to the trouble of adding something like ValidatedString.If not just if the string is valid UTF-8. There can be many other types of valid strings. Or rather other functions that have additional requirements. Like sanitized filenames, HTML/SQL escaped strings and so on.
Nov 20 2013
I also think this is very powerful and under-explored approach but it really better belongs to certain domain framework than to stdlib. One example I keep thinking about is to re-declare vibe.d string functions in terms of EscapedString!(SQL), EscapedString!(HTML) and so on for better application safety and correctness. No idea how that may work in practice though.
Nov 20 2013
20-Nov-2013 22:28, Dicebot пишет:I also think this is very powerful and under-explored approach but it really better belongs to certain domain framework than to stdlib. One example I keep thinking about is to re-declare vibe.d string functions in terms of EscapedString!(SQL), EscapedString!(HTML) and so on for better application safety and correctness. No idea how that may work in practice though.I think the obstacles are mostly: 1. There is a non-zero intersection between validated subsets. Some kind of NiceStringWithNoPunctuation fits practically every EscapedString!(XYZ). There must be a way to cascade and mix/match these classes. 2. Template bloatZ! It would be real hard to fight the IFTI duping functions bodies behind your back. Or if you dumb down these escaped types to not fit the most of templates, it may become a usability problem. 3. This kind of thing is viral. With escape hatch though, it may be done step by step. -- Dmitry Olshansky
Nov 20 2013
On Wednesday, 20 November 2013 at 20:19:28 UTC, Dmitry Olshansky wrote:2. Template bloatZ! It would be real hard to fight the IFTI duping functions bodies behind your back. Or if you dumb down these escaped types to not fit the most of templates, it may become a usability problem. 3. This kind of thing is viral. With escape hatch though, it may be done step by step.This is the very reason why I am saying it makes much more sense as part of certain application framework as those tends to have more clear separation between internal and external infrastructure and strict usage API expectations. So it is not a usability problem, it is a usability feature :)
Nov 20 2013
On Wednesday, 20 November 2013 at 17:45:43 UTC, Simen Kjærås wrote:On 20.11.2013 12:49, Jacob Carlborg wrote:What if you have more that just one validation, e.g. Positive and LessThan42? Is Positive!LessThan42!int the same type as LessThan42!Positive!int? Implicitly convertible? I feel that it might be better to use attributes here instead. Something like: positive int validatePositive(int value) { assert(value > 0); return value; } lessThan42 validateLessThan42(int value) { assert(value < 42); return value; } Now you can have positive lessThan42 int value = validatePositive(validateLessThan42(x)); It also doesn't involve creating new types.On 2013-11-20 12:16, Jonathan M Davis wrote:May I suggest: struct Validated(alias fn, T) { private T value; property inout T get() { return value; } } Validated!(fn, T) validate(alias fn, T)(T value) { Validated!(fn, T) result; fn(value); result.value = value; return result; } void functionThatTakesSanitizedFileNames(Validated!(sanitizeFileName, string) path) { // Do stuff }You'd do it the other way around by having something like ValidatedString!char s = validateString("hello world");Right.ValidatedString would then avoid any extra validation when iterating over the characters, though I don't know how much of an efficiency gain that would actually be given that much of the validation occurs naturally when decoding or using stride. It would have the downside that any function which specializes on strings would likely have to then specialize on ValidatedString as well. So, while I agree with the idea in concept, I'd propose that we benchmark the difference in decoding and striding without the checks and see if there actually is much difference. Because if there isn't, then I don't think that it's worth going to the trouble of adding something like ValidatedString.If not just if the string is valid UTF-8. There can be many other types of valid strings. Or rather other functions that have additional requirements. Like sanitized filenames, HTML/SQL escaped strings and so on.
Nov 21 2013
On Thursday, 21 November 2013 at 22:51:43 UTC, inout wrote:What if you have more that just one validation, e.g. Positive and LessThan42? Is Positive!LessThan42!int the same type as LessThan42!Positive!int? Implicitly convertible?Allow multiple validation functions. Then a Validated type is only valid if validationFunction1(val) && validationFunction2(val) &&... Validated!(isPositive, lessThan42, int) validatedInt = validate!(isPositive, lessThan42)(34); //Do stuff with validatedInt Or just pass a function that validates that the int is both positive and less than 42, which would be much simpler.... It also doesn't involve creating new types.Creating new types is what allows us to provide static, compiler-verified guarantees.
Nov 21 2013
On 22.11.2013 00:50, Meta wrote:On Thursday, 21 November 2013 at 22:51:43 UTC, inout wrote:I believe inout's point was this, though: Validated!(isPositive, lessThan42, int) i = foo(); Validated!(isPositive, int) n = i; // Fails. Validated!(lessThan42, isPositive, int) r = i; // Fails. This is of course less than optimal. If a type such as Validate is to be added to Phobos, these problems need to be fixed first.What if you have more that just one validation, e.g. Positive and LessThan42? Is Positive!LessThan42!int the same type as LessThan42!Positive!int? Implicitly convertible?Allow multiple validation functions. Then a Validated type is only valid if validationFunction1(val) && validationFunction2(val) &&... Validated!(isPositive, lessThan42, int) validatedInt = validate!(isPositive, lessThan42)(34); //Do stuff with validatedIntOr just pass a function that validates that the int is both positive and less than 42, which would be much simpler.-- Simen
Nov 21 2013
Am Fri, 22 Nov 2013 02:55:44 +0100 schrieb Simen Kj=C3=A6r=C3=A5s <simen.kjaras gmail.com>:On 22.11.2013 00:50, Meta wrote:=20On Thursday, 21 November 2013 at 22:51:43 UTC, inout wrote:=20 I believe inout's point was this, though: =20 Validated!(isPositive, lessThan42, int) i =3D foo(); =20 Validated!(isPositive, int) n =3D i; // Fails. Validated!(lessThan42, isPositive, int) r =3D i; // Fails. =20 This is of course less than optimal. =20 If a type such as Validate is to be added to Phobos, these problems need=What if you have more that just one validation, e.g. Positive and LessThan42? Is Positive!LessThan42!int the same type as LessThan42!Positive!int? Implicitly convertible?Allow multiple validation functions. Then a Validated type is only valid if validationFunction1(val) && validationFunction2(val) &&... Validated!(isPositive, lessThan42, int) validatedInt =3D validate!(isPositive, lessThan42)(34); //Do stuff with validatedIntto be fixed first.Can you write a templated assignment operator that accepts any Validated!* instance and builds the set difference of validation functions that are missing on the assigned value? E.g. in the case of n =3D i: {isPositive} / {isPositive, lessThan42} =3D emtpy set. --=20 Marco
Nov 25 2013
On 2013-11-25 13:00, Marco Leise wrote:Am Fri, 22 Nov 2013 02:55:44 +0100 schrieb Simen Kjærås <simen.kjaras gmail.com>:Do you mean this? Validated!(int, isPositive, lessThan42) a = validated!(isPositive, lessThan42)(13); Validated!(int, isPositive) b = a; a = b; // Only tests lessThan42 If so, you're mostly right that this should be done. I am however of the opinion that conversions that may throw should be marked appropriately, so this will be the right way: a = validated!(isPositive, lessThan42)(b); // Only tests lessThan42 New version now available on GitHub: http://git.io/hEe0MA http://git.io/QEP-kQ -- SimenOn 22.11.2013 00:50, Meta wrote:Can you write a templated assignment operator that accepts any Validated!* instance and builds the set difference of validation functions that are missing on the assigned value? E.g. in the case of n = i: {isPositive} / {isPositive, lessThan42} = emtpy set.On Thursday, 21 November 2013 at 22:51:43 UTC, inout wrote:I believe inout's point was this, though: Validated!(isPositive, lessThan42, int) i = foo(); Validated!(isPositive, int) n = i; // Fails. Validated!(lessThan42, isPositive, int) r = i; // Fails. This is of course less than optimal. If a type such as Validate is to be added to Phobos, these problems need to be fixed first.What if you have more that just one validation, e.g. Positive and LessThan42? Is Positive!LessThan42!int the same type as LessThan42!Positive!int? Implicitly convertible?Allow multiple validation functions. Then a Validated type is only valid if validationFunction1(val) && validationFunction2(val) &&... Validated!(isPositive, lessThan42, int) validatedInt = validate!(isPositive, lessThan42)(34); //Do stuff with validatedInt
Nov 25 2013
On Monday, 25 November 2013 at 13:01:43 UTC, Simen Kjærås wrote:On 2013-11-25 13:00, Marco Leise wrote:I find this to be too verbose to be useful. And you also need to be very careful not to discard any existing qualifiers on input and carry them over. This will essentially make any function that uses them to be templated, while all the instances will be the same (yet have a different body since no D compiler merges identical functions). I still find wrapping int with some type to add a tag to it without adding any methods is not a great idea - it doesn't scale well with composition and tag propagation. Any operation that expects int will essentially discard all the qualifiers.Am Fri, 22 Nov 2013 02:55:44 +0100 schrieb Simen Kjærås <simen.kjaras gmail.com>:Do you mean this? Validated!(int, isPositive, lessThan42) a = validated!(isPositive, lessThan42)(13); Validated!(int, isPositive) b = a; a = b; // Only tests lessThan42 If so, you're mostly right that this should be done. I am however of the opinion that conversions that may throw should be marked appropriately, so this will be the right way: a = validated!(isPositive, lessThan42)(b); // Only tests lessThan42 New version now available on GitHub: http://git.io/hEe0MA http://git.io/QEP-kQ -- SimenOn 22.11.2013 00:50, Meta wrote:Can you write a templated assignment operator that accepts any Validated!* instance and builds the set difference of validation functions that are missing on the assigned value? E.g. in the case of n = i: {isPositive} / {isPositive, lessThan42} = emtpy set.On Thursday, 21 November 2013 at 22:51:43 UTC, inout wrote:I believe inout's point was this, though: Validated!(isPositive, lessThan42, int) i = foo(); Validated!(isPositive, int) n = i; // Fails. Validated!(lessThan42, isPositive, int) r = i; // Fails. This is of course less than optimal. If a type such as Validate is to be added to Phobos, these problems need to be fixed first.What if you have more that just one validation, e.g. Positive and LessThan42? Is Positive!LessThan42!int the same type as LessThan42!Positive!int? Implicitly convertible?Allow multiple validation functions. Then a Validated type is only valid if validationFunction1(val) && validationFunction2(val) &&... Validated!(isPositive, lessThan42, int) validatedInt = validate!(isPositive, lessThan42)(34); //Do stuff with validatedInt
Nov 26 2013
On 26.11.2013 21:14, inout wrote:I find this to be too verbose to be useful.This I understand. It is actually the best argument I can find in favor of doing constraints checking upon construction, rather than in a separate construction function. This allows you to use one alias instead of two.And you also need to be very careful not to discard any existing qualifiers on input and carry them over. This will essentially make any function that uses them to be templated, while all the instances will be the same (yet have a different body since no D compiler merges identical functions).Could you give an example of this? It's a bit unclear to me what you mean. Is it this sort of thing: auto doPrimeStuff(Validated!(int, isPrime) a){return a;} auto doLessThan42Stuff(Validated!(int, lessThan42) a){return a;} Validated!(int, isPrime, lessThan42) i = 13; i.doPrimeStuff().doLessThan42Stuff(); Where the second chained function call fails due to lessThan42 being removed from the constraints? (There's also the problem that this wouldn't work in the first place due to D's lack of implicit conversions)I still find wrapping int with some type to add a tag to it without adding any methods is not a great idea - it doesn't scale well with composition and tag propagation. Any operation that expects int will essentially discard all the qualifiers.And any operation of the kind you describe is likely to change the value so the constraints need to be checked again. abs(Validated!(int, isNegative)) cannot possibly return the same type it received. -- Simen
Nov 26 2013
On Tuesday, 26 November 2013 at 20:14:15 UTC, inout wrote:Any operation that expects int will essentially discard all the qualifiers.It isn't surprising that any operation that expects int will get int. To take advantage of Validated, an operation has to expect Validated.
Nov 26 2013
Am Mon, 25 Nov 2013 14:01:28 +0100 schrieb Simen Kj=C3=A6r=C3=A5s <simen.kjaras gmail.com>:On 2013-11-25 13:00, Marco Leise wrote:=20Can you write a templated assignment operator that accepts any Validated!* instance and builds the set difference of validation functions that are missing on the assigned value? E.g. in the case of n =3D i: {isPositive} / {isPositive, lessThan42} =3D emtpy set.=20 Do you mean this? =20 Validated!(int, isPositive, lessThan42) a =3D validated!(isPositive, lessThan42)(13); Validated!(int, isPositive) b =3D a; a =3D b; // Only tests lessThan42 =20 If so, you're mostly right that this should be done. I am however of the=opinion that conversions that may throw should be marked appropriately,=20 so this will be the right way: =20 a =3D validated!(isPositive, lessThan42)(b); // Only tests lessThan42 =20 New version now available on GitHub: http://git.io/hEe0MA http://git.io/QEP-kQ =20 -- SimenYes, that is what I had in mind. --=20 Marco
Nov 26 2013
On 22.11.2013 02:55, Simen Kjærås wrote:On 22.11.2013 00:50, Meta wrote:I've created a version of Validated now that takes 1 or more constraints, and where a type whose constraints are a superset of another's, is implicitly convertible to that. Sadly, because of D's lack of certain implicit conversions, there are limits. Attached is source (validation.d), and some utility functions that are necessary for it to compile (utils.d). Is this worth working more on? Should it be in Phobos? Other critique? Oh, sorry about those stupid questions, we have a term for that: Detroy! -- SimenOn Thursday, 21 November 2013 at 22:51:43 UTC, inout wrote:I believe inout's point was this, though: Validated!(isPositive, lessThan42, int) i = foo(); Validated!(isPositive, int) n = i; // Fails. Validated!(lessThan42, isPositive, int) r = i; // Fails. This is of course less than optimal. If a type such as Validate is to be added to Phobos, these problems need to be fixed first.What if you have more that just one validation, e.g. Positive and LessThan42? Is Positive!LessThan42!int the same type as LessThan42!Positive!int? Implicitly convertible?Allow multiple validation functions. Then a Validated type is only valid if validationFunction1(val) && validationFunction2(val) &&... Validated!(isPositive, lessThan42, int) validatedInt = validate!(isPositive, lessThan42)(34); //Do stuff with validatedIntOr just pass a function that validates that the int is both positive and less than 42, which would be much simpler.
Nov 24 2013
On Sunday, 24 November 2013 at 17:35:51 UTC, Simen Kjærås wrote:Awesome, I was messing around with something similar but you beat me to the punch. A couple things: - The function validated would probably be better named validate, since it actually performs validation and returns a validated type. The struct's name is fine. - I think it'd be better to change "static if (is(typeof(fn(value)) == bool))" to "static if (is(typeof(fn(value)) : bool))", which rather than checking that the return type is exactly bool, it only checks that it's implicitly convertible to bool, AKA "truthy". - It might be a good idea to have a version(AlwaysValidate) block in assumeValidated for people who don't care about code speed and want maximum safety, that would always run the validation functions. Also, it might be a good idea to mark assumeValidated system, because it blatantly breaks the underlying assumptions being made in the first place. Code that wants to be rock-solid safe will be restricted to using only validate. Or maybe that's going too far. - Validated doesn't work very well with reference types. The following fails: class CouldBeNull { } bool notNull(T)(T t) if (is(T == class)) { return t !is null; } //Error: cannot implicitly convert expression (this._value) of type inout(CouldBeNull) to f505.CouldBeNull void takesNonNull(Validated!(CouldBeNull, notNull) validatedT) { } - On the subject of reference types, I don't think Validated handles them quite correctly. This is a problem I ran into, and it's not an easy one. Assume for a second that there's a class FourtyTwo that *does* work with Validated: class FortyTwo { int i = 42; } bool containsFortyTwo(FortyTwo ft) { return ft.i == 42; } void mutateFortyTwo(Validated!(FortyTwo, containsFortyTwo) fortyTwo) { fortyTwo.i = 43; } auto a = validated!containsFortyTwo(new FortyTwo()); auto b = a; //Passes assert(a.i == 42); assert(b.i == 42); mutateFortyTwo(a); //Fails assert(a.i == 43); assert(b.i == 43); This is an extremely contrived example, but it illustrates the problem of using reference types with Validated. It gets even hairier if i itself were a reference type, like a slice: void mutateCopiedValue(Validated!(FortyTwo, containsFortyTwo) fortyTwo) { //We're not out of the woods yet int[] arr = fortyTwo.i; arr[0] += 1; } //Continuing from previous example, //except i is now an array mutateCopiedValue(b); assert(a.i[0] == 44); assert(b.i[0] == 44); Obviously in this case you could just .dup i, but what if i were a class itself? It'd be extremely easy to accidentally invalidate every Validated!(FortyTwo, ...) in the program in a single swipe. It gets even worse if i were some class reference to which other, non-validated references existed. Changing those naked references would change i, and possibly invalidate it.I believe inout's point was this, though: Validated!(isPositive, lessThan42, int) i = foo(); Validated!(isPositive, int) n = i; // Fails. Validated!(lessThan42, isPositive, int) r = i; // Fails. This is of course less than optimal. If a type such as Validate is to be added to Phobos, these problems need to be fixed first.I've created a version of Validated now that takes 1 or more constraints, and where a type whose constraints are a superset of another's, is implicitly convertible to that. Sadly, because of D's lack of certain implicit conversions, there are limits. Attached is source (validation.d), and some utility functions that are necessary for it to compile (utils.d). Is this worth working more on? Should it be in Phobos? Other critique? Oh, sorry about those stupid questions, we have a term for that: Detroy!Or just pass a function that validates that the int is both positive and less than 42, which would be much simpler.
Nov 24 2013
On Monday, 25 November 2013 at 07:24:10 UTC, Meta wrote:auto a = validated!containsFortyTwo(new FortyTwo()); auto b = a; //Passes assert(a.i == 42); assert(b.i == 42); mutateFortyTwo(a); //Fails assert(a.i == 43); assert(b.i == 43);"//Fails" should be "//Passes" as well.
Nov 24 2013
On 2013-11-25 08:24, Meta wrote:- The function validated would probably be better named validate, since it actually performs validation and returns a validated type. The struct's name is fine.Yeah, I was somewhat torn there, but I think you're right. Fixed.- I think it'd be better to change "static if (is(typeof(fn(value)) == bool))" to "static if (is(typeof(fn(value)) : bool))", which rather than checking that the return type is exactly bool, it only checks that it's implicitly convertible to bool, AKA "truthy".Even better - test if 'if (fn(value)) {}' compiles. Fixed.- It might be a good idea to have a version(AlwaysValidate) block in assumeValidated for people who don't care about code speed and want maximum safety, that would always run the validation functions. Also, it might be a good idea to mark assumeValidated system, because it blatantly breaks the underlying assumptions being made in the first place. Code that wants to be rock-solid safe will be restricted to using only validate. Or maybe that's going too far.safe is only for memory safety, which this is not. I agree it would be nice to mark assumeValidated as 'warning, may not do what it claims', but safe is not really the correct indicator of that.- Validated doesn't work very well with reference types. The following fails: class CouldBeNull { } bool notNull(T)(T t) if (is(T == class)) { return t !is null; } //Error: cannot implicitly convert expression (this._value) of type inout(CouldBeNull) to f505.CouldBeNull void takesNonNull(Validated!(CouldBeNull, notNull) validatedT) { }Yeah, found that. It's a bug in value(), which should return inout(T), not T. Fixed.- On the subject of reference types, I don't think Validated handles them quite correctly. This is a problem I ran into, and it's not an easy one. Assume for a second that there's a class FourtyTwo that *does* work with Validated: class FortyTwo { int i = 42; } bool containsFortyTwo(FortyTwo ft) { return ft.i == 42; } void mutateFortyTwo(Validated!(FortyTwo, containsFortyTwo) fortyTwo) { fortyTwo.i = 43; } auto a = validated!containsFortyTwo(new FortyTwo()); auto b = a; //Passes assert(a.i == 42); assert(b.i == 42); mutateFortyTwo(a); //Fails assert(a.i == 43); assert(b.i == 43); This is an extremely contrived example, but it illustrates the problem of using reference types with Validated. It gets even hairier if i itself were a reference type, like a slice: void mutateCopiedValue(Validated!(FortyTwo, containsFortyTwo) fortyTwo) { //We're not out of the woods yet int[] arr = fortyTwo.i; arr[0] += 1; } //Continuing from previous example, //except i is now an array mutateCopiedValue(b); assert(a.i[0] == 44); assert(b.i[0] == 44); Obviously in this case you could just .dup i, but what if i were a class itself? It'd be extremely easy to accidentally invalidate every Validated!(FortyTwo, ...) in the program in a single swipe. It gets even worse if i were some class reference to which other, non-validated references existed. Changing those naked references would change i, and possibly invalidate it.This is a known shortcoming for which I see no good workaround. It would be possible to use std.traits.hasAliasing to see which types can be safely .dup'ed and only allow those types, but this is not a solution I like. I guess it could print a warning when used with unsafe types. If I were to do that, I would still want some way to turn that message off. Eh. Maybe there is no good solution. What else is new? - Better error messages for invalid constraints (testing if an int is null, a string is divisible by 3 or an array has a database connection, e.g.) - Fixed a bug in opCast (I love that word - in Norwegian it [oppkast] means puke. ...anyways...) when converting to an incompatible wrapped value. -- Simen
Nov 25 2013
On Monday, 25 November 2013 at 08:52:14 UTC, Simen Kjærås wrote:safe is only for memory safety, which this is not. I agree it would be nice to mark assumeValidated as 'warning, may not do what it claims', but safe is not really the correct indicator of that.What about a version flag, then, that can be passed to specify that the user wants assumeValidated() to run the validation functions as well?This is a known shortcoming for which I see no good workaround. It would be possible to use std.traits.hasAliasing to see which types can be safely .dup'ed and only allow those types, but this is not a solution I like.It's a hard problem. This is a case where a Unique!T type would be really useful.What else is new? - Better error messages for invalid constraints (testing if an int is null, a string is divisible by 3 or an array has a database connection, e.g.) - Fixed a bug in opCast (I love that word - in Norwegian it [oppkast] means puke. ...anyways...) when converting to an incompatible wrapped value. -- SimenKeep up the good work!
Nov 25 2013
On 2013-11-26 06:37, Meta wrote:On Monday, 25 November 2013 at 08:52:14 UTC, Simen Kjærås wrote:That's already in: http://git.io/EdHw8A -- Simensafe is only for memory safety, which this is not. I agree it would be nice to mark assumeValidated as 'warning, may not do what it claims', but safe is not really the correct indicator of that.What about a version flag, then, that can be passed to specify that the user wants assumeValidated() to run the validation functions as well?
Nov 26 2013
On 20.11.2013 18:45, Simen Kjærås wrote:On 20.11.2013 12:49, Jacob Carlborg wrote:Uh-hm. Add this: alias get this;On 2013-11-20 12:16, Jonathan M Davis wrote:May I suggest: struct Validated(alias fn, T) { private T value; property inout T get() { return value; }You'd do it the other way around by having something like ValidatedString!char s = validateString("hello world");Right.ValidatedString would then avoid any extra validation when iterating over the characters, though I don't know how much of an efficiency gain that would actually be given that much of the validation occurs naturally when decoding or using stride. It would have the downside that any function which specializes on strings would likely have to then specialize on ValidatedString as well. So, while I agree with the idea in concept, I'd propose that we benchmark the difference in decoding and striding without the checks and see if there actually is much difference. Because if there isn't, then I don't think that it's worth going to the trouble of adding something like ValidatedString.If not just if the string is valid UTF-8. There can be many other types of valid strings. Or rather other functions that have additional requirements. Like sanitized filenames, HTML/SQL escaped strings and so on.} Validated!(fn, T) validate(alias fn, T)(T value) { Validated!(fn, T) result; fn(value); result.value = value; return result; } void functionThatTakesSanitizedFileNames(Validated!(sanitizeFileName, string) path) { // Do stuff }-- Simen
Nov 20 2013
20-Nov-2013 22:01, Simen Kjærås пишет:On 20.11.2013 18:45, Simen Kjærås wrote:[snip]And it decays to the naked type in a blink of an eye. And some function down the road will do the validation again...May I suggest: struct Validated(alias fn, T) { private T value; property inout T get() { return value; }Uh-hm. Add this: alias get this;-- Dmitry Olshansky} Validated!(fn, T) validate(alias fn, T)(T value) { Validated!(fn, T) result; fn(value); result.value = value; return result; } void functionThatTakesSanitizedFileNames(Validated!(sanitizeFileName, string) path) { // Do stuff }
Nov 20 2013
On Wednesday, 20 November 2013 at 18:30:58 UTC, Dmitry Olshansky wrote:20-Nov-2013 22:01, Simen Kjærås пишет:Yes. It is very important not to allow direct access to the underlying value. This is important for ensuring that it is not put in an invalid state. This is a mistake that was made with std.typecons.Nullable, making it useless for anything other than giving a non-nullable type a null state (which, in fairness, is probably all that it was originally intended for).On 20.11.2013 18:45, Simen Kjærås wrote:[snip]And it decays to the naked type in a blink of an eye. And some function down the road will do the validation again...May I suggest: struct Validated(alias fn, T) { private T value; property inout T get() { return value; }Uh-hm. Add this: alias get this;} Validated!(fn, T) validate(alias fn, T)(T value) { Validated!(fn, T) result; fn(value); result.value = value; return result; } void functionThatTakesSanitizedFileNames(Validated!(sanitizeFileName, string) path) { // Do stuff }
Nov 20 2013
On Wednesday, November 20, 2013 19:53:43 Meta wrote:Yes. It is very important not to allow direct access to the underlying value. This is important for ensuring that it is not put in an invalid state. This is a mistake that was made with std.typecons.Nullable, making it useless for anything other than giving a non-nullable type a null state (which, in fairness, is probably all that it was originally intended for).It's arguably pretty pointless to put a nullable type in std.typecons.Nullable. If you want a nullable type to be null, just set it to null. - Jonathan M Davis
Nov 20 2013
On Wednesday, 20 November 2013 at 19:23:32 UTC, Jonathan M Davis wrote:On Wednesday, November 20, 2013 19:53:43 Meta wrote:See the discussion from the other thread for why it can be useful to wrap a nullable reference in a option type (nullable is a pseudo-option type).Yes. It is very important not to allow direct access to the underlying value. This is important for ensuring that it is not put in an invalid state. This is a mistake that was made with std.typecons.Nullable, making it useless for anything other than giving a non-nullable type a null state (which, in fairness, is probably all that it was originally intended for).It's arguably pretty pointless to put a nullable type in std.typecons.Nullable. If you want a nullable type to be null, just set it to null. - Jonathan M Davis
Nov 20 2013
On Wednesday, November 20, 2013 20:40:40 Meta wrote:On Wednesday, 20 November 2013 at 19:23:32 UTC, Jonathan M Davis wrote:I know. And I still think that it's pointless - and it incurs extra overhead to boot, making it _worse_ than pointless. But clearly there's disagreement on the matter. - Jonathan M DavisOn Wednesday, November 20, 2013 19:53:43 Meta wrote:See the discussion from the other thread for why it can be useful to wrap a nullable reference in a option type (nullable is a pseudo-option type).Yes. It is very important not to allow direct access to the underlying value. This is important for ensuring that it is not put in an invalid state. This is a mistake that was made with std.typecons.Nullable, making it useless for anything other than giving a non-nullable type a null state (which, in fairness, is probably all that it was originally intended for).It's arguably pretty pointless to put a nullable type in std.typecons.Nullable. If you want a nullable type to be null, just set it to null. - Jonathan M Davis
Nov 20 2013
On 2013-11-20 19:53, Meta wrote:Yes. It is very important not to allow direct access to the underlying value. This is important for ensuring that it is not put in an invalid state. This is a mistake that was made with std.typecons.Nullable, making it useless for anything other than giving a non-nullable type a null state (which, in fairness, is probably all that it was originally intended for).In that case all string functionality needs to be provided inside the Validated struct. In addition to that we loose the beauty of UFCS, at least for functions expecting plain "string". -- /Jacob Carlborg
Nov 20 2013
On Thursday, November 21, 2013 08:36:37 Jacob Carlborg wrote:On 2013-11-20 19:53, Meta wrote:You could use alias this and alias the Validated struct to the underlying string, but if you did that, you'd probably end up having it escape the struct and used as a naked string the vast majority of the time, which would essentially defeat the purpose of the Validated struct. - Jonathan M DavisYes. It is very important not to allow direct access to the underlying value. This is important for ensuring that it is not put in an invalid state. This is a mistake that was made with std.typecons.Nullable, making it useless for anything other than giving a non-nullable type a null state (which, in fairness, is probably all that it was originally intended for).In that case all string functionality needs to be provided inside the Validated struct. In addition to that we loose the beauty of UFCS, at least for functions expecting plain "string".
Nov 20 2013
On 2013-11-21 08:46, Jonathan M Davis wrote:You could use alias this and alias the Validated struct to the underlying string, but if you did that, you'd probably end up having it escape the struct and used as a naked string the vast majority of the time, which would essentially defeat the purpose of the Validated struct.Yeah, that's what needs to be avoided and is the reason "alias this" or a property returning the raw string cannot be used. -- /Jacob Carlborg
Nov 21 2013
On Thursday, 21 November 2013 at 07:36:38 UTC, Jacob Carlborg wrote:On 2013-11-20 19:53, Meta wrote:This is tricky business. Unfortunately, having the wrapper be able to degrade to its base type is at odds with providing compiler-enforced guarantees. We can't allow direct access to the underlying string, because the user could purposely or inadvertently put it in an invalid state. On the other hand, these opaque wrapper types can no longer be transparently substituted into existing code. One solution is copying the validated string to do arbitrary operations on, leaving the original validated string unchanged. auto validatedString = validate!isValidUTF(someString); //Doesn't work; Validated!string does not expose the string interface //auto invalidString = validatedString.map!(c => c - cast(char)int.max); //Also doesn't work //validatedString ~= cast(char)0xFFFF auto validatedCopy = validatedString.duplicate(); //Do bad things with validatedCopy. validatedString remains unchanged and validYes. It is very important not to allow direct access to the underlying value. This is important for ensuring that it is not put in an invalid state. This is a mistake that was made with std.typecons.Nullable, making it useless for anything other than giving a non-nullable type a null state (which, in fairness, is probably all that it was originally intended for).In that case all string functionality needs to be provided inside the Validated struct. In addition to that we loose the beauty of UFCS, at least for functions expecting plain "string".
Nov 20 2013
On 20.11.2013 19:30, Dmitry Olshansky wrote:20-Nov-2013 22:01, Simen Kjærås пишет:And guess what? That's (often) ok. It's better to do the validation once too many than missing it once. The point (at least in the cases I've used it) is to enforce that only validated values are passed to functions that require validated strings, not that validated values never be passed to functions that don't really care. Doing it like this also lets you call functions that take the unadorned type, because that might be just as important. The result of re-validating is performance loss. The result of missed validation is a bug. Also, in just a few lines, you can make a version that will *not* decay to the original type: struct Validated(alias fn, T) { private T _value; property inout T value() { return _value; } } // validated() is identical to before. Sure, using it is a bit more verbose than using the unadorned type, which is why I chose to make the original version automatically decay. This is a judgment where sensible people may disagree, even with themselves on a case-by-case basis. -- SimenOn 20.11.2013 18:45, Simen Kjærås wrote:[snip]And it decays to the naked type in a blink of an eye. And some function down the road will do the validation again...May I suggest: struct Validated(alias fn, T) { private T value; property inout T get() { return value; }Uh-hm. Add this: alias get this;
Nov 20 2013
On 2013-11-21 01:16, Simen Kjærås wrote:The result of re-validating is performance loss. The result of missed validation is a bug. Also, in just a few lines, you can make a version that will *not* decay to the original type: struct Validated(alias fn, T) { private T _value; property inout T value() { return _value; } } // validated() is identical to before. Sure, using it is a bit more verbose than using the unadorned type, which is why I chose to make the original version automatically decay. This is a judgment where sensible people may disagree, even with themselves on a case-by-case basis.It's still accessible via "value". -- /Jacob Carlborg
Nov 20 2013
On 2013-11-21 08:38, Jacob Carlborg wrote:On 2013-11-21 01:16, Simen Kjærås wrote:Indeed it is. If we want to make it perfectly impossible to get at the contents, so as to hinder all possible use of the data, I suggest this solution: struct Validated {} Validated validate() { return Validated.init; } -- SimenThe result of re-validating is performance loss. The result of missed validation is a bug. Also, in just a few lines, you can make a version that will *not* decay to the original type: struct Validated(alias fn, T) { private T _value; property inout T value() { return _value; } } // validated() is identical to before. Sure, using it is a bit more verbose than using the unadorned type, which is why I chose to make the original version automatically decay. This is a judgment where sensible people may disagree, even with themselves on a case-by-case basis.It's still accessible via "value".
Nov 21 2013
On Wednesday, 20 November 2013 at 18:30:58 UTC, Dmitry Olshansky wrote:And it decays to the naked type in a blink of an eye. And some function down the road will do the validation again...Not if that function down the road only accepted validated in the first place because that is what it needed. Follow the rule - if you need validated instance only accept validated type - do not try to validate.
Nov 21 2013
On 11/20/2013 3:16 AM, Jonathan M Davis wrote:ValidatedString would then avoid any extra validation when iterating over the characters, though I don't know how much of an efficiency gain that would actually be given that much of the validation occurs naturally when decoding or using stride. It would have the downside that any function which specializes on strings would likely have to then specialize on ValidatedString as well. So, while I agree with the idea in concept, I'd propose that we benchmark the difference in decoding and striding without the checks and see if there actually is much difference. Because if there isn't, then I don't think that it's worth going to the trouble of adding something like ValidatedString.Utf validation isn't the only form of validation for strings. You could, for example, validate that the string doesn't contain SQL injection code, or contains a correctly formatted date, or has a name that is guaranteed to be in your employee database, or is a valid phone number, or is a correct email address, etc. Again, validation is not defined by D, it is defined by the constraints YOUR PROGRAM puts on it.
Nov 20 2013
On Wednesday, November 20, 2013 16:26:59 Walter Bright wrote:On 11/20/2013 3:16 AM, Jonathan M Davis wrote:Yes, but we seemed to be discussing the possibility of having some kind of type in Phobos which indicated that the string had been validated for UTF correctness. I wouldn't expect other types of string validation to end up in Phobos. And without the type for UTF validation being in Phobos and specialized on in Phobos functions, I don't think that I would ever want to use it, because in such a case, you lose out on all of the specialization that Phobos does for strings and are stuck with a range of dchar, which will force a lot of extra decoding even if some of the validation can be skipped, since it was already validated, whereas a number of Phobos functions are able to specialize on narrow strings and avoid decoding altogether. That performance boost would be lost if a string was wrapped in a UTFValidatedString without Phobos specializing on UTFValidatedString, and based on how decode and stride work, it looks to me like the decoding costs way more than the little bit of extra validation that is currently done as part of that such that avoiding the decoding is likely to be a much greater performance boost than avoiding those checks. And if that is indeed the case, I don't see much point to something like UTFValidatedString unless Phobos specializes for it like it specializes for narrow strings. Other types of string validation might very well be worth doing without Phobos knowing about them, but having the wrapper type which indicates that that validation has been done still needs to be worth more than the performance hit of not being able to use naked strings anymore and losing any performance gains that come from the functions which specialize for narrow strings. And that's probably true for strings that just get passed around but probably isn't true for strings that end up being processed by range-based functions a lot. - Jonathan M DavisValidatedString would then avoid any extra validation when iterating over the characters, though I don't know how much of an efficiency gain that would actually be given that much of the validation occurs naturally when decoding or using stride. It would have the downside that any function which specializes on strings would likely have to then specialize on ValidatedString as well. So, while I agree with the idea in concept, I'd propose that we benchmark the difference in decoding and striding without the checks and see if there actually is much difference. Because if there isn't, then I don't think that it's worth going to the trouble of adding something like ValidatedString.Utf validation isn't the only form of validation for strings. You could, for example, validate that the string doesn't contain SQL injection code, or contains a correctly formatted date, or has a name that is guaranteed to be in your employee database, or is a valid phone number, or is a correct email address, etc. Again, validation is not defined by D, it is defined by the constraints YOUR PROGRAM puts on it.
Nov 20 2013
Am Wed, 20 Nov 2013 16:26:59 -0800 schrieb Walter Bright <newshound2 digitalmars.com>:Utf validation isn't the only form of validation for strings. You could, for example, validate that the string doesn't contain SQL injection code, or contains a correctly formatted date, or has a name that is guaranteed to be in your employee database, or is a valid phone number, or is a correct email address, etc. Again, validation is not defined by D, it is defined by the constraints YOUR PROGRAM puts on it.A checked type for database access goes a bit beyond the scope of the proposal. You'd need to encapsulate a transaction that needs to be working on a snapshot of the database state and fail if data changed in another transaction. Otherwise you could validate a name against the database just before someone else deletes it and thus invalidates the string. With a DB transaction wrapped in the validation, assignment between two "validated" strings becomes a pretty sophisticated runtime action, while the original proposals evolved around validation functions that can be pure. /This allows us to assign one validated string type to another with no runtime overhead./ -- Marco
Nov 25 2013
On 11/20/2013 2:49 AM, Jacob Carlborg wrote:How should we accomplish this? We can't replace: void main (string[] args) With void main (UnsafeString[] args) And break every application out there.Use a different type for the validated string, validated means your program has guaranteed it has a certain form defined by that program.
Nov 20 2013
On Tuesday, November 19, 2013 16:01:00 Andrei Alexandrescu wrote:Please chime in with ideas!In general, I favor using defensive programming in library APIs and using enforce to validate the input to functions. Doing so makes it much harder to misuse the library and makes it much less likely that programs will run into weird and/or undefined behavior or other types of bugs. I then favor using DbC within a library or application for its own code and asserting that input is valid in those cases, because in that case, the caller is essentially part of the same code that's doing the asserting and is maintained by the same people. The problem with that is of course that there are cases where performance degrades when you use defensive programming and always check input - especially when the caller can know that the data is valid without having to check it first. So, having a way to use an API that doesn't involve it always defensively checking its input can be useful for the sake of efficiency. Unfortunately, I don't think that it scales at all to take the approach that Walter has suggested of having the API normally assert on input and provide helper functions which the caller can use to validate input when they deem appropriate. That has the advantage of giving the caller control over what is and isn't checked and avoiding unnecessary checks, but it also makes it much easier to misuse the API, and I would expect the average programmer to skip the checks in most cases. It very quickly becomes like using error codes instead of exceptions, except that in this case, instead of an error code being ignored, the data's validity wouldn't have even been checked in the first place, resulting in the function being called doing who-knows-what. And the resulting bugs could be very obvious, or they could be insidiously hard to detect. So, if we can find a way to default to checking validity and throwing on bad input but still provide a way for the caller to avoid the checks when appropriate, I think that that would be ideal. That way, we default to correctness and user-friendliness (in that the API is harder to silently use incorrectly that way), but we still provide a more performant route for those who know what they're doing and are willing to take the time to make sure that they are sure that they truly do know how to use the API correctly and take responsibility for ensuring that they don't feed bad input to the API. Now, how we do that, I don't know. In some cases, creating a wrapper type would solve the problem (e.g. some kind of wrapper for strings which guaranteed UTF-correctness). But I don't think that it scales to use wrapper types for all such situations. One alternative is to essentially duplicate a lot of functions with one function validating the input for you and throwing on failure, and the other asserting that the input is valid. But that could result in a lot of code duplication, which isn't terribly desirable either. The assumeSorted or FracSec.assumeValid solutions seem to go either with a wrapper type or with essentially being a second function which does the same thing but without the validation depending on the types involved and what the function is doing. Another alternative would be to provide an argument (probably a template argument, though it could be a function argument if that makes more sense) which told the function whether it should assert or enforce on its input. That would at least localize the code duplication, but again, that could get a bit verbose, and I do like how assumeXYZ makes it abundantly clear that the caller is taking responsibility for the correctness in that case. And in some situations, I think that it would clearly be the case that it wouldn't make any sense to do anything else other than enforce on the input (e.g. string parsing functions have a tendency to have to do almos the same work in the validation function as the actual parsing function, making it almost pointless to have a separate validation function). So, I think that what we end up doing is definitely going to depend on what the code in question is for and what it's doing, but I agree that it would be valuable to come up with some common idioms for handling validation and error checking, and assumeXYZ would be one such idiom and one which documents things nicely when it can be used. Still, the most important point that I'd like to make is that I think we should lean towards validating input with enforce by default and then provide alternative means to avoid that validation rather than using assertions and DbC by default, because leaving the validation up to the caller in release and asserting in debug is going to lead to _far_ more bugs in code using Phobos, particularly when the result isn't immediately and obviously wrong when bad input is given. And the fact that by default, the assertions in Phobos won't be hit in calling code unless the Phobos function is templatized (because Phobos will have been compiled in release) makes using assertions that much worse. But I'll definitely have to think about idioms that we could use to do separate validation where appropriate and yet validate arguments via enforce by default. - Jonathan M Davis
Nov 20 2013
On 2013-11-20 11:38, Jonathan M Davis wrote:Unfortunately, I don't think that it scales at all to take the approach that Walter has suggested of having the API normally assert on input and provide helper functions which the caller can use to validate input when they deem appropriate. That has the advantage of giving the caller control over what is and isn't checked and avoiding unnecessary checks, but it also makes it much easier to misuse the API, and I would expect the average programmer to skip the checks in most cases. It very quickly becomes like using error codes instead of exceptions, except that in this case, instead of an error code being ignored, the data's validity wouldn't have even been checked in the first place, resulting in the function being called doing who-knows-what. And the resulting bugs could be very obvious, or they could be insidiously hard to detect.I think Walter suggestion requires the use of asserts: bool isValid (Data data); void process (Data data) { assert(isValid(data)); // process } The asserts should be on by default and remove in release builds. This would require DMD shipping two versions of Phobos, one with asserts enabled and one where they're disabled. Then only when the -release flag is used the the version of Phobos with disabled asserts will be used.Still, the most important point that I'd like to make is that I think we should lean towards validating input with enforce by default and then provide alternative means to avoid that validation rather than using assertions and DbC by default, because leaving the validation up to the caller in release and asserting in debug is going to lead to _far_ more bugs in code using Phobos, particularly when the result isn't immediately and obviously wrong when bad input is given. And the fact that by default, the assertions in Phobos won't be hit in calling code unless the Phobos function is templatized (because Phobos will have been compiled in release) makes using assertions that much worse.DMD need to ship with two versions of Phobos, one with assertions on and one with them disabled. -- /Jacob Carlborg
Nov 20 2013
On 11/20/2013 12:57 PM, Jacob Carlborg wrote:bool isValid (Data data); void process (Data data) { assert(isValid(data)); // process }void process(Data data)in{ assert(isValid(data)); }body{ // process }
Nov 20 2013
On 2013-11-20 14:01, Timon Gehr wrote:void process(Data data)in{ assert(isValid(data)); }body{ // process }Right, forgot about contracts. -- /Jacob Carlborg
Nov 20 2013
On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei Alexandrescu wrote:(c) A variety of text functions currently suffer because we don't make the difference between validated UTF strings and potentially invalid ones.I think it is fair to always assume that a char[] is a valid UTF-8 string, and instead perform the validation when creating/filling the string from a non-validated source. Take std.file.read() as an example; it returns void[], but has a validating counterpart in std.file.readText(). I think we should use ubyte[] to a greater extent for data which is potentially *not* valid UTF. Examples include interfacing with C functions, where I think there is a tendency towards always translating C char to D char, when they are in fact not equivalent. Another example is, again, std.file.read(), which currently returns void[]. I guess it is a matter of taste, but I think ubyte[] would be more appropriate here, since you can actually use it for something without casting it first. The transition from string to ubyte[] is already made simple by std.string.representation. We should offer an equally simple and convenient way to do the opposite transformation. In one of my current projects, I am using this function: inout(char)[] asString(inout(ubyte)[] data) safe pure { auto s = cast(typeof(return)) data; import std.utf: validate; validate(s); return s; } This could easily be written as a template, to accept wider encodings as well, and I think it would be a nice addition to Phobos. Lars
Nov 20 2013
On Wednesday, November 20, 2013 11:45:57 Lars T. Kyllingstad wrote:On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei Alexandrescu wrote:That doesn't work when strings are being created via concatenation and the like inside the program rather than simply coming from outside the program.(c) A variety of text functions currently suffer because we don't make the difference between validated UTF strings and potentially invalid ones.I think it is fair to always assume that a char[] is a valid UTF-8 string, and instead perform the validation when creating/filling the string from a non-validated source.Take std.file.read() as an example; it returns void[], but has a validating counterpart in std.file.readText(). I think we should use ubyte[] to a greater extent for data which is potentially *not* valid UTF.Well, we've already discussed the possibility of using ubyte[] to indicate ASCII strings, and that makes a lot more sense IMHO, because then no decoding occurs (which is precisely what you want for ASCII), whereas with a string that's potentially invalid UTF, it's not that we don't want to decode it. It's just that we need to validate it when decoding it. So, I'd argue that ubyte[] should be used when you want to operate on code units rather than code points rather than it having anything to do with validating code points. - Jonathan M Davis
Nov 20 2013
20-Nov-2013 14:45, Lars T. Kyllingstad пишет:On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei Alexandrescu wrote:Sadly it's horrifically slow to do so. Above all practicality must take precedence. Would you like to validate the whole file just to later re-scan it anew to say tokenize source file?(c) A variety of text functions currently suffer because we don't make the difference between validated UTF strings and potentially invalid ones.I think it is fair to always assume that a char[] is a valid UTF-8 string, and instead perform the validation when creating/filling the string from a non-validated source. Take std.file.read() as an example; it returns void[], but has a validating counterpart in std.file.readText().I think we should use ubyte[] to a greater extent for data which is potentially *not* valid UTF. Examples include interfacing with C functions, where I think there is a tendency towards always translating C char to D char, when they are in fact not equivalent. Another example is, again, std.file.read(), which currently returns void[]. I guess it is a matter of taste, but I think ubyte[] would be more appropriate here, since you can actually use it for something without casting it first.Otherwise I think it's a good idea to encode high-level invariants in types. The only problem is inadvertent template bloat then. [snip] -- Dmitry Olshansky
Nov 20 2013
On 11/20/13, 18:45, Lars T. Kyllingstad wrote:I think we should use ubyte[] to a greater extent for data which is potentially *not* valid UTF. Examples include interfacing with C functions, where I think there is a tendency towards always translating C char to D char, when they are in fact not equivalent. Another example is, again, std.file.read(), which currently returns void[]. I guess it is a matter of taste, but I think ubyte[] would be more appropriate here, since you can actually use it for something without casting it first.+1 Especially the windows APIs, they never take UTF-8(*) but consistently get translated to taking D char :( In fact, if we want a good translation from C to D, we should be using D byte. On most platforms I've run into have C char is signed. (To be honest, you don't see 'byte' much in D code, so it would make the ported code stand out even more.) * except from MultiByteToWideChar
Nov 26 2013
On Wednesday, November 20, 2013 08:51:16 Joseph Rushton Wakeling wrote:On 20/11/13 01:01, Andrei Alexandrescu wrote:When an assertion fails, it's a bug in your code. Assertions should _never_ be used for validating user input. So, if your function is asserting on the state of its input, then it is requiring that the caller give input which follows that contract, and it's a bug in the caller when they violate that contract by passing in bad input. When your function uses enforce to validate its input, it is _not_ considered a bug when bad input is given. It _could_ be a bug in the caller, but they are not required to give valid input. When they give invalid input, they then get to react to the exception that was thrown and handle the error appropriately. Then this works when the input came from outside the program (e.g. a user or a file) as well as when it doesn't make sense for the caller to have validated the input before calling the function (e.g. because the validator function and the function doing the work end up having to almost the same work, making it cheaper to just have the function validate its input and not have a separate validator function). It also makes it so that the function will _never_ have to operate on invalid input as invalid input will always be checked and rejected, which then makes it much harder to use the function incorrectly. But ultimately, whether you use assertions or exceptions comes down to whether it's considered to always be a bug in the caller if the input is bad. DbC uses assertions and considers it a bug in the caller (since they violated their part of the contract), whereas defensive programming has the function protect itself and always check and throw on invalid input rather than assuming that the caller is going to provide valid input. - Jonathan M DavisThere's been recent discussion herein about what parameter validation method would be best for Phobos to adhere to. Currently we are using a mix of approaches: 1. Some functions enforce() 2. Some functions just assert() 3. Some (fewer I think) functions assert(0) 4. Some functions don't do explicit checking, relying instead on lower-level enforcement such as null dereference and bounds checking to ensure safety. Each method has its place. The question is what guidelines we put forward for Phobos code to follow; we're a bit loose about that right now.Regarding enforce() vs. assert(), a good rule that I remember having suggested to me was that enforce() should be used for actual runtime checking (e.g. checking that the input to a public API function has correct properties), assert() should be used to test logical failures (i.e. checking that cases which should never arise, really don't arise). I've always followed that as a rule of thumb ever since.
Nov 20 2013