digitalmars.D - Ranges longer than size_t.max
- Peter Alexander (28/28) Dec 28 2012 There was a discussion a while ago about the type returned by
- Simen Kjaeraas (26/51) Dec 28 2012 64 bits ought to be enough for anyone. :p
- Peter Alexander (9/21) Dec 28 2012 This doesn't solve the issue. For example, even with your update,
- Jonathan M Davis (8/43) Dec 28 2012 I'd be very strongly inclined to go with #4 and just say that anyone who...
- Peter Alexander (7/19) Dec 28 2012 Sorry, I don't think I was clear on what the issue is.
- Andrei Alexandrescu (3/19) Dec 28 2012 I think it would complicate a lot of things for the benefit of a few cas...
- Peter Alexander (7/15) Dec 29 2012 I agree, but the range of permutations will be longer than 2^64
- Jonathan M Davis (7/26) Dec 29 2012 I expect that it'll be silently overflow in most cases, because that's a...
- Andrei Alexandrescu (3/16) Dec 29 2012 I'd say give it a different name, i.e. bigLength.
- Peter Alexander (7/30) Dec 29 2012 But then the range doesn't have a length, even though it would be
- Andrei Alexandrescu (4/6) Dec 29 2012 I'd think you convinced yourself it's a bad idea while writing it. I did...
- Mehrdad (8/11) Dec 28 2012 Uhm, 64-bit?
- Jonathan M Davis (21/34) Dec 29 2012 You bring up a good point, but not even arrays support that, so stuff li...
- Dmitry Olshansky (9/43) Dec 29 2012 s/array/slice
- Mehrdad (4/6) Dec 29 2012 I didn't mean mmfile, I meant random access to the bytes of a
- SomeDude (2/8) Dec 30 2012 AFAIK, no 32-bit OS supports files larger than 4Gb anyway.
- Dmitry Olshansky (9/20) Dec 30 2012 False. It's solely a question of FS used.
- Era Scarecrow (9/15) Dec 31 2012 I was writing some code to go through the FAT12/16/32, and some
- Stewart Gordon (14/29) Dec 31 2012 Sector? Do you mean cluster?
- Era Scarecrow (27/41) Dec 31 2012 Wouldn't have been practical. Historically the FAT table was a
- Jonathan M Davis (8/20) Dec 30 2012 Um. No. There's a reason that things like the macro _LARGE_FILE_API exis...
- SomeDude (3/16) Jan 02 2013 That's true, I stand corrected.
- Marco Leise (11/13) Dec 31 2012 Anachronism!!!
-
Stewart Gordon
(20/30)
Dec 29 2012
- Peter Alexander (11/25) Dec 29 2012 It's also three, for large values of three :-)
- Stewart Gordon (12/23) Dec 30 2012 Yes, you have a point there. Any facility for creating a range by
- monarch_dodra (20/21) Dec 29 2012 I still think there is a fundamental difference between:
- Jonathan M Davis (10/19) Dec 29 2012 It's now enforced that finite random access ranges have length, so we're...
- Era Scarecrow (4/9) Dec 31 2012 Option 5: Incorporate (emulation code?) for cent/ucent, then
- Peter Alexander (5/15) Dec 31 2012 Two problems with this:
- Era Scarecrow (13/30) Dec 31 2012 Perhaps, but having cent/ucent available would make a few people
- monarch_dodra (4/6) Dec 31 2012 Not to change the subject, but couldn't we just implement Cent
- Era Scarecrow (7/14) Dec 31 2012 Yes we could. Just that they are already defined built-in types
There was a discussion a while ago about the type returned by .length for ranges. I believe the conclusion was that it should always be size_t. My question now is what to do with really large ranges? For example, if we have a range of permutations of the elements in a set, when the set is larger than 21 the number of permutations is 21 factorial, which is over 2^64. As far as I see, there are three options: 1. Allow ranges to return BigInt for length. 2. Allow ranges like this but assert when .length overflows. 3. Allow ranges like this and just allow .length to overflow dangerously. 4. Do not allow ranges like this. Option 1 solves the problem, but significantly complicates Phobos development for the rare case. Option 2 works, and is practical, although runtime asserts are undesirable. Option 3 is current Phobos practice (presumably because overflow is unlikely). See example below. This may be acceptable currently, but perhaps less so when overflow is more likely. -------------------------------- auto a = iota(size_t.max / 2 + 1); auto b = chain(a, a); writeln(a.length); // 9223372036854775808 writeln(b.length); // 0 -------------------------------- Option 4 works, but it would be a shame to lose these ranges. Thoughts?
Dec 28 2012
On 2012-18-29 02:12, Peter Alexander <peter.alexander.au gmail.com> wrote:There was a discussion a while ago about the type returned by .length for ranges. I believe the conclusion was that it should always be size_t. My question now is what to do with really large ranges? For example, if we have a range of permutations of the elements in a set, when the set is larger than 21 the number of permutations is 21 factorial, which is over 2^64. As far as I see, there are three options: 1. Allow ranges to return BigInt for length. 2. Allow ranges like this but assert when .length overflows. 3. Allow ranges like this and just allow .length to overflow dangerously. 4. Do not allow ranges like this. Option 1 solves the problem, but significantly complicates Phobos development for the rare case. Option 2 works, and is practical, although runtime asserts are undesirable. Option 3 is current Phobos practice (presumably because overflow is unlikely). See example below. This may be acceptable currently, but perhaps less so when overflow is more likely. -------------------------------- auto a = iota(size_t.max / 2 + 1); auto b = chain(a, a); writeln(a.length); // 9223372036854775808 writeln(b.length); // 0 -------------------------------- Option 4 works, but it would be a shame to lose these ranges. Thoughts?64 bits ought to be enough for anyone. :p This is a bit of a tough one, and I believe the root problem is that user-defined types are second-class citizens. Specifically, CommonType is incapable of finding the common type of BigInt and int. This makes it hard to find the correct return type for e.g. chain. Behind the scenes, CommonType is using ?: to figure out the correct type. If D supported opImplicitCastFrom (as suggested in WalterAndrei.pdf), ?: could figure out the correct type, and chain's .length could be: private template LengthType(Range) { alias typeof(Range.length) LengthType; } property auto length() { CommonType!(staticMap!(LengthType, R)) result = 0; foreach (i, Unused; R) { result += source[i].length; } return result; } This would thus support BigInt .length just as well as size_t, byte, or whatever. -- Simen
Dec 28 2012
On Saturday, 29 December 2012 at 01:56:04 UTC, Simen Kjaeraas wrote:property auto length() { CommonType!(staticMap!(LengthType, R)) result = 0; foreach (i, Unused; R) { result += source[i].length; } return result; } This would thus support BigInt .length just as well as size_t, byte, or whatever.This doesn't solve the issue. For example, even with your update, the code snippet I provided in the original post with iota and chain would still break (assuming iota(size_t.max).length returns a size_t). Also, if we allow BigInt, even with better support in CommonType it would significantly complicate Phobos with LengthType!Range popping up everywhere.
Dec 28 2012
On Saturday, December 29, 2012 02:18:36 Peter Alexander wrote:There was a discussion a while ago about the type returned by .length for ranges. I believe the conclusion was that it should always be size_t. My question now is what to do with really large ranges? For example, if we have a range of permutations of the elements in a set, when the set is larger than 21 the number of permutations is 21 factorial, which is over 2^64. As far as I see, there are three options: 1. Allow ranges to return BigInt for length. 2. Allow ranges like this but assert when .length overflows. 3. Allow ranges like this and just allow .length to overflow dangerously. 4. Do not allow ranges like this. Option 1 solves the problem, but significantly complicates Phobos development for the rare case. Option 2 works, and is practical, although runtime asserts are undesirable. Option 3 is current Phobos practice (presumably because overflow is unlikely). See example below. This may be acceptable currently, but perhaps less so when overflow is more likely. -------------------------------- auto a = iota(size_t.max / 2 + 1); auto b = chain(a, a); writeln(a.length); // 9223372036854775808 writeln(b.length); // 0 -------------------------------- Option 4 works, but it would be a shame to lose these ranges. Thoughts?actually cares about numbers that large should use 64-bit. Needing ranges of length greater than uint.max is definitely not the norm, and I would expect people caring that much about computational stuff to be using 64-bit anyway, since odds are they'll need the memory. Allowing for length to be anything other than size_t is extremely annoying for generic code. - Jonathan M Davis
Dec 28 2012
On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis wrote:On Saturday, December 29, 2012 02:18:36 Peter Alexander wrote: anyone who actually cares about numbers that large should use 64-bit. Needing ranges of length greater than uint.max is definitely not the norm, and I would expect people caring that much about computational stuff to be using 64-bit anyway, since odds are they'll need the memory. Allowing for length to be anything other than size_t is extremely annoying for generic code.Sorry, I don't think I was clear on what the issue is. I'm already assuming 64-bit. What I'm saying is that 64-bit sometimes isn't even enough (for example in the case of a permutations range). I'm asking if allowing length to return BigInt would be reasonable.
Dec 28 2012
On 12/28/12 9:21 PM, Peter Alexander wrote:On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis wrote:I think it would complicate a lot of things for the benefit of a few cases. AndreiOn Saturday, December 29, 2012 02:18:36 Peter Alexander wrote: actually cares about numbers that large should use 64-bit. Needing ranges of length greater than uint.max is definitely not the norm, and I would expect people caring that much about computational stuff to be using 64-bit anyway, since odds are they'll need the memory. Allowing for length to be anything other than size_t is extremely annoying for generic code.Sorry, I don't think I was clear on what the issue is. I'm already assuming 64-bit. What I'm saying is that 64-bit sometimes isn't even enough (for example in the case of a permutations range). I'm asking if allowing length to return BigInt would be reasonable.
Dec 28 2012
On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei Alexandrescu wrote:On 12/28/12 9:21 PM, Peter Alexander wrote:I agree, but the range of permutations will be longer than 2^64 on occasion, so what do we do? * Don't provide .length at all * Provide it as size_t, and assert on overflow * Provide it as size_t, and silently overflowI'm already assuming 64-bit. What I'm saying is that 64-bit sometimes isn't even enough (for example in the case of a permutations range). I'm asking if allowing length to return BigInt would be reasonable.I think it would complicate a lot of things for the benefit of a few cases.
Dec 29 2012
On Saturday, December 29, 2012 10:00:44 Peter Alexander wrote:On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei Alexandrescu wrote:I expect that it'll be silently overflow in most cases, because that's a lot simpler and the problem is almost never an issue. It also incurs a performance penalty in non-release mode where there's almost never any benefit from it. But if you want to provide pull requests with assertions for overflow in various Phobos algorithms, then they'll probably be merged in. - Jonathan M DavisOn 12/28/12 9:21 PM, Peter Alexander wrote:I agree, but the range of permutations will be longer than 2^64 on occasion, so what do we do? * Don't provide .length at all * Provide it as size_t, and assert on overflow * Provide it as size_t, and silently overflowI'm already assuming 64-bit. What I'm saying is that 64-bit sometimes isn't even enough (for example in the case of a permutations range). I'm asking if allowing length to return BigInt would be reasonable.I think it would complicate a lot of things for the benefit of a few cases.
Dec 29 2012
On 12/29/12 4:00 AM, Peter Alexander wrote:On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei Alexandrescu wrote:I'd say give it a different name, i.e. bigLength. AndreiOn 12/28/12 9:21 PM, Peter Alexander wrote:I agree, but the range of permutations will be longer than 2^64 on occasion, so what do we do? * Don't provide .length at all * Provide it as size_t, and assert on overflow * Provide it as size_t, and silently overflowI'm already assuming 64-bit. What I'm saying is that 64-bit sometimes isn't even enough (for example in the case of a permutations range). I'm asking if allowing length to return BigInt would be reasonable.I think it would complicate a lot of things for the benefit of a few cases.
Dec 29 2012
On Saturday, 29 December 2012 at 14:23:09 UTC, Andrei Alexandrescu wrote:On 12/29/12 4:00 AM, Peter Alexander wrote:But then the range doesn't have a length, even though it would be useful to have it in a lot of cases. How about, have .length, and allow it to dangerously overflow, but also provide .bigLength, which returns a BigInt for when people need it?On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei Alexandrescu wrote:I'd say give it a different name, i.e. bigLength.On 12/28/12 9:21 PM, Peter Alexander wrote:I agree, but the range of permutations will be longer than 2^64 on occasion, so what do we do? * Don't provide .length at all * Provide it as size_t, and assert on overflow * Provide it as size_t, and silently overflowI'm already assuming 64-bit. What I'm saying is that 64-bit sometimes isn't even enough (for example in the case of a permutations range). I'm asking if allowing length to return BigInt would be reasonable.I think it would complicate a lot of things for the benefit of a few cases.
Dec 29 2012
On 12/29/12 3:38 PM, Peter Alexander wrote:How about, have .length, and allow it to dangerously overflow, but also provide .bigLength, which returns a BigInt for when people need it?I'd think you convinced yourself it's a bad idea while writing it. I did the same :o). Andrei
Dec 29 2012
On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis wrote:anyone who actually cares about numbers that large should use 64-bit.Uhm, 64-bit? What about large files (> 4 GiB) on 32-bit systems? (Say, if a range wants to provide random access to the bytes of an 8 GiB file?) And what about large bit vectors? A 600 MiB bit vector needs > 2^32 bits for length...
Dec 28 2012
On Saturday, December 29, 2012 07:12:49 Mehrdad wrote:On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis wrote:You bring up a good point, but not even arrays support that, so stuff like std.mmfile can't possibly work with large files on 32-bit systems, not with the current design anyway (as it gives you an array to operate on). It may be that you just need to do something different if you want to operate on large files on 32-bit systems. I don't know. Plenty of stuff right now in Phobos isn't going to work right now if you try and have a length of 64 bits on a 32-bit system. size_t is used pretty much everywhere. And it's going to make generic code a _lot_ less pleasant if we have to worry about using anything other than size_t. It's already been causing us problems that iota does, and we've discussed making it so that it doesn't. But even if we allow ulong for length and indexing on 32-bit systems, that's a completely different ball game from permitting BigInt. So, I'd _very_ much like to always restruct length and indices to size_t for ranges. It will make all of the generic handling of that stuff far more pleasant. But we may very well be forced to permit ulong on 32-bit systems due to practical concerns such as large file handling on 32-bit systems. Oh, how I wish that 32-bit systems would just die out. It would make so many things so much easier. No such luck on that yet though, much as most PC processors are 64-bit now. - Jonathan M Davisanyone who actually cares about numbers that large should use 64-bit.Uhm, 64-bit? What about large files (> 4 GiB) on 32-bit systems? (Say, if a range wants to provide random access to the bytes of an 8 GiB file?)
Dec 29 2012
12/29/2012 1:30 PM, Jonathan M Davis пишет:On Saturday, December 29, 2012 07:12:49 Mehrdad wrote:s/array/slice It maps a view of a part of file. Obviously one can't map parts greater then address space.On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis wrote:You bring up a good point, but not even arrays support that, so stuff like std.mmfile can't possibly work with large files on 32-bit systems, not with the current design anyway (as it gives you an array to operate on). It may be that you just need to do something different if you want to operate on large files on 32-bit systems. I don't know.anyone who actually cares about numbers that large should use 64-bit.Uhm, 64-bit? What about large files (> 4 GiB) on 32-bit systems? (Say, if a range wants to provide random access to the bytes of an 8 GiB file?)Plenty of stuff right now in Phobos isn't going to work right now if you try and have a length of 64 bits on a 32-bit system. size_t is used pretty much everywhere. And it's going to make generic code a _lot_ less pleasant if we have to worry about using anything other than size_t. It's already been causing us problems that iota does, and we've discussed making it so that it doesn't. But even if we allow ulong for length and indexing on 32-bit systems, that's a completely different ball game from permitting BigInt. So, I'd _very_ much like to always restruct length and indices to size_t for ranges. It will make all of the generic handling of that stuff far more pleasant. But we may very well be forced to permit ulong on 32-bit systems due to practical concerns such as large file handling on 32-bit systems.I think 32-bit version have to bite the bullet and try ulong/uint deduction.Oh, how I wish that 32-bit systems would just die out. It would make so many things so much easier. No such luck on that yet though, much as most PC processors are 64-bit now.Yeah, bad sadly they are practical ;) As 32bit is now dominating MCU world... -- Dmitry Olshansky
Dec 29 2012
On Saturday, 29 December 2012 at 09:31:02 UTC, Jonathan M Davis wrote:not even arrays support that, so stuff like std.mmfile can't possibly work with large files on 32-bit systemsI didn't mean mmfile, I meant random access to the bytes of a regular file. No need for mapping it to memory.
Dec 29 2012
On Saturday, 29 December 2012 at 06:12:52 UTC, Mehrdad wrote:Uhm, 64-bit? What about large files (> 4 GiB) on 32-bit systems? (Say, if a range wants to provide random access to the bytes of an 8 GiB file?) And what about large bit vectors? A 600 MiB bit vector needs > 2^32 bits for length...AFAIK, no 32-bit OS supports files larger than 4Gb anyway.
Dec 30 2012
12/30/2012 10:23 PM, SomeDude пишет:On Saturday, 29 December 2012 at 06:12:52 UTC, Mehrdad wrote:False. It's solely a question of FS used. NTFS supports files larger then 4 Gb regardless of version Windows (Win2K+ or even earlier). It doesn't matter what the bitness of OS in question is. I suspect 32bit linux also has > 4Gb files even with ext2 no problem. And e.g. FAT32 remarkably can't handle 4 Gb+ files with any OS. -- Dmitry OlshanskyUhm, 64-bit? What about large files (> 4 GiB) on 32-bit systems? (Say, if a range wants to provide random access to the bytes of an 8 GiB file?) And what about large bit vectors? A 600 MiB bit vector needs > 2^32 bits for length...AFAIK, no 32-bit OS supports files larger than 4Gb anyway.
Dec 30 2012
On Sunday, 30 December 2012 at 19:11:51 UTC, Dmitry Olshansky wrote:False. It's solely a question of FS used. NTFS supports files larger then 4 Gb regardless of version Windows (Win2K+ or even earlier). It doesn't matter what the bitness of OS in question is. I suspect 32bit linux also has > 4Gb files even with ext2 no problem. And e.g. FAT32 remarkably can't handle 4 Gb+ files with any OS.I was writing some code to go through the FAT12/16/32, and some interesting information was found (Although there was so much overhead I kinda stopped in the middle of it). FAT32 actually uses something like 28/29 bits for the sector id, the other 3-4 bits were flags like bad sectors, last sector and other minor data. At least that's what I remember off hand. http://en.wikipedia.org/wiki/Fat32
Dec 31 2012
On 31/12/2012 17:50, Era Scarecrow wrote:On Sunday, 30 December 2012 at 19:11:51 UTC, Dmitry Olshansky wrote:Sector? Do you mean cluster? I would have thought it used the whole 32 bits for cluster number, with magic values for "unused", "end of chain" and "bad". In each case you don't need to point to the next cluster as well. Unless it supports something like marking an in-use cluster as bad but leaving until another day the task of moving the data from it into a good cluster. But looking throughFalse. It's solely a question of FS used. NTFS supports files larger then 4 Gb regardless of version Windows (Win2K+ or even earlier). It doesn't matter what the bitness of OS in question is. I suspect 32bit linux also has > 4Gb files even with ext2 no problem. And e.g. FAT32 remarkably can't handle 4 Gb+ files with any OS.I was writing some code to go through the FAT12/16/32, and some interesting information was found (Although there was so much overhead I kinda stopped in the middle of it). FAT32 actually uses something like 28/29 bits for the sector id, the other 3-4 bits were flags like bad sectors, last sector and other minor data. At least that's what I remember off hand.http://en.wikipedia.org/wiki/Fat32there are indeed a handful of magic values for things like this. Anyway, that states that it uses 28 bits for the cluster number, but nothing about what the other 4 bits are for. But a possibility I can see is that these 4 bits were reserved for bit flags that may be added in the future. Stewart.
Dec 31 2012
On Monday, 31 December 2012 at 18:55:04 UTC, Stewart Gordon wrote:Sector? Do you mean cluster?Probably. They mean the same thing to me.I would have thought it used the whole 32 bits for cluster number, with magic values for "unused", "end of chain" and "bad". In each case you don't need to point to the next cluster as well. Unless it supports something like marking an in-use cluster as bad but leaving until another day the task of moving the data from it into a good cluster.Wouldn't have been practical. Historically the FAT table was a layout of all the clusters that pointed to the next cluster, and used the largest number to denote EOF. (there were 8 codes or so reserved, I don't remember exactly). Taking the math if you were to lay it all out via FAT32 using the same scheme, you'd end up with 2^34 bytes for a single table. FAT by default had 2 tables (a backup) meaning 2^35 would be needed, that is just overhead (assuming you needed it all). Back then you had 8Gig drives at the most and space being sparse, 28bits makes more sense (1-2Gigs vs 16gigs-32gigs). Of course obviously it wouldn't make the table(s) bigger if the drive didn't support above X clusters.But looking throughI don't remember where I read it, but I was certain they were used for overhead/flags; Course I was also reading up on Long File Name (LFN) too and directory structures. Likely lost somewhere in those texts. FAT32 would/could have supported files over 4Gig, however the Folder/FS Max filesize was 32bit, and anything over would have likely been more complex to incorporate, along with programs depending on them all being 32bit. Guess it was easier to make the hard limit rather than make it extend to further sizes. Plus programmers are going to be lazy and prefer int whenever possible. Hmmm actually back then long long's weren't supported (except maybe by gcc), so I don't think that was much an option.http://en.wikipedia.org/wiki/Fat32there are indeed a handful of magic values for things like this. Anyway, that states that it uses 28 bits for the cluster number, but nothing about what the other 4 bits are for. But a possibility I can see is that these 4 bits were reserved for bit flags that may be added in the future.
Dec 31 2012
On Sunday, December 30, 2012 19:23:13 SomeDude wrote:On Saturday, 29 December 2012 at 06:12:52 UTC, Mehrdad wrote:Um. No. There's a reason that things like the macro _LARGE_FILE_API exist. While some file systems won't allow larger file sizes, that's generally an FS limitation, not an OS limitation. The OSes provide ways to operate on 64-bit files on 32-bit systems. If they didn't, it would be impossible to do something like read a Blu-ray on a 32-bit system. http://en.wikipedia.org/wiki/Large_file_support - Jonathan M DavisUhm, 64-bit? What about large files (> 4 GiB) on 32-bit systems? (Say, if a range wants to provide random access to the bytes of an 8 GiB file?) And what about large bit vectors? A 600 MiB bit vector needs > 2^32 bits for length...AFAIK, no 32-bit OS supports files larger than 4Gb anyway.
Dec 30 2012
On Sunday, 30 December 2012 at 22:08:59 UTC, Jonathan M Davis wrote:On Sunday, December 30, 2012 19:23:13 SomeDude wrote:That's true, I stand corrected.AFAIK, no 32-bit OS supports files larger than 4Gb anyway.Um. No. There's a reason that things like the macro _LARGE_FILE_API exist. While some file systems won't allow larger file sizes, that's generally an FS limitation, not an OS limitation. The OSes provide ways to operate on 64-bit files on 32-bit systems. If they didn't, it would be impossible to do something like read a Blu-ray on a 32-bit system. http://en.wikipedia.org/wiki/Large_file_support - Jonathan M Davis
Jan 02 2013
Am Sun, 30 Dec 2012 19:23:13 +0100 schrieb "SomeDude" <lovelydear mailmetrash.com>:On Saturday, 29 December 2012 at 06:12:52 UTC, Mehrdad wrote: AFAIK, no 32-bit OS supports files larger than 4Gb anyway.Anachronism!!! 4.7 GB DVD burners: ~1997 64-bit personal computers: 2003 (Also backups of large partitions into disk images were made) That said, my father owns a stand-alone, multi-media, USB disk drive, that cannot store files > 4GB because of the used FAT32. -- Marco
Dec 31 2012
On 29/12/2012 01:18, Peter Alexander wrote:There was a discussion a while ago about the type returned by .length for ranges. I believe the conclusion was that it should always be size_t. My question now is what to do with really large ranges? For example, if we have a range of permutations of the elements in a set, when the set is larger than 21 the number of permutations is 21 factorial, which is over 2^64. As far as I see, there are three options:They look like four to me.1. Allow ranges to return BigInt for length. 2. Allow ranges like this but assert when .length overflows. 3. Allow ranges like this and just allow .length to overflow dangerously. 4. Do not allow ranges like this.<snip> 5. Don't allow such ranges to define a .length property. From the std.range documentation: template hasLength(R) Returns true if R has a length member that returns an integral type. R does not have to be a range. Note that length is an optional primitive as no range must implement it. Some ranges do not store their length explicitly, some cannot compute it without actually exhausting the range (e.g. socket streams), and some other ranges may be infinite. There you go. For reasons such as this, ranges aren't required to define a length property. As well as I/O ranges and infinite ranges, ranges where the length is liable to be greater than ulong.max are among those that naturally won't have this property. If we define a length that may overflow, then sooner or later some code will be called on it that calls hasLength on the range, finds it to be true, and then relies on length and malfunctions because the value returned doesn't match the actual length. Stewart.
Dec 29 2012
On Saturday, 29 December 2012 at 20:59:56 UTC, Stewart Gordon wrote:On 29/12/2012 01:18, Peter Alexander wrote:It's also three, for large values of three :-)My question now is what to do with really large ranges? For example, if we have a range of permutations of the elements in a set, when the set is larger than 21 the number of permutations is 21 factorial, which is over 2^64. As far as I see, there are three options:They look like four to me.If we define a length that may overflow, then sooner or later some code will be called on it that calls hasLength on the range, finds it to be true, and then relies on length and malfunctions because the value returned doesn't match the actual length.Using this logic, chain shouldn't have length either (it can overflow, as I demonstrated in the original post), but it does, and it should. There are many ranges in Phobos that may overflow in length. The problem with permutations is that, for the kinds of permutations you are likely to use often, length will not overflow. It would be really useful to have length available when this is the case.
Dec 29 2012
On 29/12/2012 21:13, Peter Alexander wrote:On Saturday, 29 December 2012 at 20:59:56 UTC, Stewart Gordon wrote:<snip>Yes, you have a point there. Any facility for creating a range by combining ranges is liable to overflow if it defines a length property. But I would expect that in practice 99% of finite ranges would be nowhere near 2^64 in size, so an overflow would be an exceptional case.If we define a length that may overflow, then sooner or later some code will be called on it that calls hasLength on the range, finds it to be true, and then relies on length and malfunctions because the value returned doesn't match the actual length.Using this logic, chain shouldn't have length either (it can overflow, as I demonstrated in the original post), but it does, and it should. There are many ranges in Phobos that may overflow in length.The problem with permutations is that, for the kinds of permutations you are likely to use often, length will not overflow. It would be really useful to have length available when this is the case.Maybe the solution is to define two permutation range classes: one with length that asserts that the number of elements <= 20, and one without length that places no bound. If the set size is a template parameter rather than being set at run time, then it can define length conditionally. Stewart.
Dec 30 2012
On Saturday, 29 December 2012 at 20:59:56 UTC, Stewart Gordon wrote:5. Don't allow such ranges to define a .length property.I still think there is a fundamental difference between: 5a: Don't allow such ranges to define a .length property. and 5b: Don't guarantee *support* for such ranges. The current status quo in phobos is "5b" BTW. There are a few select algorithms that do support it, but only because the implementers decided to go the extra mile, and it usually makes things more complicated than it needs to be. BTW iota, in certain cases, will have a length returned in ulong. This is used to test *certain* algorithms, but really, nothing is guaranteed. At the very least, there was a discussion some time ago that for a (finite) RA range, range[range.length] must compile. Ditto for slicing. This was never enforced though... Which is a shame (IMO)... ...Well, I was the one to suggest that actually, but I never did it, so blame falls on me :/ I'll put it back on my todo list.
Dec 29 2012
On Saturday, December 29, 2012 22:18:53 monarch_dodra wrote:At the very least, there was a discussion some time ago that for a (finite) RA range, range[range.length] must compile. Ditto for slicing. This was never enforced though... Which is a shame (IMO)... ...Well, I was the one to suggest that actually, but I never did it, so blame falls on me :/ I'll put it back on my todo list.It's now enforced that finite random access ranges have length, so we're part of the way there. http://d.puremagic.com/issues/show_bug.cgi?id=7177 If that gets implemented, then we can require that random access ranges and ranges with slicing fully support $. We'd probably still need to check range[range.length] though, just to check that length and indexing are properly compatible. - Jonathan M Davis
Dec 29 2012
On Saturday, 29 December 2012 at 01:18:37 UTC, Peter Alexander wrote:1. Allow ranges to return BigInt for length. 2. Allow ranges like this but assert when .length overflows. 3. Allow ranges like this and just allow .length to overflow dangerously. 4. Do not allow ranges like this.Option 5: Incorporate (emulation code?) for cent/ucent, then consider that the same as 1 but returning cent rather than BitInt.
Dec 31 2012
On Monday, 31 December 2012 at 17:43:34 UTC, Era Scarecrow wrote:On Saturday, 29 December 2012 at 01:18:37 UTC, Peter Alexander wrote:Two problems with this: 1. Still complicates range code (have to accomodate for both size_t and ucent return value for .length). 2. ucent probably isn't enough either for these sorts of ranges.1. Allow ranges to return BigInt for length. 2. Allow ranges like this but assert when .length overflows. 3. Allow ranges like this and just allow .length to overflow dangerously. 4. Do not allow ranges like this.Option 5: Incorporate (emulation code?) for cent/ucent, then consider that the same as 1 but returning cent rather than BitInt.
Dec 31 2012
On Monday, 31 December 2012 at 17:47:36 UTC, Peter Alexander wrote:On Monday, 31 December 2012 at 17:43:34 UTC, Era Scarecrow wrote:Perhaps, but having cent/ucent available would make a few people happy at least, although except for a few cases with databases and encryption it may not be that important right now. I remember trying to learn C/C++ and trying to do a very very simple code to get the number 1 doubled something like 100 times. After 31 it went to garbage (hey I was 14 at the time, god that was a long time ago). It took me a while to understand the problem. I ended up re-writing the whole thing in assembly that took 101 bytes (COM file) and could double the number as many times as I wanted.On Saturday, 29 December 2012 at 01:18:37 UTC, Peter Alexander wrote:Two problems with this: 1. Still complicates range code (have to accomodate for both size_t and ucent return value for .length). 2. ucent probably isn't enough either for these sorts of ranges.1. Allow ranges to return BigInt for length. 2. Allow ranges like this but assert when .length overflows. 3. Allow ranges like this and just allow .length to overflow dangerously. 4. Do not allow ranges like this.Option 5: Incorporate (emulation code?) for cent/ucent, then consider that the same as 1 but returning cent rather than BitInt.
Dec 31 2012
On Monday, 31 December 2012 at 17:56:37 UTC, Era Scarecrow wrote:Perhaps, but having cent/ucent available would make a few people happy at leastNot to change the subject, but couldn't we just implement Cent and UCent as library types, such as Complex, or HalfFloat? I'd be tempted to open an ER and see if walter is up for it...
Dec 31 2012
On Monday, 31 December 2012 at 18:18:08 UTC, monarch_dodra wrote:On Monday, 31 December 2012 at 17:56:37 UTC, Era Scarecrow wrote:Yes we could. Just that they are already defined built-in types according to the specs. *shrugs* But changing your code from Cent to cent after it's properly implemented seems like a minimal change. If walter says go ahead (and I'm sure he won't have a problem with it), then I guess it's up to whoever to make it.Perhaps, but having cent/ucent available would make a few people happy at leastNot to change the subject, but couldn't we just implement Cent and UCent as library types, such as Complex, or HalfFloat? I'd be tempted to open an ER and see if walter is up for it...
Dec 31 2012