digitalmars.D.learn - Output range with custom string type
- Jacob Carlborg (36/36) Aug 28 2017 I'm working on some code that sanitizes and converts values of different...
- Moritz Maxeiner (36/48) Aug 28 2017 If you want the caller to be just in charge of allocation, that's
- Jacob Carlborg (8/48) Aug 29 2017 I guess that would work.
- Moritz Maxeiner (75/81) Aug 29 2017 Certainly, that's what dynamic arrays (aka vectors, e.g.
- Jacob Carlborg (4/94) Aug 29 2017 Thanks.
- Jacob Carlborg (5/23) Aug 31 2017 What's the reason to use "moveEmplace" instead of just assigning to the
- Moritz Maxeiner (8/29) Aug 31 2017 The `move` part is to support non-copyable types (i.e. T with
- Cecil Ward (8/44) Aug 28 2017 Q is it an option to let the caller provide all the storage in an
I'm working on some code that sanitizes and converts values of different types to strings. I thought it would be a good idea to wrap the sanitized string in a struct to have some type safety. Ideally it should not be possible to create this type without going through the sanitizing functions. The problem I have is that I would like these functions to push up the allocation decision to the caller. Internally these functions use formattedWrite. I thought the natural design would be that the sanitize functions take an output range and pass that to formattedWrite. Here's a really simple example: import std.stdio : writeln; struct Range { void put(char c) { writeln(c); } } void sanitize(OutputRange)(string value, OutputRange range) { import std.format : formattedWrite; range.formattedWrite!"'%s'"(value); } void main() { Range range; sanitize("foo", range); } The problem now is that the data is passed one char at the time to the range. Meaning that if the user implements a custom output range, the user is in full control of the data. It will now be very easy for the user to make a mistake or manipulate the data on purpose. Making the whole idea of the sanitized type pointless. Any suggestions how to fix this or a better idea? -- /Jacob Carlborg
Aug 28 2017
On Monday, 28 August 2017 at 14:27:19 UTC, Jacob Carlborg wrote:I'm working on some code that sanitizes and converts values of different types to strings. I thought it would be a good idea to wrap the sanitized string in a struct to have some type safety. Ideally it should not be possible to create this type without going through the sanitizing functions. The problem I have is that I would like these functions to push up the allocation decision to the caller. Internally these functions use formattedWrite. I thought the natural design would be that the sanitize functions take an output range and pass that to formattedWrite. [...] Any suggestions how to fix this or a better idea?If you want the caller to be just in charge of allocation, that's what std.experimental.allocator provides. In this case, I would polish up the old "format once to get the length, allocate, format second time into allocated buffer" method used with snprintf for D: --- test.d --- import std.stdio; import std.experimental.allocator; struct CountingOutputRange { private: size_t _count; public: size_t count() { return _count; } void put(char c) { _count++; } } char[] sanitize(string value, IAllocator alloc) { import std.format : formattedWrite, sformat; CountingOutputRange r; (&r).formattedWrite!"'%s'"(value); // do not copy the range auto s = alloc.makeArray!char(r.count); scope (failure) alloc.dispose(s); // This should only throw if the user provided allocator returned less // memory than was requested return s.sformat!"'%s'"(value); } void main() { auto s = sanitize("foo", theAllocator); scope (exit) theAllocator.dispose(s); writeln(s); } --------------
Aug 28 2017
On 2017-08-28 23:45, Moritz Maxeiner wrote:If you want the caller to be just in charge of allocation, that's what std.experimental.allocator provides. In this case, I would polish up the old "format once to get the length, allocate, format second time into allocated buffer" method used with snprintf for D: --- test.d --- import std.stdio; import std.experimental.allocator; struct CountingOutputRange { private: size_t _count; public: size_t count() { return _count; } void put(char c) { _count++; } } char[] sanitize(string value, IAllocator alloc) { import std.format : formattedWrite, sformat; CountingOutputRange r; (&r).formattedWrite!"'%s'"(value); // do not copy the range auto s = alloc.makeArray!char(r.count); scope (failure) alloc.dispose(s); // This should only throw if the user provided allocator returned less // memory than was requested return s.sformat!"'%s'"(value); } void main() { auto s = sanitize("foo", theAllocator); scope (exit) theAllocator.dispose(s); writeln(s); } --------------I guess that would work. But if I keep the range internal, can't I just do the allocation inside the range and only use "formattedWrite"? Instead of using both formattedWrite and sformat and go through the data twice. Then of course the final size is not known before allocating. -- /Jacob Carlborg
Aug 29 2017
On Tuesday, 29 August 2017 at 09:59:30 UTC, Jacob Carlborg wrote:[...] But if I keep the range internal, can't I just do the allocation inside the range and only use "formattedWrite"? Instead of using both formattedWrite and sformat and go through the data twice. Then of course the final size is not known before allocating.Certainly, that's what dynamic arrays (aka vectors, e.g. std::vector in C++ STL) are for: --- import core.exception; import std.stdio; import std.experimental.allocator; import std.algorithm; struct PoorMansVector(T) { private: T[] store; size_t length; IAllocator alloc; public: disable this(this); this(IAllocator alloc) { this.alloc = alloc; } ~this() { if (store) { alloc.dispose(store); store = null; } } void put(T t) { if (!store) { // Allocate only once for "small" vectors store = alloc.makeArray!T(8); if (!store) onOutOfMemoryError(); } else if (length == store.length) { // Growth factor of 1.5 auto expanded = alloc.expandArray!char(store, store.length / 2); if (!expanded) onOutOfMemoryError(); } assert (length < store.length); moveEmplace(t, store[length++]); } char[] release() { auto elements = store[0..length]; store = null; return elements; } } char[] sanitize(string value, IAllocator alloc) { import std.format : formattedWrite, sformat; auto r = PoorMansVector!char(alloc); (&r).formattedWrite!"'%s'"(value); // do not copy the range return r.release(); } void main() { auto s = sanitize("foo", theAllocator); scope (exit) theAllocator.dispose(s); writeln(s); } --- Do be aware that the above vector is named "poor man's vector" for a reason, that's a hasty write down from memory and is sure to contain bugs. For better vector implementations you can use at collection libraries such as EMSI containers; my own attempt at a DbI vector container can be found here [1] [1] https://github.com/Calrama/libds/blob/6a1fc347e1f742b8f67513e25a9fdbf79f007417/src/ds/vector.d
Aug 29 2017
On 2017-08-29 19:35, Moritz Maxeiner wrote:On Tuesday, 29 August 2017 at 09:59:30 UTC, Jacob Carlborg wrote:Thanks. -- /Jacob Carlborg[...] But if I keep the range internal, can't I just do the allocation inside the range and only use "formattedWrite"? Instead of using both formattedWrite and sformat and go through the data twice. Then of course the final size is not known before allocating.Certainly, that's what dynamic arrays (aka vectors, e.g. std::vector in C++ STL) are for: --- import core.exception; import std.stdio; import std.experimental.allocator; import std.algorithm; struct PoorMansVector(T) { private: T[] store; size_t length; IAllocator alloc; public: disable this(this); this(IAllocator alloc) { this.alloc = alloc; } ~this() { if (store) { alloc.dispose(store); store = null; } } void put(T t) { if (!store) { // Allocate only once for "small" vectors store = alloc.makeArray!T(8); if (!store) onOutOfMemoryError(); } else if (length == store.length) { // Growth factor of 1.5 auto expanded = alloc.expandArray!char(store, store.length / 2); if (!expanded) onOutOfMemoryError(); } assert (length < store.length); moveEmplace(t, store[length++]); } char[] release() { auto elements = store[0..length]; store = null; return elements; } } char[] sanitize(string value, IAllocator alloc) { import std.format : formattedWrite, sformat; auto r = PoorMansVector!char(alloc); (&r).formattedWrite!"'%s'"(value); // do not copy the range return r.release(); } void main() { auto s = sanitize("foo", theAllocator); scope (exit) theAllocator.dispose(s); writeln(s); } --- Do be aware that the above vector is named "poor man's vector" for a reason, that's a hasty write down from memory and is sure to contain bugs. For better vector implementations you can use at collection libraries such as EMSI containers; my own attempt at a DbI vector container can be found here [1] [1] https://github.com/Calrama/libds/blob/6a1fc347e1f742b8f67513e25a9fdbf79f00 417/src/ds/vector.d
Aug 29 2017
On 2017-08-29 19:35, Moritz Maxeiner wrote:void put(T t) { if (!store) { // Allocate only once for "small" vectors store = alloc.makeArray!T(8); if (!store) onOutOfMemoryError(); } else if (length == store.length) { // Growth factor of 1.5 auto expanded = alloc.expandArray!char(store, store.length / 2); if (!expanded) onOutOfMemoryError(); } assert (length < store.length); moveEmplace(t, store[length++]); }What's the reason to use "moveEmplace" instead of just assigning to the array: "store[length++] = t" ? -- /Jacob Carlborg
Aug 31 2017
On Thursday, 31 August 2017 at 07:06:26 UTC, Jacob Carlborg wrote:On 2017-08-29 19:35, Moritz Maxeiner wrote:The `move` part is to support non-copyable types (i.e. T with ` disable this(this)`), such as another owning container (assigning would generally try to create a copy). The `emplace` part is because the destination `store[length]` has been default initialized either by makeArray or expandArray and it doesn't need to be destroyed (a pure move would destroy `store[length]` if T has a destructor).void put(T t) { if (!store) { // Allocate only once for "small" vectors store = alloc.makeArray!T(8); if (!store) onOutOfMemoryError(); } else if (length == store.length) { // Growth factor of 1.5 auto expanded = alloc.expandArray!char(store, store.length / 2); if (!expanded) onOutOfMemoryError(); } assert (length < store.length); moveEmplace(t, store[length++]); }What's the reason to use "moveEmplace" instead of just assigning to the array: "store[length++] = t" ?
Aug 31 2017
On Monday, 28 August 2017 at 14:27:19 UTC, Jacob Carlborg wrote:I'm working on some code that sanitizes and converts values of different types to strings. I thought it would be a good idea to wrap the sanitized string in a struct to have some type safety. Ideally it should not be possible to create this type without going through the sanitizing functions. The problem I have is that I would like these functions to push up the allocation decision to the caller. Internally these functions use formattedWrite. I thought the natural design would be that the sanitize functions take an output range and pass that to formattedWrite. Here's a really simple example: import std.stdio : writeln; struct Range { void put(char c) { writeln(c); } } void sanitize(OutputRange)(string value, OutputRange range) { import std.format : formattedWrite; range.formattedWrite!"'%s'"(value); } void main() { Range range; sanitize("foo", range); } The problem now is that the data is passed one char at the time to the range. Meaning that if the user implements a custom output range, the user is in full control of the data. It will now be very easy for the user to make a mistake or manipulate the data on purpose. Making the whole idea of the sanitized type pointless. Any suggestions how to fix this or a better idea?Q is it an option to let the caller provide all the storage in an oversized fixed-length buffer? You could add a second helper function to compute and return a suitable safely pessimistic ott max value for the length reqd which could be called once beforehand to establish the reqd buffer size (or check it). This is the technique I am using right now. My sizing function is ridiculously fast as I am lucky in the particular use-case.
Aug 28 2017