www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Output range with custom string type

reply Jacob Carlborg <doob me.com> writes:
I'm working on some code that sanitizes and converts values of different 
types to strings. I thought it would be a good idea to wrap the 
sanitized string in a struct to have some type safety. Ideally it should 
not be possible to create this type without going through the sanitizing 
functions.

The problem I have is that I would like these functions to push up the 
allocation decision to the caller. Internally these functions use 
formattedWrite. I thought the natural design would be that the sanitize 
functions take an output range and pass that to formattedWrite.

Here's a really simple example:

import std.stdio : writeln;

struct Range
{
     void put(char c)
     {
         writeln(c);
     }
}

void sanitize(OutputRange)(string value, OutputRange range)
{
     import std.format : formattedWrite;
     range.formattedWrite!"'%s'"(value);
}

void main()
{
     Range range;
     sanitize("foo", range);
}

The problem now is that the data is passed one char at the time to the 
range. Meaning that if the user implements a custom output range, the 
user is in full control of the data. It will now be very easy for the 
user to make a mistake or manipulate the data on purpose. Making the 
whole idea of the sanitized type pointless.

Any suggestions how to fix this or a better idea?

-- 
/Jacob Carlborg
Aug 28 2017
next sibling parent reply Moritz Maxeiner <moritz ucworks.org> writes:
On Monday, 28 August 2017 at 14:27:19 UTC, Jacob Carlborg wrote:
 I'm working on some code that sanitizes and converts values of 
 different types to strings. I thought it would be a good idea 
 to wrap the sanitized string in a struct to have some type 
 safety. Ideally it should not be possible to create this type 
 without going through the sanitizing functions.

 The problem I have is that I would like these functions to push 
 up the allocation decision to the caller. Internally these 
 functions use formattedWrite. I thought the natural design 
 would be that the sanitize functions take an output range and 
 pass that to formattedWrite.

 [...]

 Any suggestions how to fix this or a better idea?
If you want the caller to be just in charge of allocation, that's what std.experimental.allocator provides. In this case, I would polish up the old "format once to get the length, allocate, format second time into allocated buffer" method used with snprintf for D: --- test.d --- import std.stdio; import std.experimental.allocator; struct CountingOutputRange { private: size_t _count; public: size_t count() { return _count; } void put(char c) { _count++; } } char[] sanitize(string value, IAllocator alloc) { import std.format : formattedWrite, sformat; CountingOutputRange r; (&r).formattedWrite!"'%s'"(value); // do not copy the range auto s = alloc.makeArray!char(r.count); scope (failure) alloc.dispose(s); // This should only throw if the user provided allocator returned less // memory than was requested return s.sformat!"'%s'"(value); } void main() { auto s = sanitize("foo", theAllocator); scope (exit) theAllocator.dispose(s); writeln(s); } --------------
Aug 28 2017
parent reply Jacob Carlborg <doob me.com> writes:
On 2017-08-28 23:45, Moritz Maxeiner wrote:

 If you want the caller to be just in charge of allocation, that's what 
 std.experimental.allocator provides. In this case, I would polish up the 
 old "format once to get the length, allocate, format second time into 
 allocated buffer" method used with snprintf for D:

 --- test.d ---
 import std.stdio;
 import std.experimental.allocator;
 
 struct CountingOutputRange
 {
 private:
      size_t _count;
 public:
      size_t count() { return _count; }
      void put(char c) { _count++; }
 }
 
 char[] sanitize(string value, IAllocator alloc)
 {
      import std.format : formattedWrite, sformat;
 
      CountingOutputRange r;
      (&r).formattedWrite!"'%s'"(value); // do not copy the range
 
      auto s = alloc.makeArray!char(r.count);
      scope (failure) alloc.dispose(s);
 
          // This should only throw if the user provided allocator 
 returned less
          // memory than was requested
      return s.sformat!"'%s'"(value);
 }
 
 void main()
 {
      auto s = sanitize("foo", theAllocator);
      scope (exit) theAllocator.dispose(s);
      writeln(s);
 }
 --------------
I guess that would work. But if I keep the range internal, can't I just do the allocation inside the range and only use "formattedWrite"? Instead of using both formattedWrite and sformat and go through the data twice. Then of course the final size is not known before allocating. -- /Jacob Carlborg
Aug 29 2017
parent reply Moritz Maxeiner <moritz ucworks.org> writes:
On Tuesday, 29 August 2017 at 09:59:30 UTC, Jacob Carlborg wrote:
 [...]

 But if I keep the range internal, can't I just do the 
 allocation inside the range and only use "formattedWrite"? 
 Instead of using both formattedWrite and sformat and go through 
 the data twice. Then of course the final size is not known 
 before allocating.
Certainly, that's what dynamic arrays (aka vectors, e.g. std::vector in C++ STL) are for: --- import core.exception; import std.stdio; import std.experimental.allocator; import std.algorithm; struct PoorMansVector(T) { private: T[] store; size_t length; IAllocator alloc; public: disable this(this); this(IAllocator alloc) { this.alloc = alloc; } ~this() { if (store) { alloc.dispose(store); store = null; } } void put(T t) { if (!store) { // Allocate only once for "small" vectors store = alloc.makeArray!T(8); if (!store) onOutOfMemoryError(); } else if (length == store.length) { // Growth factor of 1.5 auto expanded = alloc.expandArray!char(store, store.length / 2); if (!expanded) onOutOfMemoryError(); } assert (length < store.length); moveEmplace(t, store[length++]); } char[] release() { auto elements = store[0..length]; store = null; return elements; } } char[] sanitize(string value, IAllocator alloc) { import std.format : formattedWrite, sformat; auto r = PoorMansVector!char(alloc); (&r).formattedWrite!"'%s'"(value); // do not copy the range return r.release(); } void main() { auto s = sanitize("foo", theAllocator); scope (exit) theAllocator.dispose(s); writeln(s); } --- Do be aware that the above vector is named "poor man's vector" for a reason, that's a hasty write down from memory and is sure to contain bugs. For better vector implementations you can use at collection libraries such as EMSI containers; my own attempt at a DbI vector container can be found here [1] [1] https://github.com/Calrama/libds/blob/6a1fc347e1f742b8f67513e25a9fdbf79f007417/src/ds/vector.d
Aug 29 2017
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2017-08-29 19:35, Moritz Maxeiner wrote:
 On Tuesday, 29 August 2017 at 09:59:30 UTC, Jacob Carlborg wrote:
 [...]

 But if I keep the range internal, can't I just do the allocation 
 inside the range and only use "formattedWrite"? Instead of using both 
 formattedWrite and sformat and go through the data twice. Then of 
 course the final size is not known before allocating.
Certainly, that's what dynamic arrays (aka vectors, e.g. std::vector in C++ STL) are for: --- import core.exception; import std.stdio; import std.experimental.allocator; import std.algorithm; struct PoorMansVector(T) { private:     T[]        store;     size_t     length;     IAllocator alloc; public:      disable this(this);     this(IAllocator alloc)     {         this.alloc = alloc;     }     ~this()     {         if (store)         {             alloc.dispose(store);             store = null;         }     }     void put(T t)     {         if (!store)         {             // Allocate only once for "small" vectors             store = alloc.makeArray!T(8);             if (!store) onOutOfMemoryError();         }         else if (length == store.length)         {             // Growth factor of 1.5             auto expanded = alloc.expandArray!char(store, store.length / 2);             if (!expanded) onOutOfMemoryError();         }         assert (length < store.length);         moveEmplace(t, store[length++]);     }     char[] release()     {         auto elements = store[0..length];         store = null;         return elements;     } } char[] sanitize(string value, IAllocator alloc) {     import std.format : formattedWrite, sformat;     auto r = PoorMansVector!char(alloc);     (&r).formattedWrite!"'%s'"(value); // do not copy the range     return r.release(); } void main() {     auto s = sanitize("foo", theAllocator);     scope (exit) theAllocator.dispose(s);     writeln(s); } --- Do be aware that the above vector is named "poor man's vector" for a reason, that's a hasty write down from memory and is sure to contain bugs. For better vector implementations you can use at collection libraries such as EMSI containers; my own attempt at a DbI vector container can be found here [1] [1] https://github.com/Calrama/libds/blob/6a1fc347e1f742b8f67513e25a9fdbf79f00 417/src/ds/vector.d
Thanks. -- /Jacob Carlborg
Aug 29 2017
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2017-08-29 19:35, Moritz Maxeiner wrote:

      void put(T t)
      {
          if (!store)
          {
              // Allocate only once for "small" vectors
              store = alloc.makeArray!T(8);
              if (!store) onOutOfMemoryError();
          }
          else if (length == store.length)
          {
              // Growth factor of 1.5
              auto expanded = alloc.expandArray!char(store,
store.length 
 / 2);
              if (!expanded) onOutOfMemoryError();
          }
          assert (length < store.length);
          moveEmplace(t, store[length++]);
      }
What's the reason to use "moveEmplace" instead of just assigning to the array: "store[length++] = t" ? -- /Jacob Carlborg
Aug 31 2017
parent Moritz Maxeiner <moritz ucworks.org> writes:
On Thursday, 31 August 2017 at 07:06:26 UTC, Jacob Carlborg wrote:
 On 2017-08-29 19:35, Moritz Maxeiner wrote:

      void put(T t)
      {
          if (!store)
          {
              // Allocate only once for "small" vectors
              store = alloc.makeArray!T(8);
              if (!store) onOutOfMemoryError();
          }
          else if (length == store.length)
          {
              // Growth factor of 1.5
              auto expanded = alloc.expandArray!char(store, 
 store.length / 2);
              if (!expanded) onOutOfMemoryError();
          }
          assert (length < store.length);
          moveEmplace(t, store[length++]);
      }
What's the reason to use "moveEmplace" instead of just assigning to the array: "store[length++] = t" ?
The `move` part is to support non-copyable types (i.e. T with ` disable this(this)`), such as another owning container (assigning would generally try to create a copy). The `emplace` part is because the destination `store[length]` has been default initialized either by makeArray or expandArray and it doesn't need to be destroyed (a pure move would destroy `store[length]` if T has a destructor).
Aug 31 2017
prev sibling parent Cecil Ward <d cecilward.com> writes:
On Monday, 28 August 2017 at 14:27:19 UTC, Jacob Carlborg wrote:
 I'm working on some code that sanitizes and converts values of 
 different types to strings. I thought it would be a good idea 
 to wrap the sanitized string in a struct to have some type 
 safety. Ideally it should not be possible to create this type 
 without going through the sanitizing functions.

 The problem I have is that I would like these functions to push 
 up the allocation decision to the caller. Internally these 
 functions use formattedWrite. I thought the natural design 
 would be that the sanitize functions take an output range and 
 pass that to formattedWrite.

 Here's a really simple example:

 import std.stdio : writeln;

 struct Range
 {
     void put(char c)
     {
         writeln(c);
     }
 }

 void sanitize(OutputRange)(string value, OutputRange range)
 {
     import std.format : formattedWrite;
     range.formattedWrite!"'%s'"(value);
 }

 void main()
 {
     Range range;
     sanitize("foo", range);
 }

 The problem now is that the data is passed one char at the time 
 to the range. Meaning that if the user implements a custom 
 output range, the user is in full control of the data. It will 
 now be very easy for the user to make a mistake or manipulate 
 the data on purpose. Making the whole idea of the sanitized 
 type pointless.

 Any suggestions how to fix this or a better idea?
Q is it an option to let the caller provide all the storage in an oversized fixed-length buffer? You could add a second helper function to compute and return a suitable safely pessimistic ott max value for the length reqd which could be called once beforehand to establish the reqd buffer size (or check it). This is the technique I am using right now. My sizing function is ridiculously fast as I am lucky in the particular use-case.
Aug 28 2017