digitalmars.D - A case for opImplicitCast: making string search work better

downs (13/13) May 15 2009 Consider this type:

Steven Schveighoffer (8/24) May 15 2009 No, I want the length of the string if it is not found, not -1.
grauzone (26/42) May 15 2009 Could work, but it looks overcomplicated. It could be intuitive, but

Steven Schveighoffer (10/53) May 15 2009 Your solution actually goes the opposite direction than I'd like. That ...

grauzone (24/28) May 15 2009 All what you can do with the index is

Steven Schveighoffer (30/54) May 15 2009 I hadn't thought of the case where you are calling *on* a temporary, I

grauzone (14/25) May 15 2009 The whole point of the search function is to make programming easier,

Christopher Wright (2/19) May 15 2009 Just use two functions: find and contains.

bearophile (6/7) May 15 2009 Or better, define a built in operator, you may call it "in" :-)

downs <default_357-line yahoo.de> writes:

Consider this type:

struct StringPosition {
  size_t pos;
  void opImplicitCast(out size_t sz) {
    sz = pos;
  }
  void opImplicitCast(out bool b) {
    b = pos != -1;
  }
}

Wouldn't that effectively sidestep most problems people have with find
returning -1?

Or am I missing something?

Of course, this would require a way to resolve ambiguities, i.e.
functions/statements with preferences - for instance, if() would "prefer" bool
over int. I don't know if this is possible.

May 15 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 15 May 2009 06:07:10 -0400, downs <default_357-line yahoo.de>  
wrote:

 Consider this type:

 struct StringPosition {
   size_t pos;
   void opImplicitCast(out size_t sz) {
     sz = pos;
   }
   void opImplicitCast(out bool b) {
     b = pos != -1;
   }
 }

 Wouldn't that effectively sidestep most problems people have with find  
 returning -1?

 Or am I missing something?

 Of course, this would require a way to resolve ambiguities, i.e.  
 functions/statements with preferences - for instance, if() would  
 "prefer" bool over int. I don't know if this is possible.

No, I want the length of the string if it is not found, not -1.

It's not a question of -1 vs. false, it's a question of usability.  -1 can  
be tested as well as string.length, but -1 cannot be seamlessly forwarded  
to slicing operations.  Most of the time, you want to USE the index  
returned, not just check if it is valid.

-Steve

May 15 2009

grauzone <none example.net> writes:

downs wrote:
 Consider this type:
 
 struct StringPosition {
   size_t pos;
   void opImplicitCast(out size_t sz) {
     sz = pos;
   }
   void opImplicitCast(out bool b) {
     b = pos != -1;
   }
 }
 
 Wouldn't that effectively sidestep most problems people have with find
returning -1?
 
 Or am I missing something?

Could work, but it looks overcomplicated. It could be intuitive, but 
even then someone new would not be able to figure out what is actually 
going on, without digging deep into the internals of the library (or the 
D language).

I like my way better (returning two slices for search). Also, it 
wouldn't require this:

 Of course, this would require a way to resolve ambiguities, i.e.
functions/statements with preferences - for instance, if() would "prefer" bool
over int. I don't know if this is possible.

...and with my way, it's very simple to check if the search was successful.

e.g.

void myfind(char[] text, char[] search_for, out char[] before, char[] 
after);

char[] before, after;
myfind(text, something, before, after);

//was it found?
bool was_found = !!after.length;
//where was it found?
int at = before.length;

Both operations are frequently needed and don't require you to reference 
text or something again, which means they can be returned by other 
functions, and you don't need to break the "flow" by putting them into 
temporary variables.

With multiple return values, the signature of myfind() could become 
nicer, too:

auto before, after = myfind(text, something);

(Or at least allow static arrays as return values for functions.)

Am _I_ missing something?

May 15 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 15 May 2009 09:36:51 -0400, grauzone <none example.net> wrote:

 downs wrote:
 Consider this type:
  struct StringPosition {
   size_t pos;
   void opImplicitCast(out size_t sz) {
     sz = pos;
   }
   void opImplicitCast(out bool b) {
     b = pos != -1;
   }
 }
  Wouldn't that effectively sidestep most problems people have with find  
 returning -1?
  Or am I missing something?

 Could work, but it looks overcomplicated. It could be intuitive, but  
 even then someone new would not be able to figure out what is actually  
 going on, without digging deep into the internals of the library (or the  
 D language).

 I like my way better (returning two slices for search). Also, it  
 wouldn't require this:

 Of course, this would require a way to resolve ambiguities, i.e.  
 functions/statements with preferences - for instance, if() would  
 "prefer" bool over int. I don't know if this is possible.

 ...and with my way, it's very simple to check if the search was  
 successful.

 e.g.

 void myfind(char[] text, char[] search_for, out char[] before, char[]  
 after);

 char[] before, after;
 myfind(text, something, before, after);

 //was it found?
 bool was_found = !!after.length;
 //where was it found?
 int at = before.length;

 Both operations are frequently needed and don't require you to reference  
 text or something again, which means they can be returned by other  
 functions, and you don't need to break the "flow" by putting them into  
 temporary variables.

 With multiple return values, the signature of myfind() could become  
 nicer, too:

 auto before, after = myfind(text, something);

 (Or at least allow static arrays as return values for functions.)

 Am _I_ missing something?

Your solution actually goes the opposite direction than I'd like.  That  
is, it looks more complicated than simply returning an index or a slice.   
I don't want to have to declare return values ahead of time and I'm not  
holding my breath for multiple return values.  You may be able to return a  
pair struct, but still, what could be simpler than returning an index?   
It's easy to construct the value you want (before or after), and if you  
both multiple values, that is also possible (and probably results in  
simpler code).

-Steve

May 15 2009

grauzone <none example.net> writes:

 to return a pair struct, but still, what could be simpler than returning 
 an index?  It's easy to construct the value you want (before or after), 
 and if you both multiple values, that is also possible (and probably 
 results in simpler code).

All what you can do with the index is
1. compare it against the length of the searched string to test if the 
search was successful
2. slice the searched string
3. do something rather special

What else would you do? You'd just have to store the searched string as 
a temporary, and then you'd slice the searched string (for 2.), or 
compare it against the length of the searched string. You always have to 
keep the searched string in a temporary. That's rather unpractical. Oh 
sure, if you _really_ need the index (for 3.), then directly returning 
an index is of course the best way.

With my approach, you don't need to grab the passed searched string 
again. All of these can be done in a single, trivial expression (for 3. 
getting the index only). Actually, compared to your approach, this would 
just eliminate the trivial but annoying slicing code after the search 
call, that'd you'd type in... what, 90% of all cases?

The thing about multiple return values is true (sadly), but in this 
case, you could simply return a static array (char[][2]). At least that 
should be possible in D2 at some point.

Maybe a struct would work fine too. But I don't like it, because the 
programmer had to look up the struct members first. He had to memorize 
the struct members, and couldn't tell what the function returns just by 
looking at the function signature.

(Yay bikeshed issues.)

May 15 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 15 May 2009 10:30:17 -0400, grauzone <none example.net> wrote:

 to return a pair struct, but still, what could be simpler than  
 returning an index?  It's easy to construct the value you want (before  
 or after), and if you both multiple values, that is also possible (and  
 probably results in simpler code).

 All what you can do with the index is
 1. compare it against the length of the searched string to test if the  
 search was successful
 2. slice the searched string
 3. do something rather special

 What else would you do? You'd just have to store the searched string as  
 a temporary, and then you'd slice the searched string (for 2.), or  
 compare it against the length of the searched string. You always have to  
 keep the searched string in a temporary. That's rather unpractical. Oh  
 sure, if you _really_ need the index (for 3.), then directly returning  
 an index is of course the best way.

 With my approach, you don't need to grab the passed searched string  
 again. All of these can be done in a single, trivial expression (for 3.  
 getting the index only). Actually, compared to your approach, this would  
 just eliminate the trivial but annoying slicing code after the search  
 call, that'd you'd type in... what, 90% of all cases?

I hadn't thought of the case where you are calling *on* a temporary, I  
always had in mind that the source string was already declared, this is a  
good point.  The only drawback in this case is you are constructing  
information you sometimes do not need or care about.  If all you want is  
whether it succeeded or not, then you don't need two ranges constructed  
and returned.  But therein lies a fundamental tradeoff that cannot be  
avoided.  The very basic information you get is the index, and with that,  
you can construct any larger pieces from the pieces you have, but not  
always easily, and not without repeating identifiers.

I like your approach, but with the single return type, not out  
parameters.  Having out parameters would be a deal breaker.

I'd prefer not to have two strings but a string that has an identified  
pivot point.  You could generate the desired left and right hand sides  
dynamically, and it would work without any changes to the current syntax.

for example:

struct partition(R)
{
    R range;
    uint pivot;

    R lhs() {return range[0..pivot];}
    R rhs() {return range[pivot..$];}
    bool found() {return pivot < range.length;}
}

partition!string indexOf(string haystack, dchar needle);

usage:

string s = str.find("hi").rhs; // or .lhs or .found or .pivot

 Maybe a struct would work fine too. But I don't like it, because the  
 programmer had to look up the struct members first. He had to memorize  
 the struct members, and couldn't tell what the function returns just by  
 looking at the function signature.

If this were implemented, the return type would be very common.  At some  
point you have to look up everything (what's a "range"?).

-Steve

May 15 2009

grauzone <none example.net> writes:

 a good point.  The only drawback in this case is you are constructing 
 information you sometimes do not need or care about.  If all you want is 
 whether it succeeded or not, then you don't need two ranges constructed 
 and returned.  But therein lies a fundamental tradeoff that cannot be 
 avoided.  The very basic information you get is the index, and with 
 that, you can construct any larger pieces from the pieces you have, but 
 not always easily, and not without repeating identifiers.

The whole point of the search function is to make programming easier, 
isn't it? Its implementation is rather trivial. You call it because it 
makes your life easier. I don't see why constructing this "additional 
information" is a problem.

Anyway, you always could move this to a second function. I just think 
that returning a tuple of slices is the most useful way.

 I like your approach, but with the single return type, not out 
 parameters.  Having out parameters would be a deal breaker.

I just wanted to show something, that works on D1 without memory 
allocation. And without returning a struct.

 If this were implemented, the return type would be very common.  At some 
 point you have to look up everything (what's a "range"?).

I think multiple return values are simpler, and more versatile, elegant 
and intuitive. I contrast, having to define structs for return values of 
(almost) trivial functions is not a good sign. You could as well pass 
all in-parameters of a function as struct, claiming this is more 
practical, because then you can have named arguments and arbitrary 
default arguments. Huh.

May 15 2009

Christopher Wright <dhasenan gmail.com> writes:

downs wrote:
 Consider this type:
 
 struct StringPosition {
   size_t pos;
   void opImplicitCast(out size_t sz) {
     sz = pos;
   }
   void opImplicitCast(out bool b) {
     b = pos != -1;
   }
 }
 
 Wouldn't that effectively sidestep most problems people have with find
returning -1?
 
 Or am I missing something?
 
 Of course, this would require a way to resolve ambiguities, i.e.
functions/statements with preferences - for instance, if() would "prefer" bool
over int. I don't know if this is possible.

Just use two functions: find and contains.

May 15 2009

bearophile <bearophileHUGS lycos.com> writes:

Christopher Wright:
 Just use two functions: find and contains.

Or better, define a built in operator, you may call it "in" :-)

'e' in "hello" => true
(The compiler may even cache the resulting position somewhere, so a successive
find can be very fast).

Bye,
bearophile

May 15 2009

D Programming

C/C++ Programming

Other

digitalmars.D - A case for opImplicitCast: making string search work better