www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Trouble with regex backreferencing

I was working around with regex trying to match certain patterns 
of repeating patterns before and after a space and I came across 
some unexpected behavior.

	writeln("ABC ABC CBA".replaceAll(regex(r"([A-Z]) ([A-Z])"), 
"D"));
	//ABDBDBA
	//Makes sense, replaces the 3 characters surrounding a space 
with a single D
	writeln("ABC ABC CBA".replaceAll(regex(r"([A-Z]) \1"), "D"));
	//ABC ABDBA
	//Same idea, but this time only if the 2 surrounding letters are 
the same
	writeln("ABC ABC CBA".replaceAll(regex(r"([A-Z]+) \1"), "D"));
	//D CBA
	//Same idea again, but this time match any amount of characters 
as long as they are in the same order
	writeln("ABCABC ABC CBA".replaceAll(regex(r"([A-Z]+) \1"), "D"));
	//ABCABC ABC CBA
	//Hold on, shouldn't this be "ABCD CBA"?
	writeln("ABC ABCABC CBA".replaceAll(regex(r"([A-Z]+) \1"), "D"));
	//DABC CBA
	//Works the other way

The problem I've come across is that the regex should be matching 
the largest portion of the subexpression that it can for both the 
first usage, but it is matching the most it can for its first 
reference without any care as to its future usage, making it only 
work if the entirety of the first word is contained at the start 
of the second, where it should work both ways.
Is there any gross hack I can do to get around this and  if this 
is for some reason intended behavior, why?
Jun 12 2017