digitalmars.D - UTF8 + SIMD = win
- deadalnix (3/3) Jul 30 2012 http://woboq.com/blog/utf-8-processing-using-simd.html
- bearophile (4/5) Jul 30 2012 So many things to do, so little time to do them :-)
- Guillaume Chatelet (2/6) Jul 30 2012 Very interesting, thx for sharing. This NG definitely is a horn of plent...
- Walter Bright (4/7) Jul 31 2012 If someone wants to fix std.utf
- bearophile (4/5) Jul 31 2012 I think in D the most needed UTF operation is UTF8 -> UTF32.
- Bernard Helyer (2/7) Jul 31 2012 Where is UTF-32 actually used?
- bearophile (5/6) Jul 31 2012 I think all std.algorithm and std.range yield UTF-32 dchars, when
- Jakob Ovrum (3/9) Jul 31 2012 In addition, foreach over a string with a dchar loop variable
- Walter Bright (3/15) Jul 31 2012 SIMD isn't going to speed things up at all for decoding one character. I...
- Jakob Ovrum (2/21) Jul 31 2012 Duh, good point, I totally forgot the context.
- Tobias Pankrath (2/21) Jul 31 2012 You could decode them in advance.
- jerro (3/25) Jul 31 2012 The problem is you don't know how much you are going to need.
- bearophile (10/12) Jul 31 2012 Right.
http://woboq.com/blog/utf-8-processing-using-simd.html All in the article. As D include Unicode as a language feature, I think it is interesting to mention here.
Jul 30 2012
deadalnix:http://woboq.com/blog/utf-8-processing-using-simd.htmlSo many things to do, so little time to do them :-) Bye, bearophile
Jul 30 2012
On 07/30/12 21:13, deadalnix wrote:http://woboq.com/blog/utf-8-processing-using-simd.html All in the article. As D include Unicode as a language feature, I think it is interesting to mention here.Very interesting, thx for sharing. This NG definitely is a horn of plenty :)
Jul 30 2012
On 7/30/2012 12:13 PM, deadalnix wrote:http://woboq.com/blog/utf-8-processing-using-simd.html All in the article. As D include Unicode as a language feature, I think it is interesting to mention here.If someone wants to fix std.utf http://dlang.org/phobos/std_utf.html to use SIMD instructions, that would be cool!
Jul 31 2012
Walter Bright:to use SIMD instructions, that would be cool!I think in D the most needed UTF operation is UTF8 -> UTF32. Bye, bearophile
Jul 31 2012
On Tuesday, 31 July 2012 at 10:57:23 UTC, bearophile wrote:Walter Bright:Where is UTF-32 actually used?to use SIMD instructions, that would be cool!I think in D the most needed UTF operation is UTF8 -> UTF32. Bye, bearophile
Jul 31 2012
Bernard Helyer:Where is UTF-32 actually used?I think all std.algorithm and std.range yield UTF-32 dchars, when you give them a string in input. Bye, bearophile
Jul 31 2012
On Tuesday, 31 July 2012 at 12:11:25 UTC, bearophile wrote:Bernard Helyer:In addition, foreach over a string with a dchar loop variable does implicit UTF-8 decoding.Where is UTF-32 actually used?I think all std.algorithm and std.range yield UTF-32 dchars, when you give them a string in input. Bye, bearophile
Jul 31 2012
On 7/31/2012 5:24 AM, Jakob Ovrum wrote:On Tuesday, 31 July 2012 at 12:11:25 UTC, bearophile wrote:SIMD isn't going to speed things up at all for decoding one character. It is for transcoding a large array.Bernard Helyer:In addition, foreach over a string with a dchar loop variable does implicit UTF-8 decoding.Where is UTF-32 actually used?I think all std.algorithm and std.range yield UTF-32 dchars, when you give them a string in input. Bye, bearophile
Jul 31 2012
On Tuesday, 31 July 2012 at 19:28:03 UTC, Walter Bright wrote:On 7/31/2012 5:24 AM, Jakob Ovrum wrote:Duh, good point, I totally forgot the context.On Tuesday, 31 July 2012 at 12:11:25 UTC, bearophile wrote:SIMD isn't going to speed things up at all for decoding one character. It is for transcoding a large array.Bernard Helyer:In addition, foreach over a string with a dchar loop variable does implicit UTF-8 decoding.Where is UTF-32 actually used?I think all std.algorithm and std.range yield UTF-32 dchars, when you give them a string in input. Bye, bearophile
Jul 31 2012
On Tuesday, 31 July 2012 at 19:28:03 UTC, Walter Bright wrote:On 7/31/2012 5:24 AM, Jakob Ovrum wrote:You could decode them in advance.On Tuesday, 31 July 2012 at 12:11:25 UTC, bearophile wrote:SIMD isn't going to speed things up at all for decoding one character. It is for transcoding a large array.Bernard Helyer:In addition, foreach over a string with a dchar loop variable does implicit UTF-8 decoding.Where is UTF-32 actually used?I think all std.algorithm and std.range yield UTF-32 dchars, when you give them a string in input. Bye, bearophile
Jul 31 2012
On Tuesday, 31 July 2012 at 19:41:02 UTC, Tobias Pankrath wrote:On Tuesday, 31 July 2012 at 19:28:03 UTC, Walter Bright wrote:The problem is you don't know how much you are going to need. This would actually hurt performance in some cases.On 7/31/2012 5:24 AM, Jakob Ovrum wrote:You could decode them in advance.On Tuesday, 31 July 2012 at 12:11:25 UTC, bearophile wrote:SIMD isn't going to speed things up at all for decoding one character. It is for transcoding a large array.Bernard Helyer:In addition, foreach over a string with a dchar loop variable does implicit UTF-8 decoding.Where is UTF-32 actually used?I think all std.algorithm and std.range yield UTF-32 dchars, when you give them a string in input. Bye, bearophile
Jul 31 2012
Walter Bright:SIMD isn't going to speed things up at all for decoding one character. It is for transcoding a large array.Right. Maybe you remember my two or three posts about vectorized lazynesss and related matters (that later was a bit implemented in the half-eager map of std.parallelism). Introducing some vectorized lazyness in std.algorithm when the iterable is a UTF-8 (or rarely UTF-16) string allows to use SIMD and probably leads to higher performance. Bye, bearophile
Jul 31 2012