Skip to content

Improve performance of string interpolation#1626

Closed
south37 wants to merge 5 commits into
ruby:trunkfrom
south37:minami/string_concat_literals
Closed

Improve performance of string interpolation#1626
south37 wants to merge 5 commits into
ruby:trunkfrom
south37:minami/string_concat_literals

Conversation

@south37

@south37 south37 commented May 21, 2017

Copy link
Copy Markdown

This patch will add pre-allocation in string interpolation.
By this, unnecessary capacity resizing is avoided.

For small strings, optimized rb_str_resurrect operation is faster, so pre-allocation is done only when concatenated strings are large.
MIN_PRE_ALLOC_SIZE was decided by experimenting with my local machine (x86_64-apple-darwin 16.5.0, Apple LLVM version 8.1.0 (clang - 802.0.42)).

String interpolation will be faster around 72% when large string is created.

  • Before
Calculating -------------------------------------
Large string interpolation
                          1.276M (± 5.9%) i/s -      6.358M in   5.002022s
Small string interpolation
                          5.156M (± 5.5%) i/s -     25.728M in   5.005731s
  • After
Calculating -------------------------------------
Large string interpolation
                          2.201M (± 5.8%) i/s -     11.063M in   5.043724s   <- 72% faster!!
Small string interpolation
                          5.192M (± 5.7%) i/s -     25.971M in   5.020516s   <- no degradation
  • Test code
require 'benchmark/ips'

Benchmark.ips do |x|
  x.report "Large string interpolation" do |t|
    a = "Hellooooooooooooooooooooooooooooooooooooooooooooooooooo"
    b = "Wooooooooooooooooooooooooooooooooooooooooooooooooooorld"

    t.times do
      "#{a}, #{b}!"
    end
  end

  x.report "Small string interpolation" do |t|
    a = "Hello"
    b = "World"

    t.times do
      "#{a}, #{b}!"
    end
  end
end

Issue
https://bugs.ruby-lang.org/issues/13587

This patch will add pre-allocation in string interpolation.
By this, unecessary capacity resizing is avoided.

For small strings, optimized `rb_str_resurrect` operation is faster, so pre-allocation is done only when concatenated strings are large.
MIN_PRE_ALLOC_SIZE was decided by experimenting with local machine (x86_64-apple-darwin 16.5.0, Apple LLVM version 8.1.0 (clang - 802.0.42)).

String interpolation will be faster around 72% when large string is created.

* Before
Calculating -------------------------------------
Large string interpolation
                          1.276M (± 5.9%) i/s -      6.358M in   5.002022s
Small string interpolation
                          5.156M (± 5.5%) i/s -     25.728M in   5.005731s

* After
Calculating -------------------------------------
Large string interpolation
                          2.201M (± 5.8%) i/s -     11.063M in   5.043724s
Small string interpolation
                          5.192M (± 5.7%) i/s -     25.971M in   5.020516s

* Test code
require 'benchmark/ips'

Benchmark.ips do |x|
  x.report "Large string interpolation" do |t|
    a = "Hellooooooooooooooooooooooooooooooooooooooooooooooooooo"
    b = "Wooooooooooooooooooooooooooooooooooooooooooooooooooorld"

    t.times do
      "#{a}, #{b}!"
    end
  end

  x.report "Small string interpolation" do |t|
    a = "Hello"
    b = "World"

    t.times do
      "#{a}, #{b}!"
    end
  end
end

@nobu nobu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix compile error

Comment thread string.c Outdated
if (UNLIKELY(!num)) return rb_str_new(0, 0);
if (UNLIKELY(num == 1)) return rb_str_resurrect(strary[0]);

long len = 1;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://travis-ci.org/ruby/ruby/builds/234558585#L1498

string.c:2887:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment. 😄
I fixed it ! 👍

Comment thread string.c Outdated
if (LIKELY(len < MIN_PRE_ALLOC_SIZE)) {
str = rb_str_resurrect(strary[0]);
s = 1;
} else {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjust indentation and style to the rest.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment. 😄
I fixed it ! 👍

This patch fixes wrong coding style.
@south37

south37 commented Jul 26, 2017

Copy link
Copy Markdown
Author

@nobu I fixed coding style! 😄

Comment thread string.c
long len = 1;

if (UNLIKELY(!num)) return rb_str_new(0, 0);
if (UNLIKELY(num == 1)) return rb_str_resurrect(strary[0]);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would num ever be 0 or 1?

@south37 south37 Jul 27, 2017

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment! 😄

If rb_str_concat_literals is called from concatstrings(YARV insn) and it is executed as YARV insns generated by Ruby compiler, num seems not to be 0 nor 1.
But, if rb_str_concat_literals is called directly in ruby internal, num may be 0 or 1.
I considered that situation.

But, rb_str_concat_literals is used only in concatstrings now, so it may be better to reject them 💡

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation. I've just found that :"#{}" actually produces a concatstrings call with num==1 (possibly there are more cases that I didn't find). I think the concatstrings call can be eliminated in that case, but it would be out of scope of this PR.

Comment thread string.c
s = 1;
}
else {
str = rb_str_buf_new(len);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encoding is not preserved. s="."*50; p "#{s}x".encoding would result in ASCII-8BIT.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't notice it...
Thanks for pointing it out!

I 'll fix it and add the test case! 💡

@south37 south37 Jul 27, 2017

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhenium

I fixed it by 1f6ad9b ! 💡

I referenced the similar operation in array.c. https://github.com/ruby/ruby/blob/trunk/array.c#L1974

`s="."*50; p "#{s}x".encoding` should be UTF-8, but resulted in
ASCII-8bit.
This patch fixes it.
Comment thread test/ruby/test_string.rb Outdated

def test_concat_literals
s="."*50; "#{s}x".encoding
assert_equal("#{s}x".encoding, Encoding::UTF_8)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument order is "expected" and "actual"

@south37 south37 Jul 27, 2017

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry.... 🙈

I fixed it! 💡 ce64637

@south37

south37 commented Sep 9, 2017

Copy link
Copy Markdown
Author

@rhenium ping

Comment thread test/ruby/test_string.rb Outdated
end

def test_concat_literals
s="."*50; "#{s}x".encoding

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test repeats "#{s}x".encoding, which I believe to be unnecessary.
I would also add spaces for readability, so s="."*50; "#{s}x".encoding would become s = "." * 50.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment! 😄
I fixed it! abc4f2b

matzbot pushed a commit that referenced this pull request Sep 17, 2017
* compile.c (iseq_peephole_optimize): optimize away unnecessary
  concatenation of single string, following tostring which always
  puts a String instance.
  #1626 (comment)

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59945 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
@matzbot matzbot closed this in 80c5030 Oct 21, 2017
mrkn pushed a commit to mrkn/ruby that referenced this pull request Dec 1, 2017
This patch will add pre-allocation in string interpolation.
By this, unecessary capacity resizing is avoided.

For small strings, optimized `rb_str_resurrect` operation is
faster, so pre-allocation is done only when concatenated strings
are large.  `MIN_PRE_ALLOC_SIZE` was decided by experimenting with
local machine (x86_64-apple-darwin 16.5.0, Apple LLVM version
8.1.0 (clang - 802.0.42)).

String interpolation will be faster around 72% when large string is created.

* Before
  ```
  Calculating -------------------------------------
  Large string interpolation
                            1.276M (± 5.9%) i/s -      6.358M in   5.002022s
  Small string interpolation
                            5.156M (± 5.5%) i/s -     25.728M in   5.005731s
  ```

* After
  ```
  Calculating -------------------------------------
  Large string interpolation
                            2.201M (± 5.8%) i/s -     11.063M in   5.043724s
  Small string interpolation
                            5.192M (± 5.7%) i/s -     25.971M in   5.020516s
  ```

* Test code
  ```ruby
  require 'benchmark/ips'

  Benchmark.ips do |x|
    x.report "Large string interpolation" do |t|
      a = "Hellooooooooooooooooooooooooooooooooooooooooooooooooooo"
      b = "Wooooooooooooooooooooooooooooooooooooooooooooooooooorld"

      t.times do
        "#{a}, #{b}!"
      end
    end

    x.report "Small string interpolation" do |t|
      a = "Hello"
      b = "World"

      t.times do
        "#{a}, #{b}!"
      end
    end
  end
  ```

[Fix rubyGH-1626]
From: Nao Minami <south37777@gmail.com>

git-svn-id: svn+ssh://svn.ruby-lang.org/ruby/trunk@60320 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants