Wednesday, October 20, 2004

The StringBuffer Myth

Charles Miller writes about his assessment of StringBuffer usage in Java:

One of my pet Java peeves is that some people religiously avoid the String concatenation operators, + and +=, because they are less efficient than the alternatives.

The theory goes like this. Strings are immutable. Thus, when you are concatenating "n" strings together, there must be "n - 1" intermediate String objects created in the process (including the final, complete String). Thus, to avoid dumping a bunch of unwanted String objects onto the garbage-collector, you should use the StringBuffer object instead.

So, by this theory, String a = b + c + d; is bad code, while String a = new StringBuffer(b).append(c).append(d).toString() is good code, despite the fact that the former is about a thousand times more readable than the latter.

For as long as I have been using Java, this has not been true. If you look at StringBuffer handling, you'll see the bytecodes that a Java compiler actually produces in those two cases. In most simple string-concatenation cases, the compiler will automatically convert a series of operations on Strings into a series of StringBuffer operations, and then pop the result back into a String.

The only time you need to switch to an explicit StringBuffer is in more complex cases, for example if the concatenation is occurring within a loop (see StringBuffer handling in loops).


Charles compares those to approaches:

return a + b + c;

VS.

StringBuffer s = new StringBuffer(a);
s.append(b);
s.append(c);
return s.toString();


As Charles points out correctly, the Java compiler internally replaces the string concatenation operators by a StringBuffer, which is converted back to a String at the end. This looks like the same result, as when using StringBuffer directly.

But the Java bytecode, that Charles analyzed in detail, only tells half of the story. What he did not take a closer look on was what happens inside the call to the StringBuffer constructor, which the compiler inserted. And that's where the real performance vulnerability strikes hard:

public StringBuffer(String str) {
    this(str.length() + 16);
    append(str);
}


The constructor only allocates a buffer for holding the original String plus 16 characters. Not more than that. In addition, StringBuffer.append() only expands the StringBuffer's capacity to fit for the next String appended:

public synchronized StringBuffer append(String str) {
    if (str == null) {
        str = String.valueOf(str);
    }

    int len = str.length();
    int newcount = count + len;
    if (newcount > value.length)
        expandCapacity(newcount);

    str.getChars(0, len, value, count);
    count = newcount;
    return this;
}


That means constant re-allocation on each consecutive call to StringBuffer.append().

You - the programmer - know better than that. You might know exactly how big the buffer is going to be in its final state - or if you don't know the exact number, you may at least apply a decent approximation. You can then construct your StringBuffer like this:

return new StringBuffer(
a.length() + b.length() + c.length()).
append(a).append(b).append(c).toString();


No constant reallocation necessary, that means better performance and less work for the garbage collector. And that's where the real benefit lies in when applying StringBuffer instead of String concatenation operators.