Generating garbage…

At JCrete the last couple of years I’ve had the pleasure to be socializing with, among others, HFT people like Peter Lawrey and Martin Thompson.

Those HFT dudes really makes me thinking when I’m implementing stuff. Thinking about how much garbage I create.

In High Frequency Trading systems you cannot afford garbage collection and therefore they are doing lots of tricks to get around generating garbage.

With the stuff I’m doing a gc pause is generally not critical, so generating garbage is no big deal. Writing easy to read- and maintain code is more important.

But then again, it does not harm to think about garbage.

So, here I was, implementing the good-old escapeXml function to be used in some internal HTML generating library.

I more or less copied the first attempt from the ‘net:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
/**
 * XML escape input string
 * 
 * @param input
 * @return
 */
public static String escapeXml_1(String input) {
	if (input == null || input.isEmpty()) {
		return input;
	}
	StringBuilder sb = null;
	final char[] charr = input.toCharArray();
	for (int i = 0; i < charr.length; i++) {
		final String repl;
		final char ch = charr[i];
		switch (ch) {
			case '&': repl = "&amp;"; break;
			case '<': repl = "&lt;"; break;
			case '>': repl = "&gt;"; break;
			case '"': repl = "&#034;"; break;
			case '\'': repl = "&#039;"; break;
			default: repl = null; break;
		}
		if (repl != null) {
			if (sb == null) {
				sb = new StringBuilder(xmllen(charr));
				if (i > 0) {
					sb.append(charr, 0, i);
				}
			}
			sb.append(repl);
		} else if (sb != null) {
			sb.append(ch);
		}
	}
	if (sb != null) {
		return sb.toString();
	}
	return input;
}
 
private static int xmllen(char[] charr) {
	int len = charr.length;
	for (int i = 0; i < charr.length; i++) {
		final char ch = charr[i];
		switch (ch) {
			case '&': len += 4; break;
			case '<': len += 3; break;
			case '>': len += 3; break;
			case '"': len += 5; break;
			case '\'': len += 5; break;
		}
	}
	return len;
}

Does it generate garbage? Yes, it does, in lines 12, 26 and 37. In line 12 a character array is generated from the input string. In line 26 a StringBuilder and again in line 37 the StringBuilder generates a String object, which again creates a new character-array. Lots of garbage actually.

So, can we reduce some of this garbage, still have maintainable code and not compromise performance too much?

Well, we can get rid of the garbage in line 12 by not doing a toCharArray and just charAt into the string.

Here it goes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
/**
 * XML escape input string
 * 
 * @param input
 * @return
 */
public static String escapeXml_2(String input) {
	if (input == null || input.isEmpty()) {
		return input;
	}
	StringBuilder sb = null;
	for (int i = 0; i < input.length(); i++) {
		final String repl;
		final char ch = input.charAt(i);
		switch (ch) {
			case '&': repl = "&amp;"; break;
			case '<': repl = "&lt;"; break;
			case '>': repl = "&gt;"; break;
			case '"': repl = "&#034;"; break;
			case '\'': repl = "&#039;"; break;
			default: repl = null; break;
		}
		if (repl != null) {
			if (sb == null) {
				sb = new StringBuilder(xmllen(input));
				if (i > 0) {
					sb.append(input, 0, i);
				}
			}
			sb.append(repl);
		} else if (sb != null) {
			sb.append(ch);
		}
	}
	if (sb != null) {
		return sb.toString();
	}
	return input;
}
 
private static int xmllen(String str) {
	int len = str.length();
	for (int i = 0; i < str.length(); i++) {
		final char ch = str.charAt(i);
		switch (ch) {
			case '&': len += 4; break;
			case '<': len += 3; break;
			case '>': len += 3; break;
			case '"': len += 5; break;
			case '\'': len += 5; break;
		}
	}
	return len;
}

Just a tiny bit less garbage. How about performance? Well, my simple test-harness show that version 2 is about 9% slower than 1, which is not ideal.

So I remembered a session where we discussed sun.misc.Unsafe and somewhere in the discussion Peter mentioned that the String has a secret constructor that can be used for generating strings without creating new character arrays. And there was (at least) 3 different ways to get access to that ctor.

I was thinking that I just might go for the simplest one – not trying anything fancy with Unsafe and stuff. This way I can get rid of the StringBuilder and the inherent garbage created in lines 26 (25) and 37 (36).

Here it goes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
/**
 * XML escape input string
 * 
 * @param input
 * @return
 */
public static String escapeXml_3(String input) {
	if (input == null || input.isEmpty()) {
		return input;
	}
	char[] out = null;
	int x = 0;
	final char[] charr = input.toCharArray();
	for (int i = 0; i < charr.length; i++) {
		final String repl;
		final char ch = charr[i];
		switch (ch) {
			case '&': repl = "&amp;"; break;
			case '<': repl = "&lt;"; break;
			case '>': repl = "&gt;"; break;
			case '"': repl = "&#034;"; break;
			case '\'': repl = "&#039;"; break;
			default: repl = null; break;
		}
		if (repl != null) {
			if (out == null) {
				out = new char[xmllen(charr)];
				if (i > 0) {
					System.arraycopy(charr, 0, out, 0, i);
				}
				x = i;
			}
			final int len = repl.length();
			repl.getChars(0, len, out, x);
			x += len;
		} else if (out != null) {
			out[x++] = ch;
		}
	}
	if (out != null) {
		return charArrayToString(out);
	}
	return input;
}
 
private static final Constructor<String> SECRET_STRING_CTOR;
 
static {
	Constructor<String> secret = null;
	try {
		Constructor<String> c = String.class.getDeclaredConstructor(char[].class, boolean.class);
		c.setAccessible(true);
		String s = c.newInstance(new char[]{'c'}, true);
		if ("c".equals(s)) {
			secret = c;
		}
	} catch (Exception e) {
	}
	SECRET_STRING_CTOR = secret;
}
 
private static final String charArrayToString(char[] chars) {
	if (SECRET_STRING_CTOR != null) {
		try {
			return SECRET_STRING_CTOR.newInstance(chars, true);
		} catch (Exception ignored) {
		}
	}
	return new String(chars);
}

In the static initalizer (lines 48+) we try to get access to the package-protected (char[], boolean) constructor, and if it succeeds, we use that for creating new strings from character arrays in the charArrayToString function. (Doing this actually might generate a bit of garbage, since the newInstance might involve generating an Object array to hold the arguments)

If we do not get access to that ctor for whatever reason, we just go with the normal public (char[]) ctor.

This seemed to work much better, in my test-harness this version 3 is about 25% faster than version 1. Awesome!

Perhaps we could combine the efforts and use the charAt method and not generate any superflous garbage?

Here it goes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
/**
 * XML escape input string
 * 
 * @param input
 * @return
 */
public static String escapeXml_4(String input) {
	if (input == null || input.isEmpty()) {
		return input;
	}
	char[] out = null;
	int x = 0;
	final int ilen = input.length();
	for (int i = 0; i < ilen; i++) {
		final String repl;
		final char ch = input.charAt(i);
		switch (ch) {
			case '&': repl = "&amp;"; break;
			case '<': repl = "&lt;"; break;
			case '>': repl = "&gt;"; break;
			case '"': repl = "&#034;"; break;
			case '\'': repl = "&#039;"; break;
			default: repl = null; break;
		}
		if (repl != null) {
			if (out == null) {
				out = new char[xmllen(input)];
				if (i > 0) {
					input.getChars(0, i, out, 0);
				}
				x = i;
			}
			final int len = repl.length();
			repl.getChars(0, len, out, x);
			x += len;
		} else if (out != null) {
			out[x++] = ch;
		}
	}
	if (out != null) {
		return charArrayToString(out);
	}
	return input;
}

Nice and simple. Still readable 🙂 And performance is still some 19% better than the first version. This is going into my codebase!

If there are any manager-material-kind-of people reading this: Please do remember to send your employees on conferences…

Happy Easter.

test-harness

About Jesper Udby

I'm a freelance computer Geek living in Denmark with my wife and 3 kids. I've done professional software development since 1994 and JAVA development since 1998.
This entry was posted in Fun, Java and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *