Saturday, April 17, 2010

Detecting binary data in Java

Sometimes you only want to show data, only when it's printable. I use the following trick to do so.

String value = "a string";
CharsetEncoder encoder =
Charset.forName("ISO-8859-1").newEncoder();

if (encoder.canEncode(value)) {
System.out.println(value);
} else {
System.out.println("<BINARY DATA>");
}

Before I print data, I check if the data can be converted to ISO-8859-1 (Latin-1) encoding, which includes all characters used throughout The Americas, Western Europe, Oceania, and a large part of Africa. When this fails I know there are characters that are not defined in the charset, which implies the data might be binary.

This method works most of the time, but there is no guarantee it works 100% in every situation. Two scenarios where this method fails are when the binary data only includes printable data or when text data is encoded in a different charset.

For printing text in other charsets like Chinese or Arabic, you have to use the corresponding encodings instead of Latin-1 to make this trick work.

1 comment:

  1. This is one of those solutions that are really simple and beautiful.

    ReplyDelete