Monday, October 31, 2011

Base64 encoding and decoding in Java

What is Base 64 Encoding Scheme?
Wiki says -
"Base64 is a group of similar encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation. The Base64 term originates from a specific MIME content transfer encoding."

In simple terms Base64 encoding is a Binary to ASCII text conversion technique. Other such techniques available are quoted-printable, hexadecimal, BinHex, etc.

Binary to text encoding/decoding techniques are used when there is doubt that the transmission protocol/channel might not be capable of handling the non-textual data, best examples are email sent via MIME, Usenet. For example, many browser normally convert a space in URLs to %20D or such like characters and you might be in trouble if your server doesn't reconvert them but you can't possible crack down every such possible case. So better is use an encoding scheme that converts your data to text based characters.

your data => [ Base64 Encoding System ] => Textual Data
Image      => [ Base64 Encoding System ] => Image converted to text data

Is Base64 Encoding scheme a Encryption technique? NO, don't even think like an idiot. it is simply converting data from one form to another so that no modification happens due to the limitation in my transmission protocol/channel but there is not concept of cryptography involved at all.

How Base64Encoding works? Well, look at Wiki example, it is pretty much straight forward and clear.
Same example is given below as well..
Text content M a n
ASCII 77 97 110
Bit pattern 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 0
Index 19 22 5 46
Base64-encoded T W F u

Since in Base64 you can have totally 64 different characters so your output can have characters only from the set given below:
A-Z, a-z, 0-9, +, /
Note: last two character might change depending on the implementation, for example, in variant Base64 with URL and Filename Safe Alphabet (RFC 4648 'base64url' encoding) -(hyphen) and _ (underscore) are used.

There are many variants of base64 are available, Some are listed below:
  1. RFC 1421
  2. RFC 2045- org.apache.commons.codec.binary.Base64 class does encoding and decoding as per this standard.
  3. RFC 3548
See more exhaustive list here(Wiki).

Last thing we will cover in theory is padding.. Now padding is adding extra bits/bytes/characters etc. when required. In Base64encoding also, padding is required some times.

When does padding is done in Base64Encoding and what characters are inserted for padding?
I will not explain how padding is done. this is clearly explained on Wiki site of Base64(See Relevant References) here. Padding is done when the number of bytes to encode is not divisible by 3, that is there are only one or two bytes of input for the last block. The '==' sequence indicates that the last group contained only 1 byte, and '=' indicates that it contained 2 bytes (you'll like that there are implementations available which doesn't require padding at all).

There are various implementations available as I mentioned earlier but being a good developer you should open your eyes and decide which implementation to choose depending on the requirement. Some Common applications of Base64 are given below:
  1. URL:
    • To encode the URL query parameters
    • To encode data in hidden form fields
    • Just to obfuscate the URL data
  2. File names: Variant Modified Base64 for file name uses '-' instead of '/', because Unix and Windows file names cannot contain '/'.
Checkout the other applications also.

Let us jump to Base64 Encoding in Java: There are many implementations available. I will list them down first (not all):
  • Apache commons: org.apache.commons.codec.binary.Base64. Usage of this implementation is recommended.
  • Sun misc package: sun.misc.BASE64Decoder. Avoid using it as misc package is never officially released by SUN, this is one thing. Another thing is Oracle might be tempted to remove all packages with names having SUN and you see that code suddenly stopped working in future releases of Java.
  • Base64 class shipped with JAXB framework. Use this if your project is already using JAXB.
  • javax.mail.internet.MimeUtility: This class also has encode and decode methods and can do the job for you. But use them when your code is dealing with emails as this implementation follows MIME protocol guidelines.
There are many others also available. You can write your own as well (all the best if you are doing.)

Now it's time for demo code:
import java.util.Arrays;

import org.apache.commons.codec.binary.Base64;

/**
 * @author java-espresso
 * This class has two methods one encodes a string in Base64Encoding 
 * and other decodes the same.
 */
public class CommonUtil {

 public static String base64Encode(String stringToEncode){
  byte [] stringToEncodeBytes = stringToEncode.getBytes();
  return Base64.encodeBase64String(stringToEncodeBytes);
 }
 
 public static String base64Decode(String stringToDecode){
  byte [] decodedBytes = Base64.decodeBase64(stringToDecode);
  return new String(decodedBytes);
 }
 
}
Test class for the above code:
public class TestBase64 {

 /**
  * @param args
  */
 public static void main(String[] args) {
  String testString1 = "test String"; //Total bytes=11 divide by 3  and 2 bytes are in remainder so '=' is appended in the encoded string
  String testString2 = "TestString"; //Total bytes=10 divide by 3  and 1 bytes are in remainder so '==' is appended in the encoded string
  String testString3 = "Test string."; // Total bytes=12 divide by 3  and 0 bytes are in remainder so nothing is appended
  
  String encoded1= CommonUtil.base64Encode(testString1);
  String encoded2= CommonUtil.base64Encode(testString2);
  String encoded3= CommonUtil.base64Encode(testString3);
  
  System.out.println("====== Encoded Strings =======================================");
  System.out.println(testString1+" > Encoded > "+encoded1);
  System.out.println(testString2+" > Encoded > "+encoded2);
  System.out.println(testString3+" > Encoded > "+encoded3);
  
  System.out.println("====== Decoded Strings =======================================");
  System.out.println(encoded1+" > Decoded > "+CommonUtil.base64Decode(encoded1));
  System.out.println(encoded2+" > Decoded > "+CommonUtil.base64Decode(encoded2));
  System.out.println(encoded3+" > Decoded > "+CommonUtil.base64Decode(encoded3));
 }

}
output on executing the above code:
====== Encoded Strings =======================================
test String > Encoded > dGVzdCBTdHJpbmc=
TestString > Encoded > VGVzdFN0cmluZw==
Test string. > Encoded > VGVzdCBzdHJpbmcu
====== Decoded Strings =======================================
dGVzdCBTdHJpbmc= > Decoded > test String
VGVzdFN0cmluZw== > Decoded > TestString
VGVzdCBzdHJpbmcu > Decoded > Test string.
Please find the source of the application attached here.
Related Article

Relevant References

3 comments:

  1. Thank you very much,Nice Blog

    ReplyDelete
  2. Can anyone suggest how to convert the .wav (audio file) to a text file and vice-versa in Java?
    I want to convert the audio file to a text file in java i.e first the audio file will be played and after it is played it will copy the lyrics into a text file, and vice versa, in java. But, I don't know from where I will start it, so, can anyone tell me how to do that?

    ReplyDelete