How to Encode Special Characters in Java’s URI Class

How to Encode Special Characters in Java's URI Class

You would think adding query parameters with special characters to a URI would be easy in Java, but you’d be wrong.  The java.net.URI class tries to do some URL encoding, but runs into trouble with characters like ampersands, question marks, and slashes.  Here’s a quick URI workaround that doesn’t rely on third-party libraries.

 
Failure 1 – No Encoding

Your first attempt to encode special characters in a query string might be to pass it to the URI’s constructor just as you received it.

URI is smart enough to encode some characters like spaces and percent signs, but it leaves other symbols untouched.  If you try this first approach, you’ll get this garbled URL:

Instead of the properly encoded one.

Failure 2 – Double Encoding

For the next attempt, you’d reasonably try to first encode the query parameters with java.net.URLEncoder before passing them onto URI.

Makes sense.  Unfortunately, this causes URI to encode the percent sign (%) that was produced by the first encoding to %25.  Definitely not what we want.

Bypass URI’s Constructor with Reflection

That last attempt came really close.  If only there was a way to bypass the encoding step in URI’s constructor and set the query string to your properly encoded value directly.

Well there is a way — using reflection.  This approach won’t be blessed by any of the high OOP priests, but using reflection to set URI’s private fields directly does get the job done.  It doesn’t require any extra libraries and works in Oracle’s JDK 6, 7, and 8.

The important points with this approach are to:

  1. Call  Class.getDeclaredField(String)  instead of  Class.getField(String)  since the latter only looks for public fields while query is defined as private.
  2. Call  field.setAccessible(true)  to allow you to modify the value in this private field.
  3. Force the URI to rebuild on the next  toString()  call by setting its string field to null.

 

 

About Dele Taylor

Dele Taylor is the founder of StackHunter.com -- a tool to track Java exceptions. You can follow him on Twitter, G+, and LinkedIn.

10 Responses to “How to Encode Special Characters in Java’s URI Class”

  1. I think I found a better way, using the java.net.URL.toURI() method:

    // double encoded java.net.URI
    System.out.println(new URI(“https”, null, “foo.bar”, -1, “/baz”, “fuz=a%26b&q=w”, null));
    // good java.net.URL
    System.out.println(new URL(“https”, “foo.bar”, -1, “/baz?fuz=a%26b&q=w”));
    // good java.net.URI
    System.out.println(new URL(“https”, “foo.bar”, -1, “/baz?fuz=a%26b&q=w”).toURI());

    • stribika,

      You are right! Creating a URL object first and then converting it to a URI object seems to honor the encodings already performed. This avoids using Reflection as the author of this article suggests. Great job!

  2. Great article!
    This is exactly what I was looking for in order to get “?” string value

  3. Thank God for this! …err I mean Thank you good sir for this!!!

    I needed to add two fragments (parameters with #) into an URI and this is exactly what I needed. It would not be possible any other way since two URI fragments are completely non standard. The difference was that I actually changed the URI.class.getDeclaredField(“fragment”);

    Now why would anyone need such a twistedness you may ask – Robohelp –
    it uses a special linking to its webhelp files with two URI fragments to open help chapters and topics within them, they are some twisted m*** f***rs.

  4. abhishek bhatia Reply July 5, 2018 at 4:03 AM

    Hi, This solution does not work when there is a space character in value of the query parameter, because space is converted to + instead of %20.

    For example New York in your code above will get converted to New+York instead of New%20York.

    Any solution for this?

  5. Just got the same problem today. After studying all morning and I found that there is different encoding rules on URI constructor by design (ref: https://docs.oracle.com/javase/7/docs/api/java/net/URI.html):

    *The single-argument constructor requires any illegal characters in its argument to be quoted and preserves any escaped octets and other characters that are present.

    *The multi-argument constructors quote illegal characters as required by the components in which they appear. The percent character (‘%’) is always quoted by these constructors. Any other characters are preserved.

    That is why case “Double Encoding” occurs and URL.toURI() works.

Trackbacks/Pingbacks

  1. How to: Java URL encoding | SevenNet - November 28, 2014

    […] How to Encode Special Characters in java.net.URI […]

  2. Fixed Java URL encoding #dev #it #asnwer | Good Answer - December 20, 2014

    […] How to Encode Special Characters in java.net.URI […]

  3. How to: Java URL encoding #dev #development #computers | IT Info - December 27, 2014

    […] How to Encode Special Characters in java.net.URI […]