How to Encode Special Characters in Java’s URI Class
You would think adding query parameters with special characters to a URI would be easy in Java, but you’d be wrong. The java.net.URI class tries to do some URL encoding, but runs into trouble with characters like ampersands, question marks, and slashes. Here’s a quick URI workaround that doesn’t rely on third-party libraries.
Failure 1 – No Encoding
Your first attempt to encode special characters in a query string might be to pass it to the URI’s constructor just as you received it.
1 2 3 4 5 6 7 8 9 10 11 |
String queryString = "a=Frick&Frack" + "&b=New York" + "&c=US/Eastern" + "&d=when?" + "&e=20%" + "&f=#1"; URI uri = new URI("http", null, "example.com", -1, "/accounts", queryString, null); System.out.println(uri); |
URI is smart enough to encode some characters like spaces and percent signs, but it leaves other symbols untouched. If you try this first approach, you’ll get this garbled URL:
1 |
http://example.com/accounts?a=Frick&Frack&b=New%20York&c=US/Eastern&d=when?&e=20%25&f=%231 |
Instead of the properly encoded one.
1 |
http://example.com/accounts?a=Frick%26Frack&b=New+York&c=US%2FEastern&d=when%3F&e=20%25&f=%231 |
Failure 2 – Double Encoding
For the next attempt, you’d reasonably try to first encode the query parameters with java.net.URLEncoder before passing them onto URI.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
final String ENCODING = "UTF-8"; String queryString = "a=" + URLEncoder.encode("Frick&Frack", ENCODING) + "&b=" + URLEncoder.encode("New York", ENCODING) + "&c=" + URLEncoder.encode("US/Eastern", ENCODING) + "&d=" + URLEncoder.encode("when?", ENCODING) + "&e=" + URLEncoder.encode("20%", ENCODING) + "&f=" + URLEncoder.encode("#1", ENCODING); URI uri = new URI("http", null, "example.com", -1, "/accounts", queryString, null); System.out.println(uri); |
Makes sense. Unfortunately, this causes URI to encode the percent sign (%) that was produced by the first encoding to %25. Definitely not what we want.
1 |
http://example.com/accounts?a=Frick%2526Frack&b=New+York&c=US%252FEastern&d=when%253F&e=20%2525&f=%25231 |
Bypass URI’s Constructor with Reflection
That last attempt came really close. If only there was a way to bypass the encoding step in URI’s constructor and set the query string to your properly encoded value directly.
Well there is a way — using reflection. This approach won’t be blessed by any of the high OOP priests, but using reflection to set URI’s private fields directly does get the job done. It doesn’t require any extra libraries and works in Oracle’s JDK 6, 7, and 8.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
final String ENCODING = "UTF-8"; String queryString = "a=" + URLEncoder.encode("Frick&Frack", ENCODING) + "&b=" + URLEncoder.encode("New York", ENCODING) + "&c=" + URLEncoder.encode("US/Eastern", ENCODING) + "&d=" + URLEncoder.encode("when?", ENCODING) + "&e=" + URLEncoder.encode("20%", ENCODING) + "&f=" + URLEncoder.encode("#1", ENCODING); URI uri = new URI("http", null, "example.com", -1, "/accounts", null, null); Field field = URI.class.getDeclaredField("query"); field.setAccessible(true); field.set(uri, queryString); field = URI.class.getDeclaredField("string"); field.setAccessible(true); field.set(uri, null); System.out.println(uri); |
The important points with this approach are to:
- Call Class.getDeclaredField(String) instead of Class.getField(String) since the latter only looks for public fields while query is defined as private.
- Call field.setAccessible(true) to allow you to modify the value in this private field.
- Force the URI to rebuild on the next toString() call by setting its string field to null.
I think I found a better way, using the java.net.URL.toURI() method:
// double encoded java.net.URI
System.out.println(new URI(“https”, null, “foo.bar”, -1, “/baz”, “fuz=a%26b&q=w”, null));
// good java.net.URL
System.out.println(new URL(“https”, “foo.bar”, -1, “/baz?fuz=a%26b&q=w”));
// good java.net.URI
System.out.println(new URL(“https”, “foo.bar”, -1, “/baz?fuz=a%26b&q=w”).toURI());
stribika,
You are right! Creating a URL object first and then converting it to a URI object seems to honor the encodings already performed. This avoids using Reflection as the author of this article suggests. Great job!
Great article!
This is exactly what I was looking for in order to get “?” string value
I’m glad it was helpful 🙂 Cheers.
Thank God for this! …err I mean Thank you good sir for this!!!
I needed to add two fragments (parameters with #) into an URI and this is exactly what I needed. It would not be possible any other way since two URI fragments are completely non standard. The difference was that I actually changed the URI.class.getDeclaredField(“fragment”);
Now why would anyone need such a twistedness you may ask – Robohelp –
it uses a special linking to its webhelp files with two URI fragments to open help chapters and topics within them, they are some twisted m*** f***rs.
Hi, This solution does not work when there is a space character in value of the query parameter, because space is converted to + instead of %20.
For example New York in your code above will get converted to New+York instead of New%20York.
Any solution for this?
Just got the same problem today. After studying all morning and I found that there is different encoding rules on URI constructor by design (ref: https://docs.oracle.com/javase/7/docs/api/java/net/URI.html):
*The single-argument constructor requires any illegal characters in its argument to be quoted and preserves any escaped octets and other characters that are present.
*The multi-argument constructors quote illegal characters as required by the components in which they appear. The percent character (‘%’) is always quoted by these constructors. Any other characters are preserved.
That is why case “Double Encoding” occurs and URL.toURI() works.