The importance of the hashCode()

Interviewers like to ask questions about the importance of hashCode() function, kind of “what would be if the hashCode() returns the same value for different objects” and vise versa, “what could be if it returns random number every time, can it break anything”?

I think the best way to memorize the answers for such a questions is to understand how it actually works in a real life code. Simple remember the following implementation from java.util.HashMap:

public V get(Object key) {
        if (key == null)
            return getForNullKey();
        int hash = hash(key.hashCode());
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }

Even just these two lines:

if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;

As we see, two different keys can have the same hashes. If we make the hashCode() always return the same integer it won’t break anything but it will decrease performance of searching the key. HashSet collection is based on the HashMap and works the same way when we use it’s contains(Object key); method (basically it just delegates the task to the hashMap’s getEntry(Object key); method which is using the above algorithm).

So, the answers are: if we use a custom object as a key, we need to provide our own hashCode() to have a greater performance. If hashCode() returns random integer every time, the hashMap won’t probably ever find the key you’ve put into it.

HTML parsing in Java is really simple with Jsoup

The other day I needed to write an application which parses different data from a pile of web pages. I’ve found an excellent library that parses HTML with JQuery-like selection methods. You do not need any regexes which cause a headache.

For example, you need to get a title and URL of anchor from the HTML code below:

String html = "<div><a href="http://blog.romanvlasenko.com/">Click here...</a></div>";
Document doc = Jsoup.parse(html);
String title = doc.select("div.some-class a").text();
String url = doc.select("div.some-class a").attr("href");

Voila! You have what you needed.

If you need to parse a real web page, use Jsoup.connect() method instead of the Jsoup.parse()
Jsoup has it’s own HttpConnection that encapsulates some extra-work which you were needed to do with Java’s HttpURLConnection.

Document webPage = Jsoup.connect("http://blog.romanvlasenko.com/").userAgent("Mozilla").get();

Now you have downloaded web page ready for processing. Jsoup provides both POST and GET methods. For more details see the Jsoup project homepage.

GWT: client code optimization

I’d like to share with you one tip which I’ve learned only yesterday, in spite of the fact that you may already know it.

Our GWT project is written with a lot of modules that include initialization of child modules in their constructors which initialize a lot of another modules in their constructors and so on… Some modules can load different data for initial rendering, despite the fact that user won’t see them in current session, however this data and asynchronous calls are forcing application server to do unuseful job and weighty client code makes some delays on application start up in user’s browser.

Here is the solution! Recently I read about ReadAsyncCallback interface which allows to easily make deferred calls right in the client code. Just imagine, it allows you to make deferred method call in a client code in the same way as you make requests to your remote services:

GWT.runAsync(new RunAsyncCallback() {
    public void onSuccess() {
      new SomeModule().show();
    }
    public void onFailure(Throwable ohNoes) {
    }
  });

This really simple method tells compiler to split your code into parts if it possible without any bad consequences.

To learn some details and limitations of this feature read the official documentation.