HTML parsing in Java is really simple with Jsoup

The other day I needed to write an application which parses different data from a pile of web pages. I’ve found an excellent library that parses HTML with JQuery-like selection methods. You do not need any regexes which cause a headache.

For example, you need to get a title and URL of anchor from the HTML code below:

String html = "<div><a href="http://blog.romanvlasenko.com/">Click here...</a></div>";
Document doc = Jsoup.parse(html);
String title = doc.select("div.some-class a").text();
String url = doc.select("div.some-class a").attr("href");

Voila! You have what you needed.

If you need to parse a real web page, use Jsoup.connect() method instead of the Jsoup.parse()
Jsoup has it’s own HttpConnection that encapsulates some extra-work which you were needed to do with Java’s HttpURLConnection.

Document webPage = Jsoup.connect("http://blog.romanvlasenko.com/").userAgent("Mozilla").get();

Now you have downloaded web page ready for processing. Jsoup provides both POST and GET methods. For more details see the Jsoup project homepage.

GWT: client code optimization

I’d like to share with you one tip which I’ve learned only yesterday, in spite of the fact that you may already know it.

Our GWT project is written with a lot of modules that include initialization of child modules in their constructors which initialize a lot of another modules in their constructors and so on… Some modules can load different data for initial rendering, despite the fact that user won’t see them in current session, however this data and asynchronous calls are forcing application server to do unuseful job and weighty client code makes some delays on application start up in user’s browser.

Here is the solution! Recently I read about ReadAsyncCallback interface which allows to easily make deferred calls right in the client code. Just imagine, it allows you to make deferred method call in a client code in the same way as you make requests to your remote services:

GWT.runAsync(new RunAsyncCallback() {
    public void onSuccess() {
      new SomeModule().show();
    }
    public void onFailure(Throwable ohNoes) {
    }
  });

This really simple method tells compiler to split your code into parts if it possible without any bad consequences.

To learn some details and limitations of this feature read the official documentation.