maneo (2) [Avatar] Offline
I know it's a bit late but I didn't maked to seat and read AIW earlier. Below you can find some of my thoughts after reading version downloaded 60 days ago (probably most of those remarks are fixed already).

Diagrams and some illustration from PDF which I reviewed were in rather bad quality. Below you can find a few thoughts about book's content:

A few words about authors maybe also a foreword could add some value to book,

You should add more information about basic requirements in terms of programming skills which reader have to have to understand examples.

There is also no information about how to run examples. I've didn't noticed any URLs to a place from which ZIP archive with additional materials (examples) could be downloaded.

BeanShell is a solid and well established technology but maybe you could think about examples written in more sexy languages like Groovy/JRuby smilie. More and more Web 2.0 applications is written in frameworks like Grails or Ruby on Rails examples in those languages may simplify adoption of described solutions in production.

Formatting of bibliography after each of the chapters should be improved and styled better.

I'm not sure if it's necessary to introduce Logger in examples like in listing 2.3 for MySearcher class . For the sake of simplicity you could simplify all exception handling and use System.Out.println only to report errors and results. Thanks to checked exception Java code becomes less readable and I guess simplicity is very important in this field.

Index of open source libraries at the end is great idea, maybe it's worth to divide this appendix into parts. Each part would contain information about tools used in that particular chapter.

I'm missing some (even short) references to tools like:

Polar Rose (– retrieval of pictures of particular people as a an example of processing image data. Maybe also Google Similar Images Search is worth to reference. Maybe some examples of binary data retrieval.

Some thigs connected with semantic web/OWL/RDF technologies like Hakia ( or OpenCalais ( Things like this are very interesting but there is a lot blurb arround so called web 3.0 and I would love to read an objective description of those solutions.

Just a remark about PDFs, from that what I've observed PDFBox is a rather inactive since November 2008 and stacked within 0.8 release. We had some issues with PDFs created using newest Adobe tools.

As for clustering algorithms and their classification. Have you heard about "description comes first"? Concept proposed in Dawid Weiss's Phd dissertation and used by Carrot2 search results clustering framework? This concept can give results which are really attractive for end-users. Basic assumption behind this is that clusters which does not have meaningful names are not very useful for users. So first step in this paradigm is to fetch potentially interesting words/phrases for names of groups and build clusters around those names. More information about this approach were described by Dawid Weiss and Stanisław Osiński in papers about Carrot2 (

That's all and thanks for great book smilie
babis.marmanis (52) [Avatar] Offline
Re: A few thoughts after reading MEAP version
Hello maneo,

That's great feedback. Thank you for posting it.

Indeed, a number of your suggestions had been noted earlier and have been addressed (e.g. foreword; running the examples; improved format for the bibliography). However, the book has been sent to the printer today, so the references mentioned at the end of your posting will not make it in the book.

As you can understand, this book is only an introduction to a field that’s already large and keeps growing rapidly. Naturally, the number of algorithms covered had to be limited and the explanations had to be concise. My objective was to select a number of topics and explain them well, rather than attempt to cover as much as possible with the risk of confusing the reader or simply creating a cookbook.

Notwithstanding the above observation, I will create a series of follow up articles that will further explore the subject matter. In particular, I think that review articles on related technologies (such as the one that you mentioned about the semantic web) and open source projects are of great importance to many fellow professionals. I intend to provide a number of such reports and reviews of specialized algorithms in the future. I am not sure what publication form that will take or when they will be available but you can check my website (, for updates on that front -- look at the section called "Publications".

This kind of publications cannot take the form of a book because the information within them must be recent, in order to be relevant for decision making support. The time scale of writing a book is much longer than the optimal time period for that purpose.

Once again, thank you for your feedback and the kind words.

Best regards,
maneo (2) [Avatar] Offline
Re: A few thoughts after reading MEAP version
Thanks for the answer and link to your website smilie.

I've prepared short review, it'is avaialble at :

Adam Dudczak