kiwi (31) [Avatar] Offline

I was thinking that will hibernate integrate with solr in future.

as I think that if hibernate search can integrate with solr, it can take advantages of what solr provide (highlight, spell check, etc) while retain what hibernate search very good at, which is index complex pojo, which is what solr current version cannot do well(version 1.4)

while the disadvantages maybe will be additional HTTP request search need to return from solr server (but solrj, java client had a javabin binary format, with is more compact then xml format). and the additional of connection to get data from database.

this maybe the disadvantages of integration, but i think the benefit is outweight the disadvantages,

while the HTTP request may need some small overhead.(solr have a query cache which can reduce the overhead even further) the application can scale easy when performance is needed. because what need to do is just keep adding additional server without modify application code, isn't it cool smilie

now client (web app/server in this case) can send index to same place, and index is stored at single place,which can be a single or multiple server (distribute-search, which solr is supported.

while the search function now is separate from web server (content generation function), separate to single or multiple box mean "search function" now had it own CPU, which can provider better performance.

beside, i thinking it can integrate nicely with object database, E.G, db4o, which lack of full text search features (index complex data that need to store, and return the id for db40 to get the object when searching).

*for handle massive indexing and searching, I was thinking solr can be modify to using rest style, which is stateless, no servlet involve (as current version 2.5, it can't take advantage of NIO, see jetty blog by grew), it could be something like restlet + jboss netty (which provide small memory footprint and great performance), which can take advantages of java NIO, which mean massive indexing and searching to single server (or multiple server, as scaling is easy) is possible.

any opinion or comment is welcome smilie

happy hacking !
emmanuel.bernard (101) [Avatar] Offline
Re: hibernate + solr integration

First of, I wish we could easily switch from Lucene to Solr as a backend but I don't think it's as easy as it looks. Both backend are different and you will miss some of the declarative features of Hibernate Search.

Onto your specific requests,

Highlight and spell checking are in our road map. They are not hard to per se. Designing a nice API is harder and we have decided to work on a smoother Lucene query API first (fluent based).

Hibernate Search has ben designed so that you can add additional servers without modifying your code provided you use the master slaves approach. And that works smilie Granted we don't do true distributed search and don't have any short plan for that yet.

"you can provide better performance" That's simply not true, I ague that it's better to have 4 boxes doing both search and application processing that 2 boxes doing search and 2 doing application processing and delegating search. It reduces latency to have homogeneous boxes (dustributed queries can change this parameter). Remember, CPUs are not like humans, they don't need specialization to go faster.

Massive indexing:
I don't know where the bottleneck would be in a massive indexing involving a DB, HSearch and Solr. That's a good question. Today the bottleneck is the DB in the HSearch / DB couple. That's why there is a new schema in Hibernate Search 3.2 to do parallel queries.
kiwi (31) [Avatar] Offline
Re: hibernate + solr integration
hi, thx for comment.

as my point of view, I think that since both using lucene index files, and the good thing about HS is it can index complex pojo. (as what solr 1.4 still cannot doing now).

Instead of using jms (i had a hard time to make it work smilie ), solr is easy to deploy (just dump a war file, and modify some configuration).

More importantly, solr can solve what HS clustering problem : the slave receive the indexes every n seconds. (solr can let all client search almost realtime, and it retain one copy to manage, and it is jmx enabled).

I agree that design a good API is not easy, as I personally like to use HS API since it integrate smoothly with hibernate API (rewrite the whole search is not easy if i directly switch to solr, even it just a function ).

so I was thinking, imaging using HS indexing pojo (it still happen automatic in application), and the indexes (in javabin format maybe, which introduce by solrj) will be send to solr server. (like send to jms in clustering mode).

when searching, the HS API can search the data from solr (transparently, though some configuration) and it get the entity id and continue processing (get data from db and so on).

basically it work like: send indexes to solr, request search from solr which return entity id as a result, then it get entity from hibernate though id.

and i think the best thing is, the API could be almost no change. (so the API still integriti to Hibernate, switching the mode just need some configuration)

The architecture will look a bit difference compare to HS clustering, the search function is delegate to a solr server, u can add additional server for performance. (like partition the function)

If compare to current HS clustering approve: it provide near real time indexes data, in the mean time, it still retain what HS clustering goal : scalability. and it allow to share index though http. (which current HS approach, indexes is needed to be on same LAN)

for performance, maybe I could be wrong. as my though may contain bugs either smilie

this is just my personal thought, please correct me if i was wrong smilie

happy hacking !
emmanuel.bernard (101) [Avatar] Offline
Re: hibernate + solr integration
As I said, it's not as easy as it looks at first hands. But if someone wants to work on that we can try and remove all the obstacles.

PS: Not sure what is hard with JMS, it's a 5 mins work on most app servers.
PPS: we have a pure jgroups solution in trunk that will work as an alternative to JMS