xinwuqingdao (11) [Avatar] Offline
#1
Hi, I am using the following existing code:
fullTextSession.setFlushMode(FlushMode.MANUAL);
fullTextSession.setCacheMode(CacheMode.IGNORE);
transaction = fullTextSession.beginTransaction();
//Scrollable results will avoid loading too many objects in memory
ScrollableResults results = fullTextSession.createCriteria( Email.class )
.scroll( ScrollMode.FORWARD_ONLY );
int index = 0;
while( results.next() ) {
index++;
fullTextSession.index( results.get(0) ); //index each element
if (index % BATCH_SIZE == 0) {
fullTextSession.flushToIndexes(); //apply changes to indexes
fullTextSession.clear(); //clear since the queue is processed
}
}
transaction.commit();

but it seems that performance is very slow. Can you give me some idea where I can tune this ino order to get better performance.

Thank you.

--Xin Wu
s.grinovero (26) [Avatar] Offline
#2
Re: the performance of indexing the existing data in database
Hello,
I think the code is fine. What do you mean by "very slow" ?

Did you try to tune some batching options, like
"hibernate.search.default.batch.ram_buffer_size" ?

You should understand what is actually slowing you down, you might have a too low BATCH_SIZE resulting into flushing to the index too often, or a too low BATCH_SIZE resulting in many roundtrips to the database.
you might have too less memory (using Xms and Xms parameters of the JVM?)

The type of objects (the schema/model) will also affect indexing behaviour, if for each object loaded using the efficient scrollableresultset you have to load some other indexed collection, more SELECT statements will be used for each of your objects. You should monitor the database, and maybe enable SQL logging.

For some cases, if the indexed entity is relating to some indexedembedded objects which are often repeated, it might help to enable the cache instead of disabling it.