kiwi (31) [Avatar] Offline
#1
hi,

I wondering is that any difference of performance if I using filter compare to query.

say i wan to find x, y which the x should be between a and b, and y between c, and d.

so i got this for Query:

//for simplify, this code just for demonstrate the intention.

@factory
public Filter createFilter()
{
query1 = [a to b]
query2 = [c to d]

booleanQuery = ...
booleanQuery,add(query1, MUST)
booleanQuery,add(query2, MUST)

return new QueryWrapperFilter(booleanQuery)
}

and another is only using filter:

@factory
public Filter createFilter()
{
rangeFilter1 = new RangeFilter("field.name", a, b, true, true);
rangeFilter2 = new RangeFilter("field.name", c, d, true, true);

return new ChainedFilter(new Filter()[]{rangeFilter1, rangeFilter2});
}

1) is that any performance between both ?

2) which way is recommended ?

3) is that any diff between both method? why the second one is not suffer from TooManyClauses exception ? (I just could not get it, can anyone explain further ? )

4) if I using ConstantScoreRangeQuery for first one, the outcome of both method is also same, right ? (mean no exception for both).

my intend was to create a filter for filtering the result, just not sure which way is better.

any idea ?

kiwi
-----
happy hacking !

Message was edited by:
kiwi

Message was edited by:
kiwi
emmanuel.bernard (101) [Avatar] Offline
#2
Re: all filter or all query + QueryWrapperFilter for filter impl ?
For small range (ie with a low number of discrete matching values in the index), there shouldn't be much difference. I would say that query is faster and easier.
Unless you plan to filter by these exact range values all the time, in that case filter + cache is better.

The latter is not affected by TooManyClause because the filter apply the range scoping like well... a filter. In the former case, Lucene looks for all values matching the range and create a giant query matching each exact value.

query1 = [a to e] => a OR b OR c OR d OR e

For your question about score. I think so but john is better qualified.
john.griffin (36) [Avatar] Offline
#3
Re: all filter or all query + QueryWrapperFilter for filter impl ?
Kiwi,

I have not used filters much in my work so I can't vouch for the absence of the exception.

Since your question deals solely with performance and not the score returned by individual matches I recommend that you always utilize a ConstantScoreRangeQuery when a range of values is part of your query. I have found that they are ~10 times faster than a standard RangeQuery. If you are using a QueryParser then the ConstantScoreRangeQuery is automatically generated for you. If you are constructing your range queries manually then use the CSRQ instead of the RQ.

As for scoring effects, the CSRQ score for all documents matching the range is equal to to the boost factor applied to this clause. So, documents matching the range could have their score increased/decreased by the boost factor you set.

Hope this helps.
kiwi (31) [Avatar] Offline
#4
Re: all filter or all query + QueryWrapperFilter for filter impl ?
hi, thank for reply,

I just check the ConstantScoreRangeQuery source, indeed it wrap the RangeFilter to do the task and set the boost for query.

Hence i think should not much diff if using both method accept ConstantScoreRangeQuery got set boost factor but RangeFilter is not.

regards,
kiwi
kiwi (31) [Avatar] Offline
#5
Re: all filter or all query + QueryWrapperFilter for filter impl ?
just another question, when query is pass to QueryWrapperFilter, what is happen actually ?

seem like the QueryWrapperFilter rewrite the query, say if i using CSRQ, it will using the RangeFilter, but use RQ, it use a lot of term OR add in booleanQuery.

so in my case, it make a diff when using RQ or CSRQ.

am i right ?

kiwi
john.griffin (36) [Avatar] Offline
#6
Re: all filter or all query + QueryWrapperFilter for filter impl ?
Kiwi,

Sorry it took so long to reply. I'm not quite sure what you are asking with this last question. When you say it makes a difference for you when you use one or the other of RQ or CSRQ, what are you asking? It makes a difference in which way for you?

John Griffin
kiwi (31) [Avatar] Offline
#7
Re: all filter or all query + QueryWrapperFilter for filter impl ?
hi, sorry for confusing.

I think that when we using ConstantScoreRangeQuery, the query will rewrite as RangeFilter (with boost factor enabled), instead of when use RangeQuery, it rewrite to a lot of Boolean Query OR with all matched value.

hence the diff of RangeQuery and ConstantScoreRangeQuery is u not get any exception and performance is faster.

just not sure if my assumption is correct.

i check the code thought debugger to try to understand how it work indeed, since not so familiar with Lucene yet smilie

regards,
kiwi
john.griffin (36) [Avatar] Offline
#8
Re: all filter or all query + QueryWrapperFilter for filter impl ?
Kiwi,

I can't vouch for the absence of the exception being due to the different query types but most of the performance difference is from the fact that with CSRQ you do not have to process weights the same way a standard RQ does (its a fairly long computational process for each document). All you do is basically assign the boost as the score for CSRQ matching documents.

We discuss this at length in chapter 12 of Hibernate Search in Action.

Hope this helps.

John Griffin
kiwi (31) [Avatar] Offline
#9
Re: all filter or all query + QueryWrapperFilter for filter impl ?
hi,

thanks, i think i going to lookat that.

regards,
kiwi