405169 (2) [Avatar] Offline
#1
I appreciated the appendix for Solr users... but surprised that there is no mention at all of the major issue on Solr with synonyms... that it cannot properly handle multi-term synonyms. I would have really liked to have seen this big issue included with suggested approaches. For some people it's important enough to be a reason to chose ES over Solr.
Doug Turnbull (15) [Avatar] Offline
#2
Thanks for the feedback.

The problem is tied to the ideas in chapter 6 with term-centric query parsers. We discuss a bit how term-centric search needs to analyze queries the same way. Either through choosing the same analyzer or by pre-splitting the query on whitespace before analysis. The "presplitting" approach is what Solr edismax does and fouls up the analysis process, preventing adjacent terms being recognized by the synonym filter.

But this is good feedback for the next edition. We can attempt to tie this more explicitly to Solr's problems. You might or might not know about these existing articles/projects around the problem

http://opensourceconnections.com/blog/2013/10/27/why-is-multi-term-synonyms-so-hard-in-solr/
http://opensourceconnections.com/blog/2016/06/23/solr-multi-word-synonym-solutions-2016/
https://github.com/healthonnet/hon-lucene-synonyms
https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/

I'm personally hopeful in the future Solr can take Elasticsearch's approach and let you have more query-time control of query analysis! smilie

405169 (2) [Avatar] Offline
#3
Thanks for your reply, I've attempted to use some of these plugins but ultimately I find that particularly if you are trying to improve relevancy, they fall apart. As an example when using the hon_lucene plugin, which works globally to expand synonyms across your entire query, if you are trying to do phrase boosting, it results in a single word query term that has multi-term synonyms having matches for the synonyms that are artificially boosted above ones with the search term that was entered. Not good. This right now has ended up being the number one issue for us in achieving the relevancy results we need on Solr so it was really frustrating to not see it even mentioned, let alone a viable solution. It's particularly frustrating to find out how long it has been an issue in Solr and yet not addressed.
Doug Turnbull (15) [Avatar] Offline
#4
Just to let you know, I did end up releasing the plugin I use for doing Solr multiterm Synonyms, you can read more here:

http://opensourceconnections.com/blog/2017/01/23/our-solution-to-solr-multiterm-synonyms/

Cheers!
-Doug