The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

erinkay (2) [Avatar] Offline
#1
Hello!



So I’ve been working through the chapter 6 examples with mixed results. In example 6.4 (Sample Vector creation from a Lucene index), I get the error message:
ERROR lucene.LuceneIterator: There are too many documents that do not have a term vector for desc-clustering
Exception in thread "main" java.lang.IllegalStateException: There are too many documents that do not have a term vector for desc-clustering

This error gets resolved-ish when I add
—maxPercentErrorDocs 1
to the end of the command line argument. When I do this I get the warning message:
WARN lucene.LuceneIterator: 80 documents do not have a term vector for description

When I get these warning messages, it does write some vectors, and I am able to use them in the k-means example and get normal looking output in ClusterDump.



But when I try to label those clusters, the log looks normal but the output is structured correctly, but doesn’t contain any data. In addition, the clusters contain a significantly smaller number of vectors (largest is 256 vectors).



In addition, when I try the topic modeling example, I get many warning messages which culminate in the following

WARN lucene.LuceneIterator: 1000 documents do not have a term vector for desc-clustering

17/04/29 13:13:26 ERROR lucene.LuceneIterator: There are too many documents that do not have a term vector for desc-clustering

Exception in thread "main" java.lang.IllegalStateException: There are too many documents that do not have a term vector for desc-clustering

at org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:114)

at org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:127)




I’ve tried to resolve these issues on my own, but can’t quite figure them out. Does anyone have any resources or ideas about how to approach these problems?

Thanks and all the best!
erinkay (2) [Avatar] Offline
#2
Would it be helpful if I included any more information? I'm still lost and haven't made any progress toward resolving this issue