Joey (17) [Avatar] Offline
#1
In the LDA discussion, document topic vectors and the term-topic pairs was discussed,
but is it possible to determine the topic for a word in a given document?

For example, document A talks about "river banks" while document B talks about "The role of banks in finance". Hence, will LDA allow the word "banks" to potentially be assigned to two different topics.

When you have a new document C with the word "banks" in it, how would you know the topic that should be associated with it?
hobs (58) [Avatar] Offline
#2
The word "bank" in your example would be assigned to all topics by PCA/LSA. The LSA term-topic matrix gives you a weight or value for each word in each topic. So what I think what you're looking for is a sorted list of the highest weight topics for a word, like "bank". And yes, you're right, there will likely be more than one topic with a high weight for the word "bank". There are probably 3 or more topics for the 3 uses of the work "bank" that I can think of off the top of my head (a banking airplane is the 3rd less common, lower weight one). I may try to add and example to this chapter showing how to do this. In the mean time, you can use the sort_values() method on a word-topic vector for "bank" if you first convert your word-document matrix into a pandas DataFrame, or convert the row for "bank" into a Pandas Series.