The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

alex27 (9) [Avatar] Offline
#1
With the sentence "Find textbooks with titles containing 'NLP', or 'natural' and 'language',or 'computational' and 'linguistics'.", there is 12 elements in the set, not 11 ?

In [1]: s="Find textbooks with titles containing 'NLP', or 'natural' and 'language',or 'computational' and 'linguistics'."

In [2]: s
Out[2]: "Find textbooks with titles containing 'NLP', or 'natural' and 'language',or 'computational' and 'linguistics'."

In [3]: len(set(s.split()))
Out[3]: 12

In [4]: import numpy as np

In [5]: np.arange(1, 12 + 1).prod()
Out[5]: 479001600


hobs (58) [Avatar] Offline
#2
You are right. If we'd used a tokenizer that split on punctuation or consistently used spaces between words in our example, there would be 12 unique words or tokens in the set.