381030 (1) [Avatar] Offline
#1
book example: from nlpia.data import get_data

research-hack: from nlpia.data.loaders import get_data

Is "loaders" for the beta version?
hobs (57) [Avatar] Offline
#2
Good catch. We need to be consistent. For now, using `nlpia.data.loaders.get_data` is the right way to import data loaders.
475801 (2) [Avatar] Offline
#3
Yet another small issue caused by renamed constants
>>> from nlpia.data.loaders import get_data
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/devteam/Code/nlpia/nlpia/data/loaders.py", line 14, in <module>
    from pugnlp.constants import MAX_INT64 as MAX_INT
ImportError: cannot import name 'MAX_INT64'
>>>



Fix is easy:
$ git diff
diff --git a/nlpia/data/loaders.py b/nlpia/data/loaders.py
index a275a98..1879b35 100644
--- a/nlpia/data/loaders.py
+++ b/nlpia/data/loaders.py
@@ -11,8 +11,8 @@ from nlpia.constants import logging, DATA_PATH, BIGDATA_PATH

 from tqdm import tqdm
 from pugnlp.futil import path_status, find_files
-from pugnlp.constants import MAX_INT64 as MAX_INT
-from pugnlp.constants import MIN_INT64 as MIN_INT
+from pugnlp.constants import INT64_MAX as MAX_INT
+from pugnlp.constants import INT64_MIN as MIN_INT
 import pandas as pd
 import tarfile

316854 (1) [Avatar] Offline
#4
ModuleNotFoundError: No module named 'nlpia.data'
In Page 57 after running from
>>>nlpia.data.loaders import get_data

I am getting the following error

ModuleNotFoundError Traceback (most recent call last)
<ipython-input-139-151f96be4fbb> in <module>()
----> 1 from nlpia.data.loaders import get_data

ModuleNotFoundError: No module named 'nlpia.data'
hobs (57) [Avatar] Offline
#5
You need to make sure you import the full path before you use the full path when calling the `get_data()` function. This worked for me on version 1.19 (April 16th):

>>> import nlpia.data.loaders
>>> nlpia.data.loaders.get_data()
      spam                                               text
0        0  Go until jurong point, crazy.. Available only ...
...
4836     0                         Rofl. Its true to its name

[4837 rows x 2 columns]
>>> nlpia.__version__
'0.1.27'
hobs (57) [Avatar] Offline
#6
Really appreciate you finding these bugs and helping us build a high quality book. We now have automated unit testing to try to prevent these kinds of bugs.
578126 (5) [Avatar] Offline
#7
Thank you for writing such an energetic book and providing a virtual environment in which to run the examples. I'm having trouble, though in Chapter 2's NB sentiment analyzer example.

After cloning the NLPIA repo and setting up my Python 3.6 conda env using the yml configuration on my Windows 7 machine, the nlpia package was not installed. I could not conda install it, even through the most up-to-date command on the anaconda site. After pip-installing nlpia 0.1.31, I tried the following command

import nlpia.data.loaders as nlpia_load


and obtained several error messages, ending with

AttributeError: module 'numpy' has no attribute 'float128'


NumPy 1.14.3 was installed per the yml, but could the version be incorrect? Any other suggestion?
hobs (57) [Avatar] Offline
#8
Very weird. Thank you for sharing. My fresh install of the conda environment on Mac and Linux for the latest version of `nlpia` doesn't seem to have this problem. And the automated tests on Travis all pass (finally smilie). Could you provide more details about your OS and environment (`pip freeze` please)? If you file this as an issue on github, we'll be able to help you more quickly (I just now saw this).
112968 (6) [Avatar] Offline
#9
I have same problem on Windows 10 pro 64 bit.

Using numpy 1.15.0
When importing get_data as in the subject field,
the AttributeError shows up: numpy has no attribute 'float 128'

Anaconda is python 3.6

I tried the same on macOS, everything went well. Only Windows gives me trouble... (how often have I heard that phrase uttered...?)

112968 (6) [Avatar] Offline
#10
I think the problem is that numpy 1.15 does not have a float128 attribute, but numpy 1.14 does.
There is a lot of juggling to be done here, as my first attempt to just uninstall numpy also removed scipy and pandas, now I am putting things back...

112968 (6) [Avatar] Offline
#11
Well, after reinstalling anaconda and everything, I can got get a combination of numpy versions that has float128 attribute. Perhaps it's a combination of conda, numpy the flavor of windows, or the phase of the moon. My solution is going to be moving back to macOS, or install a flavor of GNU/Linux virtually.
Cheers, G
hobs (57) [Avatar] Offline
#12
Thank you so much for all the details on what you did to get things working. I'll try to duplicate you problem on Windows once we're done with the final copy edits. And I'll fix the installation instructions/scripts on github and in pypi once I figure it out.
I just added some instructions on how best to get Linux into your windows world in Appendix A. Here's an excerpt, but it sounds like you know what you're doing already:


VirtualBox:

* Jason Brownlee (https://machinelearningmastery.com/linux-virtual-machine-machine-learning-development-python-3/)

* Jeroen Janssens (http://datasciencetoolbox.org/)

Docker Container:

* Vik Paruchuri (https://www.dataquest.io/blog/docker-data-science/)

* Jamie Hall (http://blog.kaggle.com/2016/02/05/how-to-get-started-with-data-science-in-containers/)

112968 (6) [Avatar] Offline
#13
In case this helps: Windows 10 Enterprise edition I installed Anaconda python 3.6 with the normal stuff, numpy, scipy, nltk.
However that install -- even without nlpia -- has an issue with numpy:
Jupyter QtConsole 4.3.1
Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.

import numpy as np

np.float64
Out[2]: numpy.float64

np.float128
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-1496141e3f9f> in <module>()
----> 1 np.float128

AttributeError: module 'numpy' has no attribute 'float128'


This is important because:
from nlpia.data.loaders import get_data
C:\Users\johnsmith\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\utils.py:1209: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
  warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-151f96be4fbb> in <module>()
----> 1 from nlpia.data.loaders import get_data

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nlpia\data\loaders.py in <module>()
----> 1 from nlpia.loaders import *

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nlpia\loaders.py in <module>()
     37 
     38 from pugnlp.futil import path_status, find_files
---> 39 from pugnlp.util import clean_columns
     40 # from nlpia.constants import DEFAULT_LOG_LEVEL  # , LOGGING_CONFIG
     41 from nlpia.constants import DATA_PATH, BIGDATA_PATH

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pugnlp\util.py in <module>()
     62 
     63 import pandas as pd
---> 64 from .tutil import clip_datetime
     65 # import progressbar
     66 from fuzzywuzzy import process as fuzzy

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pugnlp\tutil.py in <module>()
     30 import pytz
     31 
---> 32 from .constants import DEFAULT_TZ
     33 from .constants import MAX_DATETIME, MIN_DATETIME, MAX_TIMESTAMP, MIN_TIMESTAMP, NAT
     34 import pugnlp.regexes as rex

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pugnlp\constants.py in <module>()
    130 
    131 # these may not all be the sames isinstance types, depending on the env
--> 132 FLOAT_TYPES = (float, np.float16, np.float32, np.float64, np.float128)
    133 FLOAT_DTYPES = tuple(set(np.dtype(typ) for typ in FLOAT_TYPES))
    134 INT_TYPES = (int, np.int0, np.int8, np.int16, np.int32, np.int64)

AttributeError: module 'numpy' has no attribute 'float128'

578126 (5) [Avatar] Offline
#14
Sorry for not responding with the info you requested, but I channeled this Windows-specific problem into a positive inspiration to begin working in Linux. None of us should be expected to problem-solve that nonsense when there is so much to work to do building language models. Despite some frictions making Ubuntu work in my Windows-centric corporation, I'm happy with Ubuntu.
112968 (6) [Avatar] Offline
#15
Fair enough. Alas, I don't get to tell my boss that he should not use Windows, as much as I'd like to do that.
NLPIA seemed like a good idea at the time... I'll figure out a workaround.
112968 (6) [Avatar] Offline
#16
The problem seems to be with the constants in pugnlp, which explicitly includes np.float128 as one of its float types.
The problem is, that numpy's types are not universally identical: they depend on flavors of compiler and architecture combinations. I would dynamically create FLOAT_TYPES by looking inside this list comprehension (numpy imported as np), and filtering on the trailing digits
In [7]:[ x for x in dir(np) if x.startswith('float') ]
Out[7]: ['float', 'float16', 'float32', 'float64', 'float_', 'float_power', 'floating']

Or simply checking if float128 is in the type hashtable:
In [8]: 'float32' in np.typeDict
Out[8]: True

In [9]: 'float128' in np.typeDict
Out[9]: False

hobs (57) [Avatar] Offline
#17
Regarding your boss and his requirement that you use Windows, you can always install Virtual Box and run everything on a "real" OS within your Windows machine smilie
hobs (57) [Avatar] Offline
#18
That's an excellent approach, as long as you exclude non-dtype objects. Here's a better way that returns a correct list of types and their names:

>>> FLOAT_TYPES = [t for t in set(np.typeDict.values()) if t.__name__.startswith('float')]
>>> FLOAT_TYPES
[numpy.float64, numpy.float128, numpy.float16, numpy.float32]
>>> FLOAT_TYPE_NAMES = [t.__name__ for t in FLOAT_TYPES]
>>> FLOAT_TYPE_NAMES
['float64', 'float128', 'float16', 'float32']


I added your Issue to the github issue tracker at https://github.com/totalgood/nlpia/issues/18. If you want to help us out and get full credit for your code and you can submit a PR at ]https://github.com/totalgood/nlpia/pulls.