· python

Python: scikit-learn: ImportError: cannot import name __check_build

In part 3 of Kaggle’s series on text analytics I needed to install scikit-learn and having done so ran into the following error when trying to use one of its classes:

>>> from sklearn.feature_extraction.text import CountVectorizer
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/markneedham/projects/neo4j-himym/himym/lib/python2.7/site-packages/sklearn/__init__.py", line 37, in <module>
    from . import __check_build
ImportError: cannot import name __check_build

This error doesn’t reveal very much but I found that when I exited the REPL and tried the same command again I got a different error which was a bit more useful:

>>> from sklearn.feature_extraction.text import CountVectorizer
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/markneedham/projects/neo4j-himym/himym/lib/python2.7/site-packages/sklearn/__init__.py", line 38, in <module>
    from .base import clone
  File "/Users/markneedham/projects/neo4j-himym/himym/lib/python2.7/site-packages/sklearn/base.py", line 10, in <module>
    from scipy import sparse
ImportError: No module named scipy

The fix for this is now obvious:

$ pip install scipy

And I can now load CountVectorizer without any problem:

$ python
Python 2.7.5 (default, Aug 25 2013, 00:04:04)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from sklearn.feature_extraction.text import CountVectorizer
  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket