Python basics: package management
Python is a very famous programming language for machine learning. In this article, I will introduce basic Python environment.
Glossary
I will introduce basic terms about Python package management.
- pip: A tool for package installation. It retrieves Python packages from PyPI. pip is gem command of Ruby.
- virtualenv: Package isolation tool for Python. It has similar function with bundler of Ruby, but it also has the function to change Python versions over 2.x and 3.x.
- venv: It is an official tool for package isolation introduced from Python 3.3. But, if you want to use Python 2.x or you are Debian/Ubuntu user, I recommend you to use virtualenv.
venv switches with a command like python3.5 -m venv some-awesome-env
, so it can’t handle over Python 2 and 3. venv installed by Debian/Ubuntu installs useless dependencies for other OSs, so I’m an Ubuntu user so I don’t use venv.
These are common tool sets for many Pythonistas. They are recommended tools of PyPA, a working group that maintains many of the relevant projects in Python packaging.
There is one more tool that is for the specific purpose.
- conda: conda is a tool for package management for scientific computation developed by Anaconda, Inc. It can manage not only Python but also R. PyData community loves conda.
I use conda for my work, but I recommend you to know the pros/cons of conda and virtualenv/venv and chose write tool for your purpose.
Installation of Python
Since it is 2017, Python beginners should use the latest version of Python 3. However, there are some cases to use Python 2.x for some painful reasons.
If you need to install Python 2 and 3, you can install multiple Python with package management tools like apt
or yum
. In Ubuntu, you can install Python 2.7 with apt install python-dev
, and you can install Python 3.6 via apt install python3-dev
.
After installation, you can see the Pythons under /usr/bin
:
/usr/bin/python #<- 2.7
/python2 #<- 2.7
/python2.7 #<- 2.7
/python3 #<- 3.6
/python3.6 #<- 3.6
If you’re macOS user, you can install both Python 2 and 3 via brew install
or port install
.
For Windows users, you can install Python 2 and 3 using official installer or Chocolatey. From Python 3.6 for Windows, there is py
command that switches Python version.
Caution: Never try to keep using System Python. System Python is often old, and it depends on system critical system such as yum. If you run sudo pip install
carelessly, there is a risk of destroying the environment of the OS itself.
Package management
As I mentioned, you should not do sudo pip install awesome-package
. Hence, Many important systems depend on system Python, don’t use sudo pip
.
If you’re a venv user, this tutorial will help you.
https://docs.python.jp/3/tutorial/venv.html
For virtualenv users, I will write a tutorial of virtualenv. It is a translation of the document written by aodag.
https://gist.github.com/aodag/bea141d255e22d204a2140fba658ebf2
Why should we use virtualenv/venv?
virtualenv avoids:
- Conflicting Python packages with system Python
- Conflicting packages between projects
- Losing sight of which project depends on those packages
Install virtualenv
First, you can install virtualenv
under user home directory.
$ wget https://bootstrap.pypa.io/get-pip.py
$ export PATH=”~/.local/bin/:$PATH”
$ python get-pip.py --user
$ pip install virtualenv --user
\# Windows user can isntall just via \`pip install\`
\> pip install virtualenv
With --user
option, you can install packages under user directory.
virtualenv
can create a Python virtual environment. Creating the environment under the project root is common.
Run virtualenv
as follows:
$ virtualenv venv -p python3.6
then, you can get virtual environment.
Since Python packages will be installed under the venv
directory, don’t forget to add venv directory into .gitignore
.
$ source venv/bin/activate
(venv) $
\# For Windows
\> . venv/Script/activate
Install Python packages via pip
You can install packages via pip
. After activating virtualenv/venv, pip will install packages under venv
directory.
(venv) $ pip install pyramid
If you want to install the specific version of the package, you can set version number:
(venv) $ pip install pyramid==1.8.1
Without version number, pip
will install latest stable version.
https://www.python.org/dev/peps/pep-0440/
You can list installed packages with pip list
command.
(venv) $ pip list
numpy (1.13.1)
pandas (0.20.3)
pip (9.0.1)
pkginfo (1.4.1)
pytest (3.2.0)
python-dateutil (2.6.1)
pytz (2017.2)
wheel (0.29.0)
Managing package version
From pip 7.1, we can fix version of packages with constraints.txt
. Using pip freeze
command, you can list packages with a version number.
(venv)$ pip freeze -l
numpy==1.13.1
pandas==0.20.3
pkginfo==1.4.1
pytest==3.2.0
python-dateutil==2.6.1
pytz==2017.2
(venv)$ pip freeze -l > constraints.txt
You should list your required packages into requirements.txt
,
(venv)$ cat requirements.txt
pandas
numpy
Then you can install required packages as follows:
(venv)$ pip install -r requirements.txt -c constraints.txt
Levelaging wheelhouse
Modern Python package is distributed by wheel format, which is the binary type format. There is another format, sdist, which is the source type format and it requires compile from source if it depends on native codes. I highly recommend using wheel format, because it is faster installation than sdist without compilation and even if you have an offline environment which unable to connect PyPI you can deploy the project easily.
Put all dependent .whl
format package files under wheelhouse
directory, you can install as follows:
$ pip install -r requirements.txt -c constraints.txt -f wheelhouse — no-index
-w
or --wheel-dir
option allows you to set wheel directory. -f
or--find-links
option uses wheelhouse directory primary.--no-index
option prevent to connect PyPI.
If you want to export all the dependencies into wheelhouse
directory, you can use pip wheel
command.
$ pip wheel -r requirements.txt -c constraints.txt -w wheelhouse
Should I use conda?
Anaconda is a Python distribution for scientific computing such as machine learning. Anaconda suit consists of Anaconda, which includes the recommended package and Miniconda, which is the minimum environment for conda and you can install only necessary packages yourself. Anaconda sometimes includes heavy packages. It used to include Django, so check the default package and use it properly.
Unlike virtualenv, Anaconda can create its original virtual environment. Characteristically, using the --copy
option makes it possible to copy system level libraries, .so, etc. without creating symbolic links. If you archive a set of virtual environments with zip or tar, you can use it on other machines.
$ conda create -n myenv --copy python=3.6
$ conda activate myenv
In other words, libraries, which are managed by OS level package management tools such as apt
, are also managed by conda. Conda has its own package repository different from PyPI and upload binaries for each OS on it. Since the same package, such as OpenCV, is registered in the repository by multiple users, you should care which package is the best one.
In many machine learning books, it is often written that conda can be used, but I think that it is better not to use it much outside Windows.
The reasons are as follows:
- In 2017, wheel is de facto for the binary package format, so conda’s original purpose, handling scientific packages like numpy, or Scipy, can be done without conda.
- conda will replace commands such as openssl/curl/python in macOS / Linux System (strictly speaking, conda will pass PATH first) [issue]
- Package developers are often not conda users, and they seem to be asked for support in an environment that they do not normally use, such as JRuby or Rubyinius (or Windows specific trouble).
- In the conda world, it is difficult to pass information that should be included in a build of a native extension (such as Cython dependence)
So I recommend using conda for Windows users or people do not develop heavily but want to experience machine learning. Or, put Miniconda under pyenv control. I use conda under Docker environment.
However, we can not install the package like Scipy on Windows via pip install
, you need to download wheel on your own. I think that this point is better for honest conda.
Historical details are detailed in this article. In short, because old binary format egg was not good, conda was created.
Conclusion
I introduced installation of Python and how to manage Python packages. I think we can manage Python packages via virtualenv/venv well without conda, but there is good case for conda to pack some environment with system libraries.
References
Original Japanese document: