Machine learning is about learning some properties of a data set and then testing those properties against another data set. A common practice in machine learning is to evaluate an algorithm by splitting a data set into two. We call one of those sets the training set, on which we learn some properties; we call the other set the testing set, on which we test the learned properties.
Scikit-learn is a machine learning library built on top of SciPy that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection, model evaluation, and many other utilities. It’s accessible to everyone, and reusable in various contexts.
This is free and open source software.
To avoid polluting your system, we recommend installing scikit-learn with Anaconda, a distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment.
Download and install Anaconda using wget.
$ wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh
Run the shell script:
$ bash Anaconda3-2022.10-Linux-x86_64.sh
You’ll be asked to accept Anaconda’s license and whether to initialize Anaconda3 by running conda init. For changes to take effect, close and re-open your current shell.
Create a conda environment, and activate it.
$ conda create --name scikit-learn
$ conda activate scikit-learn
Now we install scikit-learn into our conda environment with the command:
$ pip install -U scikit-learn
This installed joblib-1.2.0, scikit-learn-1.2.1, and threadpoolctl-3.1.0 in our conda environment.
There are packages for popular distros. For example, in Debian/Ubuntu scikit-learn can be installed with the command:
$ sudo apt-get install python3-sklearn python3-sklearn-lib python3-sklearn-doc
scikit-learn has many dependencies which are detailed on the project’s website.