Exploring Python packages
Before wrapping up this chapter, let's explore the different Python packages required with data analysis and validate they are available to use in the Jupyter Notebook app. These packages have evolved over time and are open source so programmers can contribute and improve the source code.
We will go into more depth about each individual package as we use their awesome features in future chapters. The focus in this chapter is to verify the specific libraries are available, and there are a few different approaches to use such as inspecting the installation folder for specific files on your workstation or running commands from a Python command line. I find the easiest method is to run a few simple commands in a new notebook.
Navigate back to the notebooks folder and create a new notebook file by clicking on the New menu and select Python 3 in the submenu to create a default Untitled notebook. To stay consistent with best practices, be sure to rename the notebook verify_python_packages before moving forward.
Checking for pandas
The steps to verify whether each Python package is available are similar with slight variations to the code. The first one will be pandas, which will make it easier to complete common data analysis techniques such as pivoting, cleaning, merging and grouping datasets all in one place without going back to the source of record.
To verify whether the pandas library is available in Jupyter, follow these steps:
- Type inimport pandas as pdin theIn []:cell.
- Run the cell using the preferred method discussed earlier in the Installing Python and using Jupyter Notebook section:
- Click the button from the toolbar.
- Select Run Cellsfrom theCell menu.
- Press the Shift + Enter orCtrl + Enter keys.
- Type in thenp.__version__command in the nextIn []:cell.
- Run the cell using the preferred method from step 2.
- Verify the output cell displayed asOut [].
Now you will repeat these steps for each of the following required packages used in this book: numpy, sklearn, matplotlib, and scipy. Note that I have used the commonly known shortcut names for each library to make it consistent with best practices found in the industry.
For example, pandas has been shortened to pd, so as you call features from each library, you can just use the shortcut name.
Checking for NumPy
NumPy is a powerful and common mathematical extension of Python created to perform fast numeric calculations against a list of values that is known as an array. We will learn more about the power of NumPy features in Chapter 3, Getting Started with NumPy.
To verify whether the numpy library is available in Jupyter, follow these steps:
- Type in import numpy as np in the In []: cell.
- Run the cell using the preferred method discussed earlier in the Installing Python and using Jupyter Notebook section:
- Click the button from the toolbar.
- Select Run Cells from the Cell menu.
- Press the Shift + Enter or Ctrl + Enter keys.
- Type in the np.__version__ command in the next In []: cell.
- Run the cell using the preferred method from step 2.
- Verify the output cell displayed as Out [].
Checking for sklearn
sklearn is an advanced open source data science library used for clustering and regression analysis. While we will not leverage all of the advanced capabilities of this library, having it installed will make it easier for future lessons.
To verify if the sklearn library is available in Jupyter, follow these steps:
- Type in import sklearn as sk in the In []: cell.
- Run the cell using the preferred method discussed earlier in the Installing Python and using Jupyter Notebook section:
- Click the button from the toolbar.
- SelectRun Cellsfrom theCellmenu.
- Press the Shift + Enter or Ctrl + Enter keys.
- Type in the sk.__version__ command in the next In []: cell.
- Run the cell using the preferred method from step 2.
- Verify the output cell displayed as Out [].
Checking for Matplotlib
The Matplotlib Python library package is used for data visualization and plotting charts using Python.
To verify whether the matplotlib library is available in Jupyter, follow these steps:
- Type in import matplotlib as mp in the In []: cell.
- Run the cell using the preferred method discussed earlier in the Installing Python and using Jupyter Notebook section:
- Click the button from the toolbar.
- SelectRun Cellsfrom theCellmenu.
- Press the Shift + Enter or Ctrl + Enter keys.
- Type in the mp.__version__ command in the next In []: cell.
- Run the cell using the preferred method from step 2.
- Verify the output cell displayed as Out [].
Checking for SciPy
SciPy is a library that's dependent on NumPy and includes additional mathematical functions used for the analysis of data.
To verify whether the scipy library is available in Jupyter, follow these steps:
- Type in import scipy as sc in the In []: cell.
- Run the cell using the preferred method discussed earlier in the Installing Python and using Jupyter Notebook section:
- Click the button from the toolbar.
- SelectRun Cellsfrom theCellmenu.
- Press the Shift + Enter or Ctrl + Enter keys.
- Type in the sc.__version__ command in the next In []: cell.
- Run the cell using the preferred method from step 2.
- Verify the output cell displayed as Out [].
Once you have completed all of the steps, your notebook should look similar to the following screenshot: