Practical Data Analysis Using Jupyter Notebook
上QQ阅读APP看书,第一时间看更新

Exploring Python packages

Before wrapping up this chapter, let's explore the different Python packages required with data analysis and validate they are available to use in the Jupyter Notebook app. These packages have evolved over time and are open source so programmers can contribute and improve the source code.

The version of the Python packages will increment over time depending on when you install conda or pip (package manager) on your machine. If you receive errors running commands, validate they match the versions used in this book.

We will go into more depth about each individual package as we use their awesome features in future chapters. The focus in this chapter is to verify the specific libraries are available, and there are a few different approaches to use such as inspecting the installation folder for specific files on your workstation or running commands from a Python command line. I find the easiest method is to run a few simple commands in a new notebook.

Navigate back to the notebooks folder and create a new notebook file by clicking on the New menu and select Python 3 in the submenu to create a default Untitled notebook. To stay consistent with best practices, be sure to rename the notebook verify_python_packages before moving forward.

Checking for pandas

The steps to verify whether each Python package is available are similar with slight variations to the code. The first one will be pandas, which will make it easier to complete common data analysis techniques such as pivoting, cleaning, merging and grouping datasets all in one place without going back to the source of record.

To verify whether the pandas library is available in Jupyter, follow these steps:

  1. Type inimport pandas as pdin theIn []:cell.
  2. Run the cell using the preferred method discussed earlier in the Installing Python and using Jupyter Notebook section:
  • Click the button from the toolbar.
  • Select Run Cellsfrom theCell menu.
  • Press the Shift + Enter orCtrl + Enter keys.
  1. Type in thenp.__version__command in the nextIn []:cell.
  2. Run the cell using the preferred method from step 2.
  3. Verify the output cell displayed asOut [].
The version of pandas should be 0.18.0 or greater.

Now you will repeat these steps for each of the following required packages used in this book: numpy, sklearn, matplotlib, and scipy. Note that I have used the commonly known shortcut names for each library to make it consistent with best practices found in the industry.

For example, pandas has been shortened to pd, so as you call features from each library, you can just use the shortcut name.

Additional packages can and should be used depending on the type of analysis required, variations of the data input, and advancement of the Python ecosystem.

Checking for NumPy

NumPy is a powerful and common mathematical extension of Python created to perform fast numeric calculations against a list of values that is known as an array. We will learn more about the power of NumPy features in Chapter 3, Getting Started with NumPy.

To verify whether the numpy library is available in Jupyter, follow these steps:

  1. Type in import numpy as np in the In []: cell.
  2. Run the cell using the preferred method discussed earlier in the Installing Python and using Jupyter Notebook section:
  • Click the button from the toolbar.
  • Select Run Cells from the Cell menu.
  • Press the Shift + Enter or Ctrl + Enter keys.
  1. Type in the np.__version__ command in the next In []: cell.
  2. Run the cell using the preferred method from step 2.
  3. Verify the output cell displayed as Out [].
The version of NumPy should be 1.10.4 or greater.

Checking for sklearn

sklearn is an advanced open source data science library used for clustering and regression analysis. While we will not leverage all of the advanced capabilities of this library, having it installed will make it easier for future lessons.

To verify if the sklearn library is available in Jupyter, follow these steps:

  1. Type in import sklearn as sk in the In []: cell.
  2. Run the cell using the preferred method discussed earlier in the Installing Python and using Jupyter Notebook section:
  • Click the button from the toolbar.
  • SelectRun Cellsfrom theCellmenu.
  • Press the Shift + Enter or Ctrl + Enter keys.
  1. Type in the sk.__version__ command in the next In []: cell.
  2. Run the cell using the preferred method from step 2.
  3. Verify the output cell displayed as Out [].
The version of sklearn should be 0.17.1 or greater.

Checking for Matplotlib

The Matplotlib Python library package is used for data visualization and plotting charts using Python.

To verify whether the matplotlib library is available in Jupyter, follow these steps:

  1. Type in import matplotlib as mp in the In []: cell.
  2. Run the cell using the preferred method discussed earlier in the Installing Python and using Jupyter Notebook section:
  • Click the button from the toolbar.
  • SelectRun Cellsfrom theCellmenu.
  • Press the Shift + Enter or Ctrl + Enter keys.
  1. Type in the mp.__version__ command in the next In []: cell.
  2. Run the cell using the preferred method from step 2.
  3. Verify the output cell displayed as Out [].
The version of matplotlib should be 1.5.1 or greater.

Checking for SciPy

SciPy is a library that's dependent on NumPy and includes additional mathematical functions used for the analysis of data.

To verify whether the scipy library is available in Jupyter, follow these steps:

  1. Type in import scipy as sc in the In []: cell.
  2. Run the cell using the preferred method discussed earlier in the Installing Python and using Jupyter Notebook section:
  • Click the button from the toolbar.
  • SelectRun Cellsfrom theCellmenu.
  • Press the Shift + Enter or Ctrl + Enter keys.
  1. Type in the sc.__version__ command in the next In []: cell.
  2. Run the cell using the preferred method from step 2.
  3. Verify the output cell displayed as Out [].
The version of scipy should be 0.17.0 or greater.

Once you have completed all of the steps, your notebook should look similar to the following screenshot: