Making your first NumPy array
The easiest example to create a one-dimensional array would be a straightforward command.After renaming your Jupyter notebook from Untitled to array_basics, the first thing to do is to import the numpy library into your active session by typing in import numpy as np in the In [] command and running the cell.
Next, you want to assign the array object a variable name so you can reference it in future commands.It is common to use single character values such as a or x as a shortcut for your array but for just getting started, let's use something more descriptive, such as my_first_array for easier reference.To the right of the equals sign, we reference the numpy method using np.array followed by a parentheses and square brackets, which encapsulate the assigned values for each element. After running the command, to ensure the syntax is correct, the last command will be to print the array to ensure the output matches the input.Once completed, the results should look similar to the following screenshot:
Now that we have an array available, let's walk through how you can verify the contents.
Useful array functions
Some useful commands to run against any array in NumPy to give you metadata (data about the data) are included here. The commands are being run specifically against the variable named my_first_array:
- my_first_array.shape: It provides the array dimensions.
- my_first_array.size: This shows the number of array elements (similar to the number of cells in a table).
- len(my_first_array): This shows the length of the array.
- my_first_array.dtype.name: This provides the data type of the array elements.
- my_first_array.astype(int): This converts an array into a different data type—in this example, an integer that will display as int64.
If you run the preceding commands in Jupyter, your notebook should look similar to the following screenshot:
To reference individual elements in the array, you use the square brackets along with an ordinal whole number, which is called the array index.If you are familiar with the Microsoft Excel function, vlookup, the behavior to reference the index of the data you want to retrieve has a similar concept. The first element in any array using NumPy would be 0 so if you wanted to just display the first value from the prior example, you would type in the print(my_first_array[0]) command, which will output 1 on the screen, as shown in the following screenshot:
Since the array we are working with in this example has numeric values, we can also do some mathematical functions against the values.
Some useful statistical functions you can run against numeric arrays that have dtype of int or float include the following:
- my_first_array.sum(): Sums all of the element values
- my_first_array.min(): Provides the minimum element value in the entire array
- my_first_array.max(): Provides the maximum element value in the entire array
- my_first_array.mean(): Provides the mean or average, which is the sum of the elements divided by the count of the elements
If you run these statistical commands against my_first_array in your notebook, the output will look similar to the following screenshot:
As you can see from these few examples, there are plenty of useful functions built into the NumPy library that will help you with data validation and quality checks during the analysis process. In the Further reading section, I have placed a link to a printable one-page cheat sheet for easy reference.