NumPy Array in Python: First Step into Numerical Python

NumPy array in Python are basic entities for numerical computations. They originate from the NumPy library, an essential package in Python for scientific computing. A NumPy array also termed as ndarray, is highly efficient and an object of multi-dimensional array (N-dimensional array), which provides the base functions of numerical operations. Being NumPy Arrays an essential object to study in data analysis, machine learning, and scientific computing.

In this detailed article, we will explore what NumPy arrays are, why they are useful, and core features of NumPy Arrays.

Before we start working with NumPy, we should have NumPy installed on machine. Read article to Install NumPy on Windows, Linux and Mac.

Table of Contents

NumPy Array in Python

Numpy stands for Numerical Python, which is a fundamental library in Python for scientific computing. At its core, Numpy introduces the concept of arrays, which are powerful data structures for efficient numerical operations.

A NumPy array is an N-dimensional array (ndarray) container that allows to store and manipulate large datasets efficiently. These arrays are a grid of values, all of the same data type, and can have any number of dimensions (1-D, 2-D, 3-D, etc.). Unlike standard Python lists, NumPy arrays have the following key advantages:

Homogeneous – All elements in a NumPy array (ndarray) are of the same data type, such as all integers or all floats. This property allows for more memory-efficient storage and faster computation.
Multidimensional – NumPy arrays can be multi-dimensional, meaning they can have one or more axes (dimensions). That is why NumPy arrays are generally referred as ndarray (N-dimensional array). For example:
- A 1-D array is like a simple list => [23, 25, 38, 49].
- A 2-D array could be thought of as a matrix or a table or nested list => [[21, 23], [36, 48]].
- A 3-D array could be a stack of 2-D arrays, like a cube or a set of matrices.
Fixed Size – The size of NumPy array or ndarray cannot be change once it is created. That the size of NumPy array remains fixed after its creation. However, we can create another array with a different size.
Performance Optimization – NumPy arrays code was written in C programming language. In general, these arrays are much faster than standard Python lists and its performance is much more significant for large numerical tasks. This is because of the optimized memory layout and operation on that memory, which unleashes low-level system optimizations.

Why Use NumPy Arrays

Speed and Performance

Vectorized Operations – Vectorized operations in NumPy are applied to perform calculation on the entire array. This way we don’t need to use any slow loops of Python. This significantly improves the execution time and provide faster results.
Memory Management – NumPy array elements are stored in contiguous blocks of memory. This allows to get all array elements from a single block and also can iterate through these element with continuous pointer to memory. This makes NumPy arrays more efficient than standard list, especially for large datasets.
C language based Implementation – Most of the fundamental operations in NumPy are written in C programming language. Because of this we can directly perform operations in memory and also byte code conversion is fast. It gives a substantial boost in performance as compared to the same operation in pure Python.

Flexibility and Convenience

Various Operations – NumPy supports a great variety of mathematical, statistical, and logical operations on arrays, including basic addition, subtraction, multiplication, division, indexing, slicing, reshaping, etc. This provides a greater flexibility and convenience to the developers.
Built-in Functions – NumPy has built-in functions like np.arange(), np.linspace(), and np.zeros() for easy creation and manipulation of arrays.
Interoperability – NumPy arrays are compatible with a wide range of scientific computing and machine learning libraries such as Pandas, Sc i Py, TensorFlow, and Scikit-learn, making them a versatile tool for data analysis and modeling. Many data science workflows are based on NumPy arrays, making them highly compatible with various libraries and tools.

Key Features of NumPy Arrays

NumPy arrays or ndarray are fundamental to numerical computing in Python. Because of its efficiency and flexibility, it provides an efficient way to work with large datasets and perform advanced mathematical operations. Core features of NumPy arrays are:

Introduction to NumPy - Core Features of NumPy — **Introduction to NumPy – Core Features of NumPy**

Homogeneous Data Type

All elements in a NumPy array must be of the same data type (e.g., integers, floats, etc.). This homogeneity allows NumPy to optimize memory usage and computation. Unlike Python lists, which can hold different data types in a single list, NumPy arrays store elements efficiently, resulting in faster operations and lower memory overhead.

# Homogeneous Data Type
import numpy as np

integer_array = np.array([14, 22, 35, 45])  # All integers
string_array = np.array(["shbytes.com", "NumPy", "Python", "Power BI"])  # All strings
boolean_array = np.array([False, True, True, True])  # All booleans

print(integer_array)  # [14 22 35 45]
print(string_array)   # ['shbytes.com' 'NumPy' 'Python' 'Power BI']
print(boolean_array)  # [False  True  True  True]

Multidimensional (N-dimensional) Structure

NumPy arrays can have any number of dimensions. A 1-D array is like a simple list, 2-D array is a matrix, 3-D array is a tensor, and so on. This flexibility allows NumPy to represent complex data structures such as matrices, images, or even multi-dimensional grids efficiently.

# Multidimensional (N-dimensional) Structure
import numpy as np

arr1d = np.array([14, 22, 35, 45])  # 1-D array
arr2d = np.array([[14, 22], [35, 45]])  # 2-D array (matrix)
arr3d = np.array([[[14, 22], [35, 45]], [[54, 64], [74, 68]]])  # 3-D array (tensor)

Shape and Size of Arrays

The shape of an array represents its dimensions, while its size indicates the total number of elements.

Shape is a tuple representing the array dimensions.
Size – The total number of elements in the array.

# Shape and Size of Arrays
import numpy as np

arr = np.array([[12, 22, 34], [46, 65, 86]])
print(arr.shape)  # Output: (2, 3) (2 rows and 3 columns)
print(arr.size)   # Output: 6 (2 * 3)

Array Indexing and Slicing

NumPy arrays allow indexing and slicing to access elements or subarrays. Indexing and slicing in NumPy arrays works similarly as it worked with List and Tuple in Python.

# Array Indexing and Slicing
import numpy as np

arr = np.array([14, 22, 35, 45])
print(arr[2])  # Access the third element (indexing starts at 0) => 35
print(arr[1:3])  # Slice elements from index 1 to 2 (not inclusive of 3) => [22 35]

Element-wise Operations

NumPy supports element-wise operations on arrays, which means that operations are applied to each element of the array individually without explicit loops.

# Element-wise Operations
import numpy as np

arr_1 = np.array([12, 22, 34])
arr_2 = np.array([46, 65, 86])
result = arr_1 + arr_2  # Element-wise addition
print(result)  # Output: [ 58  87 120]

Broadcasting

Broadcasting allows NumPy to perform arithmetic operations on arrays of different shapes in a way that is consistent with the smaller array’s shape being “broadcast” over the larger array. This eliminates the need for explicit loops and improves performance.

# Broadcasting
import numpy as np

arr_1 = np.array([12, 22, 34]) # 1-D array with shape (3,) (3 elements)
arr_2 = np.array([[46], [65], [86]])  # 2-D array with shape (3, 1) (3 rows, 1 column)
result = arr_1 + arr_2  # Broadcasting: arr_1 is added to each column of arr_2
print(result)
# Output:
# [[ 58  68  80]
#  [ 77  87  99]
#  [ 98 108 120]]

Reshaping Arrays

You can reshape arrays into different shapes, as long as the total number of elements remains the same. This allows you to change the layout of the array to suit different use cases.

# Reshaping Arrays
import numpy as np

arr = np.array([12, 22, 34, 46, 65, 86]) # 1-D array with shape (6,) (6 elements)
reshaped_arr = arr.reshape((2, 3))  # Reshape into 2 rows and 3 columns
print(reshaped_arr)
# Output:
# [[12 22 34]
#  [46 65 86]]

Universal Functions (ufuncs)

NumPy provides a large set of “universal functions” (ufuncs) that operate element-wise on arrays. These functions perform mathematical operations like square root, sine, logarithm, etc.

# Universal Functions (ufuncs)
import numpy as np

arr = np.array([41, 44, 29, 16])
sqrt_arr = np.sqrt(arr)  # Apply square root to each element
print(sqrt_arr)  # Output: [6.40312424 6.63324958 5.38516481 4.]

Array Concatenation and Splitting

You can concatenate or split NumPy arrays along any axis.

Concatenation: Joining two or more arrays along an existing axis.

# Concatenation
import numpy as np

arr_1 = np.array([15, 52])
arr_2 = np.array([23, 44])
result = np.concatenate((arr_1, arr_2))  # Concatenate along the 0-axis (1-D arrays)
print(result)  # Output: [15 52 23 44]

Splitting: Dividing an array into multiple sub-arrays.

# Splitting
import numpy as np

arr = np.array([41, 44, 29, 16, 15, 52])
result = np.split(arr, 3)  # Split into 3 sub-arrays
print(result)  # Output: [array([41, 44]), array([29, 16]), array([15, 52])]

Memory Efficiency

NumPy arrays are more memory efficient than Python lists. The elements in a NumPy array are stored in contiguous blocks of memory, which allows for faster operations and more efficient memory usage.

# Memory Efficiency
import numpy as np

arr = np.array([41, 44, 29, 16, 15, 52], dtype=np.int32)
print(arr.nbytes)  # Output: 24 (6 integers * 4 bytes per int32)

Vectorization

NumPy supports vectorized operations, which means you can perform operations on entire arrays at once without writing explicit loops. This results in cleaner code and often better performance compared to iterating over arrays element-by-element.

# Vectorization
import numpy as np

arr = np.array([41, 44, 29, 16, 15, 52])
result = arr * 2  # Multiply each element by 2
print(result)  # Output: [82  88  58  32  30 104]

Random Number Generation

NumPy provides a module (numpy.random) for generating random numbers and random sampling from various distributions (e.g., uniform, normal).

# Random Number Generation
import numpy as np

random_arr = np.random.rand(2, 3)  # Generate a 2x3 array of random floats in [0, 1)
print(random_arr)
# Output => Can be different on next run
# [[0.328437   0.31566905 0.36429999]
#  [0.03383582 0.72987953 0.33411073]]

Linear Algebra Functions

NumPy includes a comprehensive set of linear algebra operations such as matrix multiplication, dot product, matrix inverse, eigenvalues, etc.

# Linear Algebra Functions
import numpy as np

A = np.array([[6, 2], [3, 4]])
B = np.array([[5, 4], [7, 8]])
result = np.dot(A, B)  # Matrix multiplication
print(result)
# Output:
# [[44 40]
#  [43 44]]

Universal Mathematical Functions

NumPy includes a wide variety of mathematical functions for performing operations on arrays such as sin(), cos(), log(), exp(), etc.

# Universal Mathematical Functions
import numpy as np

arr = np.array([12, 22, 32])
log_arr = np.log(arr)  # Natural logarithm of each element
print(log_arr)  # Output: [2.48490665 3.09104245 3.4657359 ]

Conclusion

NumPy arrays are one of the most versatile and fundamental building blocks for numerical computing in Python. By supporting multi-dimensional data, using memory efficiently, allowing advanced operations such as broadcasting and vectorization, and providing a wide range of mathematical functions, NumPy arrays are the basis for more complex operations in data science, machine learning, and scientific computing. Mastering these core features will allow you to efficiently work with large datasets and perform high-performance numerical calculations.

Code snippets and programs related to NumPy Array in Python, can be accessed from GitHub Repository. This GitHub repository all contains programs related to other topics in NumPy tutorial.