Table of Contents
- Introduction to NumPy
- Installation
- NumPy Arrays
- Array Creation
- Array Attributes
- Array Indexing and Slicing
- Array Operations
- Mathematical Functions
- Array Reshaping
- Broadcasting
- Linear Algebra
- Statistical Functions
- Random Number Generation
- File I/O
- Advanced Topics
- Best Practices
Introduction to NumPy
NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides:
- A powerful N-dimensional array object
- Sophisticated broadcasting functions
- Tools for integrating C/C++ and Fortran code
- Useful linear algebra, Fourier transform, and random number capabilities
Why Use NumPy?
- Performance: NumPy arrays are stored at one continuous place in memory (unlike lists), making them faster to access and manipulate
- Convenience: Built-in functions for mathematical operations
- Less Code: Operations that would require loops in Python can be done in one line with NumPy
- Foundation: Many scientific libraries (pandas, scikit-learn, TensorFlow) are built on top of NumPy
Installation
# Using pip
pip install numpy
# Using conda
conda install numpy
# Verify installation
python -c "import numpy; print(numpy.__version__)"BashImport NumPy in your Python script:
import numpy as npPythonNumPy Arrays
The core of NumPy is the ndarray (n-dimensional array) object. Unlike Python lists, NumPy arrays:
- Are homogeneous (all elements must be of the same type)
- Have a fixed size at creation
- Support vectorized operations
- Are more memory-efficient
Array vs List Comparison
import numpy as np
# Python list
python_list = [1, 2, 3, 4, 5]
# NumPy array
numpy_array = np.array([1, 2, 3, 4, 5])
# Multiplying by 2
# List: requires loop or list comprehension
result_list = [x * 2 for x in python_list]
# NumPy: vectorized operation
result_numpy = numpy_array * 2
print(result_numpy) # [2 4 6 8 10]PythonArray Creation
From Python Lists
import numpy as np
# 1D array
arr1d = np.array([1, 2, 3, 4, 5])
print(arr1d) # [1 2 3 4 5]
# 2D array (matrix)
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr2d)
# [[1 2 3]
# [4 5 6]]
# 3D array
arr3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(arr3d.shape) # (2, 2, 2)PythonUsing Built-in Functions
# Array of zeros
zeros = np.zeros((3, 4)) # 3x4 array of zeros
print(zeros)
# Array of ones
ones = np.ones((2, 3, 4)) # 2x3x4 array of ones
# Empty array (uninitialized values)
empty = np.empty((2, 2))
# Array with a range of values
arange = np.arange(0, 10, 2) # [0 2 4 6 8]
# Array with evenly spaced values
linspace = np.linspace(0, 1, 5) # [0. 0.25 0.5 0.75 1. ]
# Identity matrix
identity = np.eye(3) # 3x3 identity matrix
# Array filled with a constant value
full = np.full((2, 3), 7) # 2x3 array filled with 7
# Array like another array
arr = np.array([1, 2, 3])
zeros_like = np.zeros_like(arr)
ones_like = np.ones_like(arr)PythonSpecifying Data Types
# Integer array
int_arr = np.array([1, 2, 3], dtype=np.int32)
# Float array
float_arr = np.array([1, 2, 3], dtype=np.float64)
# Complex array
complex_arr = np.array([1+2j, 3+4j], dtype=np.complex128)
# Boolean array
bool_arr = np.array([True, False, True], dtype=np.bool_)
# String array
str_arr = np.array(['a', 'b', 'c'], dtype='U1')
# Check data type
print(int_arr.dtype) # int32PythonArray Attributes
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
# Shape: dimensions of the array
print(arr.shape) # (3, 4)
# ndim: number of dimensions
print(arr.ndim) # 2
# size: total number of elements
print(arr.size) # 12
# dtype: data type of elements
print(arr.dtype) # int64 (or int32 depending on system)
# itemsize: size in bytes of each element
print(arr.itemsize) # 8 (for int64)
# nbytes: total bytes consumed by the array
print(arr.nbytes) # 96 (12 elements * 8 bytes)PythonArray Indexing and Slicing
Basic Indexing
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
# Access single element
print(arr[0]) # 10
print(arr[-1]) # 50
# Modify element
arr[2] = 99
print(arr) # [10 20 99 40 50]PythonSlicing 1D Arrays
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Basic slicing: arr[start:stop:step]
print(arr[2:7]) # [2 3 4 5 6]
print(arr[:5]) # [0 1 2 3 4]
print(arr[5:]) # [5 6 7 8 9]
print(arr[::2]) # [0 2 4 6 8]
print(arr[::-1]) # [9 8 7 6 5 4 3 2 1 0] (reverse)PythonIndexing 2D Arrays
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Access single element
print(arr2d[0, 0]) # 1
print(arr2d[1, 2]) # 6
print(arr2d[-1, -1]) # 9
# Access row
print(arr2d[1]) # [4 5 6]
# Access column
print(arr2d[:, 1]) # [2 5 8]
# Slicing
print(arr2d[0:2, 1:3])
# [[2 3]
# [5 6]]PythonBoolean Indexing
arr = np.array([1, 2, 3, 4, 5, 6])
# Create boolean mask
mask = arr > 3
print(mask) # [False False False True True True]
# Use mask to filter
print(arr[mask]) # [4 5 6]
# Direct boolean indexing
print(arr[arr > 3]) # [4 5 6]
# Multiple conditions
print(arr[(arr > 2) & (arr < 5)]) # [3 4]PythonFancy Indexing
arr = np.array([10, 20, 30, 40, 50, 60])
# Index with array of integers
indices = np.array([0, 2, 4])
print(arr[indices]) # [10 30 50]
# 2D fancy indexing
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
rows = np.array([0, 2])
cols = np.array([1, 2])
print(arr2d[rows, cols]) # [2 9]PythonArray Operations
Arithmetic Operations
import numpy as np
a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])
# Element-wise operations
print(a + b) # [11 22 33 44]
print(a - b) # [-9 -18 -27 -36]
print(a * b) # [10 40 90 160]
print(a / b) # [0.1 0.1 0.1 0.1]
print(a ** 2) # [1 4 9 16]
print(a % 2) # [1 0 1 0]
# Operations with scalars
print(a + 5) # [6 7 8 9]
print(a * 2) # [2 4 6 8]PythonUniversal Functions (ufuncs)
arr = np.array([1, 4, 9, 16, 25])
# Mathematical functions
print(np.sqrt(arr)) # [1. 2. 3. 4. 5.]
print(np.exp(arr)) # [2.72e+00 5.46e+01 8.10e+03 8.89e+06 7.20e+10]
print(np.log(arr)) # [0. 1.39 2.20 2.77 3.22]
print(np.sin(arr)) # [0.84 -0.76 0.41 -0.29 -0.13]
# Absolute value
arr_neg = np.array([-1, -2, 3, -4])
print(np.abs(arr_neg)) # [1 2 3 4]
# Sign function
print(np.sign(arr_neg)) # [-1 -1 1 -1]PythonComparison Operations
a = np.array([1, 2, 3, 4, 5])
b = np.array([5, 4, 3, 2, 1])
print(a == b) # [False False True False False]
print(a < b) # [ True True False False False]
print(a >= 3) # [False False True True True]
# Check if any or all elements satisfy condition
print(np.any(a > 3)) # True
print(np.all(a > 0)) # TruePythonAggregate Functions
arr = np.array([1, 2, 3, 4, 5])
print(np.sum(arr)) # 15
print(np.min(arr)) # 1
print(np.max(arr)) # 5
print(np.mean(arr)) # 3.0
print(np.median(arr)) # 3.0
print(np.std(arr)) # 1.414... (standard deviation)
print(np.var(arr)) # 2.0 (variance)
# 2D array operations
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print(np.sum(arr2d)) # 21 (all elements)
print(np.sum(arr2d, axis=0)) # [5 7 9] (sum along columns)
print(np.sum(arr2d, axis=1)) # [6 15] (sum along rows)PythonMathematical Functions
Trigonometric Functions
import numpy as np
angles = np.array([0, np.pi/6, np.pi/4, np.pi/3, np.pi/2])
print(np.sin(angles))
print(np.cos(angles))
print(np.tan(angles))
# Inverse functions
print(np.arcsin(np.sin(angles)))
print(np.arccos(np.cos(angles)))
print(np.arctan(np.tan(angles[:-1]))) # Exclude π/2 to avoid infinity
# Hyperbolic functions
print(np.sinh(angles))
print(np.cosh(angles))
print(np.tanh(angles))PythonRounding Functions
arr = np.array([1.23, 2.67, 3.45, 4.89])
print(np.round(arr)) # [1. 3. 3. 5.]
print(np.floor(arr)) # [1. 2. 3. 4.]
print(np.ceil(arr)) # [2. 3. 4. 5.]
print(np.trunc(arr)) # [1. 2. 3. 4.]
# Round to specific decimals
print(np.round(arr, 1)) # [1.2 2.7 3.4 4.9]PythonExponential and Logarithmic
arr = np.array([1, 2, 3, 4, 5])
# Exponential
print(np.exp(arr)) # e^x
print(np.exp2(arr)) # 2^x
print(np.power(3, arr)) # 3^x
# Logarithm
print(np.log(arr)) # Natural log (ln)
print(np.log10(arr)) # Base 10
print(np.log2(arr)) # Base 2PythonArray Reshaping
Reshape
import numpy as np
arr = np.arange(12) # [0 1 2 3 4 5 6 7 8 9 10 11]
# Reshape to 2D
arr2d = arr.reshape(3, 4)
print(arr2d)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
# Reshape to 3D
arr3d = arr.reshape(2, 3, 2)
print(arr3d.shape) # (2, 3, 2)
# Use -1 to infer dimension
arr_auto = arr.reshape(2, -1) # NumPy calculates: 12/2 = 6
print(arr_auto.shape) # (2, 6)PythonFlatten and Ravel
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
# Flatten: returns a copy
flat = arr2d.flatten()
print(flat) # [1 2 3 4 5 6]
# Ravel: returns a view if possible (more efficient)
ravel = arr2d.ravel()
print(ravel) # [1 2 3 4 5 6]PythonTranspose
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape) # (2, 3)
# Transpose
arr_t = arr.T
print(arr_t.shape) # (3, 2)
print(arr_t)
# [[1 4]
# [2 5]
# [3 6]]
# For multi-dimensional arrays
arr3d = np.arange(24).reshape(2, 3, 4)
arr3d_t = np.transpose(arr3d, (2, 0, 1)) # Swap axes
print(arr3d_t.shape) # (4, 2, 3)PythonStack and Split
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Vertical stack (row-wise)
v_stack = np.vstack([a, b])
print(v_stack)
# [[1 2 3]
# [4 5 6]]
# Horizontal stack (column-wise)
h_stack = np.hstack([a, b])
print(h_stack) # [1 2 3 4 5 6]
# Concatenate along axis
concat = np.concatenate([a, b])
print(concat) # [1 2 3 4 5 6]
# Split
arr = np.arange(9)
split = np.split(arr, 3) # Split into 3 equal parts
print(split) # [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]PythonBroadcasting
Broadcasting allows NumPy to perform operations on arrays of different shapes.
Broadcasting Rules
- If arrays have different numbers of dimensions, pad the smaller shape with ones on the left
- Arrays are compatible if their dimensions are equal or one of them is 1
- After broadcasting, each array behaves as if it had the larger shape
Examples
import numpy as np
# Scalar with array
arr = np.array([1, 2, 3])
result = arr + 10 # 10 is broadcast to [10, 10, 10]
print(result) # [11 12 13]
# 1D with 2D
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
arr1d = np.array([10, 20, 30])
result = arr2d + arr1d # arr1d is broadcast to each row
print(result)
# [[11 22 33]
# [14 25 36]]
# Column vector with row vector
col = np.array([[1], [2], [3]]) # Shape: (3, 1)
row = np.array([10, 20, 30]) # Shape: (3,)
result = col + row # Broadcasting creates (3, 3) result
print(result)
# [[11 21 31]
# [12 22 32]
# [13 23 33]]
# Visual representation
# col: row: result:
# [[1] + [10 20 30] = [[11 21 31]
# [2] [12 22 32]
# [3]] [13 23 33]]PythonCommon Broadcasting Patterns
# Normalize each row
arr = np.array([[1, 2, 3], [4, 5, 6]])
row_means = arr.mean(axis=1, keepdims=True) # Shape: (2, 1)
normalized = arr - row_means
print(normalized)
# Outer product
a = np.array([1, 2, 3])
b = np.array([10, 20])
outer = a.reshape(-1, 1) * b.reshape(1, -1)
print(outer)
# [[10 20]
# [20 40]
# [30 60]]PythonLinear Algebra
Matrix Operations
import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Matrix multiplication
C = np.dot(A, B) # or A @ B
print(C)
# [[19 22]
# [43 50]]
# Element-wise multiplication
element_wise = A * B
print(element_wise)
# [[ 5 12]
# [21 32]]
# Matrix power
print(np.linalg.matrix_power(A, 2)) # A^2
# Inner product (for 1D arrays)
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(np.inner(a, b)) # 32 (1*4 + 2*5 + 3*6)
# Outer product
print(np.outer(a, b))
# [[ 4 5 6]
# [ 8 10 12]
# [12 15 18]]PythonMatrix Decomposition
# Determinant
A = np.array([[1, 2], [3, 4]])
det = np.linalg.det(A)
print(det) # -2.0
# Inverse
A_inv = np.linalg.inv(A)
print(A_inv)
# [[-2. 1. ]
# [ 1.5 -0.5]]
# Verify: A @ A_inv should be identity
print(np.round(A @ A_inv))
# [[1. 0.]
# [0. 1.]]
# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:\n", eigenvectors)
# Singular Value Decomposition (SVD)
U, s, Vt = np.linalg.svd(A)
print("U:\n", U)
print("Singular values:", s)
print("Vt:\n", Vt)
# QR decomposition
Q, R = np.linalg.qr(A)
print("Q:\n", Q)
print("R:\n", R)PythonSolving Linear Systems
# Solve Ax = b
A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])
x = np.linalg.solve(A, b)
print(x) # [2. 3.]
# Verify solution
print(np.allclose(A @ x, b)) # True
# Least squares solution (overdetermined system)
A = np.array([[1, 1], [1, 2], [1, 3]])
b = np.array([2, 3, 4])
x, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)
print(x)PythonMatrix Properties
A = np.array([[1, 2, 3], [4, 5, 6]])
# Trace (sum of diagonal elements)
B = np.array([[1, 2], [3, 4]])
print(np.trace(B)) # 5
# Rank
print(np.linalg.matrix_rank(A)) # 2
# Norm
print(np.linalg.norm(A)) # Frobenius norm
print(np.linalg.norm(A, ord=2)) # 2-norm (spectral norm)PythonStatistical Functions
Basic Statistics
import numpy as np
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Central tendency
print(np.mean(data)) # 5.5
print(np.median(data)) # 5.5
print(np.average(data)) # 5.5
# Weighted average
weights = np.array([1, 1, 1, 1, 1, 2, 2, 2, 2, 2])
print(np.average(data, weights=weights)) # 6.333...
# Spread
print(np.std(data)) # 2.872... (standard deviation)
print(np.var(data)) # 8.25 (variance)
print(np.ptp(data)) # 9 (peak to peak, max - min)
# Percentiles and quantiles
print(np.percentile(data, 25)) # 3.25
print(np.percentile(data, 50)) # 5.5 (median)
print(np.percentile(data, 75)) # 7.75
print(np.quantile(data, [0.25, 0.5, 0.75]))PythonCorrelation and Covariance
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])
# Correlation coefficient
correlation_matrix = np.corrcoef(x, y)
print(correlation_matrix)
# [[1. 0.775]
# [0.775 1. ]]
# Covariance
covariance_matrix = np.cov(x, y)
print(covariance_matrix)
# For multiple variables
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
cov_matrix = np.cov(data)
print(cov_matrix)PythonBinning and Histograms
data = np.random.randn(1000) # Random normal distribution
# Histogram
hist, bin_edges = np.histogram(data, bins=10)
print("Histogram counts:", hist)
print("Bin edges:", bin_edges)
# Digitize (assign to bins)
bins = np.array([-2, -1, 0, 1, 2])
indices = np.digitize(data, bins)
print(indices[:10]) # First 10 bin assignmentsPythonRandom Number Generation
Random Module
import numpy as np
# Set seed for reproducibility
np.random.seed(42)
# Random floats between 0 and 1
print(np.random.rand(3, 2)) # 3x2 array
# Random floats from uniform distribution [low, high)
print(np.random.uniform(0, 10, size=5))
# Random integers
print(np.random.randint(0, 10, size=5)) # [low, high)
# Random integers from range
print(np.random.randint(low=1, high=7, size=(2, 3))) # Like dice rolls
# Random choice from array
arr = np.array([10, 20, 30, 40, 50])
print(np.random.choice(arr, size=3))
# Random choice with replacement=False (unique values)
print(np.random.choice(arr, size=3, replace=False))
# Random choice with probabilities
print(np.random.choice(arr, size=5, p=[0.1, 0.1, 0.2, 0.3, 0.3]))PythonStatistical Distributions
# Normal (Gaussian) distribution
normal = np.random.normal(loc=0, scale=1, size=1000) # mean=0, std=1
print(normal.mean(), normal.std())
# Standard normal
standard_normal = np.random.randn(1000)
# Binomial distribution
binomial = np.random.binomial(n=10, p=0.5, size=1000) # 10 trials, p=0.5
# Poisson distribution
poisson = np.random.poisson(lam=5, size=1000) # lambda=5
# Exponential distribution
exponential = np.random.exponential(scale=2, size=1000)
# Beta distribution
beta = np.random.beta(a=2, b=5, size=1000)
# Gamma distribution
gamma = np.random.gamma(shape=2, scale=2, size=1000)PythonArray Manipulation with Random
arr = np.arange(10)
# Shuffle in place
np.random.shuffle(arr)
print(arr)
# Permutation (returns shuffled copy)
original = np.arange(10)
shuffled = np.random.permutation(original)
print(original) # Unchanged
print(shuffled) # ShuffledPythonNew Random Generator (Recommended)
# Modern approach using Generator
from numpy.random import default_rng
rng = default_rng(42) # Seed
# Generate random numbers
print(rng.random(5))
print(rng.integers(0, 10, size=5))
print(rng.normal(0, 1, size=5))
print(rng.choice([1, 2, 3, 4, 5], size=3))PythonFile I/O
Text Files
import numpy as np
# Save array to text file
arr = np.array([[1, 2, 3], [4, 5, 6]])
np.savetxt('data.txt', arr)
# Load from text file
loaded = np.loadtxt('data.txt')
print(loaded)
# Save with formatting
np.savetxt('data.csv', arr, delimiter=',', fmt='%d')
# Load CSV
loaded_csv = np.loadtxt('data.csv', delimiter=',')PythonBinary Files (.npy)
# Save single array
arr = np.array([1, 2, 3, 4, 5])
np.save('array.npy', arr)
# Load
loaded = np.load('array.npy')
print(loaded)
# Save multiple arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
np.savez('multiple.npz', a=arr1, b=arr2)
# Load multiple
data = np.load('multiple.npz')
print(data['a'])
print(data['b'])
# Save compressed
np.savez_compressed('compressed.npz', a=arr1, b=arr2)PythonMemory-Mapped Files
# For very large files that don't fit in memory
# Create memory-mapped array
mm_arr = np.memmap('memmap.dat', dtype='float32', mode='w+', shape=(1000, 1000))
# Write data
mm_arr[:] = np.random.rand(1000, 1000)
mm_arr.flush()
# Read memory-mapped array
mm_loaded = np.memmap('memmap.dat', dtype='float32', mode='r', shape=(1000, 1000))
print(mm_loaded[0, :10]) # Access without loading entire filePythonAdvanced Topics
Structured Arrays
import numpy as np
# Define structured data type
dt = np.dtype([('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
# Create structured array
data = np.array([('Alice', 25, 55.5),
('Bob', 30, 75.0),
('Charlie', 35, 80.2)], dtype=dt)
print(data)
print(data['name']) # ['Alice' 'Bob' 'Charlie']
print(data['age']) # [25 30 35]
print(data[0]) # ('Alice', 25, 55.5)
# Sorting by field
sorted_data = np.sort(data, order='age')
print(sorted_data)PythonMasked Arrays
# Handle missing or invalid data
data = np.array([1, 2, -999, 4, -999, 6])
# Create masked array
masked = np.ma.masked_equal(data, -999)
print(masked) # [1 2 -- 4 -- 6]
# Operations ignore masked values
print(masked.mean()) # 3.25 (ignores -999)
print(masked.sum()) # 13
# Manual mask
mask = np.array([False, False, True, False, True, False])
masked2 = np.ma.array(data, mask=mask)
print(masked2)PythonVectorization
# Vectorize a Python function to work on arrays
def my_function(x, y):
if x > y:
return x - y
else:
return x + y
# Vectorize
vec_function = np.vectorize(my_function)
a = np.array([1, 2, 3, 4])
b = np.array([4, 3, 2, 1])
result = vec_function(a, b)
print(result) # [5 5 5 5]
# Note: For better performance, prefer built-in NumPy functions
result_fast = np.where(a > b, a - b, a + b)
print(result_fast) # [5 5 5 5]PythonAdvanced Indexing
# Integer array indexing
arr = np.arange(12).reshape(3, 4)
rows = np.array([0, 0, 2, 2])
cols = np.array([0, 2, 0, 2])
print(arr[rows, cols]) # [0 2 8 10]
# Boolean mask with multiple conditions
arr = np.arange(20)
mask = (arr % 2 == 0) & (arr > 10)
print(arr[mask]) # [12 14 16 18]
# np.where for conditional replacement
arr = np.array([1, 2, 3, 4, 5])
result = np.where(arr > 3, 100, arr)
print(result) # [1 2 3 100 100]
# np.select for multiple conditions
conditions = [arr < 2, arr < 4, arr >= 4]
choices = ['small', 'medium', 'large']
result = np.select(conditions, choices)
print(result) # ['small' 'medium' 'medium' 'large' 'large']PythonMemory Views and Copies
arr = np.array([1, 2, 3, 4, 5])
# View (shares memory)
view = arr[1:4]
view[0] = 999
print(arr) # [1 999 3 4 5] - original is modified!
# Copy (independent)
copy = arr[1:4].copy()
copy[0] = 111
print(arr) # [1 999 3 4 5] - original unchanged
# Check if array owns its data
print(arr.flags['OWNDATA']) # True
print(view.flags['OWNDATA']) # False
print(copy.flags['OWNDATA']) # True
# Base attribute
print(view.base is arr) # True (view references arr)
print(copy.base is None) # True (copy is independent)PythonEinstein Summation (einsum)
# Powerful tool for multi-dimensional operations
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Matrix multiplication
C = np.einsum('ij,jk->ik', A, B)
print(C) # Same as np.dot(A, B)
# Trace
trace = np.einsum('ii->', A)
print(trace) # 5 (1 + 4)
# Transpose
At = np.einsum('ij->ji', A)
print(At)
# Element-wise multiplication and sum
result = np.einsum('ij,ij->', A, B)
print(result) # Sum of A * B element-wisePythonBest Practices
1. Use Vectorization Instead of Loops
# Bad: Using loops
arr = np.arange(1000)
result = np.zeros(1000)
for i in range(len(arr)):
result[i] = arr[i] ** 2
# Good: Vectorized operation
result = arr ** 2Python2. Specify Data Types
# Bad: Default data type
arr = np.zeros(1000000) # float64, uses more memory
# Good: Specify appropriate type
arr = np.zeros(1000000, dtype=np.float32) # Uses half the memoryPython3. Use In-Place Operations
arr = np.arange(1000)
# Creates new array
arr = arr + 1
# In-place operation (more memory efficient)
arr += 1Python4. Avoid Copying When Possible
# Use views when you don't need independence
large_arr = np.arange(1000000)
subset = large_arr[::2] # View, not copy
# Only copy when necessary
subset_copy = large_arr[::2].copy()Python5. Use Appropriate Functions
# Bad: Manual implementation
arr = np.array([1, 2, 3, 4, 5])
mean = np.sum(arr) / len(arr)
# Good: Built-in function
mean = np.mean(arr)Python6. Pre-allocate Arrays
# Bad: Growing array
result = np.array([])
for i in range(1000):
result = np.append(result, i)
# Good: Pre-allocate
result = np.zeros(1000)
for i in range(1000):
result[i] = i
# Even better: Use arange
result = np.arange(1000)Python7. Use Broadcasting
# Bad: Manual broadcasting
arr = np.array([[1, 2, 3], [4, 5, 6]])
result = np.zeros_like(arr)
for i in range(arr.shape[0]):
result[i] = arr[i] + np.array([10, 20, 30])
# Good: Automatic broadcasting
result = arr + np.array([10, 20, 30])Python8. Profile Your Code
import numpy as np
import time
# Method 1
start = time.time()
arr = np.arange(1000000)
result = arr ** 2
end = time.time()
print(f"Method 1: {end - start:.4f} seconds")
# Method 2
start = time.time()
arr = np.arange(1000000)
result = np.power(arr, 2)
end = time.time()
print(f"Method 2: {end - start:.4f} seconds")Python9. Handle Memory Efficiently
# For large datasets, use memory-mapped files
# For operations on portions of data, use slicing/views
# Delete intermediate arrays when done
del intermediate_result
# Use generators for large data processing
def data_generator(n):
for i in range(n):
yield np.random.rand(1000)Python10. Document Array Shapes
def process_data(X):
"""
Process input data.
Parameters
----------
X : ndarray, shape (n_samples, n_features)
Input data matrix
Returns
-------
result : ndarray, shape (n_samples,)
Processed output
"""
# Shape: (n_samples, n_features) -> (n_samples,)
return np.mean(X, axis=1)PythonCommon Pitfalls and Solutions
Pitfall 1: Modifying Array Through View
# Problem
arr = np.arange(10)
subset = arr[5:] # View
subset[:] = 0 # Modifies original!
print(arr) # [0 1 2 3 4 0 0 0 0 0]
# Solution: Use copy
arr = np.arange(10)
subset = arr[5:].copy()
subset[:] = 0
print(arr) # [0 1 2 3 4 5 6 7 8 9] - unchangedPythonPitfall 2: Integer Division
# Problem (Python 2 style)
arr = np.array([1, 2, 3, 4, 5])
result = arr / 2 # In older NumPy versions, this was integer division
# Solution: Ensure float division
result = arr / 2.0 # or arr / float(2)
# In Python 3 and modern NumPy, / always does float divisionPythonPitfall 3: Dimension Confusion
# Problem
arr = np.array([1, 2, 3])
print(arr.shape) # (3,) - 1D array
arr2d = np.array([[1, 2, 3]])
print(arr2d.shape) # (1, 3) - 2D array with 1 row
# They behave differently in some operations!
# Solution: Be explicit about dimensions
arr_column = arr.reshape(-1, 1) # (3, 1)
arr_row = arr.reshape(1, -1) # (1, 3)PythonUseful Resources
- Official Documentation: https://numpy.org/doc/
- NumPy User Guide: https://numpy.org/doc/stable/user/index.html
- NumPy Reference: https://numpy.org/doc/stable/reference/
- NumPy Tutorial: https://numpy.org/numpy-tutorials/
- Performance Tips: https://numpy.org/doc/stable/user/performance.html
Summary
NumPy is the foundation of scientific computing in Python, providing:
- Efficient multi-dimensional arrays
- Broadcasting for implicit operations on arrays of different shapes
- Comprehensive mathematical functions
- Linear algebra operations
- Random number generation
- File I/O capabilities
- Integration with other scientific libraries
Master NumPy to unlock the full potential of Python for data science, machine learning, and scientific computing!
Quick Reference Card
# Array Creation
np.array([1,2,3]) # From list
np.zeros((3,4)) # Array of zeros
np.ones((2,3)) # Array of ones
np.arange(10) # Range of values
np.linspace(0,1,5) # Evenly spaced values
# Array Info
arr.shape # Dimensions
arr.dtype # Data type
arr.size # Number of elements
arr.ndim # Number of dimensions
# Indexing
arr[i] # 1D indexing
arr[i,j] # 2D indexing
arr[i:j] # Slicing
arr[arr > 5] # Boolean indexing
# Operations
arr + 5 # Element-wise addition
arr * arr2 # Element-wise multiplication
arr @ arr2 # Matrix multiplication
np.dot(arr, arr2) # Dot product
# Aggregations
np.sum(arr) # Sum
np.mean(arr) # Mean
np.std(arr) # Standard deviation
np.max(arr) # Maximum
np.argmax(arr) # Index of maximum
# Reshaping
arr.reshape(3,4) # Reshape
arr.flatten() # Flatten to 1D
arr.T # Transpose
# Random
np.random.rand(3,4) # Random values [0,1)
np.random.randn(100) # Standard normal
np.random.randint(0,10,5) # Random integers
# Linear Algebra
np.linalg.inv(A) # Matrix inverse
np.linalg.det(A) # Determinant
np.linalg.eig(A) # Eigenvalues/vectors
np.linalg.solve(A,b) # Solve Ax=bPythonComprehensive NumPy Cheatsheet
📦 Import Convention
import numpy as npPython🎯 Array Creation
From Existing Data
np.array([1, 2, 3]) # 1D array from list
np.array([[1,2], [3,4]]) # 2D array from nested lists
np.asarray([1, 2, 3]) # Convert to array (no copy if already array)
np.copy(arr) # Create a copy
np.frombuffer(b'\x01\x02', dtype=int) # From buffer
np.fromiter(range(5), dtype=int) # From iterablePythonZeros, Ones, and Empty
np.zeros(5) # [0. 0. 0. 0. 0.]
np.zeros((3, 4)) # 3x4 array of zeros
np.ones((2, 3, 4)) # 2x3x4 array of ones
np.empty((2, 2)) # Uninitialized 2x2 array
np.zeros_like(arr) # Zeros with same shape as arr
np.ones_like(arr) # Ones with same shape as arr
np.empty_like(arr) # Empty with same shape as arr
np.full((3, 3), 7) # 3x3 array filled with 7
np.full_like(arr, 5) # Like arr, filled with 5PythonRanges and Sequences
np.arange(10) # [0 1 2 ... 9]
np.arange(2, 10) # [2 3 4 ... 9]
np.arange(0, 1, 0.1) # [0. 0.1 0.2 ... 0.9]
np.linspace(0, 10, 5) # 5 evenly spaced values
np.logspace(0, 2, 5) # [1. 3.16... 10. 31.6... 100.]
np.geomspace(1, 1000, 4) # Geometric sequencePythonIdentity and Diagonal
np.eye(3) # 3x3 identity matrix
np.eye(3, 4) # 3x4 identity matrix
np.identity(3) # 3x3 identity matrix
np.diag([1, 2, 3]) # Diagonal matrix
np.diag(arr) # Extract diagonal
np.diagflat([1, 2]) # Create diagonal arrayPythonRandom Arrays
np.random.rand(3, 4) # Uniform [0, 1), shape (3,4)
np.random.randn(3, 4) # Standard normal, shape (3,4)
np.random.randint(0, 10, (3, 4)) # Random ints [0, 10)
np.random.random((3, 4)) # Random floats [0, 1)
np.random.uniform(0, 10, (3, 4)) # Uniform [0, 10)
np.random.normal(0, 1, (3, 4)) # Normal (μ=0, σ=1)
np.random.choice([1,2,3,4], 10) # Random choices
np.random.permutation(10) # Random permutation
np.random.shuffle(arr) # Shuffle in placePython📊 Array Attributes
arr.shape # Dimensions (rows, cols, ...)
arr.ndim # Number of dimensions
arr.size # Total number of elements
arr.dtype # Data type
arr.itemsize # Size of each element (bytes)
arr.nbytes # Total bytes (size * itemsize)
arr.T # Transpose
arr.real # Real part (complex arrays)
arr.imag # Imaginary part
arr.flat # Flat iterator
arr.flags # Memory layout infoPython🎯 Data Types
np.int8, np.int16, np.int32, np.int64 # Signed integers
np.uint8, np.uint16, np.uint32, np.uint64 # Unsigned integers
np.float16, np.float32, np.float64 # Floating point
np.complex64, np.complex128 # Complex numbers
np.bool_ # Boolean
np.object_ # Python objects
np.string_, np.unicode_ # Strings
# Convert types
arr.astype(np.float32) # Convert to float32
arr.astype('int') # Convert to intPython🔍 Indexing & Slicing
Basic Indexing
arr[0] # First element
arr[-1] # Last element
arr[2:5] # Elements 2, 3, 4
arr[::2] # Every other element
arr[::-1] # Reverse
arr[1:8:2] # Start:stop:stepPythonMulti-dimensional Indexing
arr[i, j] # Element at row i, col j
arr[i] # Row i
arr[:, j] # Column j
arr[0:2, 1:3] # Subarray
arr[..., 0] # Last dimension, first element
arr[:, :, 0] # Same as above for 3DPythonBoolean Indexing
arr[arr > 5] # Elements > 5
arr[(arr > 5) & (arr < 10)] # Elements 5 < x < 10
arr[(arr < 5) | (arr > 10)] # Elements x < 5 or x > 10
arr[~(arr > 5)] # Elements <= 5 (NOT operator)PythonFancy Indexing
arr[[0, 2, 4]] # Elements at indices 0, 2, 4
arr[[0, 1], [2, 3]] # Elements (0,2) and (1,3)
arr[np.ix_([0,2], [1,3])] # Outer indexingPython➕ Mathematical Operations
Arithmetic
arr + 5 # Add scalar
arr - 5 # Subtract scalar
arr * 5 # Multiply by scalar
arr / 5 # Divide by scalar
arr // 5 # Floor division
arr % 5 # Modulo
arr ** 2 # Power
np.add(arr1, arr2) # Element-wise addition
np.subtract(arr1, arr2) # Element-wise subtraction
np.multiply(arr1, arr2) # Element-wise multiplication
np.divide(arr1, arr2) # Element-wise division
np.power(arr, 2) # Element-wise power
np.sqrt(arr) # Square root
np.square(arr) # Square
np.exp(arr) # e^x
np.log(arr) # Natural log
np.log10(arr) # Log base 10
np.log2(arr) # Log base 2PythonTrigonometric
np.sin(arr) # Sine
np.cos(arr) # Cosine
np.tan(arr) # Tangent
np.arcsin(arr) # Inverse sine
np.arccos(arr) # Inverse cosine
np.arctan(arr) # Inverse tangent
np.arctan2(y, x) # Atan2(y, x)
np.sinh(arr) # Hyperbolic sine
np.cosh(arr) # Hyperbolic cosine
np.tanh(arr) # Hyperbolic tangent
np.deg2rad(arr) # Degrees to radians
np.rad2deg(arr) # Radians to degreesPythonRounding
np.round(arr) # Round to nearest
np.round(arr, 2) # Round to 2 decimals
np.floor(arr) # Round down
np.ceil(arr) # Round up
np.trunc(arr) # Truncate
np.rint(arr) # Round to nearest int
np.fix(arr) # Round towards zeroPythonComparison
arr == 5 # Equal to
arr != 5 # Not equal to
arr > 5 # Greater than
arr < 5 # Less than
arr >= 5 # Greater or equal
arr <= 5 # Less or equal
np.equal(arr1, arr2) # Element-wise ==
np.not_equal(arr1, arr2) # Element-wise !=
np.greater(arr1, arr2) # Element-wise >
np.less(arr1, arr2) # Element-wise <
np.allclose(arr1, arr2) # All close (tolerance)
np.isclose(arr1, arr2) # Element-wise closePython📈 Aggregate Functions
Basic Aggregations
np.sum(arr) # Sum all elements
np.sum(arr, axis=0) # Sum along axis 0
np.sum(arr, axis=1) # Sum along axis 1
np.prod(arr) # Product of all elements
np.cumsum(arr) # Cumulative sum
np.cumprod(arr) # Cumulative product
np.diff(arr) # Differences between consecutivePythonStatistics
np.mean(arr) # Mean
np.median(arr) # Median
np.average(arr) # Average
np.average(arr, weights=w) # Weighted average
np.std(arr) # Standard deviation
np.var(arr) # Variance
np.min(arr) # Minimum
np.max(arr) # Maximum
np.ptp(arr) # Peak to peak (max - min)
np.percentile(arr, 50) # 50th percentile
np.quantile(arr, 0.5) # 0.5 quantilePythonIndices of Extrema
np.argmin(arr) # Index of minimum
np.argmax(arr) # Index of maximum
np.argmin(arr, axis=0) # Indices along axis
np.argmax(arr, axis=1) # Indices along axis
np.nanargmin(arr) # Ignore NaN
np.nanargmax(arr) # Ignore NaNPythonLogical Operations
np.all(arr) # True if all True
np.any(arr) # True if any True
np.all(arr > 0) # Check condition
np.any(arr > 0) # Check conditionPython🔄 Array Manipulation
Reshaping
arr.reshape(3, 4) # Reshape to 3x4
arr.reshape(-1, 1) # Column vector
arr.reshape(1, -1) # Row vector
arr.reshape(2, -1) # Auto-calculate columns
arr.flatten() # Flatten to 1D (copy)
arr.ravel() # Flatten to 1D (view)
arr.squeeze() # Remove single dimensions
np.expand_dims(arr, axis=0) # Add dimension at axisPythonTransposing
arr.T # Transpose
np.transpose(arr) # Transpose
np.transpose(arr, (2, 0, 1)) # Permute axes
np.swapaxes(arr, 0, 1) # Swap two axes
np.moveaxis(arr, 0, -1) # Move axisPythonJoining Arrays
np.concatenate([arr1, arr2]) # Concatenate along axis 0
np.concatenate([arr1, arr2], axis=1) # Along axis 1
np.vstack([arr1, arr2]) # Vertical stack (rows)
np.hstack([arr1, arr2]) # Horizontal stack (cols)
np.dstack([arr1, arr2]) # Depth stack
np.stack([arr1, arr2]) # Stack along new axis
np.stack([arr1, arr2], axis=1) # Stack along axis 1
np.column_stack([arr1, arr2]) # Stack as columns
np.row_stack([arr1, arr2]) # Stack as rowsPythonSplitting Arrays
np.split(arr, 3) # Split into 3 equal parts
np.split(arr, [3, 5]) # Split at indices 3, 5
np.vsplit(arr, 2) # Vertical split (rows)
np.hsplit(arr, 2) # Horizontal split (cols)
np.dsplit(arr, 2) # Depth split
np.array_split(arr, 3) # Split (unequal allowed)PythonAdding/Removing Elements
np.append(arr, [7, 8, 9]) # Append elements (copy)
np.insert(arr, 3, [99]) # Insert at index
np.delete(arr, [1, 3]) # Delete at indices
np.resize(arr, (4, 4)) # Resize (repeats if needed)
np.pad(arr, 2, mode='constant') # Pad with zeros
np.pad(arr, 2, mode='edge') # Pad with edge valuesPythonRepeating Elements
np.repeat(arr, 3) # Repeat each element 3 times
np.repeat(arr, 3, axis=0) # Repeat along axis
np.tile(arr, 3) # Tile array 3 times
np.tile(arr, (2, 3)) # Tile in 2DPython🧮 Linear Algebra
Matrix Products
np.dot(A, B) # Matrix multiplication
A @ B # Matrix multiplication (Python 3.5+)
np.matmul(A, B) # Matrix multiplication
np.inner(a, b) # Inner product
np.outer(a, b) # Outer product
np.tensordot(A, B, axes=1) # Tensor dot product
np.einsum('ij,jk->ik', A, B) # Einstein summation
np.kron(A, B) # Kronecker productPythonMatrix Properties
np.trace(A) # Trace (sum of diagonal)
np.linalg.det(A) # Determinant
np.linalg.matrix_rank(A) # Rank
np.linalg.norm(A) # Frobenius norm
np.linalg.norm(A, ord=2) # 2-norm (spectral)
np.linalg.norm(A, ord='fro') # Frobenius norm
np.linalg.cond(A) # Condition numberPythonMatrix Decomposition
np.linalg.inv(A) # Matrix inverse
np.linalg.pinv(A) # Pseudo-inverse (Moore-Penrose)
np.linalg.eig(A) # Eigenvalues & eigenvectors
np.linalg.eigvals(A) # Eigenvalues only
np.linalg.eigh(A) # Hermitian/symmetric eigendecomp
np.linalg.svd(A) # Singular value decomposition
np.linalg.qr(A) # QR decomposition
np.linalg.cholesky(A) # Cholesky decompositionPythonSolving Systems
np.linalg.solve(A, b) # Solve Ax = b
np.linalg.lstsq(A, b, rcond=None) # Least squares solutionPython📉 Statistical Functions
Descriptive Statistics
np.mean(arr) # Arithmetic mean
np.median(arr) # Median
np.std(arr) # Standard deviation
np.std(arr, ddof=1) # Sample std (N-1)
np.var(arr) # Variance
np.var(arr, ddof=1) # Sample variance
np.nanmean(arr) # Mean (ignore NaN)
np.nanmedian(arr) # Median (ignore NaN)
np.nanstd(arr) # Std (ignore NaN)
np.nanvar(arr) # Var (ignore NaN)PythonCorrelation
np.corrcoef(x, y) # Correlation coefficient matrix
np.cov(x, y) # Covariance matrix
np.correlate(x, y) # Cross-correlationPythonHistograms
np.histogram(arr, bins=10) # Histogram
np.histogram2d(x, y, bins=10) # 2D histogram
np.bincount(arr) # Count occurrences
np.digitize(arr, bins) # Bin indicesPython🎲 Random Sampling
Distributions
np.random.random(10) # Uniform [0, 1)
np.random.rand(3, 4) # Uniform [0, 1), shape (3,4)
np.random.randn(3, 4) # Standard normal
np.random.randint(0, 10, 5) # Random integers [0, 10)
np.random.uniform(0, 10, 5) # Uniform [0, 10)
np.random.normal(5, 2, 100) # Normal(μ=5, σ=2)
np.random.binomial(10, 0.5, 100) # Binomial(n=10, p=0.5)
np.random.poisson(5, 100) # Poisson(λ=5)
np.random.exponential(2, 100) # Exponential(scale=2)
np.random.gamma(2, 2, 100) # Gamma(shape=2, scale=2)
np.random.beta(2, 5, 100) # Beta(α=2, β=5)
np.random.chisquare(2, 100) # Chi-square(df=2)PythonSampling
np.random.choice([1,2,3,4,5], 10) # Random choices
np.random.choice(arr, 5, replace=False) # Sample without replacement
np.random.choice(arr, 5, p=probs) # Weighted sampling
np.random.shuffle(arr) # Shuffle in place
np.random.permutation(arr) # Random permutation (copy)PythonRandom Generator (Recommended)
from numpy.random import default_rng
rng = default_rng(42) # Create generator with seed
rng.random(10) # Random floats
rng.integers(0, 10, 5) # Random integers
rng.normal(0, 1, 100) # Normal distribution
rng.choice([1,2,3,4,5], 10) # Random choicesPython💾 File I/O
Text Files
np.savetxt('data.txt', arr) # Save to text
np.savetxt('data.csv', arr, delimiter=',') # Save as CSV
np.savetxt('data.txt', arr, fmt='%.2f') # Format specifier
np.loadtxt('data.txt') # Load from text
np.loadtxt('data.csv', delimiter=',') # Load CSV
np.loadtxt('data.txt', skiprows=1) # Skip header
np.genfromtxt('data.csv', delimiter=',') # More flexible
np.genfromtxt('data.csv', names=True) # With column namesPythonBinary Files
np.save('arr.npy', arr) # Save single array
np.load('arr.npy') # Load single array
np.savez('arrays.npz', a=arr1, b=arr2) # Save multiple arrays
np.savez_compressed('arr.npz', a=arr1) # Compressed
data = np.load('arrays.npz') # Load multiple
data['a'] # Access by namePythonMemory-Mapped Files
# Create memory-mapped file
mm = np.memmap('data.dat', dtype='float32',
mode='w+', shape=(1000, 1000))
mm[:] = np.random.rand(1000, 1000)
mm.flush()
# Load memory-mapped file
mm = np.memmap('data.dat', dtype='float32',
mode='r', shape=(1000, 1000))Python🔧 Utility Functions
Array Testing
np.isnan(arr) # Check for NaN
np.isinf(arr) # Check for infinity
np.isfinite(arr) # Check for finite
np.isreal(arr) # Check for real
np.iscomplex(arr) # Check for complexPythonArray Comparison
np.array_equal(arr1, arr2) # True if identical
np.array_equiv(arr1, arr2) # True if broadcastable & equal
np.allclose(arr1, arr2) # True if close (tolerance)
np.allclose(arr1, arr2, rtol=1e-5) # Relative tolerance
np.allclose(arr1, arr2, atol=1e-8) # Absolute tolerancePythonSorting
np.sort(arr) # Sort (returns copy)
arr.sort() # Sort in place
np.argsort(arr) # Indices that would sort
np.sort(arr, axis=0) # Sort along axis
np.lexsort((arr1, arr2)) # Sort by multiple keys
np.partition(arr, 3) # Partial sort (3rd smallest)
np.argpartition(arr, 3) # Indices of partial sortPythonSearching
np.where(arr > 5) # Indices where condition
np.where(arr > 5, x, y) # x if condition else y
np.argwhere(arr > 5) # Indices (2D format)
np.nonzero(arr) # Indices of non-zero
np.flatnonzero(arr) # Flat indices of non-zero
np.searchsorted(arr, 5) # Index to insert 5
np.extract(arr > 5, arr) # Extract elementsPythonSet Operations
np.unique(arr) # Unique elements (sorted)
np.unique(arr, return_counts=True) # With counts
np.unique(arr, return_index=True) # With first indices
np.in1d(arr1, arr2) # Test membership
np.intersect1d(arr1, arr2) # Intersection
np.union1d(arr1, arr2) # Union
np.setdiff1d(arr1, arr2) # Set difference
np.setxor1d(arr1, arr2) # Symmetric differencePythonMiscellaneous
np.clip(arr, 0, 10) # Clip values to [0, 10]
np.piecewise(x, [x<0, x>=0], [lambda x: 0, lambda x: x]) # Piecewise
np.select([cond1, cond2], [val1, val2]) # Select based on conditions
np.where(condition, x, y) # Ternary operator
np.choose(indices, [arr1, arr2, arr3]) # Choose from list
np.vectorize(func) # Vectorize function
np.apply_along_axis(func, 0, arr) # Apply function along axis
np.apply_over_axes(func, arr, [0,1]) # Apply over multiple axesPython🎭 Advanced Indexing
Mesh Grids
x = np.linspace(0, 5, 5)
y = np.linspace(0, 3, 3)
X, Y = np.meshgrid(x, y) # 2D coordinate matrices
X, Y = np.mgrid[0:5:5j, 0:3:3j] # Using mgrid
X, Y = np.ogrid[0:5:5j, 0:3:3j] # Open meshgrid (1D arrays)PythonIndex Tricks
np.ix_([0, 1], [2, 3]) # Index mesh for fancy indexing
np.r_[1:4, 0, 4] # Concatenate slices
np.c_[arr1, arr2] # Column stack shortcut
np.s_[::2] # Slice object
np.indices((3, 3)) # Index arrays
np.unravel_index(7, (3, 3)) # Convert flat index to coords
np.ravel_multi_index([[0,1], [1,2]], (3,3)) # Coords to flatPython🧪 Special Arrays
Structured Arrays
# Define dtype
dt = np.dtype([('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
# Create structured array
arr = np.array([('Alice', 25, 55.5),
('Bob', 30, 70.0)], dtype=dt)
arr['name'] # Access by field name
arr[0] # Access by row
arr[['name', 'age']] # Multiple fieldsPythonMasked Arrays
import numpy.ma as ma
# Create masked array
data = np.array([1, 2, -999, 4, -999, 6])
masked = ma.masked_equal(data, -999)
# Operations ignore masked values
masked.mean() # 3.25
masked.sum() # 13
# Manual masking
mask = [False, False, True, False, True, False]
masked = ma.array(data, mask=mask)PythonCharacter Arrays
np.char.add(['Hello'], [' World']) # String concatenation
np.char.multiply('Ha', 3) # 'HaHaHa'
np.char.upper(['hello', 'world']) # Uppercase
np.char.lower(['HELLO', 'WORLD']) # Lowercase
np.char.strip([' hello ']) # Strip whitespace
np.char.replace('hello', 'l', 'L') # Replace
np.char.split('hello world') # Split
np.char.join('-', ['hello', 'world']) # JoinPython⚡ Performance Tips
Vectorization
# Bad: Loop
result = np.zeros(len(arr))
for i in range(len(arr)):
result[i] = arr[i] ** 2
# Good: Vectorized
result = arr ** 2PythonBroadcasting
# Bad: Explicit loop
for i in range(arr.shape[0]):
arr[i] += vector
# Good: Broadcasting
arr += vectorPythonIn-Place Operations
arr += 1 # In-place (no copy)
arr = arr + 1 # Creates new array
np.add(arr, 1, out=arr) # Explicit in-placePythonMemory Views vs Copies
view = arr[::2] # View (no copy)
copy = arr[::2].copy() # Explicit copy
arr.base is None # True if owns data
view.base is arr # True if view of arrPython🎓 Common Patterns
Normalize Array
# Z-score normalization
normalized = (arr - arr.mean()) / arr.std()
# Min-max normalization
normalized = (arr - arr.min()) / (arr.max() - arr.min())PythonDistance Matrix
from scipy.spatial.distance import cdist
# Or manually:
X = np.random.rand(100, 2)
dist = np.sqrt(((X[:, None] - X) ** 2).sum(axis=2))PythonOne-Hot Encoding
labels = np.array([0, 1, 2, 1, 0])
n_classes = 3
one_hot = np.eye(n_classes)[labels]PythonMoving Average
window = 3
weights = np.ones(window) / window
moving_avg = np.convolve(arr, weights, mode='valid')PythonPolynomial Fitting
x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 1, 4, 9, 16])
coeffs = np.polyfit(x, y, 2) # Fit 2nd degree polynomial
poly = np.poly1d(coeffs) # Create polynomial
y_pred = poly(x) # PredictPython🔗 Integration with Other Libraries
With Pandas
import pandas as pd
df = pd.DataFrame(arr) # Array to DataFrame
arr = df.values # DataFrame to array
arr = df.to_numpy() # Recommended methodPythonWith Matplotlib
import matplotlib.pyplot as plt
plt.plot(arr) # Plot array
plt.imshow(arr) # Display 2D array as image
plt.hist(arr.flatten(), bins=50) # HistogramPythonWith PIL/Pillow
from PIL import Image
img_array = np.array(Image.open('image.jpg'))
img = Image.fromarray(arr.astype('uint8'))Python📚 Quick Reference Table
| Operation | Syntax | Description |
|---|---|---|
| Creation | ||
| From list | np.array([1,2,3]) | Create from list |
| Zeros | np.zeros((3,4)) | 3×4 array of zeros |
| Ones | np.ones((2,3)) | 2×3 array of ones |
| Range | np.arange(10) | 0 to 9 |
| Linspace | np.linspace(0,1,5) | 5 evenly spaced values |
| Identity | np.eye(3) | 3×3 identity matrix |
| Indexing | ||
| Single element | arr[i,j] | Element at row i, col j |
| Slice | arr[1:3,:] | Rows 1-2, all columns |
| Boolean | arr[arr>5] | Elements > 5 |
| Fancy | arr[[0,2,4]] | Elements at indices 0,2,4 |
| Math | ||
| Add | arr + 5 | Add 5 to each element |
| Multiply | arr * 2 | Multiply by 2 |
| Power | arr ** 2 | Square each element |
| Sqrt | np.sqrt(arr) | Square root |
| Exp | np.exp(arr) | e^x |
| Log | np.log(arr) | Natural log |
| Aggregate | ||
| Sum | np.sum(arr) | Sum all elements |
| Mean | np.mean(arr) | Average |
| Min/Max | np.min(arr), np.max(arr) | Minimum, maximum |
| Std | np.std(arr) | Standard deviation |
| Shape | ||
| Reshape | arr.reshape(3,4) | Change shape to 3×4 |
| Flatten | arr.flatten() | Convert to 1D |
| Transpose | arr.T | Swap rows and columns |
| Join/Split | ||
| Concatenate | np.concatenate([a,b]) | Join arrays |
| Stack | np.vstack([a,b]) | Stack vertically |
| Split | np.split(arr, 3) | Split into 3 parts |
| Linear Algebra | ||
| Dot product | np.dot(A,B) or A @ B | Matrix multiplication |
| Inverse | np.linalg.inv(A) | Matrix inverse |
| Determinant | np.linalg.det(A) | Determinant |
| Eigenvalues | np.linalg.eig(A) | Eigenvalues & vectors |
| Random | ||
| Random floats | np.random.rand(3,4) | Uniform [0,1) |
| Random ints | np.random.randint(0,10,5) | Integers [0,10) |
| Normal dist | np.random.randn(100) | Standard normal |
| Choice | np.random.choice([1,2,3]) | Random selection |
Happy NumPy coding! 🚀
Discover more from Altgr Blog
Subscribe to get the latest posts sent to your email.
