Table of Contents

    1. Introduction to NumPy
    2. Installation
    3. NumPy Arrays
    4. Array Creation
    5. Array Attributes
    6. Array Indexing and Slicing
    7. Array Operations
    8. Mathematical Functions
    9. Array Reshaping
    10. Broadcasting
    11. Linear Algebra
    12. Statistical Functions
    13. Random Number Generation
    14. File I/O
    15. Advanced Topics
    16. Best Practices

    Introduction to NumPy

    NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides:

    • A powerful N-dimensional array object
    • Sophisticated broadcasting functions
    • Tools for integrating C/C++ and Fortran code
    • Useful linear algebra, Fourier transform, and random number capabilities

    Why Use NumPy?

    • Performance: NumPy arrays are stored at one continuous place in memory (unlike lists), making them faster to access and manipulate
    • Convenience: Built-in functions for mathematical operations
    • Less Code: Operations that would require loops in Python can be done in one line with NumPy
    • Foundation: Many scientific libraries (pandas, scikit-learn, TensorFlow) are built on top of NumPy

    Installation

    # Using pip
    pip install numpy
    
    # Using conda
    conda install numpy
    
    # Verify installation
    python -c "import numpy; print(numpy.__version__)"
    Bash

    Import NumPy in your Python script:

    import numpy as np
    Python

    NumPy Arrays

    The core of NumPy is the ndarray (n-dimensional array) object. Unlike Python lists, NumPy arrays:

    • Are homogeneous (all elements must be of the same type)
    • Have a fixed size at creation
    • Support vectorized operations
    • Are more memory-efficient

    Array vs List Comparison

    import numpy as np
    
    # Python list
    python_list = [1, 2, 3, 4, 5]
    
    # NumPy array
    numpy_array = np.array([1, 2, 3, 4, 5])
    
    # Multiplying by 2
    # List: requires loop or list comprehension
    result_list = [x * 2 for x in python_list]
    
    # NumPy: vectorized operation
    result_numpy = numpy_array * 2
    print(result_numpy)  # [2 4 6 8 10]
    Python

    Array Creation

    From Python Lists

    import numpy as np
    
    # 1D array
    arr1d = np.array([1, 2, 3, 4, 5])
    print(arr1d)  # [1 2 3 4 5]
    
    # 2D array (matrix)
    arr2d = np.array([[1, 2, 3], [4, 5, 6]])
    print(arr2d)
    # [[1 2 3]
    #  [4 5 6]]
    
    # 3D array
    arr3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
    print(arr3d.shape)  # (2, 2, 2)
    Python

    Using Built-in Functions

    # Array of zeros
    zeros = np.zeros((3, 4))  # 3x4 array of zeros
    print(zeros)
    
    # Array of ones
    ones = np.ones((2, 3, 4))  # 2x3x4 array of ones
    
    # Empty array (uninitialized values)
    empty = np.empty((2, 2))
    
    # Array with a range of values
    arange = np.arange(0, 10, 2)  # [0 2 4 6 8]
    
    # Array with evenly spaced values
    linspace = np.linspace(0, 1, 5)  # [0.   0.25 0.5  0.75 1.  ]
    
    # Identity matrix
    identity = np.eye(3)  # 3x3 identity matrix
    
    # Array filled with a constant value
    full = np.full((2, 3), 7)  # 2x3 array filled with 7
    
    # Array like another array
    arr = np.array([1, 2, 3])
    zeros_like = np.zeros_like(arr)
    ones_like = np.ones_like(arr)
    Python

    Specifying Data Types

    # Integer array
    int_arr = np.array([1, 2, 3], dtype=np.int32)
    
    # Float array
    float_arr = np.array([1, 2, 3], dtype=np.float64)
    
    # Complex array
    complex_arr = np.array([1+2j, 3+4j], dtype=np.complex128)
    
    # Boolean array
    bool_arr = np.array([True, False, True], dtype=np.bool_)
    
    # String array
    str_arr = np.array(['a', 'b', 'c'], dtype='U1')
    
    # Check data type
    print(int_arr.dtype)  # int32
    Python

    Array Attributes

    import numpy as np
    
    arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
    
    # Shape: dimensions of the array
    print(arr.shape)  # (3, 4)
    
    # ndim: number of dimensions
    print(arr.ndim)  # 2
    
    # size: total number of elements
    print(arr.size)  # 12
    
    # dtype: data type of elements
    print(arr.dtype)  # int64 (or int32 depending on system)
    
    # itemsize: size in bytes of each element
    print(arr.itemsize)  # 8 (for int64)
    
    # nbytes: total bytes consumed by the array
    print(arr.nbytes)  # 96 (12 elements * 8 bytes)
    Python

    Array Indexing and Slicing

    Basic Indexing

    import numpy as np
    
    arr = np.array([10, 20, 30, 40, 50])
    
    # Access single element
    print(arr[0])   # 10
    print(arr[-1])  # 50
    
    # Modify element
    arr[2] = 99
    print(arr)  # [10 20 99 40 50]
    Python

    Slicing 1D Arrays

    arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    
    # Basic slicing: arr[start:stop:step]
    print(arr[2:7])      # [2 3 4 5 6]
    print(arr[:5])       # [0 1 2 3 4]
    print(arr[5:])       # [5 6 7 8 9]
    print(arr[::2])      # [0 2 4 6 8]
    print(arr[::-1])     # [9 8 7 6 5 4 3 2 1 0] (reverse)
    Python

    Indexing 2D Arrays

    arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
    
    # Access single element
    print(arr2d[0, 0])   # 1
    print(arr2d[1, 2])   # 6
    print(arr2d[-1, -1]) # 9
    
    # Access row
    print(arr2d[1])      # [4 5 6]
    
    # Access column
    print(arr2d[:, 1])   # [2 5 8]
    
    # Slicing
    print(arr2d[0:2, 1:3])
    # [[2 3]
    #  [5 6]]
    Python

    Boolean Indexing

    arr = np.array([1, 2, 3, 4, 5, 6])
    
    # Create boolean mask
    mask = arr > 3
    print(mask)  # [False False False  True  True  True]
    
    # Use mask to filter
    print(arr[mask])  # [4 5 6]
    
    # Direct boolean indexing
    print(arr[arr > 3])  # [4 5 6]
    
    # Multiple conditions
    print(arr[(arr > 2) & (arr < 5)])  # [3 4]
    Python

    Fancy Indexing

    arr = np.array([10, 20, 30, 40, 50, 60])
    
    # Index with array of integers
    indices = np.array([0, 2, 4])
    print(arr[indices])  # [10 30 50]
    
    # 2D fancy indexing
    arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
    rows = np.array([0, 2])
    cols = np.array([1, 2])
    print(arr2d[rows, cols])  # [2 9]
    Python

    Array Operations

    Arithmetic Operations

    import numpy as np
    
    a = np.array([1, 2, 3, 4])
    b = np.array([10, 20, 30, 40])
    
    # Element-wise operations
    print(a + b)   # [11 22 33 44]
    print(a - b)   # [-9 -18 -27 -36]
    print(a * b)   # [10 40 90 160]
    print(a / b)   # [0.1 0.1 0.1 0.1]
    print(a ** 2)  # [1 4 9 16]
    print(a % 2)   # [1 0 1 0]
    
    # Operations with scalars
    print(a + 5)   # [6 7 8 9]
    print(a * 2)   # [2 4 6 8]
    Python

    Universal Functions (ufuncs)

    arr = np.array([1, 4, 9, 16, 25])
    
    # Mathematical functions
    print(np.sqrt(arr))      # [1. 2. 3. 4. 5.]
    print(np.exp(arr))       # [2.72e+00 5.46e+01 8.10e+03 8.89e+06 7.20e+10]
    print(np.log(arr))       # [0.    1.39  2.20  2.77  3.22]
    print(np.sin(arr))       # [0.84  -0.76 0.41  -0.29 -0.13]
    
    # Absolute value
    arr_neg = np.array([-1, -2, 3, -4])
    print(np.abs(arr_neg))   # [1 2 3 4]
    
    # Sign function
    print(np.sign(arr_neg))  # [-1 -1  1 -1]
    Python

    Comparison Operations

    a = np.array([1, 2, 3, 4, 5])
    b = np.array([5, 4, 3, 2, 1])
    
    print(a == b)   # [False False  True False False]
    print(a < b)    # [ True  True False False False]
    print(a >= 3)   # [False False  True  True  True]
    
    # Check if any or all elements satisfy condition
    print(np.any(a > 3))    # True
    print(np.all(a > 0))    # True
    Python

    Aggregate Functions

    arr = np.array([1, 2, 3, 4, 5])
    
    print(np.sum(arr))      # 15
    print(np.min(arr))      # 1
    print(np.max(arr))      # 5
    print(np.mean(arr))     # 3.0
    print(np.median(arr))   # 3.0
    print(np.std(arr))      # 1.414... (standard deviation)
    print(np.var(arr))      # 2.0 (variance)
    
    # 2D array operations
    arr2d = np.array([[1, 2, 3], [4, 5, 6]])
    
    print(np.sum(arr2d))           # 21 (all elements)
    print(np.sum(arr2d, axis=0))   # [5 7 9] (sum along columns)
    print(np.sum(arr2d, axis=1))   # [6 15] (sum along rows)
    Python

    Mathematical Functions

    Trigonometric Functions

    import numpy as np
    
    angles = np.array([0, np.pi/6, np.pi/4, np.pi/3, np.pi/2])
    
    print(np.sin(angles))
    print(np.cos(angles))
    print(np.tan(angles))
    
    # Inverse functions
    print(np.arcsin(np.sin(angles)))
    print(np.arccos(np.cos(angles)))
    print(np.arctan(np.tan(angles[:-1])))  # Exclude π/2 to avoid infinity
    
    # Hyperbolic functions
    print(np.sinh(angles))
    print(np.cosh(angles))
    print(np.tanh(angles))
    Python

    Rounding Functions

    arr = np.array([1.23, 2.67, 3.45, 4.89])
    
    print(np.round(arr))       # [1. 3. 3. 5.]
    print(np.floor(arr))       # [1. 2. 3. 4.]
    print(np.ceil(arr))        # [2. 3. 4. 5.]
    print(np.trunc(arr))       # [1. 2. 3. 4.]
    
    # Round to specific decimals
    print(np.round(arr, 1))    # [1.2 2.7 3.4 4.9]
    Python

    Exponential and Logarithmic

    arr = np.array([1, 2, 3, 4, 5])
    
    # Exponential
    print(np.exp(arr))         # e^x
    print(np.exp2(arr))        # 2^x
    print(np.power(3, arr))    # 3^x
    
    # Logarithm
    print(np.log(arr))         # Natural log (ln)
    print(np.log10(arr))       # Base 10
    print(np.log2(arr))        # Base 2
    Python

    Array Reshaping

    Reshape

    import numpy as np
    
    arr = np.arange(12)  # [0 1 2 3 4 5 6 7 8 9 10 11]
    
    # Reshape to 2D
    arr2d = arr.reshape(3, 4)
    print(arr2d)
    # [[ 0  1  2  3]
    #  [ 4  5  6  7]
    #  [ 8  9 10 11]]
    
    # Reshape to 3D
    arr3d = arr.reshape(2, 3, 2)
    print(arr3d.shape)  # (2, 3, 2)
    
    # Use -1 to infer dimension
    arr_auto = arr.reshape(2, -1)  # NumPy calculates: 12/2 = 6
    print(arr_auto.shape)  # (2, 6)
    Python

    Flatten and Ravel

    arr2d = np.array([[1, 2, 3], [4, 5, 6]])
    
    # Flatten: returns a copy
    flat = arr2d.flatten()
    print(flat)  # [1 2 3 4 5 6]
    
    # Ravel: returns a view if possible (more efficient)
    ravel = arr2d.ravel()
    print(ravel)  # [1 2 3 4 5 6]
    Python

    Transpose

    arr = np.array([[1, 2, 3], [4, 5, 6]])
    print(arr.shape)  # (2, 3)
    
    # Transpose
    arr_t = arr.T
    print(arr_t.shape)  # (3, 2)
    print(arr_t)
    # [[1 4]
    #  [2 5]
    #  [3 6]]
    
    # For multi-dimensional arrays
    arr3d = np.arange(24).reshape(2, 3, 4)
    arr3d_t = np.transpose(arr3d, (2, 0, 1))  # Swap axes
    print(arr3d_t.shape)  # (4, 2, 3)
    Python

    Stack and Split

    a = np.array([1, 2, 3])
    b = np.array([4, 5, 6])
    
    # Vertical stack (row-wise)
    v_stack = np.vstack([a, b])
    print(v_stack)
    # [[1 2 3]
    #  [4 5 6]]
    
    # Horizontal stack (column-wise)
    h_stack = np.hstack([a, b])
    print(h_stack)  # [1 2 3 4 5 6]
    
    # Concatenate along axis
    concat = np.concatenate([a, b])
    print(concat)  # [1 2 3 4 5 6]
    
    # Split
    arr = np.arange(9)
    split = np.split(arr, 3)  # Split into 3 equal parts
    print(split)  # [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
    Python

    Broadcasting

    Broadcasting allows NumPy to perform operations on arrays of different shapes.

    Broadcasting Rules

    1. If arrays have different numbers of dimensions, pad the smaller shape with ones on the left
    2. Arrays are compatible if their dimensions are equal or one of them is 1
    3. After broadcasting, each array behaves as if it had the larger shape

    Examples

    import numpy as np
    
    # Scalar with array
    arr = np.array([1, 2, 3])
    result = arr + 10  # 10 is broadcast to [10, 10, 10]
    print(result)  # [11 12 13]
    
    # 1D with 2D
    arr2d = np.array([[1, 2, 3], [4, 5, 6]])
    arr1d = np.array([10, 20, 30])
    
    result = arr2d + arr1d  # arr1d is broadcast to each row
    print(result)
    # [[11 22 33]
    #  [14 25 36]]
    
    # Column vector with row vector
    col = np.array([[1], [2], [3]])  # Shape: (3, 1)
    row = np.array([10, 20, 30])      # Shape: (3,)
    
    result = col + row  # Broadcasting creates (3, 3) result
    print(result)
    # [[11 21 31]
    #  [12 22 32]
    #  [13 23 33]]
    
    # Visual representation
    # col:          row:          result:
    # [[1]    +    [10 20 30] =  [[11 21 31]
    #  [2]                        [12 22 32]
    #  [3]]                       [13 23 33]]
    Python

    Common Broadcasting Patterns

    # Normalize each row
    arr = np.array([[1, 2, 3], [4, 5, 6]])
    row_means = arr.mean(axis=1, keepdims=True)  # Shape: (2, 1)
    normalized = arr - row_means
    print(normalized)
    
    # Outer product
    a = np.array([1, 2, 3])
    b = np.array([10, 20])
    outer = a.reshape(-1, 1) * b.reshape(1, -1)
    print(outer)
    # [[10 20]
    #  [20 40]
    #  [30 60]]
    Python

    Linear Algebra

    Matrix Operations

    import numpy as np
    
    A = np.array([[1, 2], [3, 4]])
    B = np.array([[5, 6], [7, 8]])
    
    # Matrix multiplication
    C = np.dot(A, B)  # or A @ B
    print(C)
    # [[19 22]
    #  [43 50]]
    
    # Element-wise multiplication
    element_wise = A * B
    print(element_wise)
    # [[ 5 12]
    #  [21 32]]
    
    # Matrix power
    print(np.linalg.matrix_power(A, 2))  # A^2
    
    # Inner product (for 1D arrays)
    a = np.array([1, 2, 3])
    b = np.array([4, 5, 6])
    print(np.inner(a, b))  # 32 (1*4 + 2*5 + 3*6)
    
    # Outer product
    print(np.outer(a, b))
    # [[ 4  5  6]
    #  [ 8 10 12]
    #  [12 15 18]]
    Python

    Matrix Decomposition

    # Determinant
    A = np.array([[1, 2], [3, 4]])
    det = np.linalg.det(A)
    print(det)  # -2.0
    
    # Inverse
    A_inv = np.linalg.inv(A)
    print(A_inv)
    # [[-2.   1. ]
    #  [ 1.5 -0.5]]
    
    # Verify: A @ A_inv should be identity
    print(np.round(A @ A_inv))
    # [[1. 0.]
    #  [0. 1.]]
    
    # Eigenvalues and eigenvectors
    eigenvalues, eigenvectors = np.linalg.eig(A)
    print("Eigenvalues:", eigenvalues)
    print("Eigenvectors:\n", eigenvectors)
    
    # Singular Value Decomposition (SVD)
    U, s, Vt = np.linalg.svd(A)
    print("U:\n", U)
    print("Singular values:", s)
    print("Vt:\n", Vt)
    
    # QR decomposition
    Q, R = np.linalg.qr(A)
    print("Q:\n", Q)
    print("R:\n", R)
    Python

    Solving Linear Systems

    # Solve Ax = b
    A = np.array([[3, 1], [1, 2]])
    b = np.array([9, 8])
    
    x = np.linalg.solve(A, b)
    print(x)  # [2. 3.]
    
    # Verify solution
    print(np.allclose(A @ x, b))  # True
    
    # Least squares solution (overdetermined system)
    A = np.array([[1, 1], [1, 2], [1, 3]])
    b = np.array([2, 3, 4])
    
    x, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)
    print(x)
    Python

    Matrix Properties

    A = np.array([[1, 2, 3], [4, 5, 6]])
    
    # Trace (sum of diagonal elements)
    B = np.array([[1, 2], [3, 4]])
    print(np.trace(B))  # 5
    
    # Rank
    print(np.linalg.matrix_rank(A))  # 2
    
    # Norm
    print(np.linalg.norm(A))  # Frobenius norm
    print(np.linalg.norm(A, ord=2))  # 2-norm (spectral norm)
    Python

    Statistical Functions

    Basic Statistics

    import numpy as np
    
    data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    
    # Central tendency
    print(np.mean(data))      # 5.5
    print(np.median(data))    # 5.5
    print(np.average(data))   # 5.5
    
    # Weighted average
    weights = np.array([1, 1, 1, 1, 1, 2, 2, 2, 2, 2])
    print(np.average(data, weights=weights))  # 6.333...
    
    # Spread
    print(np.std(data))       # 2.872... (standard deviation)
    print(np.var(data))       # 8.25 (variance)
    print(np.ptp(data))       # 9 (peak to peak, max - min)
    
    # Percentiles and quantiles
    print(np.percentile(data, 25))   # 3.25
    print(np.percentile(data, 50))   # 5.5 (median)
    print(np.percentile(data, 75))   # 7.75
    print(np.quantile(data, [0.25, 0.5, 0.75]))
    Python

    Correlation and Covariance

    x = np.array([1, 2, 3, 4, 5])
    y = np.array([2, 4, 5, 4, 5])
    
    # Correlation coefficient
    correlation_matrix = np.corrcoef(x, y)
    print(correlation_matrix)
    # [[1.    0.775]
    #  [0.775 1.   ]]
    
    # Covariance
    covariance_matrix = np.cov(x, y)
    print(covariance_matrix)
    
    # For multiple variables
    data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
    cov_matrix = np.cov(data)
    print(cov_matrix)
    Python

    Binning and Histograms

    data = np.random.randn(1000)  # Random normal distribution
    
    # Histogram
    hist, bin_edges = np.histogram(data, bins=10)
    print("Histogram counts:", hist)
    print("Bin edges:", bin_edges)
    
    # Digitize (assign to bins)
    bins = np.array([-2, -1, 0, 1, 2])
    indices = np.digitize(data, bins)
    print(indices[:10])  # First 10 bin assignments
    Python

    Random Number Generation

    Random Module

    import numpy as np
    
    # Set seed for reproducibility
    np.random.seed(42)
    
    # Random floats between 0 and 1
    print(np.random.rand(3, 2))  # 3x2 array
    
    # Random floats from uniform distribution [low, high)
    print(np.random.uniform(0, 10, size=5))
    
    # Random integers
    print(np.random.randint(0, 10, size=5))  # [low, high)
    
    # Random integers from range
    print(np.random.randint(low=1, high=7, size=(2, 3)))  # Like dice rolls
    
    # Random choice from array
    arr = np.array([10, 20, 30, 40, 50])
    print(np.random.choice(arr, size=3))
    
    # Random choice with replacement=False (unique values)
    print(np.random.choice(arr, size=3, replace=False))
    
    # Random choice with probabilities
    print(np.random.choice(arr, size=5, p=[0.1, 0.1, 0.2, 0.3, 0.3]))
    Python

    Statistical Distributions

    # Normal (Gaussian) distribution
    normal = np.random.normal(loc=0, scale=1, size=1000)  # mean=0, std=1
    print(normal.mean(), normal.std())
    
    # Standard normal
    standard_normal = np.random.randn(1000)
    
    # Binomial distribution
    binomial = np.random.binomial(n=10, p=0.5, size=1000)  # 10 trials, p=0.5
    
    # Poisson distribution
    poisson = np.random.poisson(lam=5, size=1000)  # lambda=5
    
    # Exponential distribution
    exponential = np.random.exponential(scale=2, size=1000)
    
    # Beta distribution
    beta = np.random.beta(a=2, b=5, size=1000)
    
    # Gamma distribution
    gamma = np.random.gamma(shape=2, scale=2, size=1000)
    Python

    Array Manipulation with Random

    arr = np.arange(10)
    
    # Shuffle in place
    np.random.shuffle(arr)
    print(arr)
    
    # Permutation (returns shuffled copy)
    original = np.arange(10)
    shuffled = np.random.permutation(original)
    print(original)  # Unchanged
    print(shuffled)  # Shuffled
    Python
    # Modern approach using Generator
    from numpy.random import default_rng
    
    rng = default_rng(42)  # Seed
    
    # Generate random numbers
    print(rng.random(5))
    print(rng.integers(0, 10, size=5))
    print(rng.normal(0, 1, size=5))
    print(rng.choice([1, 2, 3, 4, 5], size=3))
    Python

    File I/O

    Text Files

    import numpy as np
    
    # Save array to text file
    arr = np.array([[1, 2, 3], [4, 5, 6]])
    np.savetxt('data.txt', arr)
    
    # Load from text file
    loaded = np.loadtxt('data.txt')
    print(loaded)
    
    # Save with formatting
    np.savetxt('data.csv', arr, delimiter=',', fmt='%d')
    
    # Load CSV
    loaded_csv = np.loadtxt('data.csv', delimiter=',')
    Python

    Binary Files (.npy)

    # Save single array
    arr = np.array([1, 2, 3, 4, 5])
    np.save('array.npy', arr)
    
    # Load
    loaded = np.load('array.npy')
    print(loaded)
    
    # Save multiple arrays
    arr1 = np.array([1, 2, 3])
    arr2 = np.array([4, 5, 6])
    np.savez('multiple.npz', a=arr1, b=arr2)
    
    # Load multiple
    data = np.load('multiple.npz')
    print(data['a'])
    print(data['b'])
    
    # Save compressed
    np.savez_compressed('compressed.npz', a=arr1, b=arr2)
    Python

    Memory-Mapped Files

    # For very large files that don't fit in memory
    # Create memory-mapped array
    mm_arr = np.memmap('memmap.dat', dtype='float32', mode='w+', shape=(1000, 1000))
    
    # Write data
    mm_arr[:] = np.random.rand(1000, 1000)
    mm_arr.flush()
    
    # Read memory-mapped array
    mm_loaded = np.memmap('memmap.dat', dtype='float32', mode='r', shape=(1000, 1000))
    print(mm_loaded[0, :10])  # Access without loading entire file
    Python

    Advanced Topics

    Structured Arrays

    import numpy as np
    
    # Define structured data type
    dt = np.dtype([('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
    
    # Create structured array
    data = np.array([('Alice', 25, 55.5),
                     ('Bob', 30, 75.0),
                     ('Charlie', 35, 80.2)], dtype=dt)
    
    print(data)
    print(data['name'])    # ['Alice' 'Bob' 'Charlie']
    print(data['age'])     # [25 30 35]
    print(data[0])         # ('Alice', 25, 55.5)
    
    # Sorting by field
    sorted_data = np.sort(data, order='age')
    print(sorted_data)
    Python

    Masked Arrays

    # Handle missing or invalid data
    data = np.array([1, 2, -999, 4, -999, 6])
    
    # Create masked array
    masked = np.ma.masked_equal(data, -999)
    print(masked)  # [1 2 -- 4 -- 6]
    
    # Operations ignore masked values
    print(masked.mean())  # 3.25 (ignores -999)
    print(masked.sum())   # 13
    
    # Manual mask
    mask = np.array([False, False, True, False, True, False])
    masked2 = np.ma.array(data, mask=mask)
    print(masked2)
    Python

    Vectorization

    # Vectorize a Python function to work on arrays
    def my_function(x, y):
        if x > y:
            return x - y
        else:
            return x + y
    
    # Vectorize
    vec_function = np.vectorize(my_function)
    
    a = np.array([1, 2, 3, 4])
    b = np.array([4, 3, 2, 1])
    
    result = vec_function(a, b)
    print(result)  # [5 5 5 5]
    
    # Note: For better performance, prefer built-in NumPy functions
    result_fast = np.where(a > b, a - b, a + b)
    print(result_fast)  # [5 5 5 5]
    Python

    Advanced Indexing

    # Integer array indexing
    arr = np.arange(12).reshape(3, 4)
    rows = np.array([0, 0, 2, 2])
    cols = np.array([0, 2, 0, 2])
    print(arr[rows, cols])  # [0 2 8 10]
    
    # Boolean mask with multiple conditions
    arr = np.arange(20)
    mask = (arr % 2 == 0) & (arr > 10)
    print(arr[mask])  # [12 14 16 18]
    
    # np.where for conditional replacement
    arr = np.array([1, 2, 3, 4, 5])
    result = np.where(arr > 3, 100, arr)
    print(result)  # [1 2 3 100 100]
    
    # np.select for multiple conditions
    conditions = [arr < 2, arr < 4, arr >= 4]
    choices = ['small', 'medium', 'large']
    result = np.select(conditions, choices)
    print(result)  # ['small' 'medium' 'medium' 'large' 'large']
    Python

    Memory Views and Copies

    arr = np.array([1, 2, 3, 4, 5])
    
    # View (shares memory)
    view = arr[1:4]
    view[0] = 999
    print(arr)  # [1 999 3 4 5] - original is modified!
    
    # Copy (independent)
    copy = arr[1:4].copy()
    copy[0] = 111
    print(arr)  # [1 999 3 4 5] - original unchanged
    
    # Check if array owns its data
    print(arr.flags['OWNDATA'])    # True
    print(view.flags['OWNDATA'])   # False
    print(copy.flags['OWNDATA'])   # True
    
    # Base attribute
    print(view.base is arr)   # True (view references arr)
    print(copy.base is None)  # True (copy is independent)
    Python

    Einstein Summation (einsum)

    # Powerful tool for multi-dimensional operations
    A = np.array([[1, 2], [3, 4]])
    B = np.array([[5, 6], [7, 8]])
    
    # Matrix multiplication
    C = np.einsum('ij,jk->ik', A, B)
    print(C)  # Same as np.dot(A, B)
    
    # Trace
    trace = np.einsum('ii->', A)
    print(trace)  # 5 (1 + 4)
    
    # Transpose
    At = np.einsum('ij->ji', A)
    print(At)
    
    # Element-wise multiplication and sum
    result = np.einsum('ij,ij->', A, B)
    print(result)  # Sum of A * B element-wise
    Python

    Best Practices

    1. Use Vectorization Instead of Loops

    # Bad: Using loops
    arr = np.arange(1000)
    result = np.zeros(1000)
    for i in range(len(arr)):
        result[i] = arr[i] ** 2
    
    # Good: Vectorized operation
    result = arr ** 2
    Python

    2. Specify Data Types

    # Bad: Default data type
    arr = np.zeros(1000000)  # float64, uses more memory
    
    # Good: Specify appropriate type
    arr = np.zeros(1000000, dtype=np.float32)  # Uses half the memory
    Python

    3. Use In-Place Operations

    arr = np.arange(1000)
    
    # Creates new array
    arr = arr + 1
    
    # In-place operation (more memory efficient)
    arr += 1
    Python

    4. Avoid Copying When Possible

    # Use views when you don't need independence
    large_arr = np.arange(1000000)
    subset = large_arr[::2]  # View, not copy
    
    # Only copy when necessary
    subset_copy = large_arr[::2].copy()
    Python

    5. Use Appropriate Functions

    # Bad: Manual implementation
    arr = np.array([1, 2, 3, 4, 5])
    mean = np.sum(arr) / len(arr)
    
    # Good: Built-in function
    mean = np.mean(arr)
    Python

    6. Pre-allocate Arrays

    # Bad: Growing array
    result = np.array([])
    for i in range(1000):
        result = np.append(result, i)
    
    # Good: Pre-allocate
    result = np.zeros(1000)
    for i in range(1000):
        result[i] = i
    
    # Even better: Use arange
    result = np.arange(1000)
    Python

    7. Use Broadcasting

    # Bad: Manual broadcasting
    arr = np.array([[1, 2, 3], [4, 5, 6]])
    result = np.zeros_like(arr)
    for i in range(arr.shape[0]):
        result[i] = arr[i] + np.array([10, 20, 30])
    
    # Good: Automatic broadcasting
    result = arr + np.array([10, 20, 30])
    Python

    8. Profile Your Code

    import numpy as np
    import time
    
    # Method 1
    start = time.time()
    arr = np.arange(1000000)
    result = arr ** 2
    end = time.time()
    print(f"Method 1: {end - start:.4f} seconds")
    
    # Method 2
    start = time.time()
    arr = np.arange(1000000)
    result = np.power(arr, 2)
    end = time.time()
    print(f"Method 2: {end - start:.4f} seconds")
    Python

    9. Handle Memory Efficiently

    # For large datasets, use memory-mapped files
    # For operations on portions of data, use slicing/views
    # Delete intermediate arrays when done
    del intermediate_result
    
    # Use generators for large data processing
    def data_generator(n):
        for i in range(n):
            yield np.random.rand(1000)
    Python

    10. Document Array Shapes

    def process_data(X):
        """
        Process input data.
    
        Parameters
        ----------
        X : ndarray, shape (n_samples, n_features)
            Input data matrix
    
        Returns
        -------
        result : ndarray, shape (n_samples,)
            Processed output
        """
        # Shape: (n_samples, n_features) -> (n_samples,)
        return np.mean(X, axis=1)
    Python

    Common Pitfalls and Solutions

    Pitfall 1: Modifying Array Through View

    # Problem
    arr = np.arange(10)
    subset = arr[5:]  # View
    subset[:] = 0     # Modifies original!
    print(arr)        # [0 1 2 3 4 0 0 0 0 0]
    
    # Solution: Use copy
    arr = np.arange(10)
    subset = arr[5:].copy()
    subset[:] = 0
    print(arr)  # [0 1 2 3 4 5 6 7 8 9] - unchanged
    Python

    Pitfall 2: Integer Division

    # Problem (Python 2 style)
    arr = np.array([1, 2, 3, 4, 5])
    result = arr / 2  # In older NumPy versions, this was integer division
    
    # Solution: Ensure float division
    result = arr / 2.0  # or arr / float(2)
    # In Python 3 and modern NumPy, / always does float division
    Python

    Pitfall 3: Dimension Confusion

    # Problem
    arr = np.array([1, 2, 3])
    print(arr.shape)  # (3,) - 1D array
    
    arr2d = np.array([[1, 2, 3]])
    print(arr2d.shape)  # (1, 3) - 2D array with 1 row
    
    # They behave differently in some operations!
    
    # Solution: Be explicit about dimensions
    arr_column = arr.reshape(-1, 1)  # (3, 1)
    arr_row = arr.reshape(1, -1)     # (1, 3)
    Python

    Useful Resources


    Summary

    NumPy is the foundation of scientific computing in Python, providing:

    • Efficient multi-dimensional arrays
    • Broadcasting for implicit operations on arrays of different shapes
    • Comprehensive mathematical functions
    • Linear algebra operations
    • Random number generation
    • File I/O capabilities
    • Integration with other scientific libraries

    Master NumPy to unlock the full potential of Python for data science, machine learning, and scientific computing!


    Quick Reference Card

    # Array Creation
    np.array([1,2,3])              # From list
    np.zeros((3,4))                # Array of zeros
    np.ones((2,3))                 # Array of ones
    np.arange(10)                  # Range of values
    np.linspace(0,1,5)             # Evenly spaced values
    
    # Array Info
    arr.shape                      # Dimensions
    arr.dtype                      # Data type
    arr.size                       # Number of elements
    arr.ndim                       # Number of dimensions
    
    # Indexing
    arr[i]                         # 1D indexing
    arr[i,j]                       # 2D indexing
    arr[i:j]                       # Slicing
    arr[arr > 5]                   # Boolean indexing
    
    # Operations
    arr + 5                        # Element-wise addition
    arr * arr2                     # Element-wise multiplication
    arr @ arr2                     # Matrix multiplication
    np.dot(arr, arr2)              # Dot product
    
    # Aggregations
    np.sum(arr)                    # Sum
    np.mean(arr)                   # Mean
    np.std(arr)                    # Standard deviation
    np.max(arr)                    # Maximum
    np.argmax(arr)                 # Index of maximum
    
    # Reshaping
    arr.reshape(3,4)               # Reshape
    arr.flatten()                  # Flatten to 1D
    arr.T                          # Transpose
    
    # Random
    np.random.rand(3,4)            # Random values [0,1)
    np.random.randn(100)           # Standard normal
    np.random.randint(0,10,5)      # Random integers
    
    # Linear Algebra
    np.linalg.inv(A)               # Matrix inverse
    np.linalg.det(A)               # Determinant
    np.linalg.eig(A)               # Eigenvalues/vectors
    np.linalg.solve(A,b)           # Solve Ax=b
    Python

    Comprehensive NumPy Cheatsheet

    📦 Import Convention

    import numpy as np
    Python

    🎯 Array Creation

    From Existing Data

    np.array([1, 2, 3])                    # 1D array from list
    np.array([[1,2], [3,4]])               # 2D array from nested lists
    np.asarray([1, 2, 3])                  # Convert to array (no copy if already array)
    np.copy(arr)                           # Create a copy
    np.frombuffer(b'\x01\x02', dtype=int)  # From buffer
    np.fromiter(range(5), dtype=int)       # From iterable
    Python

    Zeros, Ones, and Empty

    np.zeros(5)                            # [0. 0. 0. 0. 0.]
    np.zeros((3, 4))                       # 3x4 array of zeros
    np.ones((2, 3, 4))                     # 2x3x4 array of ones
    np.empty((2, 2))                       # Uninitialized 2x2 array
    np.zeros_like(arr)                     # Zeros with same shape as arr
    np.ones_like(arr)                      # Ones with same shape as arr
    np.empty_like(arr)                     # Empty with same shape as arr
    np.full((3, 3), 7)                     # 3x3 array filled with 7
    np.full_like(arr, 5)                   # Like arr, filled with 5
    Python

    Ranges and Sequences

    np.arange(10)                          # [0 1 2 ... 9]
    np.arange(2, 10)                       # [2 3 4 ... 9]
    np.arange(0, 1, 0.1)                   # [0. 0.1 0.2 ... 0.9]
    np.linspace(0, 10, 5)                  # 5 evenly spaced values
    np.logspace(0, 2, 5)                   # [1. 3.16... 10. 31.6... 100.]
    np.geomspace(1, 1000, 4)               # Geometric sequence
    Python

    Identity and Diagonal

    np.eye(3)                              # 3x3 identity matrix
    np.eye(3, 4)                           # 3x4 identity matrix
    np.identity(3)                         # 3x3 identity matrix
    np.diag([1, 2, 3])                     # Diagonal matrix
    np.diag(arr)                           # Extract diagonal
    np.diagflat([1, 2])                    # Create diagonal array
    Python

    Random Arrays

    np.random.rand(3, 4)                   # Uniform [0, 1), shape (3,4)
    np.random.randn(3, 4)                  # Standard normal, shape (3,4)
    np.random.randint(0, 10, (3, 4))       # Random ints [0, 10)
    np.random.random((3, 4))               # Random floats [0, 1)
    np.random.uniform(0, 10, (3, 4))       # Uniform [0, 10)
    np.random.normal(0, 1, (3, 4))         # Normal (μ=0, σ=1)
    np.random.choice([1,2,3,4], 10)        # Random choices
    np.random.permutation(10)              # Random permutation
    np.random.shuffle(arr)                 # Shuffle in place
    Python

    📊 Array Attributes

    arr.shape                              # Dimensions (rows, cols, ...)
    arr.ndim                               # Number of dimensions
    arr.size                               # Total number of elements
    arr.dtype                              # Data type
    arr.itemsize                           # Size of each element (bytes)
    arr.nbytes                             # Total bytes (size * itemsize)
    arr.T                                  # Transpose
    arr.real                               # Real part (complex arrays)
    arr.imag                               # Imaginary part
    arr.flat                               # Flat iterator
    arr.flags                              # Memory layout info
    Python

    🎯 Data Types

    np.int8, np.int16, np.int32, np.int64  # Signed integers
    np.uint8, np.uint16, np.uint32, np.uint64  # Unsigned integers
    np.float16, np.float32, np.float64     # Floating point
    np.complex64, np.complex128            # Complex numbers
    np.bool_                               # Boolean
    np.object_                             # Python objects
    np.string_, np.unicode_                # Strings
    
    # Convert types
    arr.astype(np.float32)                 # Convert to float32
    arr.astype('int')                      # Convert to int
    Python

    🔍 Indexing & Slicing

    Basic Indexing

    arr[0]                                 # First element
    arr[-1]                                # Last element
    arr[2:5]                               # Elements 2, 3, 4
    arr[::2]                               # Every other element
    arr[::-1]                              # Reverse
    arr[1:8:2]                             # Start:stop:step
    Python

    Multi-dimensional Indexing

    arr[i, j]                              # Element at row i, col j
    arr[i]                                 # Row i
    arr[:, j]                              # Column j
    arr[0:2, 1:3]                          # Subarray
    arr[..., 0]                            # Last dimension, first element
    arr[:, :, 0]                           # Same as above for 3D
    Python

    Boolean Indexing

    arr[arr > 5]                           # Elements > 5
    arr[(arr > 5) & (arr < 10)]            # Elements 5 < x < 10
    arr[(arr < 5) | (arr > 10)]            # Elements x < 5 or x > 10
    arr[~(arr > 5)]                        # Elements <= 5 (NOT operator)
    Python

    Fancy Indexing

    arr[[0, 2, 4]]                         # Elements at indices 0, 2, 4
    arr[[0, 1], [2, 3]]                    # Elements (0,2) and (1,3)
    arr[np.ix_([0,2], [1,3])]              # Outer indexing
    Python

    ➕ Mathematical Operations

    Arithmetic

    arr + 5                                # Add scalar
    arr - 5                                # Subtract scalar
    arr * 5                                # Multiply by scalar
    arr / 5                                # Divide by scalar
    arr // 5                               # Floor division
    arr % 5                                # Modulo
    arr ** 2                               # Power
    np.add(arr1, arr2)                     # Element-wise addition
    np.subtract(arr1, arr2)                # Element-wise subtraction
    np.multiply(arr1, arr2)                # Element-wise multiplication
    np.divide(arr1, arr2)                  # Element-wise division
    np.power(arr, 2)                       # Element-wise power
    np.sqrt(arr)                           # Square root
    np.square(arr)                         # Square
    np.exp(arr)                            # e^x
    np.log(arr)                            # Natural log
    np.log10(arr)                          # Log base 10
    np.log2(arr)                           # Log base 2
    Python

    Trigonometric

    np.sin(arr)                            # Sine
    np.cos(arr)                            # Cosine
    np.tan(arr)                            # Tangent
    np.arcsin(arr)                         # Inverse sine
    np.arccos(arr)                         # Inverse cosine
    np.arctan(arr)                         # Inverse tangent
    np.arctan2(y, x)                       # Atan2(y, x)
    np.sinh(arr)                           # Hyperbolic sine
    np.cosh(arr)                           # Hyperbolic cosine
    np.tanh(arr)                           # Hyperbolic tangent
    np.deg2rad(arr)                        # Degrees to radians
    np.rad2deg(arr)                        # Radians to degrees
    Python

    Rounding

    np.round(arr)                          # Round to nearest
    np.round(arr, 2)                       # Round to 2 decimals
    np.floor(arr)                          # Round down
    np.ceil(arr)                           # Round up
    np.trunc(arr)                          # Truncate
    np.rint(arr)                           # Round to nearest int
    np.fix(arr)                            # Round towards zero
    Python

    Comparison

    arr == 5                               # Equal to
    arr != 5                               # Not equal to
    arr > 5                                # Greater than
    arr < 5                                # Less than
    arr >= 5                               # Greater or equal
    arr <= 5                               # Less or equal
    np.equal(arr1, arr2)                   # Element-wise ==
    np.not_equal(arr1, arr2)               # Element-wise !=
    np.greater(arr1, arr2)                 # Element-wise >
    np.less(arr1, arr2)                    # Element-wise <
    np.allclose(arr1, arr2)                # All close (tolerance)
    np.isclose(arr1, arr2)                 # Element-wise close
    Python

    📈 Aggregate Functions

    Basic Aggregations

    np.sum(arr)                            # Sum all elements
    np.sum(arr, axis=0)                    # Sum along axis 0
    np.sum(arr, axis=1)                    # Sum along axis 1
    np.prod(arr)                           # Product of all elements
    np.cumsum(arr)                         # Cumulative sum
    np.cumprod(arr)                        # Cumulative product
    np.diff(arr)                           # Differences between consecutive
    Python

    Statistics

    np.mean(arr)                           # Mean
    np.median(arr)                         # Median
    np.average(arr)                        # Average
    np.average(arr, weights=w)             # Weighted average
    np.std(arr)                            # Standard deviation
    np.var(arr)                            # Variance
    np.min(arr)                            # Minimum
    np.max(arr)                            # Maximum
    np.ptp(arr)                            # Peak to peak (max - min)
    np.percentile(arr, 50)                 # 50th percentile
    np.quantile(arr, 0.5)                  # 0.5 quantile
    Python

    Indices of Extrema

    np.argmin(arr)                         # Index of minimum
    np.argmax(arr)                         # Index of maximum
    np.argmin(arr, axis=0)                 # Indices along axis
    np.argmax(arr, axis=1)                 # Indices along axis
    np.nanargmin(arr)                      # Ignore NaN
    np.nanargmax(arr)                      # Ignore NaN
    Python

    Logical Operations

    np.all(arr)                            # True if all True
    np.any(arr)                            # True if any True
    np.all(arr > 0)                        # Check condition
    np.any(arr > 0)                        # Check condition
    Python

    🔄 Array Manipulation

    Reshaping

    arr.reshape(3, 4)                      # Reshape to 3x4
    arr.reshape(-1, 1)                     # Column vector
    arr.reshape(1, -1)                     # Row vector
    arr.reshape(2, -1)                     # Auto-calculate columns
    arr.flatten()                          # Flatten to 1D (copy)
    arr.ravel()                            # Flatten to 1D (view)
    arr.squeeze()                          # Remove single dimensions
    np.expand_dims(arr, axis=0)            # Add dimension at axis
    Python

    Transposing

    arr.T                                  # Transpose
    np.transpose(arr)                      # Transpose
    np.transpose(arr, (2, 0, 1))           # Permute axes
    np.swapaxes(arr, 0, 1)                 # Swap two axes
    np.moveaxis(arr, 0, -1)                # Move axis
    Python

    Joining Arrays

    np.concatenate([arr1, arr2])           # Concatenate along axis 0
    np.concatenate([arr1, arr2], axis=1)   # Along axis 1
    np.vstack([arr1, arr2])                # Vertical stack (rows)
    np.hstack([arr1, arr2])                # Horizontal stack (cols)
    np.dstack([arr1, arr2])                # Depth stack
    np.stack([arr1, arr2])                 # Stack along new axis
    np.stack([arr1, arr2], axis=1)         # Stack along axis 1
    np.column_stack([arr1, arr2])          # Stack as columns
    np.row_stack([arr1, arr2])             # Stack as rows
    Python

    Splitting Arrays

    np.split(arr, 3)                       # Split into 3 equal parts
    np.split(arr, [3, 5])                  # Split at indices 3, 5
    np.vsplit(arr, 2)                      # Vertical split (rows)
    np.hsplit(arr, 2)                      # Horizontal split (cols)
    np.dsplit(arr, 2)                      # Depth split
    np.array_split(arr, 3)                 # Split (unequal allowed)
    Python

    Adding/Removing Elements

    np.append(arr, [7, 8, 9])              # Append elements (copy)
    np.insert(arr, 3, [99])                # Insert at index
    np.delete(arr, [1, 3])                 # Delete at indices
    np.resize(arr, (4, 4))                 # Resize (repeats if needed)
    np.pad(arr, 2, mode='constant')        # Pad with zeros
    np.pad(arr, 2, mode='edge')            # Pad with edge values
    Python

    Repeating Elements

    np.repeat(arr, 3)                      # Repeat each element 3 times
    np.repeat(arr, 3, axis=0)              # Repeat along axis
    np.tile(arr, 3)                        # Tile array 3 times
    np.tile(arr, (2, 3))                   # Tile in 2D
    Python

    🧮 Linear Algebra

    Matrix Products

    np.dot(A, B)                           # Matrix multiplication
    A @ B                                  # Matrix multiplication (Python 3.5+)
    np.matmul(A, B)                        # Matrix multiplication
    np.inner(a, b)                         # Inner product
    np.outer(a, b)                         # Outer product
    np.tensordot(A, B, axes=1)             # Tensor dot product
    np.einsum('ij,jk->ik', A, B)           # Einstein summation
    np.kron(A, B)                          # Kronecker product
    Python

    Matrix Properties

    np.trace(A)                            # Trace (sum of diagonal)
    np.linalg.det(A)                       # Determinant
    np.linalg.matrix_rank(A)               # Rank
    np.linalg.norm(A)                      # Frobenius norm
    np.linalg.norm(A, ord=2)               # 2-norm (spectral)
    np.linalg.norm(A, ord='fro')           # Frobenius norm
    np.linalg.cond(A)                      # Condition number
    Python

    Matrix Decomposition

    np.linalg.inv(A)                       # Matrix inverse
    np.linalg.pinv(A)                      # Pseudo-inverse (Moore-Penrose)
    np.linalg.eig(A)                       # Eigenvalues & eigenvectors
    np.linalg.eigvals(A)                   # Eigenvalues only
    np.linalg.eigh(A)                      # Hermitian/symmetric eigendecomp
    np.linalg.svd(A)                       # Singular value decomposition
    np.linalg.qr(A)                        # QR decomposition
    np.linalg.cholesky(A)                  # Cholesky decomposition
    Python

    Solving Systems

    np.linalg.solve(A, b)                  # Solve Ax = b
    np.linalg.lstsq(A, b, rcond=None)      # Least squares solution
    Python

    📉 Statistical Functions

    Descriptive Statistics

    np.mean(arr)                           # Arithmetic mean
    np.median(arr)                         # Median
    np.std(arr)                            # Standard deviation
    np.std(arr, ddof=1)                    # Sample std (N-1)
    np.var(arr)                            # Variance
    np.var(arr, ddof=1)                    # Sample variance
    np.nanmean(arr)                        # Mean (ignore NaN)
    np.nanmedian(arr)                      # Median (ignore NaN)
    np.nanstd(arr)                         # Std (ignore NaN)
    np.nanvar(arr)                         # Var (ignore NaN)
    Python

    Correlation

    np.corrcoef(x, y)                      # Correlation coefficient matrix
    np.cov(x, y)                           # Covariance matrix
    np.correlate(x, y)                     # Cross-correlation
    Python

    Histograms

    np.histogram(arr, bins=10)             # Histogram
    np.histogram2d(x, y, bins=10)          # 2D histogram
    np.bincount(arr)                       # Count occurrences
    np.digitize(arr, bins)                 # Bin indices
    Python

    🎲 Random Sampling

    Distributions

    np.random.random(10)                   # Uniform [0, 1)
    np.random.rand(3, 4)                   # Uniform [0, 1), shape (3,4)
    np.random.randn(3, 4)                  # Standard normal
    np.random.randint(0, 10, 5)            # Random integers [0, 10)
    np.random.uniform(0, 10, 5)            # Uniform [0, 10)
    np.random.normal(5, 2, 100)            # Normal(μ=5, σ=2)
    np.random.binomial(10, 0.5, 100)       # Binomial(n=10, p=0.5)
    np.random.poisson(5, 100)              # Poisson(λ=5)
    np.random.exponential(2, 100)          # Exponential(scale=2)
    np.random.gamma(2, 2, 100)             # Gamma(shape=2, scale=2)
    np.random.beta(2, 5, 100)              # Beta(α=2, β=5)
    np.random.chisquare(2, 100)            # Chi-square(df=2)
    Python

    Sampling

    np.random.choice([1,2,3,4,5], 10)      # Random choices
    np.random.choice(arr, 5, replace=False) # Sample without replacement
    np.random.choice(arr, 5, p=probs)      # Weighted sampling
    np.random.shuffle(arr)                 # Shuffle in place
    np.random.permutation(arr)             # Random permutation (copy)
    Python
    from numpy.random import default_rng
    rng = default_rng(42)                  # Create generator with seed
    rng.random(10)                         # Random floats
    rng.integers(0, 10, 5)                 # Random integers
    rng.normal(0, 1, 100)                  # Normal distribution
    rng.choice([1,2,3,4,5], 10)            # Random choices
    Python

    💾 File I/O

    Text Files

    np.savetxt('data.txt', arr)            # Save to text
    np.savetxt('data.csv', arr, delimiter=',')  # Save as CSV
    np.savetxt('data.txt', arr, fmt='%.2f')     # Format specifier
    np.loadtxt('data.txt')                 # Load from text
    np.loadtxt('data.csv', delimiter=',')  # Load CSV
    np.loadtxt('data.txt', skiprows=1)     # Skip header
    np.genfromtxt('data.csv', delimiter=',')    # More flexible
    np.genfromtxt('data.csv', names=True)  # With column names
    Python

    Binary Files

    np.save('arr.npy', arr)                # Save single array
    np.load('arr.npy')                     # Load single array
    np.savez('arrays.npz', a=arr1, b=arr2) # Save multiple arrays
    np.savez_compressed('arr.npz', a=arr1) # Compressed
    data = np.load('arrays.npz')           # Load multiple
    data['a']                              # Access by name
    Python

    Memory-Mapped Files

    # Create memory-mapped file
    mm = np.memmap('data.dat', dtype='float32',
                   mode='w+', shape=(1000, 1000))
    mm[:] = np.random.rand(1000, 1000)
    mm.flush()
    
    # Load memory-mapped file
    mm = np.memmap('data.dat', dtype='float32',
                   mode='r', shape=(1000, 1000))
    Python

    🔧 Utility Functions

    Array Testing

    np.isnan(arr)                          # Check for NaN
    np.isinf(arr)                          # Check for infinity
    np.isfinite(arr)                       # Check for finite
    np.isreal(arr)                         # Check for real
    np.iscomplex(arr)                      # Check for complex
    Python

    Array Comparison

    np.array_equal(arr1, arr2)             # True if identical
    np.array_equiv(arr1, arr2)             # True if broadcastable & equal
    np.allclose(arr1, arr2)                # True if close (tolerance)
    np.allclose(arr1, arr2, rtol=1e-5)     # Relative tolerance
    np.allclose(arr1, arr2, atol=1e-8)     # Absolute tolerance
    Python

    Sorting

    np.sort(arr)                           # Sort (returns copy)
    arr.sort()                             # Sort in place
    np.argsort(arr)                        # Indices that would sort
    np.sort(arr, axis=0)                   # Sort along axis
    np.lexsort((arr1, arr2))               # Sort by multiple keys
    np.partition(arr, 3)                   # Partial sort (3rd smallest)
    np.argpartition(arr, 3)                # Indices of partial sort
    Python

    Searching

    np.where(arr > 5)                      # Indices where condition
    np.where(arr > 5, x, y)                # x if condition else y
    np.argwhere(arr > 5)                   # Indices (2D format)
    np.nonzero(arr)                        # Indices of non-zero
    np.flatnonzero(arr)                    # Flat indices of non-zero
    np.searchsorted(arr, 5)                # Index to insert 5
    np.extract(arr > 5, arr)               # Extract elements
    Python

    Set Operations

    np.unique(arr)                         # Unique elements (sorted)
    np.unique(arr, return_counts=True)     # With counts
    np.unique(arr, return_index=True)      # With first indices
    np.in1d(arr1, arr2)                    # Test membership
    np.intersect1d(arr1, arr2)             # Intersection
    np.union1d(arr1, arr2)                 # Union
    np.setdiff1d(arr1, arr2)               # Set difference
    np.setxor1d(arr1, arr2)                # Symmetric difference
    Python

    Miscellaneous

    np.clip(arr, 0, 10)                    # Clip values to [0, 10]
    np.piecewise(x, [x<0, x>=0], [lambda x: 0, lambda x: x])  # Piecewise
    np.select([cond1, cond2], [val1, val2]) # Select based on conditions
    np.where(condition, x, y)              # Ternary operator
    np.choose(indices, [arr1, arr2, arr3]) # Choose from list
    np.vectorize(func)                     # Vectorize function
    np.apply_along_axis(func, 0, arr)      # Apply function along axis
    np.apply_over_axes(func, arr, [0,1])   # Apply over multiple axes
    Python

    🎭 Advanced Indexing

    Mesh Grids

    x = np.linspace(0, 5, 5)
    y = np.linspace(0, 3, 3)
    X, Y = np.meshgrid(x, y)               # 2D coordinate matrices
    X, Y = np.mgrid[0:5:5j, 0:3:3j]        # Using mgrid
    X, Y = np.ogrid[0:5:5j, 0:3:3j]        # Open meshgrid (1D arrays)
    Python

    Index Tricks

    np.ix_([0, 1], [2, 3])                 # Index mesh for fancy indexing
    np.r_[1:4, 0, 4]                       # Concatenate slices
    np.c_[arr1, arr2]                      # Column stack shortcut
    np.s_[::2]                             # Slice object
    np.indices((3, 3))                     # Index arrays
    np.unravel_index(7, (3, 3))            # Convert flat index to coords
    np.ravel_multi_index([[0,1], [1,2]], (3,3))  # Coords to flat
    Python

    🧪 Special Arrays

    Structured Arrays

    # Define dtype
    dt = np.dtype([('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
    
    # Create structured array
    arr = np.array([('Alice', 25, 55.5), 
                    ('Bob', 30, 70.0)], dtype=dt)
    
    arr['name']                            # Access by field name
    arr[0]                                 # Access by row
    arr[['name', 'age']]                   # Multiple fields
    Python

    Masked Arrays

    import numpy.ma as ma
    
    # Create masked array
    data = np.array([1, 2, -999, 4, -999, 6])
    masked = ma.masked_equal(data, -999)
    
    # Operations ignore masked values
    masked.mean()                          # 3.25
    masked.sum()                           # 13
    
    # Manual masking
    mask = [False, False, True, False, True, False]
    masked = ma.array(data, mask=mask)
    Python

    Character Arrays

    np.char.add(['Hello'], [' World'])     # String concatenation
    np.char.multiply('Ha', 3)              # 'HaHaHa'
    np.char.upper(['hello', 'world'])      # Uppercase
    np.char.lower(['HELLO', 'WORLD'])      # Lowercase
    np.char.strip(['  hello  '])           # Strip whitespace
    np.char.replace('hello', 'l', 'L')     # Replace
    np.char.split('hello world')           # Split
    np.char.join('-', ['hello', 'world'])  # Join
    Python

    ⚡ Performance Tips

    Vectorization

    # Bad: Loop
    result = np.zeros(len(arr))
    for i in range(len(arr)):
        result[i] = arr[i] ** 2
    
    # Good: Vectorized
    result = arr ** 2
    Python

    Broadcasting

    # Bad: Explicit loop
    for i in range(arr.shape[0]):
        arr[i] += vector
    
    # Good: Broadcasting
    arr += vector
    Python

    In-Place Operations

    arr += 1                               # In-place (no copy)
    arr = arr + 1                          # Creates new array
    np.add(arr, 1, out=arr)                # Explicit in-place
    Python

    Memory Views vs Copies

    view = arr[::2]                        # View (no copy)
    copy = arr[::2].copy()                 # Explicit copy
    arr.base is None                       # True if owns data
    view.base is arr                       # True if view of arr
    Python

    🎓 Common Patterns

    Normalize Array

    # Z-score normalization
    normalized = (arr - arr.mean()) / arr.std()
    
    # Min-max normalization
    normalized = (arr - arr.min()) / (arr.max() - arr.min())
    Python

    Distance Matrix

    from scipy.spatial.distance import cdist
    # Or manually:
    X = np.random.rand(100, 2)
    dist = np.sqrt(((X[:, None] - X) ** 2).sum(axis=2))
    Python

    One-Hot Encoding

    labels = np.array([0, 1, 2, 1, 0])
    n_classes = 3
    one_hot = np.eye(n_classes)[labels]
    Python

    Moving Average

    window = 3
    weights = np.ones(window) / window
    moving_avg = np.convolve(arr, weights, mode='valid')
    Python

    Polynomial Fitting

    x = np.array([0, 1, 2, 3, 4])
    y = np.array([0, 1, 4, 9, 16])
    coeffs = np.polyfit(x, y, 2)           # Fit 2nd degree polynomial
    poly = np.poly1d(coeffs)               # Create polynomial
    y_pred = poly(x)                       # Predict
    Python

    🔗 Integration with Other Libraries

    With Pandas

    import pandas as pd
    df = pd.DataFrame(arr)                 # Array to DataFrame
    arr = df.values                        # DataFrame to array
    arr = df.to_numpy()                    # Recommended method
    Python

    With Matplotlib

    import matplotlib.pyplot as plt
    plt.plot(arr)                          # Plot array
    plt.imshow(arr)                        # Display 2D array as image
    plt.hist(arr.flatten(), bins=50)       # Histogram
    Python

    With PIL/Pillow

    from PIL import Image
    img_array = np.array(Image.open('image.jpg'))
    img = Image.fromarray(arr.astype('uint8'))
    Python

    📚 Quick Reference Table

    OperationSyntaxDescription
    Creation
    From listnp.array([1,2,3])Create from list
    Zerosnp.zeros((3,4))3×4 array of zeros
    Onesnp.ones((2,3))2×3 array of ones
    Rangenp.arange(10)0 to 9
    Linspacenp.linspace(0,1,5)5 evenly spaced values
    Identitynp.eye(3)3×3 identity matrix
    Indexing
    Single elementarr[i,j]Element at row i, col j
    Slicearr[1:3,:]Rows 1-2, all columns
    Booleanarr[arr>5]Elements > 5
    Fancyarr[[0,2,4]]Elements at indices 0,2,4
    Math
    Addarr + 5Add 5 to each element
    Multiplyarr * 2Multiply by 2
    Powerarr ** 2Square each element
    Sqrtnp.sqrt(arr)Square root
    Expnp.exp(arr)e^x
    Lognp.log(arr)Natural log
    Aggregate
    Sumnp.sum(arr)Sum all elements
    Meannp.mean(arr)Average
    Min/Maxnp.min(arr), np.max(arr)Minimum, maximum
    Stdnp.std(arr)Standard deviation
    Shape
    Reshapearr.reshape(3,4)Change shape to 3×4
    Flattenarr.flatten()Convert to 1D
    Transposearr.TSwap rows and columns
    Join/Split
    Concatenatenp.concatenate([a,b])Join arrays
    Stacknp.vstack([a,b])Stack vertically
    Splitnp.split(arr, 3)Split into 3 parts
    Linear Algebra
    Dot productnp.dot(A,B) or A @ BMatrix multiplication
    Inversenp.linalg.inv(A)Matrix inverse
    Determinantnp.linalg.det(A)Determinant
    Eigenvaluesnp.linalg.eig(A)Eigenvalues & vectors
    Random
    Random floatsnp.random.rand(3,4)Uniform [0,1)
    Random intsnp.random.randint(0,10,5)Integers [0,10)
    Normal distnp.random.randn(100)Standard normal
    Choicenp.random.choice([1,2,3])Random selection

    Happy NumPy coding! 🚀


    Discover more from Altgr Blog

    Subscribe to get the latest posts sent to your email.

    Leave a Reply

    Your email address will not be published. Required fields are marked *