This repository contains a Python implementation of Principal Component Analysis (PCA) applied to the Indian Pines hyperspectral dataset. The primary goal is to perform dimensionality reduction on the high-dimensional spectral data and analyze the resulting principal components. The project includes code for data loading, PCA implementation, Mean Squared Error (MSE) evaluation for reconstruction accuracy, and visualization of the principal eigenvectors.
The dataset used is the well-known Indian Pines hyperspectral dataset, acquired by the AVIRIS sensor over the Indian Pines test site in Northwestern Indiana. It consists of 145x145 pixels and 220 spectral bands. Ground truth data specifying 16 land cover classes is also available.
More information can be found here.
Note: The .mat data files (Indian_pines.mat and Indian_pines_gt.mat) are not included in this repository due to their size. You will need to download them separately from an appropriate source.
The analysis follows these key steps, as implemented in the accompanying Jupyter/Google Colab notebook:
- Data Loading: Loading the hyperspectral data (
Indian_pines.mat) and ground truth labels (Indian_pines_gt.mat). - Data Extraction: Extracting spectral vectors for selected land cover classes (e.g., classes 1 and 2 are used in the notebook for demonstration).
- Preprocessing: Calculating the mean vector across the selected data and centering the data by subtracting the mean.
- Covariance Matrix: Computing the sample covariance matrix of the centered data.
- Eigenvalue Decomposition: Performing Eigenvalue Decomposition on the covariance matrix to obtain eigenvalues and eigenvectors.
- Sorting Components: Sorting eigenvectors based on their corresponding eigenvalues in descending order.
- Dimensionality Reduction: Selecting the top K eigenvectors (principal components) to form the transformation matrix W.
- PCA Projection: Projecting the centered data onto the selected principal components to obtain the lower-dimensional PCA coefficients.
- Data Reconstruction: Reconstructing the data from the reduced K dimensions back to the original dimension space.
- Error Evaluation: Calculating the Mean Squared Error (MSE) between the original and reconstructed data to assess information loss.
- MSE vs. K Analysis: Plotting the average MSE as a function of the number of components (K) retained (from 1 to 220).
- Eigenvector Visualization: Plotting the first 3 principal eigenvectors to understand the patterns they capture.
- Clone Repository:
git clone <repository_url>
- Obtain Data: Download the
Indian_pines.matandIndian_pines_gt.matfiles from a suitable source (e.g., the link provided above). - Place Data: Put the
.matfiles in the same directory as the notebook, or modify the file paths within the notebook accordingly. - Run Notebook: Open the
PCA-Hyperspectral-Indian-Pines.ipynb(or the name you give it) notebook using Google Colab, Jupyter Notebook, or a compatible environment. - Execute Cells: Run the notebook cells sequentially to perform the analysis.
The primary Python libraries used are:
numpyscipy(for loading.matfiles)matplotlib(for plotting)h5py(may be needed byscipy.io.loadmat)
These are standard libraries often included in distributions like Anaconda or easily installable via pip. They are readily available in Google Colab.
The analysis demonstrates the effectiveness of PCA for dimensionality reduction on this hyperspectral dataset:
- MSE vs. K Plot: This plot (generated by the notebook) illustrates how reconstruction error decreases as more principal components are retained. It typically shows a sharp initial drop, indicating that a large portion of the variance is captured by the first few components.
- Principal Eigenvectors Plot: The visualization of the top 3 eigenvectors reveals the dominant spectral patterns identified by PCA across the 220 bands.
(Consider adding the generated plots directly into the README for visual reference)
PCA-Hyperspectral-Indian-Pines.ipynb: The main Jupyter/Colab notebook containing the code and analysis. (Please rename if you used a different name)README.md: This file..gitignore: Specifies intentionally untracked files.Indian_pinesIndian_pines_gt
- Apply classifiers like Linear Discriminant Analysis (LDA) or Support Vector Machines (SVM) to the reduced-dimension data obtained from PCA.
- Compare classification accuracy before and after PCA.
- Explore whitening the PCA coefficients before classification.
- Analyze the PCA results across different sets of classes from the dataset.