PCA on Indian Pines Hyperspectral Data

Introduction

This repository contains a Python implementation of Principal Component Analysis (PCA) applied to the Indian Pines hyperspectral dataset. The primary goal is to perform dimensionality reduction on the high-dimensional spectral data and analyze the resulting principal components. The project includes code for data loading, PCA implementation, Mean Squared Error (MSE) evaluation for reconstruction accuracy, and visualization of the principal eigenvectors.

Dataset

The dataset used is the well-known Indian Pines hyperspectral dataset, acquired by the AVIRIS sensor over the Indian Pines test site in Northwestern Indiana. It consists of 145x145 pixels and 220 spectral bands. Ground truth data specifying 16 land cover classes is also available.

More information can be found here.

Note: The .mat data files (Indian_pines.mat and Indian_pines_gt.mat) are not included in this repository due to their size. You will need to download them separately from an appropriate source.

Methodology

The analysis follows these key steps, as implemented in the accompanying Jupyter/Google Colab notebook:

Data Loading: Loading the hyperspectral data (Indian_pines.mat) and ground truth labels (Indian_pines_gt.mat).
Data Extraction: Extracting spectral vectors for selected land cover classes (e.g., classes 1 and 2 are used in the notebook for demonstration).
Preprocessing: Calculating the mean vector across the selected data and centering the data by subtracting the mean.
Covariance Matrix: Computing the sample covariance matrix of the centered data.
Eigenvalue Decomposition: Performing Eigenvalue Decomposition on the covariance matrix to obtain eigenvalues and eigenvectors.
Sorting Components: Sorting eigenvectors based on their corresponding eigenvalues in descending order.
Dimensionality Reduction: Selecting the top K eigenvectors (principal components) to form the transformation matrix W.
PCA Projection: Projecting the centered data onto the selected principal components to obtain the lower-dimensional PCA coefficients.
Data Reconstruction: Reconstructing the data from the reduced K dimensions back to the original dimension space.
Error Evaluation: Calculating the Mean Squared Error (MSE) between the original and reconstructed data to assess information loss.
MSE vs. K Analysis: Plotting the average MSE as a function of the number of components (K) retained (from 1 to 220).
Eigenvector Visualization: Plotting the first 3 principal eigenvectors to understand the patterns they capture.

How to Use

Clone Repository:
```
git clone <repository_url>
```
Obtain Data: Download the Indian_pines.mat and Indian_pines_gt.mat files from a suitable source (e.g., the link provided above).
Place Data: Put the .mat files in the same directory as the notebook, or modify the file paths within the notebook accordingly.
Run Notebook: Open the PCA-Hyperspectral-Indian-Pines.ipynb (or the name you give it) notebook using Google Colab, Jupyter Notebook, or a compatible environment.
Execute Cells: Run the notebook cells sequentially to perform the analysis.

Dependencies

The primary Python libraries used are:

numpy
scipy (for loading .mat files)
matplotlib (for plotting)
h5py (may be needed by scipy.io.loadmat)

These are standard libraries often included in distributions like Anaconda or easily installable via pip. They are readily available in Google Colab.

Key Results

The analysis demonstrates the effectiveness of PCA for dimensionality reduction on this hyperspectral dataset:

MSE vs. K Plot: This plot (generated by the notebook) illustrates how reconstruction error decreases as more principal components are retained. It typically shows a sharp initial drop, indicating that a large portion of the variance is captured by the first few components.
Principal Eigenvectors Plot: The visualization of the top 3 eigenvectors reveals the dominant spectral patterns identified by PCA across the 220 bands.

(Consider adding the generated plots directly into the README for visual reference)

Files in this Repository

PCA-Hyperspectral-Indian-Pines.ipynb: The main Jupyter/Colab notebook containing the code and analysis. (Please rename if you used a different name)
README.md: This file.
.gitignore: Specifies intentionally untracked files.
Indian_pines
Indian_pines_gt

(Optional) Future Work

Apply classifiers like Linear Discriminant Analysis (LDA) or Support Vector Machines (SVM) to the reduced-dimension data obtained from PCA.
Compare classification accuracy before and after PCA.
Explore whitening the PCA coefficients before classification.
Analyze the PCA results across different sets of classes from the dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PCA on Indian Pines Hyperspectral Data

Introduction

Dataset

Methodology

How to Use

Dependencies

Key Results

Files in this Repository

(Optional) Future Work

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
Indian_pines.mat		Indian_pines.mat
Indian_pines_gt.mat		Indian_pines_gt.mat
PCA-Hyperspectral-Indian-Pines.ipynb		PCA-Hyperspectral-Indian-Pines.ipynb
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

PCA on Indian Pines Hyperspectral Data

Introduction

Dataset

Methodology

How to Use

Dependencies

Key Results

Files in this Repository

(Optional) Future Work

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages