Understanding Principal Component Analysis


Component - A part of a system
Principle Component - A vital part of the system

Since we are talking Statistics, we are talking in a language of data.

Data may have many components (set of related features / variables) but when we talk about the most important components, we call it Principle Component Analysis (PCA).

So, PCA is a:
- Dimensional Reduction technique.
- It can be used to discover important features of a large data-set.

How does PCA reduces the dimension?
- It converts the original set of variables into a new set of variables (components).
- Then it selects a component with a larger variance. i.e. a component that captures maximum variance could be the most important component, similarly a component that captures second maximum variance could be the second most important component.

Why we are looking for maximized variance among all the components?
- Variance measures the variability of your data-set. The more variance means it captures most of the information.
- In short you look for the component that gives most of the information of your data-set.

When does PCA work well?
Well, when the variables are strongly correlated, it gives the best result. In other word, If the relationship is weak among variables, it is not so effective in reducing the dimension of the data-set.

Comments