Factor Analysis And Its Applications | Understanding Factor Analysis
Let's say, your data-set contains 200 variables.
Can you imagine how cumbersome its gonna be if you analyse your data-set using all the 200 variables?
Using Factor Analysis you can reduce a large number of variables into a smaller set of variables (factors), which is capable of explaining the observed variance in the larger number of variables.
In short, Factor Analysis summarizes your large data-set so that relationships and patterns can be easily interpreted and understood.
Five steps to Factor Analysis:
- Create a correlation matrix for all the variables
- Factor Extraction
- Calculate Initial Factor Loadings
- Factor Rotation
- Calculation of Factor Scores
Correlation Matrix
- It searches for variables that are strongly correlated to each other.
- If the correlation between variables are relatively small, it is very unlikely that they share a common factor.
- It focuses to extract factors that accounts for as much variation in the observed variables as possible.
Factor Extraction
- The main purpose of Factor Analysis is to identify combinations of variables, and those combinations are called factors.
- Different Factor Extraction methods:
-- Maximum Likelihood
-- Principal axis factoring
-- Unweighted Least Square
-- Generalized Least Square
-- Image Factoring
How to decide the number of factors?
- Look for the Factor Correlation - If correlation between factors are too high (> 0.7) then there is a high possibility that factors are pretty similar and in this case, merge the two related factors.
- Easily Explainable? Are you able to easily interpret and explain associated items of the each factors?
- The more items are present in a factor, there is a higher chances to consider it for further analysis.
- It represents the correlation between the factor and the variable.
- It tells you how much a factor explains a variable.
- Factor Loadings close to:
=> -1 or 1 indicates that the factor strongly influences the variable
=> 0 indicates that the factor has a weak influence on the variable
- For example, lets say we have nine variables i.e. Algebra, Chemistry, Geometry, Physics, Game theory, Number theory, Set Theory, Probability, Biology
Subjects
|
Algebra
|
Chemistry
|
Geometry
|
Physics
|
Game theory
|
Number theory
|
Set Theory
|
Probability
|
Biology
|
Subjects
|
Factor-1
|
Factor-2
|
Algebra
|
0.788
|
0.542
|
Chemistry
|
0.368
|
0.912
|
Geometry
|
0.729
|
0.367
|
Physics
|
0.541
|
0.875
|
Game theory
|
0.891
|
0.333
|
Number theory
|
0.795
|
0.412
|
Set Theory
|
0.832
|
0.390
|
Probability
|
0.955
|
0.324
|
Biology
|
0.289
|
0.816
|
- Algebra, Geometry, Game Theory, Number System, Set, Theory and Probability have high Factor Loadings in Factor-1.
- Chemistry, Physics and Biology have high Factor Loading in Factor-2.
- Items of Factor-1 is associated to a common latent relationship and can also be labeled as 'Mathematics' and similarly Factor-2 can be labelled as 'Science'.
Factor Rotation
- Once the Initial Factor Loadings have been calculated, the factors are rotated.
- It is a process of manipulation or adjusting the factor axes in order to achieve a simpler and pragmatically more meaningful factor solution.
- Rotation creates a simpler factor structure and makes the factors more clearly distinguishable.
- Orthogonal Rotation - It assumes that factors are not correlated.
- Oblique Rotation - Unlike Orthogonal Rotation, it allows for factor correlation.
Factor Scores
- Factor Scores are the estimated value of the factors.
- It is used to prioritize and rank the factors.
- With the help of Factor Score, you may decide easily that which factors are more important or which factors you need to focus more.
- In most of the cases, you look for the Factor Scores (positive or negative) >= 0.7
- Initially the obtained Factor Score can be low but after some iteration it can be achieved to a high score.
Deciding questions before using Factor Analysis
- Is there are any outliers in data? Since it assumes that there are no outliers in data.
- Is there any multi-collinearity between the variables?
Since for Factor Analysis, there should not be any perfect multi-collinearity between the variables.
- What are the minimum number of factors that can explain all the variation of data-set?
- How well do these factors describe all the data?
Factor Rotation
- Once the Initial Factor Loadings have been calculated, the factors are rotated.
- It is a process of manipulation or adjusting the factor axes in order to achieve a simpler and pragmatically more meaningful factor solution.
- Rotation creates a simpler factor structure and makes the factors more clearly distinguishable.
- Orthogonal Rotation - It assumes that factors are not correlated.
- Oblique Rotation - Unlike Orthogonal Rotation, it allows for factor correlation.
Factor Scores
- Factor Scores are the estimated value of the factors.
- It is used to prioritize and rank the factors.
- With the help of Factor Score, you may decide easily that which factors are more important or which factors you need to focus more.
- In most of the cases, you look for the Factor Scores (positive or negative) >= 0.7
- Initially the obtained Factor Score can be low but after some iteration it can be achieved to a high score.
Deciding questions before using Factor Analysis
- Is there are any outliers in data? Since it assumes that there are no outliers in data.
- Is there any multi-collinearity between the variables?
Since for Factor Analysis, there should not be any perfect multi-collinearity between the variables.
- What are the minimum number of factors that can explain all the variation of data-set?
- How well do these factors describe all the data?
Can you explain more on orthogonal rotation and oblique rotation?
ReplyDeletecould you compare pca with factor analysis in some of your posts?
ReplyDeleteHow factor analysis is different from Clustering since in factor analysis also we group similar variables into dimension?
ReplyDelete