Objectives and purpose of factor analysis
Factor Analysis is extensively used in business research. The purpose of factor analysis in business research is to reduce the number of variables by using lesser number of surrogate variables (factors) while retaining the variability.
The primary objective is to capture some psychological states of customers/ respondents that cannot be measured directly. This is usually implemented by constructing a scale that measures attitudes, perceptions, motivations, etc of targeted customers of a specific line of business.
Initially, the targeted customers/ respondents are subjected to answer a detailed questionnaire with different lifestyle statements along with numeric or verbal scales. Through this, what really are measured are customer’s views regarding few dimensions (factors) concerning the success of a business. These dimensions are psychological states that cannot be measured directly. So, factor analysis is used to assess these dimensions (factors) indirectly.
So, factor analysis is primarily used to simplify a data set before subjecting it to multivariate analysis – multiple regression, etc.
Developing a research plan for Factor analysis.
The first step in conducting factor analysis is to develop a research problem. In that regard, one has to identify the problem that includes several tasks. First, the objectives of factor analysis should be identified. Then, the variables to be included in the factor analysis need to be specified. The variables are included mainly based on past research, or judgment. Moreover, it is very important that the variables be appropriately measured on an interval scale or ratio scale.
The nest task is to determine the sample size. As a rough guideline, sample size should be at least 4 to 5 times the number of variables included.
For example, Wills Lifestyle wanted to study the general preference of people within age group of 22 – 30 in terms of the type of garments they wear and monetary spending habit.
So, they first identified their objective as:
“To study the general preference of people within age group of 22 – 30 in terms of the type of garments they wear (formal versus casual) and monetary spending habit (spendthrift and stingy)”
Then, it specified following lifestyle statements as variables to be included in factor analysis:
V1: I like to go for shopping frequently
V2: I like to go to parties frequently
V3: I like wearing formals quite often
V4: I have official meetings frequently
V5: I would rather save money than spend
V6: I prefer old things (clothes, shoes, etc) versus new ones
Assumptions of Factor analysis.
Following are the assumptions that factor analysis entails:
The variables included can be condensed into one or more underlying factors
As a data reduction technique, factor analysis assumes that a research involves multiple dependent variables
The sample is homogeneous in the sense that the sub groups within the sample will not have different patterns of scores on the variables included in the analysis
The factors are related to the score of each variable in linear manner
The specific factors or errors all have mean zero. It is assumed that these errors are random. So they will have a mean and the mean we will assume will be zero here
The observed correlation between variables can be attributed to the common factors
The common factors, the f’s, have mean zero. All of the unobserved explanatory variables will have a mean of zero
The common factors have variance one. These unobserved common factors are all assumed to have a variance of one.
The variance of specific factor i is Èi. Or, the errors or specific factors are assumed to have variances, Èi, for the ith specific factor. Here, Èi is called the specific variance.
The common factors are uncorrelated with one another:
The specific factors are uncorrelated with one another:
The specific factors are uncorrelated with the common factors:
Developing a Factor analysis model.
In developing the model of factor analysis, we address issues of sample size and variables as the first step.
For example, the following variables are lifestyle statements that can be put on a questionnaire along with a 7 point scale (1 = strongly disagree, 7 = strongly agree)
The variables to be factor analyzed:
V1: I like to go for shopping frequently
V2: I like to go to parties frequently
V3: I like wearing formals quite often
V4: I have official meetings frequently
V5: I would rather save money than spend
V6: I prefer old things (clothes, shoes, etc) versus new ones
The sample size:
The questionnaire is subjected to be answered by 25 respondents.
Ratio of sample size to number of variables must be at least 4:
The ratio of sample size to number of variables is 25/4 – greater than 4. Therefore, this requirement is also met.
Mathematically, we develop the factor analysis model as given below:
Xi = Ai1F1 + Ai2F2 + ……………..+ AimFm + ViUi
Where
Xi = ith standardized variable
Aij = standardized multiple regression coefficient of variable I on common factor j
F = common factor
Vi = standardized regression coefficient of variable I on unique factor i
Ui = the unique factor for variable i
m = number of common factors
The unique factors are uncorrelated with each other and with common factors. The common factors can be expressed as linear combinations of the observed variables:
Fi = Wi1X1 + Wi2X2 + …………………+ WikXk
Where
Fi = estimate of ith factor
Wi = weight or factor score coefficient
k = number of variables
Now, the weights or factor score coefficients are selected such that the first factor explains the largest portion of the total variance. Accordingly, a second set of weights are selected so that second factor accounts for most of the residual variance, subject to being uncorrelated with the first factor.
Deriving factors
The primary objective behind conducting factor analysis is data reduction and summarization. Therefore, it is imperative that we condense the original set of variables into smaller number of factors.
Several criteria are examined to determine the number of factors that would ideally represent the data:
A Priori Determination:
Here, the researcher knows how many factors to expect and thus can specify the number of factors to be extracted beforehand.
Determination based on Eigen values:
In this approach, only factors with Eigen values greater than 1.0 are retained. (Eigen Value represents the total variance explained by each factor)
Determination based on Scree Plot:
A scree plot is a plot of Eigen values against the number of factors in order of extraction. Typically, the plot consists of a distinct break between the steep slope of factors. Empirically, the number of factors is determined at the point the scree begins.
In the above figure, the scree begins at factor 4. So, number of factors to be included is four in the above example.
Determination based on percentage of variance:
In this approach, the number of factors is determined when the cumulative percentage of variance extracted by the factors reaches a satisfactory level. The level of
satisfaction depends upon the research problem at hand. But, it is recommended to extract those factors that account for at least 60 percent of the variance.
Determination based on split-half reliability:
Here, factor analysis is performed on two halves of the data set after splitting it into two halves. Then, only with high correspondence of factor loadings across the two subsamples are retained.
Determination based on significance tests:
Here, statistical significance of the separate Eigen values is determined. Then only those factors are retained that are statistically significant. A drawback of this method is observed in case of large samples, wherein many factors are statistically significant, although they represent only small portion of the total variance.
Assessing model fit including statistical significance of the model.
Confirmatory factor analysis allows a statistical test of how well a priori specified factor model explains the observed pattern of sample correlations or covariance, commonly referred to as `model fit’. Model fit can be assessed using indices of overall fit (e.g. chi – square, GFI, RMSR), incremental fit (e.g. NFI, CFI), the root mean square error of approximation (RMSEA) and Hoelter’s critical N (CN).
It has been suggested that model fit should be evaluated using information from these different families of indices. The chi- square test reported by Greenspoon and Saklofske is highly statistically significant, implying that the model does not satisfactorily account for the data. This is dismissed by the authors on the grounds of excessive power due to the large sample size.
More acceptable ways of assessing model fit include using the Root Mean Square Error of
Approximation and a class of indices known collectively as the incremental fit indices. The calculation of the RMSEA uses the chi – square value of the model, in conjunction with the sample size and a correction for the complexity of the model (degrees of freedom) to ensure that these factors do not affect the decision to reject or accept the model.
An additional advantage of the RMSEA is that it has a known sampling distribution, and
therefore confidence limits can be calculated. The significance value of the chi – square tests the null hypothesis of exact fit, which will always be false, and therefore if the sample size is sufficiently large the power of the test will ensure that the model is rejected. To overcome this problem Browne and Cudeck propose a test of close fit, which tests the null hypothesis that RMSEA is greater than 0.05.
Another way to determine the model fit is to find the difference between the observed correlations (as given in the correlation matrix) and the reproduced correlations (as estimated from the factor matrix). These differences are called residuals. Presence of many large residuals signifies that the factor model is not ‘fit’ and should be reconsidered.
Interpreting results.
After the extraction of factors is done successfully and the model is deemed ‘fit’, the resulting factor matrix, which shows the relationship of the original variables to the extracted factors, is rotated to make it easier to interpret. This is done because in the un-rotated matrix, the factors are correlated to many variables. The coefficients in the factor matrix are used to interpret the factors. The axes are rotated about the origin so that they are located as close to the clusters of related variables as possible.
VARIMAX rotation is the most common rotation (orthogonal) type. In this the axes are rotated at right angles to each other so that the factors are not correlated with each other.
Oblique Rotation permits the factors to be correlated and is considered to be more realistic method of analysis.
First step in this stage is to determine whether there are any variables that need to be eliminated from the factor solution. A variable can be eliminated for two reasons. First, a variable is eliminated if the communality of a variable is low, i.e. less than 0.5, which means that the factor explains only half of the variance in the original variable. Second, a variable may have loadings below the criteria level on all factors, i.e. it doesn’t have a strong relationship with any factor.
The next step is to identify the variables that have large loadings on the same factor. That factor can then be interpreted in terms of the variables that load high on it. If a factor cannot be clearly defined in terms of the original variables, it should be labeled as an undefined or a general factor.
The last step after elimination of variables that do not belong to the factor analysis and identifying the variables with large factor loadings is to name the factors that are obtained. This ensures that the factor solution is conceptually valid.
Analysis of the communalities:
Once the factors are extracted successfully, we examine the table of ‘Communalities’ which tells us how much of the variance of original variables is explained by factors.
If the communality of any variable is less than 0.5, then it should be excluded from the analysis.
For example, in the table of communalities shown below, all the variables will be included in further analysis.
Analysis of the factor loadings:
After we are satisfied that the factor solution explains sufficient variance for all the variables, we examine the Rotated Factor Matrix to check whether every variable has a substantial loading on only one factor or not.
For example, if we consider substantial loading to be 0.4 or higher, then the following table shows substantial loadings in green.
In the above example, all the variables showed substantial loading on one and only one factor. If this is not the case, then we re-run the factor analysis, excluding those variables that do not have substantial loadings on only one factor, one at a time.
Naming the factors:
Once we get a pattern of loadings by following the above steps, we name the factors according to their substance and subjective content.
Validating Factor analysis result.
In this stage, we are concerned with the issue of generalizability of the factor model we have derived. Herein, two issues are examined:
The factor model stable and generalizable
The factor solution impacted by outliers, if any
The first issue is examined by using split half validation in SPSS tool.
Split Half Validation:
The strategy for examining the stability of the model is to do a split-half validation to see if the factor structure and the communalities remain the same.
Identification of outliers:
SPSS contains a strategy to identify outliers. SPSS computes the factor scores as standard scores with a mean of 0 and a standard deviation of 1. We can examine the factor scores to see if any are above or below the standard score size associated with extreme cases, i.e. +/-2.5.
An Indian case on Factor analysis: Saffola
Over the years, the brand Saffola has become renowned for its expertise in Heart Care, thanks to the consistent introduction of innovative product like heart healthy cooking oil and foods.
The case we will be discussing below is a market study done on `refined oil’ in New Delhi carried out a couple of years ago by Saffola in its early years with the below objectives:
Research Objectives:
To gauge the consumer awareness level about the existing brands of refined oil in market
To identify various sources of awareness
To analyze the pattern of consumption of cooking medium with special emphasis on refined oil
To analyze the pattern of purchase of refined oil (in terms of 1 Kg. tin, 2 Kg. tin, etc. and also loose purchasers)
To analyze and identify relevant factors that are considered while purchasing any refined oil
To analyze the perception of consumers about the important attributes of the various brands of refined oil
Methodology:
The method of a probability sampling was adopted covering about 160 households (161 to be exact) in all important areas of Delhi. The process of data collection involved a questionnaire to meet the research. The questionnaire was administered through the mechanism of personal interview in the households allowing full participation from the respondents.
Data analysis, interpretation of results and conclusions, drawn from the study are being discussed here.
Data Analysis:
Awareness of brands as revealed from memory recall.
No. of respondents = 161
Note: Because one respondent can recall more than one brand and even than 3 or 4, the percentage is not additive.
Distribution of awareness to Total Frequency:
Source of Awareness
No. of respondents = 161
Consumption pattern
Purchase pattern ( No of households buying)
Rank distribution (based on 130 households consuming refined oil out of 161)
Rating of attributes of different brands on a 0 to 10 scale:
Interpretation of results (based on data analysis)
Awareness
The research study revealed that the following brands have highest level of awareness: Postman had the highest level of awareness of 70.8% followed by Dalda of 56.5%, Ruby of 47.8%. Diamond of 43.5% and Saffola of 37.3% CORNOLA, came next in the awareness level of a low 8.1%. This indicated that the probable cause for its poor sales could be lack of awareness. The same trend was seen in the next table where the level of awareness was worked out as % to total frequency (No. of Calls) Postman 25.97%, Dalda 20.73%, Ruby
17.54%, Diamond 15.95%, Saffola 13.67%, and CORNOLA 2.96%.
Therefore one thing was conjectured that the awareness level of CORNOLA which was low could be the probable cause for its poor sales. If better sales performance and additional sales were to be generated, the present level of awareness should be increased ‘to a reasonably higher level.
How to increase awareness?
This was answered from table 2 which gives source of awareness, frequency of awareness, % of awareness’ source measured to total frequency and to total respondents. Advertisement was maximum 40.61%, followed by friends 27.92%, market shop 9.14% and personal use 16.78%. It was, therefore inferred that effective advertisement campaign could be the best method for increasing awareness. It was felt that it would be a good idea that the advertisement was not only in T.V. but also given in a magazine like ‘Femina’ which had a larger volume of circulation and was widely read by housewives.
3 Consumption Pattern
It was significant to note from the table that the monthly per capita consumption of refined oil was increasing from lower-middle income to middle income and upper income group. Let us look at the following table giving the per capita consumption for each group and for each type of cooking medium.
As per capita consumption of refined oil was maximum in upper income group, it was decided to concentrate on a priority basis on this income group, for promoting new refined oil like
4. Purchase pattern:
2 Kg. tin and 4 Kg tin appeared to be more popular in term of purchase. The purchase were 30% and 29% respectively for 2 Kg. tin and 4 Kg. tin About 23% were purchasing in loose. It was also revealing to note in posh locality 4 Kg. tin was more popular.(37% purchase). So concentration should be on 4 kg. tin and 2 Kg. tin with respect to purchase because they were more widely purchased.
5. Important factors (Ranked)
Odour, taste and cholesterol were the three most important factors that go with the purchase of any refined oil. Certainly if Saffola could match with others in respect of odour and taste of food cooked, it would definitely have a differential advantage in respect of cholesterol over the existing brands. This factor should be highlighted more in the advertisement as a special feature.
Conclusion:
The above study helped Saffola plan its marketing strategy with regards to the type of advertisement medium to be used, the most preferred tin size for sale and also the factor that they should concentrate upon. Though the taste and odour was what was revealed as the most important factor, Saffola felt the same was targeted by every other company and thus it went on to target health as its differentiating factor going forward.
The case of other Indian company:
JK Company is another Indian company that made use of factor analysis in conducting marketing research. The management felt that the Indian market is ready for inverters as the market for Direct Current (DC) generators is already saturated. The company also found that only a few local companies existed and they could not have the scale and technology that JK had. This looked as a great opportunity to JK and it decided to carry out a survey to understand the perceptions of the owners about the inverters. To do this assignment, the company hires Perfect Marketing Agency (PMA). PMA decides to conduct a survey among the owners of different generators/inverters in India. Factor Analysis technique was selected to handle the problem as it is different from Regression Analysis.
Managerial implications of the results.
Following are the managerial implications of the results of factor analysis:
The results of factor analysis signify the core reasons (factors) responsible for a specific consumer preference
It helps in relating the product to the consumer perceptions
It helps managers to attack and respond to few critical factors rather than multiple redundant variables
Data analysis in SPSS with screen shots.
Select ANALYZE from the SPSS menu bar
Click DIMENSION REDUCTION and then FACTOR
Move all the variables in the pop up that emerges to the other side, i.e. in the VARIABLES box
Click on DESCRIPTIVES. In the STATISTICS box in the popup window that emerges, check INITIAL SOLUTION.
In the Correlation Matrix box, check KMO AND BARTLETT’S TEST OF SPHERICITY. Also, check REPRODUCED. Click CONTINUE.
Click on EXTRACTION. In the popup window that emerges, select PRINCIPAL COMPONENTS (default) for METHOD. In the ANALYZE box, check CORRELATION MATRIX. In the EXTRACT box, select EIGENVALUES OVER and write ‘1’. In the display box, check UNROTATED FACTOR SOLUTION. Click CONTINUE.
Click on ROTATION. In the METHOD box, check VARIMAX. In the DISPLAY box, check ROTATED SOLUTION. Click CONTINUE.
Click on SCORES. In the popup window, check DISPLAY FACTOR SCORE COEFFICIENT MATRIX. Click CONTINUE.
Click OK.
When we click on OK, we get the output. The screenshots of output are as follows:
Order Now