How to Design and Report Likert Scale
Likert scale: A Likert scale (pronounced ‘lick-ert’) is a type of psychometric response scale often used in questionnaires, and is the most widely used scale in survey research. When responding to a Likert questionnaire item, respondents specify their level of agreement to a statement. The scale is named after Rensis Likert, who published a report describing its use (Likert, 1932).
Sample Question Presented Using A Five-Point Likert Scale
A typical test item in a Likert scale is a statement, the respondent is asked to indicate their degree of agreement with the statement. Traditionally a five-point scale is used, however many psychometricians advocate using a seven or nine point scale.
Ice cream is good for breakfast
- Strongly disagree
- Disagree
- Neither agree nor disagree
- Agree
- Strongly agree
Likert scaling is a bipolar scaling method, measuring either positive and negative response to a statement. Sometimes Likert scales are used in a forced choice method where the middle option of “Neither agree nor disagree” is not available. Likert scales may be subject to distortion from several causes. Respondents may avoid using extreme response categories (central tendency bias); agree with statements as presented (acquiescence response bias); or try to portray themselves or their group in a more favorable light (social desirability bias).
Scoring and analysis: http://www.answers.com/topic/likert-scale
After the questionnaire is completed, each item may be analyzed separately or item responses may be summed to create a score for a group of items. Hence, Likert scales are often called summative scales.
Responses to a single Likert item are normally treated as ordinal data, because, especially when using only five levels, one cannot assume that respondents perceive the difference between adjacent levels as equidistant. When treated as ordinal data, Likert responses can be analyzed using non-parametric tests, such as the Mann-Whitney test, the Wilcoxon signed-rank test, and the Kruskal-Wallis test.
When responses to several Likert items are summed, they may be treated as interval data measuring a latent variable. If the summed responses are normally distributed, parametric statistical tests such as the analysis of variance can be applied.
Examples:
- Attitudes toward Computer (20 questions, but Each participant gets one score, summed)
- Some of the personality test (same way)
Data from Likert scales are sometimes reduced to the nominal level by combining all agree and disagree responses into two categories of “accept” and “reject”. The Cochran Q, or McNemar-Test are common statistical procedures used after this transformation.
Example Of A Likert Scale (Ordinal) Survey
And Data Analysis
Data set: the one posted on Course Website: Cultural differences in online Learning
See the Survey at: http://surveymonkey.com/s.asp?u=61201883523 (Question 4—Students’ Perceptions on Teachers and Teaching in General)
Mini-Data Analysis for two Class Periods
- Set the data correctly (Data, Variable)
- Analyze Data_Practice_continuous (Descriptive Analysis, first time, leave “factor” unchecked; second time, check it, compare the results)
- Look at the results and see what conclusions can you draw?
- Analyze Data_Practice_Ordinal (Inferential statistics)
Summary of Data: (Descriptive analysis)—generated by SurveyMonkey
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Note: however, this summary does not give us Mean and SD, still I needed to analyze the raw data.
Data: (Part of the Raw Data-SurveyMonkey will give you a numerical version)
Culture |
QS |
|||||
Open-Ended Response |
Q1Wisdom |
Q2Respect |
Q3Equal |
Q4rulesconduct |
Q5experts |
Q6formalmanner |
Chinese |
2 |
1 |
2 |
2 |
1 |
5 |
Chinese |
2 |
2 |
2 |
1 |
1 |
4 |
Chinese |
2 |
2 |
2 |
2 |
2 |
2 |
Chinese |
1 |
1 |
2 |
1 |
1 |
2 |
Chinese |
1 |
1 |
2 |
2 |
1 |
2 |
Chinese |
3 |
1 |
2 |
2 |
4 |
4 |
Chinese |
1 |
1 |
1 |
1 |
2 |
4 |
American |
2 |
2 |
4 |
2 |
1 |
3 |
American |
1 |
1 |
3 |
1 |
2 |
2 |
American |
2 |
4 |
4 |
2 |
3 |
4 |
American |
1 |
1 |
4 |
1 |
1 |
1 |
american |
2 |
2 |
3 |
1 |
1 |
3 |
American |
1 |
1 |
2 |
2 |
1 |
4 |
American |
2 |
2 |
3 |
2 |
2 |
2 |
american |
2 |
2 |
2 |
2 |
3 |
5 |
American |
2 |
2 |
4 |
1 |
2 |
1 |
American |
2 |
2 |
1 |
4 |
1 |
4 |
Descriptive Analysis: (from 1 strongly agree to 5 strongly disagree)
n |
Mean |
SD |
||||
Q1Wisdom |
128 |
1.750 |
0.6148 |
|||
Q2Respect |
128 |
1.875 |
0.6987 |
|||
Q3Equal |
127 |
2.669 |
0.9682 |
|||
Q4rulesconduct |
127 |
2.031 |
0.8351 |
|||
Q5experts |
127 |
1.701 |
0.7592 |
|||
Q6formalmanner |
128 |
3.047 |
1.0338 |
|||
Note: Likert Scale has to be set as Continuous Data in order for Analyse-it to run descriptive Statistics; not accurate but Acceptable. |
||||||
The rigorous analysis is to get a Weighted mean, which Analyse-it does not do. Often times, researchers go right into Inferential Statistics and Skip the Descriptive Statistics, since it is less informative. |
[End of Descriptive Analysis]
Inferential Analysis on the differences among the three groups (American, Chinese, Korean) For later discussion
1) Participants’ perceptions on teacher and teaching in general (pre-survey): Item 4 on the pre-survey assessed participants’ perceptions and expectations on teacher and teaching in general. The three questions that are closely related to sense of Power Distance were analyzed inferentially with the Kruskal-Wallis Analysis of Variance test, with cultural identity being the independent variable. The results indicate that:
a) there were significant differences in participants’ perceptions about being equal with their instructor. The Korean group had the highest mean rank (45.53) on a scale of 1 (strongly agree) to 5 (strongly disagree). By contrast, the Anglo-American group had the lowest mean rank (29.77) and therefore perceived their instructors more as equals.
b) There was no significant difference in participants’ perceptions about rules of conduct in online classes. The Chinese group had the lowest mean rank (29.36), an indication of a stronger agreement about implementing specific rules of conduct. This result aligned with some of their narrative comments about “feeling lost” and hoping for more guidance.
And c) There was highly significant differences in their perceptions on course conduct. Again the Chinese had the lowest mean rank, an indication of a stronger agreement about conducting courses in a formal manner.
2) Post-survey: approaching superior and peer when completing individual assignments and team work: Other Responses to the post survey that reflect the impact of Power Distance include: a) Learners’ comfort level in approaching the instructor/facilitator/TA for help with individual assignments and/or teamwork; and b) Their comfort level in approaching the peers for help with individual assignments and/or teamwork. Participants rated their comfort level from very comfortable (1), to somewhat comfortable (2), uncomfortable (3), and to very uncomfortable (4). The lower their mean rating, the higher their comfort level. Kruskal-Wallis Analysis of Variance was used again to compare the mean differences in participants’ ranking of comfort level in approaching “superior” or their peers, when completing individual assignments and team work if applicable.
[Note: Because the regular Mean of Likert Scale does not make much sense, I skipped the descriptive Analysis and Went right into Inferential Analysis—Analysis of Variance using the Non-Parametric Kruskal-Wallis statistic]
Item 1. Individual Assignment: Approaching Superior for Help (two-tailed test)
O. IA: Approach “Superior” |
n |
Rank sum |
Mean rank |
|
American |
31 |
950.0 |
30.65 |
|
Chinese |
15 |
682.5 |
45.50 |
|
Korean |
29 |
1217.5 |
41.98 |
|
Kruskal-Wallis statistic |
7.15 |
|||
p |
0.0280 |
chisqr approximation, corrected for ties) |
||
When the level of significance is set at 0.05 (a), the small p value (0.02) indicates significant difference in participants’ rating for approaching “superior” in individual assignment. The American group, not surprisingly, had the lowest mean rank (30.65), an indication of greater comfort level in approaching the instructors for help; and the Chinese group had the highest mean rank (45.50) and thus lower comfort level in approaching their instructors.
Item 2. Individual Assignment: Approaching Peer for Help (two-tailed test)
n |
73 |
cases excluded: 2 due to missing values) |
|||
P. IA: Approach Peer by Group |
n |
Rank sum |
Mean rank |
||
American |
31 |
911.5 |
29.40 |
||
Chinese |
15 |
372.5 |
24.83 |
||
Korean |
27 |
1417.0 |
52.48 |
||
Kruskal-Wallis statistic |
26.46 |
||||
p |
<0.0001 |
chisqr approximation, corrected for ties) |
When a=0.05, the small p value (<0.0001) indicates highly significant differences in participants’ comfort level in approaching peers for help with individual assignments. The Chinese group had the lowest mean rank (24.83–higher comfort level), while the Korean group had the lowest mean rank (52.48–lower comfort level).
Item 3. Teamwork: Approaching Superior for Help
n |
58 |
cases excluded: 17 due to missing values) |
|||
U. Team: Approach “Superior” by Group |
n |
Rank sum |
Mean rank |
||
American |
31 |
814.5 |
26.27 |
||
Chinese |
15 |
509.0 |
33.93 |
||
Korean |
12 |
387.5 |
32.29 |
||
Kruskal-Wallis statistic |
2.88 |
||||
p |
0.2364 |
chisqr approximation, corrected for ties) |
P=0.236 (>a=0.05) indicates no significant difference in participants’ comfortableness in approaching superiors for help when completing teamwork.
Item 4. Teamwork: Approaching Peer for Help (two-tailed test)
n |
58 |
cases excluded: 17 due to missing values) |
|||
V. Team: Approach Peer by Group |
n |
Rank sum |
Mean rank |
||
American |
31 |
806.5 |
26.02 |
||
Chinese |
15 |
319.5 |
21.30 |
||
Korean |
12 |
585.0 |
48.75 |
||
Kruskal-Wallis statistic |
24.80 |
||||
p |
<0.0001 |
chisqr approximation, corrected for ties) |
The high Kruskal-Wallis statistic (24.8) and the small p value (<0.0001) again indicates highly significant difference in participants’ comfort level in approaching peers for help with team work. The Korean group (mean rank = 48.75) contributed greatly to this difference. However, the statistical power might have been reduced in this test because of the 17 missing rating values from the Korean group. As mentioned in the curriculum analysis, many of the Korean courses did not involve team work and many chose “non applicable” for this survey question.
Summary: Influence of power distance evidenced by the four tests: Conforming to the existing findings about Power Distance, the American group (mainly Anglo-American) had the lowest PDI score, while the Chinese group had the highest PDI score. Possibly because of their sense of PDI, the American group felt the most comfortable in approaching their instructors for help, while the Korean group felt most uncomfortable in doing so. Chinese students, because of their large class size, did not have much opportunity to interact with the instructors. Still, their reported comfort level in approaching the instructors was low. As to approaching their peers for help, the Chinese group felt the most comfortable in completing both individual assignments and team work, the American group felt comfortable, while the Korean group felt the least comfortable in completing both individual assignments and teamwork. Again, the Koreans’ cultural perceptions on CMC might have influenced their ratings here. As some of the Korean participants commented, peers or classmates online can be “strangers.” As to the high comfort level of the Chinese, it is worth noting that most of these Chinese students worked in self-formed teams and they therefore were comfortable about approaching their peers for help.
The four Kruskal-Wallis analyses on the post-survey items had revealing results. Although there was no significant difference in the three groups’ comfort level in approaching superiors for help with team work, there were significant differences in their rating for approaching superiors in individual assignments, and there were highly significant differences in their levels of comfort in approaching peers for help with individual assignments and with team work. Power Distance indeed affected students’ ways in approaching instructors and their peers. By contrast, individuals were able to overcome their sense of Power Distance when working as a group. In other words, individuals became “braver” when working as a team to approach their instructors for help.
(From Wang’s Cultural Studies of Online Learning, British Journal of Educational Technology)
More Likert Scale Examples: http://www.socialresearchmethods.net/kb/scallik.htm
Defining the Focus. As in all scaling methods, the first step is to define what it is you are trying to measure. Because this is a unidimensional scaling method, it is assumed that the concept you want to measure is one-dimensional in nature. You might operationalize the definition as an instruction to the people who are going to create or generate the initial set of candidate items for your scale.
Generating the Items. next, you have to create the set of potential scale items. These should be items that can be rated on a 1-to-5 or 1-to-7 Disagree-Agree response scale. Sometimes you can create the items by yourself based on your intimate understanding of the subject matter. But, more often than not, it’s helpful to engage a number of people in the item creation step. For instance, you might use some form of brainstorming to create the items. It’s desirable to have as large a set of potential items as possible at this stage, about 80-100 would be best.
Rating the Items. The next step is to have a group of judges rate the items. Usually you would use a 1-to-5 rating scale where:
1. = strongly unfavorable to the concept
2. = somewhat unfavorable to the concept
3. = undecided
4. = somewhat favorable to the concept
5. = strongly favorable to the concept
Administering the Scale. You’re now ready to use your Likert scale. Each respondent is asked to rate each item on some response scale. For instance, they could rate each item on a 1-to-5 response scale where:
1. = strongly disagree
2. = disagree
3. = undecided
4. = agree
5. = strongly agree
There are a variety possible response scales (1-to-7, 1-to-9, 0-to-4). All of these odd-numbered scales have a middle value is often labeled Neutral or Undecided. It is also possible to use a forced-choice response scale with an even number of responses and no middle neutral or undecided choice. In this situation, the respondent is forced to decide whether they lean more towards the agree or disagree end of the scale for each item.
The final score for the respondent on the scale is the sum of their ratings for all of the items (this is why this is sometimes called a “summated” scale). On some scales, you will have items that are reversed in meaning from the overall direction of the scale. These are called reversal items. You will need to reverse the response value for each of these items before summing for the total. That is, if the respondent gave a 1, you make it a 5; if they gave a 2 you make it a 4; 3 = 3; 4 = 2; and, 5 = 1.
Example: The Employment Self Esteem Scale
Here’s an example of a ten-item Likert Scale that attempts to estimate the level of self esteem a person has on the job. Notice that this instrument has no center or neutral point — the respondent has to declare whether he/she is in agreement or disagreement with the item.
INSTRUCTIONS: Please rate how strongly you agree or disagree with each of the following statements by placing a check mark in the appropriate box.
Strongly Disagree |
Somewhat Disagree |
Somewhat Agree |
Strongly Agree |
1. I feel good about my work on the job. |
Strongly Disagree |
Somewhat Disagree |
Somewhat Agree |
Strongly Agree |
2. On the whole, I get along well with others at work. |
Strongly Disagree |
Somewhat Disagree |
Somewhat Agree |
Strongly Agree |
3. I am proud of my ability to cope with difficulties at work. |
Strongly Disagree |
Somewhat Disagree |
Somewhat Agree |
Strongly Agree |
4. When I feel uncomfortable at work, I know how to handle it. |
Strongly Disagree |
Somewhat Disagree |
Somewhat Agree |
Strongly Agree |
5. I can tell that other people at work are glad to have me there. |
Strongly Disagree |
Somewhat Disagree |
Somewhat Agree |
Strongly Agree |
6. I know I’ll be able to cope with work for as long as I want. |
Strongly Disagree |
Somewhat Disagree |
Somewhat Agree |
Strongly Agree |
7. I am proud of my relationship with my supervisor at work. |
Strongly Disagree |
Somewhat Disagree |
Somewhat Agree |
Strongly Agree |
8. I am confident that I can handle my job without constant assistance. |
Strongly Disagree |
Somewhat Disagree |
Somewhat Agree |
Strongly Agree |
9. I feel like I make a useful contribution at work. |
Strongly Disagree |
Somewhat Disagree |
Somewhat Agree |
Strongly Agree |
10. I can tell that my coworkers respect me. |
Usability Glossary: Likert scale
a type of survey question where respondents are asked to rate the level at which they agree or disagree with a given statement. For example:
I find this software easy to use. strongly disagree 1 2 3 4 5 6 7 strongly agree
A Likert scale is used to measure attitudes, preferences, and subjective reactions. In software evaluation, we can often objectively measure efficiency and effectiveness with performance metrics such as time taken or errors made. Likert scales and other attitudinal scales help get at the emotional and preferential responses people have to the design. Is it attractive, fun, professional, easy?
Producing Means and Standard Deviations: http://www.uni.edu/its/us/document/stats/spss2.html
The DESCRIPTIVES procedure in SPSS produces means and standard deviations for variables. It also prints the minimum and maximum value. Likert scale questions are appropriate to print means for since the number that is coded can give us a feel for which direction the average answer is. The standard deviation is also important as it give us an indication of the average distance from the mean. A low standard deviation would mean that most observations cluster around the mean. A high standard deviation would mean that there was a lot of variation in the answers. A standard deviation of 0 is obtained when all responses to a question are the same. The following code produces descriptive statistics of columns 1 to 20. The minimum and maximum value tell us the range of answers given by our survey population.
descriptives
variables = q1 to q20
Valid
Variable Mean Std Dev Minimum Maximum N Label
Q1 4.65 .66 2 5 80 question 1
Q2 4.59 .66 2 5 85 question 2
Q3 4.36 .75 2 5 90 question 3
Q4 4.72 .51 3 5 74 question 4
Q5 3.89 1.11 1 5 92 question 5
Q6 3.26 1.45 1 5 101 question 6
Q7 3.92 1.14 1 5 88 question 7
Q8 4.26 .90 1 5 94 question 8
Q9 4.32 .88 2 5 90 question 9
Q10 4.45 .86 2 5 75 question 10
Q11 3.86 1.45 1 5 95 question 11
Q12 3.71 1.26 1 5 110 question 12
Q13 4.62 .71 2 5 90 question 13
Q14 4.37 .85 2 5 97 question 14
Q15 3.08 1.39 1 5 109 question 15
Q16 4.45 .89 1 5 91 question 16
Q17 4.56 .81 1 5 79 question 17
Q18 2.68 1.34 1 5 116 question 18
Q19 4.54 .74 2 5 90 question 19
Q20 4.39 .76 2 5 96 question 20
Order Now