Reliability Analysis

Reliability refers to the extent to which a scale produces consistent results, if the measurements are repeated a number of times.  The analysis on reliability is called reliability analysis. Reliability analysis is determined by obtaining the proportion of systematic variation in a scale, which can be done by determining the association between the scores obtained from different administrations of the scale.  Thus, if the association in reliability analysis is high, the scale yields consistent results and is therefore reliable.

There are four different approaches:

Test-Retest: Respondents are administered identical sets of a scale of items at two different times under equivalent conditions.  The degree of similarity between the two measurements is determined by computing a correlation coefficient.  The higher the correlation coefficient in reliability analysis, the greater the reliability.  This does have some limitations. Test-Retest Reliability is sensitive to the time interval between testing. The initial measurement may alter the characteristic being measured in Test-Retest Reliability in reliability analysis.

Internal Consistency Reliability: In reliability analysis, internal consistency is used to measure the reliability of a summated scale where several items are summed to form a total score.  This measure of reliability in reliability analysis focuses on the internal consistency of the set of items forming the scale.

Split Half Reliability: A form of internal consistency reliability.  The items on the scale are divided into two halves and the resulting half scores are correlated in reliability analysis.  High correlations between the halves indicate high internal consistency in reliability analysis.  The scale items can be split into halves, based on odd and even numbered items in reliability analysis.  The limitation in this analysis is that the outcomes will depend on how the items are split.  In order to overcome this limitation, coefficient alpha or Cronbach'’s alpha is used in reliability analysis.

Inter Rater Reliability: Also called inter rater agreement.  Inter rater reliability helps to understand whether or not two or more raters or interviewers administrate the same form to the same people homogeneously.  This is done in order to establish the extent of consensus that the instrument has been used by those who administer it.


  • Errors should be uncorrelated.
  • The coding done should have the same meaning across items.
  • In Split Half test, assignments of subjects are assumed random.
  • The observations should be independent of each other.
  • In Split Half test, the variances should be equivalently assumed.



Armor, D. J. (1974). Theta reliability and factor scaling. Sociological Methodology, 5, 17-50.

Ebel, R. L. (1951). Estimation of the reliability of ratings. Psychometrika, 16(4), 407-424.

Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613-619.

Graham, J. M. (2006). Congeneric and (essentially) tau-equivalent estimates of score reliability: What they are and how to use them. Educational and Psychological Measurements, 66(6), 930-944.

Haggard, E. A. (1958). Intraclass correlation and the analysis of variance. New York: Dryden.

Jansen, R. G., Wiertz, L. F., Meyer, E. S., & Noldus, L. P. J. J. (2003). Reliability analysis of observational data: Problems, solutions, and software implementation. Behavior Research Methods, Instruments & Computers, 35(3), 391-399.

McKelvie, S. J. (1992). Does memory contaminate test-retest reliability? Journal of General Psychology, 119(1), 59-72.

Raykov, T. (1997). Estimation of composite reliability for congeneric measures. Applied Psychological Measurement, 21(2), 173-184.

Raykov, T. (1998). Coefficient alpha and composite reliability with interrelated nonhomogeneous items. Applied Psychological Measurement, 22(4), 375-385.

Shrout, P.E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420-428.

Waller, N. G. (2008). Commingled samples: A neglected source of bias in reliability analysis. Applied Psychological Measurement, 32(3), 211-223.

Walter, S. D., Eliasziw, M., & Donner, A. (1998). Sample size and optimal designs for reliability studies. Statistics in Medicine, 17(1), 101-110.

Yarnold, P. R., & Soltysik, R. C. (2005). Reliability analysis. In P. R. Yarnold & R. C. Soltysik (Eds.), Optimal data analysis: A guidebook with software for windows (pp. 121-140). Washington, DC: American Psychological Association.

Administration, Analysis and Reporting

Statistics Solutions consists of a team of professional methodologists and statisticians that can assist the student or professional researcher in administering the survey instrument, collecting the data, conducting the analyses and explaining the results.

For additional information on these services, click here.

Related Pages:

Kappa Calculation