Comparing and Calculating Reliability Measures in Multilevel Models: Understanding rWG(j), ICC(1), and ICC(2) in Multilevel Analysis
Calculating, cut off values, and descriptions.
Reliability is a fundamental aspect of any research endeavor, ensuring that measurements and assessments yield consistent and trustworthy results. In the realm of multilevel analysis, where data is often nested within groups or clusters, it becomes essential to employ reliable measures that capture both within-group and between-group variability. This article delves into the nuances of three commonly used reliability measures: within-group interrater reliability and the intraclass correlation coefficient(s). These specific statistics are rWG(j), ICC(1), and ICC(2). By unraveling the intricacies of these measures, researchers and practitioners can gain a deeper understanding of their differences and select the most appropriate measure for evaluating data reliability in multilevel models.
It is important to note that many widely used statistical measures vary by field and sub-discipline. My field is business management, and my specialty is organizational behavior and leadership. With that being said, this article can give you a general overview of multilevel modeling measures, but should not be used as a comprehensive or definitive source on the topic (i.e. use at your own risk). The notes for the article were compiled while preparing for my comprehensive exams during my Ph.D.
Michael Cole has done some great work in this area and (with colleagues), developed a powerful and easy-to-use tool for calculating rWG(j), ICC(1), and ICC(2). The Excel file used to be hosted on his website, but he informed me via personal email communication that the university will not host such a website. Because of this, he has given me permission to post a link and host the document online. Additionally, here is a walkthrough video I made discussing the use and some of the quirks involved with using the rWG(j) and ICC Excel calculator.
Biemann, T., Cole, M. S., & Voelpel, S. (2012). Within-group agreement: On the use (and misuse) of rWG and rWG(j) in leadership research and some best practice guidelines. The Leadership Quarterly, 23, 66-80.
Tool for Computing IRA and IRR Estimates (Click Link): https://docs.google.com/spreadsheets/d/1DZw1tqBboXIwt1Z1akGsFVVQm0TPdh5q/edit?usp=drivesdk&ouid=112689421649963944674&rtpof=true&sd=true
Comprehensive Exam Notes
ICC(1) aka ICC(1,1) - A one-way random effect, single measure test. Represents the amount of variance in any one individual response that can be explained by team membership. It is the ratio of between-group variance to total variance.
ICC(2) aka ICC(1,k) - A one-way random effect ANOVA, average measure test. Represents the reliability of the team mean.
rWG(j) (j stands for multiple items) - A ratio for within-group agreement. Assesses if one team member’s response is similar or identical to other team members’ responses. It is the interchangeability of the respondents.
70 or higher for ICC(2) and rWG(j)
.05 to .50 for ICC(1)
However, you can assume aggregation based on the p-value given in Cole’s Excel calculator. The F-ratio is based on degrees of freedom (# of Groups -1, # of respondents - # of Groups)
119 students, rating 31 teams would result in F(30,88)
“If the ICC(1) is statistically different from zero, there is evidence to justify making the group the focal unit of analysis (Chen et al., 2004).” (Biemann et al., 2012) p. 75.
Biemann, T., Cole, M. S., & Voelpel, S. (2012). Within-group agreement: On the use (and misuse) of rwg and rwg(j) in leadership research and some best practice guidelines. The Leadership Quarterly, 23, 66-80.
Bliese, P. D., Halverson, R. R., & Schriesheim, C. A. (2002). Benchmarking multilevel methods in leadership: The articles, the model, and the data set. The Leadership Quarterly, 13(1), 3-14.
Carter, M. (2009). Transformational-transactional leadership and work outcomes: An organizational justice and cultural perspective (Doctoral dissertation).
James, L. R. (1982). Aggregation bias in estimates of perceptual agreement. Journal of applied psychology, 67(2), 219.
Lance, C. E., Butts, M. M., & Michels, L. C. (2006). The sources of four commonly reported cutoff criteria: What did they really say?. Organizational research methods, 9(2), 202-220.
LeBreton, J. M., & Senter, J. L. (2008). Answers to 20 questions about interrater reliability and interrater agreement. Organizational research methods, 11(4), 815-852.