interrater reliability


Also found in: Dictionary, Thesaurus, Legal, Financial, Encyclopedia.
Related to interrater reliability: intrarater reliability

in·ter·judge re·li·a·bil·i·ty

in psychology, the consistency of measurement obtained when different judges or examiners independently administer the same test to the same subject.

interrater reliability

The extent to which two independent parties, each using the same tool or examining the same data, arrive at matching conclusions. Many health care investigators analyze graduated data, not binary data. In an analysis of anxiety, for example, a graduated scale may rate research subjects as “very anxious, ” “somewhat anxious, ” “mildly anxious, ” or “not at all anxious, ” whereas a binary method of rating anxiety might include just the two categories “anxious” and “not anxious.” If the study is carried out and coded by more than one psychologist, the coders may not agree on the implementation of the graduated scale: some may interview a patient and find him or her “somewhat” anxious; another might assess the patient as being ”very anxious.” The congruence in the application of the rating scale by more than one psychologist constitutes its interrater reliability.
Synonym: interobserver reliability
See also: reliability
References in periodicals archive ?
Psychometric construction of the Rey-Osterrieth Complex Figure: Methodological considerations and interrater reliability.
Interrater reliability estimates of the composite scores are presented in Table 4.
Interrater reliability between both of the authors was found to be high for separate, blinded ratings (ICC = .
Objective 1 sought to determine the variability of same-manuscript ratings and the interrater reliability of peer reviews for manuscript quality scores.
The tool has been validated for pain assessment in preverbal children because of the good interrater reliability and validity for pain assessment (Nilsson et al.
They defined interrater reliability as we have, estimated it with comparable models, and accounted for changes in k (what they called implied reliability) in an equivalent manner.
We used Stemler's (2004) 70% benchmark and procedure for calculating and interpreting consensus estimates of interrater reliability (i.
First, interrater reliability was not exceptional; consequently, these results should be interpreted with some caution.
We demonstrated that the SBP project evaluation rubric achieved high scores for interrater reliability when used by multiple faculty members during a period of 3 years.
In addition, interrater reliability was estimated using both Cohen's kappa and simple percentage agreement as these two methods are the most commonly reported for content analysis.