Bland JM, DG Altman. The measurement agreement in method comparison studies. Stat Methods Med Res 1999;8:135-60. It is important to note that in each of the three situations in Table 1, the passport percentages are the same for both examiners, and if the two examiners are compared to a typical 2-×-2 test for mated data (McNemar test), there would be no difference between their performance; On the other hand, the agreement between the observers is very different in these three situations. The basic idea that must be understood here is that “agreement” quantifies the agreement between the two examiners for each of the “couples” of the scores, not the similarity of the total pass percentage between the examiners. Methods for assessing the agreement between observers based on the nature of the variables measured and the number of observers There are several formulas that can be used to calculate the boundaries of the agreement. The simple formula given in the previous paragraph, which is well suited to sampling sizes greater than 60,, is that, for the three situations presented in Table 1, the use of the McNemar test (designed for comparison of coupled category data) would not make a difference. However, this cannot be construed as evidence of an agreement. The McNemar test compares the total proportions; Therefore, any situation in which the total share of the two examiners in Pass/Fail (for example.
B situations 1, 2 and 3 in Table 1) would result in a lack of differences. Similarly, the mated t-test compares the average difference between two observations in a single group. It cannot therefore be significant if the average difference between unit values is small, although the differences between two observers are important for individuals. Bland JM, DG Altman. Statistical methods to assess the agreement between two methods of clinical measurement. Lancet 1986;1:307-10. Let`s look at two A and B examiners who evaluate the response sheets of 20 students in a class and mark each student as a “passport” or “fail,” with each examiner reaching half of the students. Table 1 presents three different situations that can occur.
In situation 1 in this table, eight students receive a pass score from the two examiners, eight from the examiners a “bad grade” and four from one examiner the pass mark, but the “fail” score of the other (two from A and the other two from B). Thus, the results of the two examiners are the same for 16/20 students (agreement – 16/20 – 0.80, disagreement – 4/20 – 0.20). It looks good. However, it is not taken into account that some notes could have been presumptions and that the agreement could have been reached by chance. THE CCI ranges from 0 (no agreement) to 1 (perfect agreement). Suppose, and are the number of times that the conclusion of both doctors is yes or no. and indicate the number of times the two physicians disagree, for the yes of clinicians A and not of Clinic B, and for the opposite. Leave And be marginal sums for clinicians A and clinicians B with yes as diagnostic result, and total sample sizes. This data can be organized in Table 2 by 2, z.B Table 1. Let the probability of frequency be where and .
Leave and be the marginal probabilities for the first or second advisor, where and. Cohens Kappa coefficient  is indicated as the observed share of the agreement and is the expected share of the agreement solely on the basis of chance. It should be noted that Kappa  weighted by Cohens Kappa is identical for the data in a table. Landis and Koch  proposed the force of the agreement standard using the Kappa coefficient (see Table 2). Another standard for measuring the strength of compliance is found in Martin Andres and Femia Marzo [ 4]. Statistical methods for evaluating the agreement vary depending on the nature of the variables examined and the number of observers between whom an agreement is sought.