Measuring Item Reliability Part 1 – Item Discrimination Index

Item Discrimination Index Summary

The discrimination index (DI) measures how discriminating items in an exam are – i.e. how well an item can differentiate between good candidates and less able ones. For each item it is a measure based on the comparison of performance between stronger and weaker candidates in the exam as a whole. The discrimination index value for an item ranges from -1 to +1 with positive numbers over 0.2 reliably implying that an item is positively discriminating.

The discrimination index is calculated looking at the difference in marks between your higher and lower performing groups on an item

Calculating the Item Discrimination Index

In order to formally categorise your stronger and weaker candidates, we create an upper and a lower group in an exam, which are determined by the top and bottom 27% performing candidates. This gives you the same number of candidates in both groups.

You then take the number of candidates from the lower group who got the item correct away from the number of candidates from the upper group who got the item correct, divide by the number of candidates in a group, and you will get a number between -1 and 1. This is the discrimination index for the item.

Of course, if you have sophisticated exam software such as Maxexam then it will calculate the discrimination index for each item for you.

How discriminating is an item?

If an item was fully discriminating (which never happens in reality!), then everyone in your upper group would get it right and everyone in your lower group would get it wrong – leading to a discrimination index of 1.

However, there is no such thing as the ‘perfect’ item, and the general guidelines are as follows:

< 0 – negative discrimination; usually a bad sign. Could indicate a broken item i.e. could have been very misleading, or even mis-keyed (so the wrong option has been selected as the correct one in the system)

0 - 0.2 – not discriminating

0.2 - 0.4 – starting to become discriminating

> 0.4 – strongly discriminating because in practice, it can be difficult to obtain a DI greater than 0.4.

Should all items be discriminating?

An item that is to be used for ‘ranking’ within an exam (i.e. to help differentiate your good candidates from your bad candidates – see our blog) should ideally have a high discrimination index.

However, an item which is considered ‘essential knowledge’ – i.e. something that every candidate should know should have a discrimination index close to 0. As an example, it’s important that every student should know they should wash their hands with hot and soapy water. It’s even possible that an essential knowledge item may have a very slightly negative discrimination index (e.g. -0.1) without that signalling that there is an issue.

Does the Discrimination Index on its own tell you if an item is working as expected?

The discrimination index shouldn’t try to be understood as a standalone value. As mentioned above, the desirable discrimination index depends on if an item is easy/essential knowledge, or if it is hard/ranking. This means that you need to better understand the purpose of the item, and to look at other stats such as the mean of the item.   

The mean is the average score on the item i.e. the percentage of people who got it correct, so a mean of 100% means that everyone got it correct. The mean is measured between 0-100%.

If an item has a discrimination index of 0, it indicates it is not discriminating at all – i.e. the same number of people in the higher group got it correct as in the lower group.  Whether that is an issue will depend both on the type of item and what proportion of candidates overall got it correct (the mean).

The item could be testing essential knowledge – if the mean is high (approaching 100%) then a very low discrimination index is not a problem.

However, if it is a tricky question where only a small percentage of candidates got it correct, then a low discrimination index is a problem as it suggests the item was answered correctly equally by the stronger and weaker candidates, showing it is not discriminating between them.

A worked example

Discrimination index and interaction with the mean

In the example above – item 1 has a mean of 60% i.e. 60% of all students got it right. It also has a good Discrimination Index (DI) of 0.4 meaning that this item could be used as a ranking question to help separate the stronger from the weaker candidates.

Item 2 has a low DI of 0.1, and with 90% of all students getting it right this would suggest this could be an essential knowledge question – i.e. something everyone should know whether they are the best candidates or not.

Item 3 has both a low DI of 0.1 and a low mean with only 20% of candidates getting it right – with little difference between the upper and lower groups. This combination suggests either a poor question or poor teaching.

Until you have a better understanding of the mean, the discrimination index is hard to draw any conclusions with on its own.

Does the Discrimination Index only work for the correct answer?

Whilst the discrimination index is usually calculated for the correct answer it can also be calculated for the ‘distractors’ – i.e. those options that aren’t the correct response to the item. Ideally, these should all have a negative discrimination index– if one doesn’t then that means more of your strong candidates are picking it than your weak candidates. This is a great way to identify ‘broken’ distractors in an item – e.g. where another answer could be considered to be correct too.

Does the discrimination index only work for dichotic items?

If the item offers half marks on some options (i.e. a candidate neither got it right nor wrong), then an alternative formula is used. It is now the mean mark of the lower group subtracted from the mean mark of the upper group, divided by the maximum mark.

Using this formula, it is also possible to calculate the DI at Question, Scenario and Exam levels, meaning the DI is very flexible. Using Maxexam, this can help you to determine, for example, if an entire exam is meeting your objectives for distinguishing between higher and lower performing students.

Is the discrimination index always the best way of measuring the performance of your items?

It can be argued that because the discrimination index only looks at the top and bottom 27% of candidates, alternative methods should also be considered as this statistic only uses 54% of the whole data. The discrimination index is a very powerful and meaningful way of measuring the performance of your items, but if your exams only have small numbers of candidates, removing the other 46% of data can have quite an impact. Other measures like Pearson’s product-moment correlation coefficient use 100% of the data and a broader picture may be built up by looking at these calculations in parallel.

In summary

The discrimination index is one way of helping you to understand how the items in your exam are performing and is one that is used widely. Other methods (which can be used alongside the discrimination index) include Pearson’s product-moment correlation coefficient (PPC), Horst, and Cronbach’s Alpha and we will take a deeper look at these over the coming few months. Some also suggest the use of Point Biserial, and we will address the pros and cons of this methodology in our next blog.