Are you passing and failing the right candidates? Test reliability explained.

Awarding organisations have a huge responsibility to ensure that they deliver reliable examinations. Nowhere is this more important than in the healthcare sector where the appropriate result, a pass or fail, determines whether a student is safe to practice on a real patient.

When you, or a family member, visit a hospital you should be able to trust healthcare professionals and be confident that a reliable awarding body has certified them.

Measuring reliability is a key part of knowing whether you can trust the results of a test. Here’s how you can measure and improve reliability to help you make the right decisions.

What is reliability?

scales_tape_measureReliability is a measure of consistency. For example, for weighing scales to be reliable, you would expect that if you weighed 80kg ten minutes ago you would still weigh the same now. If the scales said you weighed 40kg now, you wouldn’t rely on those scales.

In the same way, a reliable exam paper should produce the same result if the test is repeated. Ideally, for an exam to be highly reliable, it should produce the same result if the candidate takes the same test on two different occasions. If it produces very different results on separate occasions, it can be argued that the exam has low reliability and can therefore not be trusted as a means of grading or judging whether a candidate is competent.

 

How can you tell if a test is reliable?

In most cases we can’t get students to take the same test twice to measure reliability so the internal consistency is measured as an alternative method. Internal consistency reliability measures the degree to which every test item measures the same construct.

Cronbach’s alpha is a statistical measure for internal consistency. Luckily, examination software can do all the number crunching for you and give you a number between 0 and 1. The closer the score is to 1 the higher the reliability. As a rough guide, exams should have a Cronbach’s alpha of 0.8 or greater. With high stakes exams, such as licensing exams, the reliability needs to be high because one exam is usually used to decide whether a student passes or fails.

 

What can cause unreliability?

There are many factors that can affect reliability such asstudent

Student performance

  • How much sleep did the candidate get the night before the examination
  • The environment in which the examination was taken

Markers

  • One marker might be more or less lenient from one question to the next. This can be reduced by using automated marking by a computer

Exam paper version

  • Questions can vary between different exam papers as it is not usually possible to test every part of a curriculum so questions are picked from a selection of topics.

All these factors have to be taken into account, which is why every stage of the assessment lifecycle including choosing the type of questions, the exam date or markers has to be carefully executed.

 

Can you improve reliability?

There are two ways of potentially increasing the reliability of an exam.

Increasing the number of exam items

If you increase the length of the assessment by adding more exam items then you are testing a wider range of knowledge or skills and therefore in a better position to make a judgement because you have more information. However, a balance needs to be struck between having an overly lengthy test and increasing reliability. The costs and time associated with creating additional items also has to be taken into consideration.

Increasing the item quality

Higher quality items are likely to improve the overall reliability of a test while lower quality items will bring down the overall reliability. If you’re now wondering how you measure the quality of an item read our blog on ‘The essential guide to item analysis’.  Examination software like Maxexam helps you to identify poorly performing questions so that you can either improve them or retire them.

 

Exams need to be reliable and valid

As a test can be reliable but not actually valid i.e. it doesn’t measure what it is supposed to measure, validity is also an important consideration when administering examinations. Look out for our future blog on measuring validity to make sure you are carrying out the right checks to maximise both the reliability and validity of your assessments.