Some variables are straightforward to measure without error – blood pressure, number of arrests, whether someone knew a word in a second language.
But many – perhaps most – are not. Whenever a measurement has a potential for error, a key criterion for the soundness of that measurement is reliability.
Think of reliability as consistency or repeatability in measurements.
Not only do you want your measurements to be accurate (i.e., valid), you want to get the same answer every time you use an instrument to measure a variable.
That instrument could be a scale, test, diagnostic tool as reliability applies to a wide range of devices and situations.
So, why do we care? Why make such a big deal about reliability?
Well, researchers would have a very hard time testing hypotheses and comparing data across groups or studies if each time we measured the same variable on the same individual we got different answers. This makes reliability very important for both social sciences and physical sciences.
Think about it: A basic tenet of science is replication, so without reliability, how can we be sure a study wasn’t replicated solely due to measurement error?
For example, say we were testing a new antidepressant drug on symptoms of depression, with the outcome assessed via a series of questions that measure depression.
We would want the scale to be a reliable measure of depressive symptoms.
This reliability takes several forms. Here are a few examples.
We want to make sure that two different researchers who measure the same person for depression get the same depression score. If there is some judgment being made by the researchers, then we need to assess the reliability of scores across researchers.
Likewise, even if there is no judgment involved, we want to make sure that the questions on the instrument are precise enough that they’re not open to misinterpretation or the current mood of the participant. If a participant took the same depression test a week apart, would they get the same score?
Another potential form of reliability is the consistency across items on the scale. If every item on the scale really measures the same construct, then the responses should be similar to all items. If they’re not, then these items are not a reliable measure of the construct.