What type of reliability is measured by administering the same test to the same group of respondents twice?

In order to continue enjoying our site, we ask that you confirm your identity as a human. Thank you very much for your cooperation.


Statistics Definitions > Test-Retest Reliability / Repeatability


Test-Retest Reliability

What type of reliability is measured by administering the same test to the same group of respondents twice?
Test reliability is measured with a test-retest correlation.Test-Retest Reliability (sometimes called retest reliability) measures test consistency — the reliability of a test measured over time. In other words, give the same test twice to the same people at different times to see if the scores are the same. For example, test on a Monday, then again the following Monday. The two scores are then correlated.

Bias is a known problem with this type of reliability test, due to:

  • Feedback between tests,
  • Participants gaining knowledge about the purpose of the test, so they are more prepared the second time around.

This reliability test can also take a long time to calculate correlations for. Depending upon the length of time between the two tests, this could be months or even years.



Calculating Test-Retest Reliability Coefficients

Finding a correlation coefficient for the two sets of data is one of the most common ways to find a correlation between the two tests. Test-retest reliability coefficients (also called coefficients of stability) vary between 0 and 1, where:

  • 1 : perfect reliability,
  • ≥ 0.9: excellent reliability,
  • ≥ 0.8 < 0.9: good reliability,
  • ≥ 0.7 < 0.8: acceptable reliability,
  • ≥ 0.6 < 0.7: questionable reliability,
  • ≥ 0.5 < 0.6: poor reliability,
  • < 0.5: unacceptable reliability,
  • 0: no reliability.

On this scale, a correlation of .9(90%) would indicate a very high correlation (good reliability) and a value of 10% a very low one (poor reliability).

  • For measuring reliability for two tests, use the Pearson Correlation Coefficient. One disadvantage: it overestimates the true relationship for small samples (under 15).
  • If you have more than two tests, use Intraclass Correlation. This can also be used for two tests, and has the advantage it doesn’t overestimate relationships for small samples. However, it is more challenging to calculate, compared to the simplicity of Pearson’s.

References

Everitt, B. S.; Skrondal, A. (2010), The Cambridge Dictionary of Statistics, Cambridge University Press.
Lindstrom, D. (2010). Schaum’s Easy Outline of Statistics, Second Edition (Schaum’s Easy Outlines) 2nd Edition. McGraw-Hill Education
Vogt, W.P. (2005). Dictionary of Statistics & Methodology: A Nontechnical Guide for the Social Sciences. SAGE.
Wheelan, C. (2014). Naked Statistics. W. W. Norton & Company


---------------------------------------------------------------------------

Need help with a homework or test question? With Chegg Study, you can get step-by-step solutions to your questions from an expert in the field. Your first 30 minutes with a Chegg tutor is free!

Comments? Need to post a correction? Please Contact Us.

By Indeed Editorial Team

Published June 15, 2021

Researchers are vital employees to various industries, as they help companies and organizations make advancements and appeal to customers. To conduct accurate research, these employees often use assessments to determine if their research methods are getting reliable results. You may be interested in learning about how to test for reliability to help you succeed in your role as a researcher. In this article, we define the four types of research reliability assessments, discuss how to test for reliability in research and examine tips to help you get the best results.

Related: Types of Research Methods

What is research reliability?

Research reliability refers to whether research methods can reproduce the same results multiple times. If your research methods can produce consistent results, then the methods are likely reliable and not influenced by external factors. This valuable information can help you determine if your research methods are accurately gathering data you can use to support studies, reviews and experiments in your field.

How do you determine reliability in research?

To determine if your research methods are producing reliable results, you must perform the same task multiple times or in multiple ways. Typically, this involves changing some aspect of the research assessment while maintaining control of the research. For example, this could mean using the same test on different groups of people or using different tests on the same group of people. Both methods maintain control by keeping one element exactly the same and changing other elements to ensure other factors don't influence the research results.

Related: Research Skills: Definition and Examples

Jobs that use reliability assessments for research

Jobs in many fields use researchers to find information and analyze data that can improve outcomes for a company, make better products for customers or increase public wellness. Most research jobs use some form of reliability testing to ensure their data is reliable and useful for their employers' purposes. Here are some careers that often test for reliability in data:

  • Media sociologist

  • Food scientists

  • Forensic science technicians

  • Marketing analysts

  • Medical scientists

  • Economists

  • Policy analysts

  • Behavioral scientists

  • Business analysts

Related: What Is Research in Business? (With Definition and Types)

4 Types of reliability in research

Depending on the type of research you're doing, you can choose between a few reliability assessments. The most common ways to check for reliability in research are:

1. Test-retest reliability

The test-retest reliability method in research involves giving a group of people the same test more than once over a set period of time. In this assessment, the research method and sample group stay the same, but when you administer the method to the group changes. If the results of the test are similar each time you give it to the sample group, that shows your research method is likely reliable and not influenced by external factors, like the sample group's mood or the day of the week.

Example: Give a group of college students a survey about their satisfaction with their school's parking lots on Monday and again on Friday, then compare the results to check the test-retest reliability.

2. Parallel forms reliability

When using parallel forms reliability to assess your research, you may give the same group of people multiple different types of tests to determine if the results stay the same when using different research methods. The theory behind this assessment is that consistent results across research methods ensure each method is looking for the same information from the group and the group is behaving similarly for each test. This means the methods are likely reliable because, if they weren't, the participants in the sample group may behave differently and change the results.

Example: In marketing, you may interview customers about a new product, observe them using the product and give them a survey about how easy the product is to use and compare these results as a parallel forms reliability test.

3. Inter-rater reliability

With inter-rater reliability testing, you may have multiple people performing assessments on a sample group and comparing their results to avoid influencing factors, like an assessor's personal bias, mood or human error. If most of the results from different assessors are similar, it's likely the research method is reliable and can produce usable research because the assessors gathered the same data from the group. This is useful for research methods like observations, interviews and surveys where each assessor may have different criteria but can still end up with similar research results.

Example: Multiple behavioral specialists may observe a group of children playing to determine their social and emotional development and then compare notes to check for inter-rater reliability.

4. Internal consistency reliability

There are two typical ways to check for internal consistency reliability in your research, which generally involves making sure your internal research methods or parts of research methods deliver the same results. One of those techniques is split-half reliability, and you can perform this test by splitting a research method, like a survey or test, in half, delivering both halves separately to a sample group and comparing the results to ensure the method can produce consistent results. If the results are consistent, then the results of the research method are likely reliable.

The other internal consistency test checks for average inter-item reliability. With this assessment, you administer sample groups multiple testing items, like with parallel forms reliability testing, and calculate the correlation between the results of each of the method results. With this information, you calculate the average and use the number to determine if the results are reliable.

Example: You may give a company's cleaning department a questionnaire about which cleaning products work the best, but you split it in half and give each half to the department separately and calculate the correlation to test for split-half reliability.

Later, you interview the members of the cleaning department, then bring them into small focus groups and observe them at work to determine which cleaning products get the most use and which people like best. You calculate the correlation between these answers and average the results to find the average inter-item reliability.

Related: What a Researcher's Work Is and How To Become One

Tips for testing the reliability of research methods

As you do research and review the results, consider the following tips for testing the reliability of your research methods and ensuring you have consistency in your work:

  • Plan ahead. Planning is a key step for most scientific experiments, so try to plan your research methods and studies in advance to ensure you and your team are prepared. You may plan for a space to conduct research with your sample group, how you can distribute testing materials or criteria and how to describe the purpose of the research to participants.

  • Make note of the environment. If you're conducting research with the same sample group multiple times, it's often a good idea to make a note of what the environment is like when the group undergoes testing. This is because certain factors, like whether it's raining, the room is cold or someone is coughing, can influence the group's willingness to participate fully.

  • Consider the participants. As you create your research materials, remember to consider how your sample group might respond to and understand the materials you present to them. For example, a group of children may need a simple set of survey questions read to them, while a group of adults could likely read more complicated questions themselves.

  • Review results thoroughly. While comparing the results of your research, review the results thoroughly to make sure you catch any errors and accurately determine the reliability of the results. You may even ask a colleague to examine the results with you and offer their opinion on the reliability of the data gathered by your research methods.

  • Think about the type of research. Different types of research may benefit from certain reliability tests more than others because each field of research measures different things. In marketing, you may regularly use various focus groups to determine a product's appeal, while a sociologist might observe behavior and compare notes regularly to get various professional opinions on a matter.