Sunday, August 9, 2020

Heres What You Need to Know About Reliability and Validity

Heres What You Need to Know About Reliability and Validity Outside of the world of research, reliability and validity are often used interchangeably. Because of this colloquial use, the true meaning of these words has become clouded. This article will explain the differences between these words from the statistical perspective and discuss the types of reliability and validity, as well as how these two constructs interact. We will start with a list of definitions, first defining reliability and validity as umbrella terms, and subsequently breaking down the different subtypes below each.The major consideration with regard to reliability versus validity is that reliability simply relates to how consistent a particular metric is, it does not consider the accuracy of the measure. This is the domain of validity. For example, an uncalibrated piece of equipment may consistently give the same results while testing a sample, and therefore it can be considered reliable. It will not give accurate results, thus the results are not valid. It would be as i f you set your bathroom scale to reflect your weight to show that you are twenty pounds lighter than you actually are. It would reliably give you roughly this weight every day, however it would not be accurate, and is therefore not valid.An uncalibrated piece of equipment may consistently give the same results while testing a sample, and therefore it can be considered reliable. It will not give accurate results, thus the results are not valid. It would be as if you set your bathroom scale to reflect your weight to show that you are twenty pounds lighter than you actually are. Photo by i yunmai on Unsplash.Definitions for various types of reliabilityIn order to get a greater depth of understanding of these fundamental concepts, it is important to discuss a few of the different types of reliability commonly considered across numerous fields of research. These constructs include the following subtypes:Reliabilityâ€"The consistency of a metricConsistencyâ€"As discussed above, this is th e core of reliability. Something that is a consistent measure will provide the same results no matter how many times you run a sample.Internal Consistency (Homogeneity)â€"This is tested by splitting the sample data in half and running a test to ensure that the two subsamples are not statistically different. This is often done using tests such as the Kruder-Richardson test, a more complex version of the split half test previously mentioned, or Chronbachs alpha.Stabilityâ€"Stability commonly refers to test-retest reliability. That is to say that it is the repeatability of the test. This is generally a correlational metric in which a correlation coefficient of less than 0.3 is weak, 0.3-0.5 is a moderate relationship, and above 0.5 is a strong correlation, and therefore the relationship is more stable. Pearsons r is a common statistical test to determine these correlation coefficients.Equivalenceâ€"This is assessed using inter-rater reliability, which is another common term for this me tric. Inter-rater reliability is achieved when the results are reliable even if a different person is doing the assessment or running the sample.Further information on these topics can be found in the Research Made Simple article in Evidence Based Nursing by Heale and Twycross (2015). Additionally, a common example of test-retest reliability provided in statistics classes, and discussed by Pagano (2010) is the IQ test. If one assumes that a persons IQ is stable over time, this test is a relatable example of test-retest reliability; no matter how many times you take the test, the score will be approximately the same. This example also works for inter-rater reliability as it does not matter if you are given the test by two different people, or if you do a computerized version, the test will still provide reliable results. The test will generate the same score for the participant consistently, however this does not address the validity of the test.Reliability is also a synonym for stat istical significance, which occurs when one is able to reject the null hypothesis. The null hypothesis is essentially the assertion that there is no difference between two populations (or more) that are being examined. In responsible research, scientists do not try to prove their idea, they try to see if they can disprove it, thus they check to see if they can reject the null or not. When the null hypothesis is rejected this means that the results of a particular test are not due to chance, with a probability generally below 0.05%. As Pagano says (2010), It might have been better to use the term reliable to convey this meaning rather than significant. However, the usage of significant is well established, so we will have to live with it.Definitions for various types of validityTo continue with various definitions youll need surrounding the concept of validity, see below.Validityâ€"Accurate measurementContent Validityâ€"If the metric in question covers all of the aspects that need to be considered for a given variable in order to accurately assess itFace Validityâ€"This is a subset of content validity in which experts in the field assess whether or not a particular instrument is capable of accurately measuring a particular variableConstruct Validityâ€"The test scores allow you to make predictions based on themHomogeneityâ€"The metric is only reflecting one theory, more specifically that the experimental samples scores have the same finite variance (the statistical properties are the same across the data set)Convergenceâ€"The instrument produces similar results to established metrics that assess the concept in questionTheory Evidenceâ€"The test results are representative of observable evidence, for example if the IQ test provides a high score for an individual and they actually have a high degree of general intelligenceCriterion Validityâ€"The instrument used to assess the construct in question highly correlates, greater than 0.5, with other modes of measurement for similar variablesConvergent Validityâ€"The demonstration that a particular instrument correlates greater than 0.5 with other instruments that measure a similar variableDivergent Validityâ€"The demonstration that there is a correlation of less than or equal to 0.3 between instruments intended to measure different variablesPredictive Validityâ€"The ability of an instrument to forecast future outcomes related to the variable in questionAdditional consideration should be given to the following types of validity as well. As described in Research Design and Statistical Analysis, a rather daunting and heavy text by Myers, Well, and Lorch (2010):Internal Validityâ€"The observations made using a particular measure can be attributed to the variable being manipulated, aka the independent variableExternal Validityâ€"This is the degree to which the observations made can be related to other populations of interest or related conditionsInteractions between reliability and validityAs illustrat ed below in a diagram used by many sources, there are interactions between reliability and validity. On the first dartboard, you can see a pictographic demonstration for data that is reliable, but not valid. The player consistently hits roughly the same spot, but is never on target, and therefore not accurate. In the second example, the player always hits the board so it is arguably accurate, given that the margin of error is rather high, but you can not rely on consistency. The third graphic demonstrates a condition in which the data is neither reliable nor accurate; they are only hitting part of the target and the shots are not evenly distributed around the bulls eye, which is meant to symbolize the variable that is supposed to be under scrutiny. The fourth board is the ideal that one strives for in science; not only is the data consistently showing similar values, but it is accurately assessing the experimental variable of interest, being the bulls eye.Interactions between reliab ility and validity. Diagram Provided by Researchgate.Summary of key pointsReliability=Consistency?Statistical SignificanceValidity=AccuracyReliability+Validity=Credible Experimental ResultsFinal thoughtsAlthough when you are first introduced to statistical analysis it can be daunting for a lot of people, a solid foundational understanding of the jargon specific to the field will reduce the likelihood of confusion as you move into more advanced topics, apply statistics to your own data, or try to discuss statistical results with others. I encourage you to look deeper into the specific statistical analyses that are commonly used in your field to facilitate your understanding of these concepts as they relate to your life. Initially, these topics may be confusing or dry, but once you become familiar with them they will prove to be excellent tools to have in your proverbial belt. Additionally, a basic understanding of research and statistics will protect you from the charlatans of the wo rld who try to misguide others with fancy words and flawed data. As American astrophysicist, author, and science communicator Neal deGrasse Tyson once said, Science literacy is a vaccine against the charlatans of the world that would exploit your ignorance.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.