By: Victor V. Wiesner, Ph.D., LPC-S, NCC, CCMHC
Click here to contact Victor and/or see his GoodTherapy.org Profile
In classical test theory, item analysis traditionally depends on the two concepts of item difficulty and item discrimination. Item difficulty is the percentage (expressed in decimal point format) of test takers who correctly respond to a test item either by getting the answer correct or by endorsing the trait or characteristic under examination. It is reported as a p-value (ranging from 0 to 1.00) and is calculated by dividing the number of persons who correctly answered the item by the number of test takers. Higher numbers mean the question is easier. Item difficulty levels are known as p-values but this should not be confused with the same name used in connection to levels of statistical significance.
The “correct” answer for psychological assessment instruments measuring constructs would simply be an answer that endorses the construct. For example, on an instrument measuring depression, a reply that positively signifies a depressive symptom would be a correct answer. An item that queried, “Do you find yourself often discouraged?” would have a higher item difficulty level (more would “pass” it) than the question, “Do you frequently have thoughts about suicide?” (more…)