class: center, middle, inverse, title-slide # Better Measurement with Item Response Theory ### Ben Stenhaug, Stanford University ### December 17, 2019 --- # Organization These slides, the code that produced them, and the option of opening that code in an Rstudio cloud environment is available at **tinyurl.com/irt-basics** which redirects to **stenhaug.github.io/irt-basics** --- # The power of Item Response Theory (IRT) In a world with and more big and naturally-occuring data, IRT offers a few promises: -- 1. Understand and leverage item variability -- 2. More precise measures of latent constructs -- 3. More information with fewer data points --- # Wordbank example Wordbank (wordbank.stanford.edu) provides open source data from over 80k MacArthur-Bates Communicative Development Inventory (MB-CDI) administrations. <img src="images/table.png" width="600" /> --- # Warm up: Answer with a partner 1. Who is the highest ability person? Who is the lowest ability person? 2. Which item is the hardest? Which is the easiest? 3. Which item is the best? Which is the worst? 4. Who has a higher ability between person D and person I? 5. Estimate the probability of person G getting item 2 correct. <small> <table> <thead> <tr> <th style="text-align:left;"> person </th> <th style="text-align:right;"> item 1 </th> <th style="text-align:right;"> item 2 </th> <th style="text-align:right;"> item 3 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> A </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> B </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:left;"> C </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> D </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:left;"> E </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> F </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:left;"> G </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> H </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:left;"> I </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> J </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> </tr> </tbody> </table> </small> --- # What is measurement? -- 1. You're interested in a latent construct (math ability, extroversion, anxiety etc.) -- 2. You measure that latent construct by giving people items (which we'll call a test) -- 3. You do some science with that measurement --- # Relevant questions -- 1. Is this a good test? Are some items better than others? -- 2. Does this test measure the latent construct I care about? -- 3. Is this test fair? -- 4. How do we get from responses to the items to the measure of latent trait? --- # How do I get from responses to the latent trait? <small> <table> <thead> <tr> <th style="text-align:left;"> child </th> <th style="text-align:right;"> mommy </th> <th style="text-align:right;"> yesterday </th> <th style="text-align:right;"> trash </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> A </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> B </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:left;"> C </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> D </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:left;"> E </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> F </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:left;"> G </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> H </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:left;"> I </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> J </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> </tr> </tbody> </table> </small> --- # The sum score -- 1. What assumptions does it make? 2. What are its limitations? --- # The sum score ## Assumptions 1. Items are equally difficult -- 2. Items are equally related to the latent construct -- 3. 1 on all items is positively related to the construct -- ## Limitations -- 1. How do I handle missing data? -- 2. How do I make predictions? -- 3. How do I make an adaptive test? --- # Item Response Theory (IRT) to the rescue! A parametric framework for item response data -- Each person `\(p\)` has an ability `\(\theta_p\)` -- Each item `\(i\)` has an easiness `\(b_i\)` -- These combine to give the probability of correct response --- # The logistic function We use the logistic `\(\sigma(x) = \dfrac{\exp(x)}{1 + \exp(x)}\)` function to map the sum of ability and easiness to probability of correct response -- <img src="irt-basics_files/figure-html/unnamed-chunk-4-1.png" width="400" /> --- # Looking at easiness <img src="irt-basics_files/figure-html/unnamed-chunk-5-1.png" width="500" /> --- # Question: Probability of responses 1. Calculate P(correct, correct, incorrect | ability = 0) 2. Calculate P(correct, correct, incorrect | ability = 1) <img src="irt-basics_files/figure-html/unnamed-chunk-6-1.png" width="400" /> --- # Answer: Probability of responses ```r logistic <- function(x) {exp(x) / (1 + exp(x))} ``` -- 1. Calculate P(correct, correct, incorrect | ability = 0) ```r logistic(2 + 0) * logistic(0 + 0) * (1 - logistic(-2 + 0)) ``` ``` ## [1] 0.3879017 ``` -- 2. Calculate P(correct, correct, incorrect | ability = 1) ```r logistic(2 + 1) * logistic(0 + 1) * (1 - logistic(-2 + 1)) ``` ``` ## [1] 0.5091 ``` --- # Who uses IRT? -- Basically any measurement that happens in education: - The Programme for International Student Assessment (PISA) -- - State tests -- - GRE -- - Department of Motor Vehicles -- Very common in other fields as well: - Psychology - Health - Economics --- # IRT in practice We'll show the power of IRT with the Wordbank data (wordbank.stanford.edu) <table> <thead> <tr> <th style="text-align:left;"> sex </th> <th style="text-align:right;"> age </th> <th style="text-align:right;"> yum yum </th> <th style="text-align:right;"> bee </th> <th style="text-align:right;"> cockadoodledoo </th> <th style="text-align:right;"> buy </th> <th style="text-align:right;"> camping </th> <th style="text-align:right;"> moo </th> <th style="text-align:right;"> ouch </th> <th style="text-align:right;"> aunt </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Female </td> <td style="text-align:right;"> 27 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:left;"> Female </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:left;"> Female </td> <td style="text-align:right;"> 26 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> Male </td> <td style="text-align:right;"> 27 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> Female </td> <td style="text-align:right;"> 19 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> Female </td> <td style="text-align:right;"> 30 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> </tr> </tbody> </table> --- # Items <img src="irt-basics_files/figure-html/unnamed-chunk-11-1.png" width="500" /> --- # Children <img src="irt-basics_files/figure-html/unnamed-chunk-12-1.png" width="500" /> --- # Fit item parameters ## code ```r irt_model_rasch <- mirt( data = english_words %>% select(-sex, -age), model = 1, itemtype = "Rasch", verbose = FALSE ) ``` --- ## item curves <img src="irt-basics_files/figure-html/unnamed-chunk-14-1.png" width="500" /> --- # Ability Estimates <table> <thead> <tr> <th style="text-align:left;"> sex </th> <th style="text-align:right;"> age </th> <th style="text-align:right;"> sum_score </th> <th style="text-align:right;"> theta_rasch </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Female </td> <td style="text-align:right;"> 27 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> -0.8383632 </td> </tr> <tr> <td style="text-align:left;"> Female </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 1.3984515 </td> </tr> <tr> <td style="text-align:left;"> Female </td> <td style="text-align:right;"> 26 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> -0.1224427 </td> </tr> <tr> <td style="text-align:left;"> Male </td> <td style="text-align:right;"> 27 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> -0.8383632 </td> </tr> <tr> <td style="text-align:left;"> Female </td> <td style="text-align:right;"> 19 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> -0.8383632 </td> </tr> <tr> <td style="text-align:left;"> Female </td> <td style="text-align:right;"> 30 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 2.3204348 </td> </tr> </tbody> </table> --- # Ability estimates by sex <img src="irt-basics_files/figure-html/unnamed-chunk-16-1.png" width="500" /> --- # Wait a second <img src="irt-basics_files/figure-html/unnamed-chunk-17-1.png" width="500" /> --- # Moving from Rasch to 2PL ## Rasch Each person has ability `\(\theta_p\)`. Each item has easiness `\(b_i\)`. `\(P(y_{pi} = 1 | \theta_p, b_i) = \sigma(\theta_p + b_i)\)` where `\(\sigma(x) = \dfrac{\exp(x)}{1 + \exp(x)}\)` -- ## 2PL Each person has ability `\(\theta_p\)`. Each item has easiness `\(b_i\)` and discrimination `\(a_i\)`. `\(P(y_{pi} = 1 | \theta_p, b_i, a_i) = \sigma(a_i \cdot \theta_p + b_i)\)` --- # Discrimination The discrimination `\(a_i\)` describes the strength of the relationship between the item and ability -- <img src="irt-basics_files/figure-html/unnamed-chunk-18-1.png" width="425" /> --- # Question: Weighting Which of the outcomes is more likely for a person with ability `\(\theta_p = 2\)`? (The easiness of each item is 0). <table> <thead> <tr> <th style="text-align:right;"> item discrimination </th> <th style="text-align:left;"> outcome 1 </th> <th style="text-align:left;"> outcome 2 </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0.5 </td> <td style="text-align:left;"> correct </td> <td style="text-align:left;"> correct </td> </tr> <tr> <td style="text-align:right;"> 1.0 </td> <td style="text-align:left;"> incorrect </td> <td style="text-align:left;"> correct </td> </tr> <tr> <td style="text-align:right;"> 2.0 </td> <td style="text-align:left;"> incorrect </td> <td style="text-align:left;"> correct </td> </tr> <tr> <td style="text-align:right;"> 3.0 </td> <td style="text-align:left;"> correct </td> <td style="text-align:left;"> incorrect </td> </tr> </tbody> </table> --- # Answer: Weighting Which of the outcomes is more likely for a person with ability `\(\theta_p = 2\)`? (The easiness of each item is 0). -- Outcome 1 ```r logistic(0.5 * 2 + 0) * (1 - logistic(1 * 2 + 0)) * (1 - logistic(2 * 2 + 0)) * logistic(3 * 2 + 0) ``` ``` ## [1] 0.00156352 ``` -- Outcome 2 ```r logistic(0.5 * 2 + 0) * logistic(1 * 2 + 0) * logistic(2 * 2 + 0) * (1 - logistic(3 * 2 + 0)) ``` ``` ## [1] 0.00156352 ``` --- # Fit 2PL model ## code ```r irt_model_2pl <- mirt( data = english_words %>% select(-sex, -age), model = 1, itemtype = "2PL", verbose = FALSE ) ``` --- ## item curves <img src="irt-basics_files/figure-html/unnamed-chunk-23-1.png" width="500" /> --- # 2PL item parameters <img src="irt-basics_files/figure-html/unnamed-chunk-24-1.png" width="500" /> --- # 2PL abilities <img src="irt-basics_files/figure-html/unnamed-chunk-25-1.png" width="500" /> --- # Why stop at 2 item parameters? -- ## 2PL Each person has ability `\(\theta_p\)`. Each item has easiness `\(b_i\)` and discrimination `\(a_i\)`. `\(P(y_{pi} = 1 | \theta_p, b_i) = \sigma(a_i \cdot \theta_p + b_i)\)` -- ## What might a 3rd item parameter do? --- # 3PL Each person has ability `\(\theta_p\)`. Each item has easiness `\(b_i\)`, discrimination `\(a_i\)`, and guessability `\(g_i\)`. `\(P(y_{pi} = 1 | \theta_p, a_i, b_i, g_i) = g_i + (1 - g_i) \cdot \sigma(a_i \cdot \theta_p + b_i)\)` --- # Intuition behind each of the 3 parameters - Easiness is horizontal translation - Discrimination is slope - Guessability is starting point at ability negative infinity <img src="irt-basics_files/figure-html/unnamed-chunk-26-1.png" width="450" /> --- # Fit 3PL model ## code ```r irt_model_3pl <- mirt( data = english_words %>% select(-sex, -age), model = 1, itemtype = "3PL", verbose = FALSE ) ``` --- ## item curves <img src="irt-basics_files/figure-html/unnamed-chunk-28-1.png" width="500" /> --- # 3PL item parameters <table> <thead> <tr> <th style="text-align:left;"> item </th> <th style="text-align:right;"> a1 </th> <th style="text-align:right;"> b </th> <th style="text-align:right;"> g </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> yum yum </td> <td style="text-align:right;"> 1.33 </td> <td style="text-align:right;"> 1.21 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> bee </td> <td style="text-align:right;"> 3.34 </td> <td style="text-align:right;"> 0.85 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> cockadoodledoo </td> <td style="text-align:right;"> 2.18 </td> <td style="text-align:right;"> -0.56 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> buy </td> <td style="text-align:right;"> 3.04 </td> <td style="text-align:right;"> -1.97 </td> <td style="text-align:right;"> 0.01 </td> </tr> <tr> <td style="text-align:left;"> camping </td> <td style="text-align:right;"> 2.35 </td> <td style="text-align:right;"> -3.28 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> moo </td> <td style="text-align:right;"> 3.05 </td> <td style="text-align:right;"> 2.19 </td> <td style="text-align:right;"> 0.24 </td> </tr> <tr> <td style="text-align:left;"> ouch </td> <td style="text-align:right;"> 1.90 </td> <td style="text-align:right;"> 1.75 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> aunt </td> <td style="text-align:right;"> 2.81 </td> <td style="text-align:right;"> -1.11 </td> <td style="text-align:right;"> 0.04 </td> </tr> </tbody> </table> --- # 3PL abilities - compare to 2PL <img src="irt-basics_files/figure-html/unnamed-chunk-30-1.png" width="500" /> --- # 3PL abilities - compare to sum score <img src="irt-basics_files/figure-html/unnamed-chunk-31-1.png" width="500" /> --- # Comparing sexes <img src="irt-basics_files/figure-html/unnamed-chunk-32-1.png" width="500" /> --- # Comparing ages <img src="irt-basics_files/figure-html/unnamed-chunk-33-1.png" width="500" /> --- # Differential item functioning (DIF) <img src="images/DIF.png" width="500" /> --- # Polytymous item response theory <img src="images/poly.png" width="500" /> --- # Multidimensional models <img src="images/multi.png" width="600" /> --- # A few examples of IRT - The Programme for International Student Assessment (PISA) -- - State tests -- - GRE --- # Summary Item response theory (IRT) provides a parametric framework for people responding to items (which can be broadly defined!). -- It has a few specific advantages: - Putting students and item on the same scale -- - Understanding items through item parameters -- - Better measurement of the latent construct -- - Better understanding of the relationship between the latent construct and the items -- - Handling of missing data -- - Ability to make predictions -- - More complicated things like equating, testing for bias, comparisons with other models etc. --- # Learning more - Most popular way to estimate is the mirt R package written by Phil Chalmers - Phil Chalmers has some good workshop materials on [his GitHub](https://github.com/philchalmers/mirt/wiki) - Mike Frank reccommends the Embretson & Reise book [Item Response Theory for Psychologists](https://www.amazon.com/Response-Theory-Psychologists-Multivariate-Applications/dp/0805828192) - Great resources on Bayesian Item Response Theory with at education-stan.github.io - [Exercise](https://github.com/stenhaug/irt-basics/blob/master/exercise.Rmd) associated with this presentation - Denny Borsboom article [The attack of the psychometricians](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2779444/) is fantastic (and Mike Frank wrote it [a love letter](http://babieslearninglanguage.blogspot.com/2019/11/letter-of-recommendation-attack-of.html)) --- # Moving forward - Where might IRT be useful in your work? - What would be helpful in getting started? --- # Getting in touch - Ben Stenhaug - benstenhaug.org - stenhaug@stanford.edu - These slides, the code that produced them, and the option of opening that code in an Rstudio cloud environment is available at **tinyurl.com/irt-basics** which redirects to **stenhaug.github.io/irt-basics**