In a IRT model, the probability of generating a particular response \(\; y_{jk}\) can be viewed as driven by the interaction of the characteristics of the person and the characteristics of the item. For example, for person \(j\) and test item \(k\), a latent trait (e.g. ability) represented by \(\; \theta_j\), interact with item discrimination parameters, represented by \(\; \alpha_k\), and an item intercept parameter \(\; \beta_k\) to determine the probability of correct response. If we divide the intercept \(\; \beta_k\) by \(\; \alpha_k\) and flip its sign, \(\beta_k\) can be interperted as item difficulty.
Let \(\; y_{jk} \vert \pi_{jk}\sim Bernoulli \left(\pi_{jk} \right)\) such that \(E(y_{jk} )=\pi_{jk}\). The 2PL model specifies that the mean expected response \[ \pi_{jk} = logit^{-1}\Bigl(\alpha^{\intercal}_{k}\theta_{p}+\beta_{k} \Bigr) = {\Bigl(1 + \operatorname{exp}\bigl(-\left( \alpha^{\intercal}_{k}\theta_{p}+\beta_{k}\right)\bigr)\Bigr)}^{-1} \] Accordingly, the sampling distribution for each response \(jk\) is \[ \begin{align} p(y_{jk}\vert\theta_p,\alpha_p,\beta_i)= {\biggl[logit^{-1}\Bigl(\alpha^{\intercal}_{k}\theta_{p}+\beta_{k} \Bigr)\biggr]}^{y_{jk}} {\biggl[1 - logit^{-1}\Bigl(\alpha^{\intercal}_{k}\theta_{p}+\beta_{k} \Bigr)\biggr]}^{1-y_{jk}} \end{align} \]
It is good practice to generate the data in order to examine how well your method of estimation can retireve the ground truth. We now specify the number of items, K, and the number of persons J we would like to have in our simulated test data and generate the true parameter values for the item charactersitics vectors alpha and beta and the person characteristic vector theta, all of which go as inputs in the data model (i.e. the 2PL Item Respone Model).
First we load the packages
library(knitr)
library(rstan)
library(ggplot2)
library(plyr)
library(reshape2)
library(parallel)
Then we set the random-number generator seed in order to make the results reproducible
set.seed(42)
K <- 25 # Items
J <- 300 #Person
N <- K*J #observations
alpha <- runif(K,.5,2) #slopes
beta <- runif(K,-2,3) # intercepts
data.frame(alpha,beta)
## alpha beta
## 1 1.8722091 0.57105892
## 2 1.9056131 -0.04898266
## 3 0.9292093 2.52869065
## 4 1.7456714 0.23484814
## 5 1.4626183 2.18002130
## 6 1.2786439 1.68797809
## 7 1.6048825 2.05527571
## 8 0.7019999 -0.05945859
## 9 1.4854884 1.42584865
## 10 1.5575972 -1.98025831
## 11 1.1866127 2.16458040
## 12 1.5786684 -1.96332927
## 13 1.9020084 -0.96170514
## 14 0.8831432 2.53300704
## 15 1.1934392 1.05889322
## 16 1.9100218 -0.10220380
## 17 1.9673396 0.17885792
## 18 0.6762310 -1.81284484
## 19 1.2124956 2.86769957
## 20 1.3404991 0.15875624
## 21 1.8560471 2.78788298
## 22 0.7080653 2.43877453
## 23 1.9833376 1.19989385
## 24 1.9200023 2.85483305
## 25 0.6236563 1.09419104
theta.mu <- 0 # population mean of person ability
theta.sig <- 1 # population sd of person ability
theta <- rnorm(J,theta.mu,theta.sig) # generate 500 ability parameters
head(theta)
## [1] -0.4304691 -0.2572694 -1.7631631 0.4600974 -0.6399949 0.4554501
Now we are ready to generate the K x J matrix of response probabilities
slope.ability <- outer(theta, alpha) # multiply the slope vector by the ability vector
intercept <- matrix(rep(beta, J), nrow=J, byrow=TRUE)
Prob <- plogis(intercept+slope.ability ) # 1/(1 + exp(.))
data <- ifelse(runif(N) < Prob,1,0) # generate matrix of Bernoulli 0-1 responses
The Stan model I wrote expects data in long format, so first we need to reshape the data. While we are at it, lets add an additional column of correct response probabilites so that we can plot the probability frequencies for some items.
df<-melt(data)
colnames(df)<-c("Person","Item","Response")
df$Probability<-c(Prob) # vectorize the matrix and add as a new column
colnames(df)<-c("Person","Item","Response","Probability")
df<-arrange(df, Person) # sort by person
head(df)
## Person Item Response Probability
## 1 1 1 1 0.4415511
## 2 1 2 0 0.2954020
## 3 1 3 1 0.8936610
## 4 1 4 0 0.3736454
## 5 1 5 0 0.8249728
## 6 1 6 0 0.7572316
Here is a plot of the distributions of the probabilities for 5 items:
ggplot(subset(df,Item %in% c(1,2,15,24,25))) + geom_density(aes(x = Probability , colour = as.factor(Item)))+ scale_colour_discrete(name="Item")
The follwing lines of code fit a hierarchical Bayesian 2PL model with a unidimensional latent trait (i.e. ability) using Stan
source("stan.R") # load code to have Stan run multiple cores in parallel
y <- df$Response
jj<-df$Person
kk<-df$Item
irt.data<-list(J=J,K=K,N=N,jj=jj,kk=kk,y=y)
Nchains<-4
Niter<-500
model.fit<- stan("2PL.stan", data = irt.data, chains =Nchains, iter=Niter)
save.image()
This webpage was created using the R Markdown authoring format and was generated using knitr dynamic report engine. The linked webpage hosts an interactive web application automatically generated by shinyStan. The shinyStan object was deployed using the shinyapps.io framework.