Fitting a Multidimensional Item-Response Theory Model in Stan

Model Specification

In a Multidimensional Item-Response Theory (MIRT) model, the probability of generating a particular response can be viewed as driven by the interaction of the characteristics of the person and the characteristics of the item. For example, for person \(p\) and test item \(i\), we can have \(D\) latent factors represented by the vector \({\boldsymbol{\theta}}_p\), interact with \(D\) test item discrimination parameters, represented by \({\boldsymbol{\alpha}}_i\), and one parameter \(\beta_i\) to determine the probability of correct response. The discrimination vector \({\boldsymbol{\alpha}}_i\) for item \(i\) is related to the power of the response in classifying respondent’s \(p\) corresponding characteristics (Note that the dimensions of \({\boldsymbol{\theta}}_p\) and \({\boldsymbol{\alpha}}_i\) should match). Now, assuming a Generalized Linear model (McCullagh & Nelder, 1989) formulation, we can set up, using the logistic link, the pointwise data model for each  \(pi\) in three steps.

  1. Let \(y_{pi}\vert\pi_{pi}\sim \operatorname{Bernoulli} \left(\pi_{pi}\right)\) with \(\operatorname*{\mathbb{E}}(y_{pi} )=\pi_{pi}\)
  2. Let the linear predictor be \({\boldsymbol{\alpha}}^{\intercal}_{i}{\boldsymbol{\theta}}_{p}+\beta_{i}\)
  3. Let the link function be the defined as \(\operatorname{logit}(\pi_{pi})=\log\left( \frac{\pi_{pi}}{1-\pi_{pi}} \right)\)

The link function, the logit, relates the linear predictor to the mean response so that the mean function is then

\[ \pi_{pi} = \operatorname{logit}^{-1}\Bigl({\boldsymbol{\alpha}}^{\intercal}_{i}{\boldsymbol{\theta}}_{p}+\beta_{i} \Bigr) = {\Bigl(1 + \operatorname{exp}\bigl(-\left( {\boldsymbol{\alpha}}^{\intercal}_{i}{\boldsymbol{\theta}}_{p}+\beta_{i}\right)\bigr)\Bigr)}^{-1} \]

And the resulting sampling distribution for each response \(pi\) is given by

\[\begin{align} p(y_{pi}\vert{\boldsymbol{\theta}}_p,{\boldsymbol{\alpha}}_p,\beta_i)= {\biggl[\operatorname{logit}^{-1}\Bigl({\boldsymbol{\alpha}}^{\intercal}_{i}{\boldsymbol{\theta}}_{p}+\beta_{i} \Bigr)\biggr]}^{y_{pi}} {\biggl[1 - \operatorname{logit}^{-1}\Bigl({\boldsymbol{\alpha}}^{\intercal}_{i}{\boldsymbol{\theta}}_{p}+\beta_{i} \Bigr)\biggr]}^{1-y_{pi}} \end{align}\]

The Joint Probability Distribution

For a general model that assumes local and experimental independence, the posterior distribution of the parameters \(\boldsymbol{\Theta},{\boldsymbol{\alpha}},{\boldsymbol{\beta}}\) and the hyperparamters \(\tilde{{\boldsymbol{\theta}}},\tilde{{\boldsymbol{\alpha}}},\tilde{{\boldsymbol{\beta}}}\) given the data is proportional to the product of the likelihood \(p({\mathbf{y}}\mid\boldsymbol{\Theta},{\boldsymbol{\alpha}},{\boldsymbol{\beta}})\), the prior distribution \(p(\boldsymbol{\Theta},{\boldsymbol{\alpha}},{\boldsymbol{\beta}}\mid\tilde{{\boldsymbol{\theta}}},\tilde{{\boldsymbol{\alpha}}},\tilde{{\boldsymbol{\beta}}})\), and the hyperprior distribution \(p(\tilde{{\boldsymbol{\theta}}},\tilde{{\boldsymbol{\alpha}}},\tilde{{\boldsymbol{\beta}}})\). By factoring, we can then obtain the posterior distribution of all the parameters given the data

\[\begin{align} p(\boldsymbol{\Theta},{\boldsymbol{\alpha}},{\boldsymbol{\beta}},\tilde{{\boldsymbol{\theta}}},\tilde{{\boldsymbol{\alpha}}},\tilde{{\boldsymbol{\beta}}}\mid {\mathbf{y}}) \propto &\biggl[ \prod_{p=1}^P\ \prod_{i=1}^I\ p(y_{pi}\vert{\boldsymbol{\theta}}_p,{\boldsymbol{\alpha}}_i,\beta_i) p({\boldsymbol{\theta}}_p \mid \tilde{{\boldsymbol{\theta}}})p({\boldsymbol{\alpha}}_i \mid \tilde{{\boldsymbol{\alpha}}})p(\beta_i \mid \tilde{{\boldsymbol{\beta}}})\biggr]p(\tilde{{\boldsymbol{\theta}}})p(\tilde{{\boldsymbol{\alpha}}})p(\tilde{{\boldsymbol{\beta}}}) \end{align}\]

Identification Constraints

The model as it stands is not identified. For multidimensional item response theory models, Rivers (2003) provided a rank condition for identification under certain assumptions. The (Theorem 5 in the paper) states that a \(D\) dimensional model is identified if and only if \[\begin{align} \mathscr{H}^{\intercal}_\theta \bigl(\tilde{\Theta} \otimes \textbf{I}_D \bigr) \quad\text{has rank } D(D+1) \end{align}\] where \(\mathscr{H}^{\intercal}_\theta\) is a matrix of partial derivatives of the restrictions in the form of  \(\textbf{h}({\boldsymbol{\theta}})=\boldsymbol{0}\) and \(\tilde{\Theta}\) is the matrix of latent trait factors augmented by the vector \(\mathbf{1}\). To identify a \(D\) dimensional IRT model with an identity covariance matrix, \({\boldsymbol{\Phi}}=\mathbf{I}\), he has shown that we need an additional \(\frac{D(D-1)}{2}\) constraints on \(\mathbf{A}\) that satisfy the rank conditions. That is, for a 2-dimensional orthogonal model, we need 1 additional constraint on \(\mathbf{A}\), and for a 3-dimensional model, we need 3 additional constraints. For our model, We could choose as a starting point the following prior specification as identification constraints:

\[\begin{align*} &{\boldsymbol{\theta}}_p \sim \mathit{Normal}_d(\boldsymbol{0},\, \textbf{I}) \quad \text{for } p=1,2,...,P \\ &{\boldsymbol{\alpha}}_i \sim \mathit{LogNormal}_d(\boldsymbol\mu_{{\boldsymbol{\alpha}}},\,\boldsymbol\Sigma_{\alpha}) \quad \text{for } i=1,2,...,I \\ &\beta_i \sim \mathit{Normal}(\mu_{\beta},\,\sigma^2_{\beta}) \quad \text{for } i=1,2,...,I \end{align*}\]

Since we know that the signs of the parameters are restricted to be positive, we can satisfy the polarity constraint condition by using a prior distribution with a positive domain such as the , where \(\boldsymbol\Sigma_{\alpha}\) is a diagonal convariance matrix with diagonal elements \(\sigma^2_{\alpha^{(d)}}\) for \(d=1,2,...,D\) and the remaining, off-diagonal, elements are 0.

This prior on \({\boldsymbol{\theta}}_p \) fixes the scale and location of the parameter space but still leaves the model unidentified. In particular, we still need \(\frac{D(D-1)}{2}\) zero constraints on \(\mathbf{A}\) that satisfy the to locally identify the model. Note that in a Bayesian model, zero constraints can also be specified by assigning a very strong prior with mean zero and variance very close to zero to the variable(s) of interest. In addition, we still need to take care of the polarity restrictions on some elements of \(\mathbf{A}\). For example, in Bayesian factor analysis, the positive lower triangular identification scheme is an identification strategy that depends of the uniqueness property of the QR decomposition (Geweke & Zhou, 1996). The scheme is commonly used in Bayesian Factor Analysis and it requires that \({\boldsymbol{\Phi}}=\mathbf{I}\) and \(\mathbf{A}\) be a lower triangular matrix with a positive diagonal elements (i.e., \(\alpha_{i,i} > 0\)) as such

\[\begin{align*} \mathbf{A} =\begin{bmatrix} \alpha_{1,1} & 0 & 0 & 0 & 0 \\ \alpha_{2,1} & \alpha_{2,2} & 0 & 0 & 0 \\ \alpha_{d,1} & \alpha_{d,2} & \ddots & 0 & 0 \\ \vdots & \vdots & \ddots & \ddots & 0 \\ \alpha_{D,1} & \alpha_{D,2} & & & \alpha_{D,D}\\ \vdots & \vdots & \ddots & \ddots & \\ \alpha_{I,1} & \alpha_{I,2} & \ldots & \alpha_{I,D-1} & \alpha_{I,D} \end{bmatrix}\\ \end{align*}\]

Alternatively, we can make the assignment of our theoretically informed constraints on a firmer ground by incorporating our prior knowledge in the specification of the priors. The prior knowledge can be included in the form of item-level covariates that replace some, or all, of the \(D\) location hyperparameters \(\mu_{\alpha^{(d)}}\) as such

\[\begin{align*} {\boldsymbol{\alpha}}_i \sim &\mathit{Normal}_d( \mathbf{x}_{{\boldsymbol{\alpha}}_{i}} \boldsymbol{\gamma_\alpha},\,\boldsymbol\Sigma_{\alpha})\quad \text{for } i=1,2,...,I \end{align*}\]

here \(\mathbf{x}_{{\boldsymbol{\alpha}}_{i}}\) refers to the row covariate vectors associated with \({\boldsymbol{\alpha}}_{i}\) and \(\boldsymbol{\gamma_\alpha}\) is the corresponding regression coefficient vector. For an example of such an approach, (Adams, Wilson, & Wu, 1997)used person demographic variables as covariates for the population parameters. Also see (Gelman & Hill, 2007) for an example of a general multilevel Bayesian IRT model with population level explanatory variables.

Assignment of Hyperpriors

Given that we don’t have any information about whether the population parameters are positive or negative, we have good reason to assign weakly informative hyperpriors that are centered at zero. For the \(\alpha\) location parameters, we assign Cauchy priors with the scale parameter set to 1, and for the \(\alpha\) standard deviation parameters, we assign half-Cauchy priors with the scale parameter set to 1. For identification reasons, we assign constained hyperpriors so that that the density of the resulting lognormal does not restrict the probability mass to concentrate heavily around a particular value. Compare the following plots for different values of hyperparameters:

library("rbokeh")
figure(ylab="LogNormal(x)") %>% 
  ly_curve(dlnorm(x, 0, .3), 0, 7, n = 2001,color = "blue")  %>% 
  ly_curve(dlnorm(x, 0.1, .3), 0, 7, n = 2001, color = "red") %>% 
  ly_curve(dlnorm(x, 0.5, .3), 0, 7, n = 2001,color = "yellow") %>% 
  ly_curve(dlnorm(x, 0.8, .3), 0, 7, n = 2001,color= "green") %>%  y_axis( label="LogNormal(x)")

For the \(\beta\) location parameter, we assign a Cauchy prior with the scale parameter set to 1, and for the \(\beta\) standard deviation parameters, we assigna half-Cauchy prior with the scale parameter set to 2. For the scale parameter of the \(\alpha\)’s, we set \(\sigma_{\alpha}\) to a small value, 0.3. The Cauchy distribution was chosen because we would like to restrict the parameters away from large values but at the same time we would like to allow the possibility of large values if the data dominates in the tail region of the prior (Gelman, 2006).

\[\begin{align*} \mu_{\alpha} \sim& \textrm{Half-Cauchy}(0,0.5) \\ \mu_{\beta} \sim &\textrm{Cauchy}(0,1) \\ \sigma_{\beta} \sim &\textrm{Half-Cauchy}(0,2) \end{align*}\]

Stan Model Code

  data {
  int<lower=1> J;                // number of students
  int<lower=1> K;                // number of questions
  int<lower=1> N;                // number of observations
  int<lower=1,upper=J> jj[N];    // student for observation n
  int<lower=1,upper=K> kk[N];    // question for observation n
  int<lower=0,upper=1> y[N];     // correctness of observation n
  int<lower=1> D;                // number of latent dimensions
}
transformed data {
  int<lower=1> L;
  L<- D*(K-D)+D*(D+1)/2;  // number of non-zero loadings
}
parameters {    
  matrix[J,D] theta;         //person parameter matrix
  vector<lower=0>[L] alpha_l;   //first column discrimination matrix
  vector[K] beta;          // vector of thresholds
  real<lower=0> mu_alpha;      // scale prior 
  real mu_beta;                 //location prior
  real<lower=0> sigma_beta;     //scale prior
}
transformed parameters{
  matrix[D,K] alpha; // connstrain the upper traingular elements to zero 
  for(i in 1:K){
    for(j in (i+1):D){
      alpha[j,i] <- 0;
    }
  } 
{
  int index;
  for (j in 1:D) {
    for (i in j:K) {
      index <- index + 1;
      alpha[j,i] <- alpha_l[index];
    } 
  }
}
}
model {
// the hyperpriors 
   mu_alpha~ cauchy(0, 0.5);
   mu_beta ~ cauchy(0, 2);
   sigma_beta ~ cauchy(0,1);
// the priors 
  to_vector(theta) ~ normal(0,1); 
  alpha_l ~ lognormal(mu_alpha,0.3);
  beta ~ normal(mu_beta,sigma_beta);
// the likelihood 
{
  vector[N] nu;  //vector of predictor terms
  for (i in 1:N) 
    nu[i] <- theta[jj[i]]*col(alpha,kk[i])+ beta[kk[i]]; 
  y ~ bernoulli_logit(nu); 
}
}

Simulate Data

To simulate and fit the data for a 2 dimensional model using R, we first set the random-number generator seed in order to make the results reproducible.

set.seed(42)
#specify the true parameter values 
D <-2
J <- 800 
K <- 25
N <-J*K
mu_theta    <- c(0,0)
sigma_theta <- c(1,1)
mu_beta     <- -.5
sigma_beta  <-1
theta_1  <- rnorm(J, mu_theta[1],sigma_theta[1] )
theta_2  <- rnorm(J, mu_theta[2],sigma_theta[2] )
alpha1 <- sort(runif(K,.2,1.5),decreasing = TRUE)
alpha2 <- sort(runif(K,0.2,1.5))
alpha2[1]<-0
beta   <- rnorm(K,mu_beta,sigma_beta)

Next we plot the sample data distribution of the two latent dimensions for the 800 persons.

You can hover on the cells to find out the count frequency of persons. Now we simulate the probabilities given the item and person

prob <- matrix(0,nrow=J,ncol=K)
for (j in 1:J)
  for (k in 1:K)
    prob[j,k] <- plogis(alpha1[k]*theta_1[j] +alpha2[k]*theta_2[j]+ beta[k])

Let’s plot the marginal distribution of the probabilities to make sure that there aren’t too many probabilities at either extreme (i.e. 0 or 1).

Now we convert the probabilities to dichotomous responses and put the data into a long-form dataset

y<-rbinom(N,1,as.vector(prob))
jj <- rep(1,K)%x%c(1:J)
kk <- c(1:K)%x%rep(1,J)
data <-data.frame(Response=y, Item=kk, Person=jj)
head(data)
##   Response Item Person
## 1        1    1      1
## 2        1    1      2
## 3        0    1      3
## 4        0    1      4
## 5        0    1      5
## 6        0    1      6

Next, we examine the observed simulated data, by plotting the observed number of correct responses for each item ordered by the true difficulty of items

df.agg.itm<-aggregate(Response~Item,data=data, sum)
df.agg.itm$Threshold<-round(beta,2)
df.agg.itm<-df.agg.itm[order(beta),]
df.agg.itm$Item_Order<-1:K
figure(ylab="Number of Correct Responses") %>%
ly_points(Item_Order,Response, data = df.agg.itm,hover = c(Threshold,Item))

We do the same thing for persons ordered by their rank on the theta_1 latent factor

df.agg.per<-aggregate(Response~Person,data=data, sum)
df.agg.per$Propensity1<-round(theta_1,2)
df.agg.per$Propensity2<-round(theta_2,2)
df.agg.per<-df.agg.per[order(theta_1),]
df.agg.per$Person_Order<-1:J
figure(width = 900, height = 450,ylab="Number of Correct Responses") %>% 
ly_points(Person_Order,Response, data = df.agg.per,size = 4,hover = c(Person,Propensity1,Propensity2)) 

Now we are ready to fit the model. We load the rstan package and the parallel package

library("rstan")
library("parallel")
source("stan.R")

mirt.data <-list(J=J,K=K,N=N,jj=jj,kk=kk,y=y,D=D)
mirt.model<- stan("mirt.stan", data = mirt.data ,chains =0)

Nchains <- 4
Niter <- 300
t_start <- proc.time()[3]
fit<-stan(fit = mirt.model, data =mirt.data, pars=c("alpha","beta","mu_alpha","mu_beta","sigma_beta"),
          iter=Niter,
          chains = Nchains)
t_end <- proc.time()[3]
t_elapsed <- t_end - t_start

Let’s see how long each iteration took and compare the true parameters with the model fit

(time <- t_elapsed / Nchains / (Niter/2))
 elapsed 
0.244135 
(true.param<-round(data.frame(alpha1,alpha2,beta),2))
   alpha1 alpha2  beta
1    1.47   0.00 -0.99
2    1.46   0.26 -0.42
3    1.45   0.30 -0.12
4    1.40   0.31 -0.64
5    1.38   0.35 -0.07
6    1.27   0.39 -0.74
7    1.24   0.55 -0.82
8    1.24   0.57 -0.75
9    1.24   0.61  0.14
10   1.19   0.63  0.32
11   1.11   0.65  1.84
12   0.95   0.71 -1.48
13   0.92   0.78 -1.21
14   0.87   0.84  0.14
15   0.75   0.89  1.14
16   0.75   0.91  0.73
17   0.65   0.95 -0.97
18   0.57   1.02 -1.18
19   0.56   1.07  1.20
20   0.45   1.09  1.04
21   0.41   1.11 -0.29
22   0.41   1.19  1.20
23   0.40   1.19 -0.51
24   0.27   1.28 -1.32
25   0.26   1.36 -0.55

Here is the model fit

print(fit,probs = c(0.5))
Inference for Stan model: mirt.
4 chains, each with iter=300; warmup=150; thin=1; 
post-warmup draws per chain=150, total post-warmup draws=600.

                 mean se_mean    sd       50% n_eff Rhat
alpha[1,1]       1.30    0.01  0.15      1.29   315 1.00
alpha[1,2]       1.32    0.01  0.14      1.32   304 1.01
alpha[1,3]       1.47    0.01  0.15      1.45   364 1.00
alpha[1,4]       1.60    0.01  0.17      1.59   251 1.01
alpha[1,5]       1.34    0.01  0.14      1.34   344 1.00
alpha[1,6]       1.53    0.01  0.16      1.53   393 1.00
alpha[1,7]       1.48    0.01  0.16      1.48   312 1.01
alpha[1,8]       1.26    0.01  0.13      1.25   600 1.01
alpha[1,9]       1.16    0.01  0.12      1.17   554 1.00
alpha[1,10]      1.10    0.01  0.11      1.10   360 1.00
alpha[1,11]      1.02    0.01  0.14      1.01   295 1.00
alpha[1,12]      0.98    0.00  0.12      0.98   600 1.00
alpha[1,13]      1.20    0.01  0.14      1.19   600 1.00
alpha[1,14]      0.87    0.01  0.11      0.87   444 1.00
alpha[1,15]      1.00    0.01  0.13      1.00   457 1.00
alpha[1,16]      0.78    0.00  0.10      0.77   600 1.00
alpha[1,17]      0.71    0.01  0.10      0.71   372 1.01
alpha[1,18]      0.66    0.00  0.10      0.66   600 1.00
alpha[1,19]      0.74    0.01  0.10      0.74   403 1.01
alpha[1,20]      0.80    0.00  0.11      0.79   600 1.00
alpha[1,21]      0.65    0.00  0.10      0.65   600 1.00
alpha[1,22]      0.61    0.00  0.09      0.60   381 1.01
alpha[1,23]      0.49    0.00  0.08      0.49   600 1.00
alpha[1,24]      0.50    0.00  0.09      0.50   600 1.00
alpha[1,25]      0.53    0.00  0.09      0.53   600 1.00
alpha[2,1]       0.00    0.00  0.00      0.00   600  NaN
alpha[2,2]       0.52    0.00  0.08      0.51   600 1.01
alpha[2,3]       0.56    0.00  0.10      0.56   600 1.00
alpha[2,4]       0.52    0.00  0.09      0.52   600 1.00
alpha[2,5]       0.61    0.00  0.10      0.62   600 1.00
alpha[2,6]       0.58    0.00  0.09      0.57   600 1.00
alpha[2,7]       0.85    0.01  0.12      0.86   452 1.00
alpha[2,8]       0.74    0.00  0.11      0.73   600 1.01
alpha[2,9]       0.72    0.00  0.11      0.72   600 1.00
alpha[2,10]      0.83    0.00  0.10      0.83   436 1.00
alpha[2,11]      0.78    0.00  0.11      0.77   600 1.00
alpha[2,12]      0.89    0.01  0.12      0.88   480 1.00
alpha[2,13]      0.98    0.01  0.13      0.98   453 1.00
alpha[2,14]      0.92    0.01  0.11      0.91   352 1.01
alpha[2,15]      0.92    0.01  0.12      0.92   323 1.01
alpha[2,16]      1.08    0.01  0.12      1.08   600 1.00
alpha[2,17]      1.04    0.01  0.12      1.04   600 1.00
alpha[2,18]      1.08    0.01  0.13      1.08   319 1.00
alpha[2,19]      1.23    0.01  0.15      1.23   293 1.00
alpha[2,20]      1.33    0.01  0.16      1.33   189 1.02
alpha[2,21]      1.41    0.01  0.16      1.41   600 1.00
alpha[2,22]      1.15    0.01  0.13      1.15   251 1.02
alpha[2,23]      1.29    0.01  0.16      1.29   197 1.04
alpha[2,24]      1.40    0.01  0.17      1.41   211 1.01
alpha[2,25]      1.53    0.01  0.18      1.52   310 1.00
beta[1]         -0.95    0.01  0.10     -0.95   282 1.00
beta[2]         -0.42    0.00  0.10     -0.42   600 1.01
beta[3]         -0.27    0.01  0.10     -0.27   269 1.00
beta[4]         -0.89    0.01  0.12     -0.89   258 1.01
beta[5]         -0.10    0.01  0.09     -0.10   290 1.00
beta[6]         -1.01    0.01  0.11     -1.01   215 1.00
beta[7]         -1.09    0.01  0.12     -1.09   313 1.01
beta[8]         -0.75    0.00  0.11     -0.75   600 1.00
beta[9]          0.05    0.01  0.09      0.05   309 1.00
beta[10]         0.10    0.00  0.10      0.10   600 1.00
beta[11]         1.71    0.01  0.13      1.71   600 1.00
beta[12]        -1.49    0.00  0.11     -1.49   600 1.00
beta[13]        -1.42    0.01  0.12     -1.41   323 1.00
beta[14]         0.14    0.01  0.09      0.14   252 1.01
beta[15]         1.28    0.00  0.11      1.27   600 1.00
beta[16]         0.67    0.01  0.10      0.67   250 1.01
beta[17]        -0.95    0.00  0.10     -0.95   600 1.01
beta[18]        -1.13    0.00  0.10     -1.13   600 1.00
beta[19]         1.17    0.01  0.11      1.16   413 1.01
beta[20]         1.02    0.01  0.12      1.01   190 1.02
beta[21]        -0.28    0.01  0.11     -0.28   357 1.01
beta[22]         1.05    0.01  0.11      1.04   254 1.00
beta[23]        -0.58    0.01  0.10     -0.57   222 1.00
beta[24]        -1.29    0.01  0.12     -1.29   256 1.01
beta[25]        -0.41    0.01  0.11     -0.40   199 1.01
mu_alpha         0.02    0.00  0.02      0.01   600 1.00
mu_beta         -0.22    0.01  0.20     -0.23   600 1.01
sigma_beta       0.96    0.01  0.15      0.95   600 0.99
lp__        -11021.89    4.27 40.53 -11020.06    90 1.03

Samples were drawn using NUTS(diag_e) at Wed Apr 22 01:54:54 2015.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

From the output, we can see that we have some evidence that the assigned lognormal prior and hyperpriors identifies the model. Nontheless, it’s very important to note that the choice of the hypepriors and values of the hyperparameters need to be tweaked carefully so that the HMC sampler doesn’t wander off sampling from other posterior modes. I suspect that with different intialial values the sampler would end up exploring a different posterior mode


This webpage was created using the R Markdown authoring format and was generated using knitr dynamic report engine.

(C) 2015 Rick Farouni


References

Adams, R. J., Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of educational and behavioral Statistics, 22.

Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by browne and draper). Bayesian analysis, 1.

Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.

Geweke, J., & Zhou, G. (1996). Measuring the pricing error of the arbitrage pricing theory. Review of Financial Studies, 9, 557–587.

McCullagh, P., & Nelder, J. A. (1989). Generalized linear models. Chapman; Hall.

Rivers, D. (2003). Identification of multidimensional spatial voting models. Typescript.Stanford University.