sim.data.Rdrandomly generate response matrix according to certain conditions, including attributes distribution, item quality, sample size, Q-matrix and cognitive diagnosis models (CDMs).
sim.data(
Q = NULL,
N = NULL,
IQ = list(P0 = NULL, P1 = NULL),
att.str = NULL,
model = "GDINA",
distribute = "uniform",
control = NULL,
verbose = TRUE
)The Q-matrix. A random 30 × 5 Q-matrix (sim.Q) will be used if Q = NULL.
Sample size. Default = 500.
A list containing two \(I\)-length vectors: P0 and P1.
P0 represents the probability of examinees who have not mastered any attributes
(\([00...0]\)) correctly answering the item, while P1 represents the probability
of examinees who have mastered all attributes (\([11...1]\)) correctly answering the item.
Specify attribute structures. NULL, by default, means there is no structure. Attribute structure
needs be specified as a list - which will be internally handled by att.structure function.
See examples. It can also be a matrix giving all permissible attribute profiles.
Type of model to be fitted; can be "GDINA", "LCDM", "DINA", "DINO",
"ACDM", "LLM", or "rRUM".
Attribute distributions; can be "uniform" for the uniform distribution,
"mvnorm" for the multivariate normal distribution (Chiu, Douglas, & Li,
2009) and "horder" for the higher-order distribution (Tu et al., 2022).
A list of control parameters with elements:
sigma A positive-definite symmetric matrix specifying the variance-covariance
matrix when distribute = "mvnorm". Default = 0.5 (Chiu, Douglas, & Li, 2009).
cutoffs A vector giving the cutoff for each attribute when distribute = "mvnorm".
Default = \(k/(1+K)\) (Chiu, Douglas, & Li, 2009).
theta A vector of length N representing the higher-order ability for each examinee.
By default, generate randomly from the standard normal distribution (Tu et al, 2022).
a The slopes for the higher-order model when distribute = "horder".
Default = 1.5 (Tu et al, 2022).
b The intercepts when distribute = "horder". By default, select equally spaced
values between -1.5 and 1.5 according to the number of attributes (Tu et al, 2022).
alpha Used to generate a structured parameter distribution with a hierarchical structure
when att.str is not NULL. This distribution is randomly drawn from a Dirichlet
distribution, where alpha denotes the parameters of the Dirichlet distribution,
and its length equals the number L.str of all valid attribute profiles
\(\boldsymbol{\alpha}\) under the hierarchical structure. Default by .
alpha = rep(1, L.str).
Logical indicating to print information or not. Default is TRUE
Object of class sim.data.
An sim.data object initially gained by simGDINA function form GDINA package.
Elements that can be extracted using method extract include:
datAn N × I simulated item response matrix.
QThe Q-matrix.
attributeAn N × K matrix for inviduals' attribute patterns.
catprob.parmA list of non-zero success probabilities for each attribute mastery pattern.
delta.parmA list of delta parameters.
higher.order.parmHigher-order parameters.
mvnorm.parmMultivariate normal distribution parameters.
LCprob.parmA matrix of success probabilities for each attribute mastery pattern.
Chiu, C.-Y., Douglas, J. A., & Li, X. (2009). Cluster Analysis for Cognitive Diagnosis: Theory and Applications. Psychometrika, 74(4), 633-665. DOI: 10.1007/s11336-009-9125-0.
Tu, D., Chiu, J., Ma, W., Wang, D., Cai, Y., & Ouyang, X. (2022). A multiple logistic regression-based (MLR-B) Q-matrix validation method for cognitive diagnosis models:A confirmatory approach. Behavior Research Methods. DOI: 10.3758/s13428-022-01880-x.
################################################################
# Example 1 #
# generate data follow the uniform distrbution #
################################################################
library(Qval)
set.seed(123)
K <- 5
I <- 10
Q <- sim.Q(K, I)
IQ <- list(
P0 = runif(I, 0.0, 0.2),
P1 = runif(I, 0.8, 1.0)
)
data.obj <- sim.data(Q = Q, N = 100, IQ=IQ, model = "GDINA", distribute = "uniform")
#> distribute = uniform
#> model = GDINA
#> number of attributes: 5
#> number of items: 10
#> num of examinees: 100
#> average of P0 = 0.116
#> average of P1 = 0.926
print(data.obj$dat)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 1 0 0 1 1 1 1 0 0 1
#> [2,] 1 0 1 1 1 0 1 1 0 0
#> [3,] 1 0 1 1 0 0 0 1 0 1
#> [4,] 0 0 1 0 1 0 1 1 0 0
#> [5,] 0 1 0 1 1 0 1 0 1 0
#> [6,] 0 0 0 0 0 1 1 1 1 1
#> [7,] 1 1 0 1 0 0 0 0 0 0
#> [8,] 1 1 1 1 1 1 1 1 1 1
#> [9,] 1 1 0 0 1 0 1 0 1 0
#> [10,] 0 1 1 0 0 0 0 1 1 0
#> [11,] 0 0 1 0 1 1 1 1 1 0
#> [12,] 0 0 0 0 0 1 1 0 0 1
#> [13,] 0 0 0 0 1 1 1 0 0 1
#> [14,] 0 0 0 0 1 1 1 1 1 1
#> [15,] 1 0 1 1 0 1 0 1 0 1
#> [16,] 1 0 1 1 1 1 1 1 0 0
#> [17,] 1 1 1 1 1 0 1 1 1 0
#> [18,] 1 1 0 1 1 0 1 0 1 0
#> [19,] 0 0 1 0 1 0 1 0 0 0
#> [20,] 1 1 0 1 0 0 1 0 1 0
#> [21,] 1 0 1 1 1 1 1 1 0 1
#> [22,] 1 1 0 1 1 0 1 1 0 1
#> [23,] 0 1 0 0 0 0 0 0 1 0
#> [24,] 0 1 1 0 0 1 0 1 1 1
#> [25,] 0 0 0 0 1 0 1 1 0 0
#> [26,] 0 1 1 0 1 1 1 1 1 1
#> [27,] 0 0 0 0 1 0 0 0 0 1
#> [28,] 1 0 0 1 0 0 1 1 0 1
#> [29,] 0 0 1 0 0 1 0 1 0 1
#> [30,] 1 0 1 1 0 1 0 1 1 1
#> [31,] 0 1 1 0 0 0 0 1 1 1
#> [32,] 0 0 1 0 1 1 1 1 0 1
#> [33,] 0 0 0 0 0 0 1 0 0 0
#> [34,] 0 1 1 0 0 1 0 1 1 1
#> [35,] 1 1 1 1 1 1 1 1 0 0
#> [36,] 0 1 1 0 0 1 0 1 1 1
#> [37,] 0 1 0 0 0 1 0 0 1 0
#> [38,] 1 1 0 1 0 0 1 0 1 0
#> [39,] 1 1 0 1 0 1 0 0 1 0
#> [40,] 0 1 0 0 1 1 1 0 0 1
#> [41,] 1 1 0 1 0 0 1 0 1 0
#> [42,] 1 1 0 1 0 1 0 0 1 1
#> [43,] 1 0 1 1 1 0 1 1 0 0
#> [44,] 1 1 1 1 1 1 0 1 1 1
#> [45,] 0 1 0 0 0 1 0 0 1 1
#> [46,] 1 0 0 1 1 1 1 0 0 1
#> [47,] 1 1 0 1 0 1 1 1 1 1
#> [48,] 0 1 0 0 0 0 0 0 1 1
#> [49,] 1 0 1 1 0 0 1 1 0 0
#> [50,] 1 0 0 1 1 0 1 0 0 0
#> [51,] 0 1 0 0 0 0 1 0 1 1
#> [52,] 0 1 0 0 1 1 1 0 1 1
#> [53,] 0 1 1 0 1 0 1 1 0 1
#> [54,] 1 1 1 1 0 1 1 1 1 1
#> [55,] 0 1 0 0 0 1 0 0 1 1
#> [56,] 1 0 1 1 1 0 1 1 0 0
#> [57,] 0 1 0 0 1 1 1 0 1 1
#> [58,] 0 1 1 0 1 1 1 1 0 1
#> [59,] 0 1 1 1 0 0 1 1 0 1
#> [60,] 0 0 0 0 1 1 1 0 0 1
#> [61,] 0 0 0 1 0 0 0 0 0 0
#> [62,] 0 1 1 0 0 0 0 1 1 0
#> [63,] 0 0 0 0 0 1 1 1 0 0
#> [64,] 0 1 0 0 1 1 1 0 1 1
#> [65,] 1 1 1 1 0 1 0 1 1 1
#> [66,] 0 1 0 0 1 0 1 1 1 0
#> [67,] 0 1 0 0 0 0 0 1 1 1
#> [68,] 0 1 0 1 1 1 1 0 0 1
#> [69,] 0 1 0 0 1 1 1 0 1 1
#> [70,] 0 1 1 0 0 0 1 1 1 1
#> [71,] 1 1 1 1 1 1 1 1 1 1
#> [72,] 1 0 0 1 0 1 0 0 0 1
#> [73,] 0 1 1 0 0 1 0 1 1 1
#> [74,] 1 1 1 1 0 1 0 1 0 1
#> [75,] 0 0 1 0 0 1 0 1 1 0
#> [76,] 1 0 0 1 1 0 1 1 0 0
#> [77,] 0 1 0 0 0 1 1 1 1 1
#> [78,] 1 1 0 1 1 1 1 1 1 1
#> [79,] 0 0 1 0 1 1 0 1 0 1
#> [80,] 1 1 1 1 0 1 0 1 1 1
#> [81,] 0 1 1 0 0 1 0 1 1 1
#> [82,] 1 0 0 1 0 1 1 0 0 1
#> [83,] 1 1 0 1 0 1 1 0 1 1
#> [84,] 1 1 1 1 1 1 1 1 1 1
#> [85,] 0 0 1 0 0 0 0 1 0 0
#> [86,] 1 0 1 1 1 1 0 0 0 1
#> [87,] 0 1 1 0 0 1 1 1 1 1
#> [88,] 0 1 1 1 0 1 1 1 1 1
#> [89,] 0 1 1 0 1 0 1 1 0 0
#> [90,] 0 0 1 0 0 0 0 1 0 0
#> [91,] 1 1 1 1 1 1 0 1 0 1
#> [92,] 0 1 0 0 1 1 1 0 1 1
#> [93,] 1 0 0 1 0 0 0 0 1 1
#> [94,] 1 1 0 1 0 0 1 0 1 0
#> [95,] 1 1 0 0 1 0 1 1 1 1
#> [96,] 0 0 0 0 0 1 0 0 0 1
#> [97,] 1 0 0 0 1 0 1 0 0 0
#> [98,] 0 1 1 0 1 0 1 1 1 0
#> [99,] 0 1 0 0 1 0 1 0 1 0
#> [100,] 0 0 0 0 1 0 1 1 0 1
################################################################
# Example 2 #
# generate data follow the mvnorm distrbution #
################################################################
set.seed(123)
K <- 5
I <- 10
Q <- sim.Q(K, I)
IQ <- list(
P0 = runif(I, 0.0, 0.2),
P1 = runif(I, 0.8, 1.0)
)
cutoffs <- sample(qnorm(c(1:K)/(K+1)), ncol(Q))
data.obj <- sim.data(Q = Q, N = 10, IQ=IQ, model = "GDINA", distribute = "mvnorm",
control = list(sigma = 0.5, cutoffs = cutoffs))
#> distribute = mvnorm
#> model = GDINA
#> number of attributes: 5
#> number of items: 10
#> num of examinees: 10
#> average of P0 = 0.116
#> average of P1 = 0.926
#> sigma = 0.5
#> cutoffs = -0.967 0 0.431 0.967 -0.431
print(data.obj$dat)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 1 1 1 1 1 0 1 1 1 0
#> [2,] 0 1 0 0 1 0 1 0 0 0
#> [3,] 1 1 1 1 1 0 1 1 1 1
#> [4,] 1 1 1 1 1 0 1 1 1 1
#> [5,] 1 1 1 1 1 0 1 1 1 0
#> [6,] 1 1 1 1 1 0 1 1 1 0
#> [7,] 0 0 0 0 0 0 0 0 0 1
#> [8,] 0 1 0 0 1 0 1 0 1 0
#> [9,] 1 1 1 1 0 1 1 1 1 1
#> [10,] 0 0 1 0 0 0 1 1 0 0
#################################################################
# Example 3 #
# generate data follow the horder distrbution #
#################################################################
set.seed(123)
K <- 5
I <- 10
Q <- sim.Q(K, I)
IQ <- list(
P0 = runif(I, 0.0, 0.2),
P1 = runif(I, 0.8, 1.0)
)
theta <- rnorm(10, 0, 1)
b <- seq(-1.5,1.5,length.out=K)
data.obj <- sim.data(Q = Q, N = 10, IQ=IQ, model = "GDINA", distribute = "horder",
control = list(theta = theta, a = 1.5, b = b))
#> distribute = horder
#> model = GDINA
#> number of attributes: 5
#> number of items: 10
#> num of examinees: 10
#> average of P0 = 0.116
#> average of P1 = 0.926
#> theta_mean = -0.625 , theta_sd = 0.906
#> a = 1.5
#> b = -1.5 -0.75 0 0.75 1.5
print(data.obj$dat)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 0 0 0 0 0 1 0 0 0 0
#> [2,] 1 1 0 1 1 1 1 1 1 1
#> [3,] 0 0 0 0 0 0 0 0 0 0
#> [4,] 0 0 0 0 0 0 0 0 1 0
#> [5,] 1 1 0 0 0 0 0 0 1 1
#> [6,] 1 0 0 1 0 1 0 0 0 1
#> [7,] 0 1 0 1 0 1 0 0 1 1
#> [8,] 0 1 0 0 0 0 0 0 1 0
#> [9,] 0 0 0 0 0 0 0 0 0 0
#> [10,] 0 0 1 0 1 1 1 1 0 1