Teoría de la Respuesta al Ítem: fundamentos y modelos

José Antonio López Pina

doi:https://doi.org/10.6018/editum.3178

Teoría de la Respuesta al Ítem: fundamentos y modelos

Resumen

Este manual recoge la experiencia del autor durante más de 40 años trabajando y enseñando los modelos de la Teoría de la Respuesta al Ítem (TRI) y el modelado tipo Rasch. Aunque parte de una edición previa publicada en 1995 sobre la TRI, ha sufrido cambios sustanciales, tanto en su organización como en el contenido, dado que la edición inicial estuvo demasiado restringida a los modelos dicotómicos y apenas se presentaron ejemplos de cómo se podrían implementar estos modelos en la vida real.

Aunque la perspectiva del modelado Rasch y la TRI divergen en el abordaje de la medición con instrumentos psicométricos, en esta edición se ha realizado un esfuerzo por presentar todos los modelos conjuntamente, lo que se evidencia en un capítulo dedicado en exclusiva al modelo de Rasch separado del resto de capítulos. Sin embargo, tanto los modelos politómicos como los procedimientos de bondad de ajuste se presentan conjuntamente en aras de presentar una visión unificada de estos modelos, aunque en los ejemplos que acompañan al libro se podrá discernir los métodos de bondad ajuste más apropiados en función del modelo implementado.

El capítulo 1 presenta los aspectos básicos de la medición en las Ciencias Sociales y de la Salud para fundamentar la necesidad de obtener estándares de medida. El capítulo 2 presenta el modelo de Rasch separado del resto de modelos TRI. El capítulo 3 presenta los modelos dicotómicos en TRI, haciendo hincapié en que el modelo logístico de 1-p y el modelo de Rasch utilizan la misma función logística pero son modelos esencialmente distintos. El capítulo 4 se dedica a presentar los métodos que se utilizan actualmente para la estimación de parámetros y se presenta un ejemplo práctico de cómo se realiza la estimación de parámetros. El capítulo 5 presenta los procedimientos empleados para estudiar la bondad de ajuste de los datos al modelo o del modelo a los datos más comunes en la práctica actual. El capítulo 6 presenta conjuntamente los modelos politómicos más comunes, y finalmente el capítulo 7 presenta algunos métodos empleados para evaluar el sesgo de los ítems (DIF).

Este manual es fruto de muchos años de enseñanza de estos modelos en el Grado en Psicología, por lo que va destinado a una audiencia que tiene algunos conocimientos de Estadística y quiere iniciarse en el aprendizaje de alguno de los modelos que se presentan. El manual desarrolla los aspectos teóricos de los modelos, pero los futuros lectores disponen de diferentes ejemplos, con bases de datos reales, implementados en paquetes de R. Estos documentos, y las bases de datos correspondientes, se pueden descargar desde https://webs.um.es/jlpina.

Tabla de contenido

1. Medición y Escalamiento

1.1. Introducción

1.2. Medida fundamental

1.3. Medida en Psicología

1.4. Teoría de tests y escalamiento

1.5. Modelos de medida

1.6. Conceptos básicos en TRI

2. Modelo de Rasch

2.2. Introducción

2.2. Relación funcional entre habilidad y probabilidad de éxito

2.3. Formulación del modelo de Rasch

2.4. Definición de los parámetros

2.5. Indeterminación de los parámetros

2.6. Función de Respuesta al ítem

2.7. Principios del modelo de Rasch

2.8. Objetividad específica

2.9. Invarianza de parámetros

3. Modelos dicotómicos en TRI

3.1. Introducción

3.2. Modelo logístico de 1-p

3.3. Modelo logístico de 2-p

3.4. Modelo de 3-p

3.5. Función de Respuesta del Test

3.6. Función de Información

3.7. Función de información del test

3.8. Invarianza de parámetros en TRI

4. Estimación de parámetros

4.1. Introducción

4.2. Métodos de estimación de parámetros

4.3. Función de verosimilitud

4.4. Método de máxima verosimilitud condicional

4.5. Errores típicos de los parámetros

4.6. Problemática en la estimación de parámetros

4.7. Otros métodos de estimación de parámetros

5. Bondad de ajuste

5.1. Introducción

5.2. Estadísticos para probar la invarianza de parámetros

5.3. Estadísticos de ajuste basados en χ²

5.4. Medición apropiada

5.5. Evaluación de la dimensionalidad

5.6. Evaluación de la Independencia local

5.7. Índices de separación

5.8. Comparación de modelos

5.9. Procedimientos gráficos

5.10. Resumen de procedimientos estadísticos

6. Modelos politómicos

6.1. Introducción

6.2. Modelo de Crédito Parcial

6.3. Modelo de Escalas de Valoración

6.4. Análisis de las categorías

6.5. Función de Información del ítem

6.6. Modelo de Crédito Parcial Generalizado

6.7. Modelo de Respuesta Graduada

7. El sesgo de los ítems

7.1. Introducción

7.2. Métodos de invarianza condicional observada

7.3. Métodos de invarianza condicional no observada

Anexos

Referencias

Índice alfabético

Índice de autores

Referencias

Abad, F. J., Olea, J., Ponsoda, V. y García, C. (2011). Medición en ciencias sociales y de la salud. Síntesis.

Akaike, H. (1974). A new look at he statistical identification model. IEEE Transaction Automatic Control, 19, 716-723.

Allen, M. J. y Yen, W. M. (1979). Introduction to measurement theory. Brooks/Cole.

Andersen, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 25(1), 123-140.

Andersen, E. B. (1980). Discrete statistical models with social science applications. NorthHolland.

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 3(4), 357-374.

Angoff, W. H. y Ford, S. F. (1973). Item race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 10, 95-105.

Baker, F. B. (1992). Item response theory: Parameter estimation techniques. Marcel Dekker. Baker, F. B. y Kim, S. (2017). The basics of item response theory using R. Springer.

Benson, J. (1998). Developing a strong program of construct validation: A test anxiety example. Educational Measurement: Issues and Practice, 17(1), 10-22.

Berkson, J. (1955). Maximum likelihood and minimum chi square estimates of the logistic function. Journal of the American Statistical Association, 50, 120-152.

Birnbaum, A. (1968). Some latent trait models and their use in inferring and examinee’s ability. En F. M. Lord y M. R. Novick, Statistical theories of mental test scores. AddisonWesley.

Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29-51.

Bock, R. D. y Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters. An application of an EM algorithm. Psychometrika, 46, 443-459.

Bock, R. D. y Lieberman, M. (1970). Fitting a response model for n dichotomously scores items. Psychometrika, 35, 179-197.

Bond, T. y Fox, C. (2007). Applying the Rasch model: Fundamental measurement in the human sciences. LEA.

Brown, T. A. (2015). Confirmatory factor analysis for applied research. Second edition. Guilford Press.

Bunday, B. D. (1984). Basic optimisation methods. Edward Arnold.

Campbell, N. R. (1940). Physics and psychology. British Association for the Advancement of Science, 2, 347-348.

Cardall, C. y Coffman, W. E. (1984). A method for comparing the performance of different groups on the items in a test. (ETS Research Bulletin 64-62). Educational Testing Service.

Chen, W. H. y Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265-289.

Chou, Y. T. y Wang, W. C. (2010). Checking dimensionality in item response models with principal component analysis on standardized residuals. Educational and Psychological Measurement, 70(5), 717-731.

Christensen, K., Makransky, G., y Horton, M. (2017). Critical values for Yen’s Q3: Identification of local dependence in the Rasch model using residual correlations. Applied Psychological Measurement, 41(3), 178-194.

Cramer, H. (1946). Mathematical methods of statistics. Princeton University Press.

Crocker, L. y Algina, J. (1986). Introduction to classical and modern test theory. Holt, Rinehart and Winston.

Cronbach, L. J., Gleser, G. C., Nanda, H. y Rajaratnam, N. (1972). The dependability of behavioral measurement: Theory of generalizability for scores and profiles. New York: Wiley.

De Ayala, R. J. (2022). The theory and practice of item response theory. Guilford Press.

DeMars, C. (2010). Item response theory. Oxford University Press.

Dempster, A. P., Laird, N. M. y Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, 1-38.

Dodd, B. G. y Koch, W. R. (1987). Effects of variations in item step values on item and test information in the partial credit model. Applied Psychological Measurement, 11, 371-384.

Drasgow, F. y Lissak, R. I. (1983). Modified parallel analysis: a procedure for examining the latent dimensionality of dichotomously scored item responses. Journal of Applied Psychology, 68(3), 363-373.

Drasgow, F., Levine, M. V. y Willians, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 67-86.

Finch, H. y Habing, B. (2005). Comparison of NOHARM and DETECT in item cluster recovery: Counting dimensions and allocating items. Journal of Educational Measurement, 42, 149-169.

Fraser, C. y McDonald, R. D. (1988). NOHARM: Least squares item factor analysis. Multivariate Behavioral Research, 23, 267-269.

Glass, C. A. W. y Verhelst, N. (1995). Testing the Rasch model. En G. J. Fischer e I. W. Molenaar (Eds.), Rasch models. Their foundations, recent developments and applications (pp. 69-96). Springer.

Gulliksen, H. (1950). Theory of mental tests. LEA.

Guttman, L. (1974). The basis for scalogram analysis. Routledge.

Hambleton, R. K. (1989). Principles and selected applications of item response theory. En R. L. Linn (Ed.). Educational measurement (3. ed.) (pp. 147-200). MacMillan.

Hambleton, R. K. y Swaminathan, H. (1985). Item response theory: Principles and applications. Kluwer-Nijhoff.

Hambleton, R. K., Swaminathan, H. y Rogers, H. J. (1991). Fundamental of item response theory. Sage Pub.

Holland, P. W. (1990). The Dutch identity: A new tool for the study of item response theory. Psychometrika, 55, 5-18.

Holland, P. W. y Thayer, D. T. (1988). Differential item functioning and the Mante-Haenszel procedure. En H. Wainer y H. I. Braun (Eds.), Test validity (pp. 129-145). Erlbaum.

Hullin, C. L., Drasgow, F. y Parsons, C. K. (1983). Item response theory. Dow-Jones Irwin.

Kim, S. H. y Cohen, A. S. (1991). A comparison of two area measures for detecting differential item functioning. Applied Psychological Measurement, 15(3), 269-278.

King, J. T. (1984). Introduction to numerical computation. McGraw-Hill.

Krantz, D. H., Luce, R. D., Suppes, P. y Tversky, A. (1971). Foundations of measurement, Vol. 1. Academic Press.

Lazarsfeld, P. F. (1950). The logical and mathematical foundation of latent structure analysis. En S. A. Stoufer, L. Guttman, E. A. Suchman, P. F. Lazarsfeld, S. A. Star y J. A. Clausen (Eds.), Studies in social psychology in world war II: vol IV Measurement and prediction. Princeton University Press.

Levine, M. V. y Rubin, D. B. (1979). Measuring the appropriateness of multiple choice test scores. Journal of Educational Statistics, 4, 269-290.

Linacre, J. M. (2004). Optimizing rating scale category effectiveness. En E. V. Smith, Jr. y R. M. Smith (Eds.). Introduction to Rasch measurement: Theory, models and applications. JAM Press.

Linn, R. L. y Harnisch, D. L. (1981). Interactions between item content and group membership on achievement test items. Journal of Educational Measurement, 18(2), 109118.

Lord, F. M. (1952). A theory of test scores (No. 7). Psychometric Monograph.

Lord, F. M. (1968). An analysis of the verbal scholastic aptitude test using Birnbaum’s three parameter logistic model. Educational and Psychological Measurement, 28, 989-1020.

Lord, F. M. (1974). Individualized testing and item characteristics curve theory. En D. H. Krantz, R. C. Atkinson, R. D. Luce y P. Suppes (Eds.). Contemporary developments in mathematical psychology. Vol. II. Freeman.

Lord, F. M. (1980). Applications of item response theory to practical testing problems. LEA.

Lord, F. M. (1986). Maximum likelihood and bayesian parameter estimation in item response theory. Journal of Educational Measurement, 23, 157-162.

Lord, F. M. y Novick, M. R. (1968). Statistical theories of mental test scores. AddisonWesley.

Martínez-Arias, M. R. (1995). Psicometría: Teoría de los tests psicológicos y educativos. Síntesis.

Masters, G. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149174.

Maydeu-Olivares, A. (2013). Goodness-of-fit assessment of item response theory models. Measurement: Interdisciplinary Research and Perspectives, 11(3), 71-101.

Maydeu-Olivares, A. y Joe, H. (2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrika, 71(4), 713-732.

McDonald, R. P. (1985). Factor analysis and related methods. LEA

McDonald, R. P. (1997). Normal-ogive multidimensional model. En W. J. van der Linden y R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 257-269). Springer.

McDonald, R. P. (1999). Test theory: A unified treatment. LEA.

Meyer, J. P. (2014). Applied measurement with jMetrik. Routdlege.

Millsap, R. E. y Everson, H. T., (1993). Methodology review: Statisticial approaches for assessing measurement bias. Applied Psychological Measurement, 17(4), 297-334.

Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177-195.

Mislevy, R. J. y Stocking, M. L. (1989). A consumers’s guide to LOGIST and BILOG. Applied Psychological Measurement, 13, 57-75.

Muñiz, J. (2018). Introducción a la Psicometría: Teoría clásica y TRI. Pirámide.

Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159-176.

Neyman, J. y Scott, E. L. (1948). Consistent estimates based on partially consistentobservations. Econometrika, 16, 1-32.

Orlando, M. y Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50-64.

Orlando, M. y Thissen, D. (2003). Further investigation of the performance of S-X2: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27(4), 289-298.

Osterlind, S. J. (1983). Test item bias. Sage University Press series on Quantitative Applications in the Social Sciences, series no. 07-030. Sage Publications.

Paek, I. y Cole, K. (2020). Using R for item response theory model applications. Routledge.

Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495-502.

Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197207.

Rao, C. R. (1965). Linear statistical inference and its application. Wiley

Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment test. The University Chicago Press.

Rudner, L. M., Getson, P. R. y Knight, D. L. (1980). A Monte Carlo comparison of seven biased item detection techniques. Journal of Educational Measurement, 17, 1-10.

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika, 34, Issue S1, (pp. 1-97).

Shealy, R. y Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159-194.

Sijtsma, K. y Molenaar, I. W. (2002). Introduction to nonparametric item response theory, Vol. 5. Sage.

Smith R. M., Schumacker, R. E. y Bush, M. J. (1998). Using item mean squares to evaluate fit to the Rasch model. Journal of Outcome Measurement, 2(1), 66-78.

Spearman, C. (1904). General Inteligence, objectively determined and measured. The American Journal of Psychology, 15(2), 201-293.

Spearman, C. (1907). Demonstration of formulae for true measurement of correlation. The American Journal of Psychology, 18(2), 161-169.

Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 667-680.

Stevens, S. S. (1951). Mathematics, measurement and psychophysics. En S. S. Stevens (Ed.), Handbook of experimental psychology (pp. 1-49). Wiley.

Stout, W. (1987). A nonparametric approach for assessing latent trait unidimensionality. Psychometrika, 52(4), 589-617.

Suen, H. K. (1990). Principles of test theories. LEA.

Swaminathan, H. y Gifford, J. A. (1982). Bayesian estimation in the Rasch model. Journal of Educational Statistics, 7, 175-192.

Swaminathan, H. y Gifford, J. A. (1985). Bayesian estimation in the two-parameter logistic model. Psychometrika, 50, 349-364.

Swaminathan, H. y Gifford, J. A. (1986). Bayesian estimation in the three-parameter logistic model. Psychometrika, 51, 589-601.

Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.

Tanaka, Y. (1993). Multifaceted conceptions of fit structural equation models. En K. A. Bollen y J. S. Long (Eds.). Testing structural equation models. Sage.

Tatsuoka, K. K. (1984). Caution indices based on item response theory. Psychometrika, 49, 95-110.

Tatsuoka, K. K. y Linn, R. L. (1983). Indices for detecting unusual patterns: Links between two general approaches and potential applications. Applied Psychological Measurement, 7(1), 81-96.

Thissen, D., Steinberg, L. y Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. En H. Wainer y H. I. Braun (Eds.), Test validity (pp. 147-169). Erlbaum.

Thissen, D., Steinberg, L. y Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. En P. W. Holland y H. Wainer (Eds.), Differential item functioning (pp. 67-113). Erlbaum.

Van der Linden, W. (1994). Fundamental measurement and the fundamentals of Rasch measurement. En M. Wilson (Ed.). Objective measurement: Theory into practice (Vol. 2). Ablex Publishing Corporation.

Wright, B. D. y Masters, G. N. (1982). Rating scale analysis: Rasch measurement. MESA Press.

Wright, B. D. y Masters, G. N. (1990). Computation of OUTFIT and INFIT statistics. Rasch Measurement Transactions, 3(4), 84-85.

Wright, B. D. y Stone, M. H. (1979). Best test design. MESA.

Yen, W. M. (1993). Scaling performance assessing: Strategies for managing local item dependence. Journal of Educational Measurement, 30(3), 187-213.

Zwick, R. y Ercikan, K. (1989). Analysis of differential item functioning in the NAEP history assessment. Journal of Educational Measurement, 26(1), 55-66.

Cómo citar (APA 7th)López Pina, J. A. (2026). Teoría de la Respuesta al Ítem: fundamentos y modelos. Editum. Ediciones de la Universidad de Murcia. https://doi.org/10.6018/editum.3178

Otros formatos de cita