Validation of Final Residency Tests Based on the Classical Model in Medical University of Mashhad (A Case Study Improving Dermatology, Ophthalmology, Obstetrics and Gynecology)

Document Type: Original Article


Department of Educational Science, Payame Noor University, Tehran, IRAN


Background: Evaluation of students’ educational progress is one of the main goals of universities. One of the most important means to this end is the final test. In this study, the results of the final residency tests for dermatology, ophthalmology and Gynecology and Obstetrics of Medical University of Mashhad in (Tir 1391) have been examined.
Methods: in this applied study, the population consisted of the answer sheets of residency students of dermatology, ophthalmology and Gynecology and Obstetrics of Medical University of Mashhad. The analysis of 113 answer sheets was performed through the classic model, which includes the credibility, level of difficulty and the coefficient of question determination; by the level of difficulty we mean the ratio of correct answers and by the coefficient of question determination, the two-point correlation of the question with the total score. More often than not, the questions included in the final test are the ones with a satisfactory level of difficulty, compatible with the target group and a remarkable coefficient of determination. The exams which lead to issuing the residency certificate are called summative tests. Such tests depend on questions at either extremes, on the one hand, and the discriminant index of questions, on the other, in the classic model.  
Results: the questions at either extremes were introduced. The vague questions or the ones lacking the discriminant index as well as other incompatible questions were excluded; the rest of the questions were regarded as the basis for decision-making and ranking of the test takers.
Conclusions: 1) the ranking of test takers, being influenced by group changes was studied 2) All the questions of each test are compatible regarding both form and content so that all of them share a common feature 3) The questions were screened out through certain formulae and the final test questions were determined 4) The questions are chosen so as to make the test taker produce an answer rather than to pick an alternative from among others 5) Those questions which were left in the final test reflect a general scientific progress rather than a certain course belonging to that particular science.


  1. Guilbert JJ. Educational handbook for health personnel. 6th ed. Geneva: World Health Organization; 1987. 53-57.
  2. Smith-Strøm H, Nortvedt MW. Evaluation of evidence-based methods used to teach nursing students to critically appraise evidence. J NursEduc 2008; 47(8): 372-5.
  3. Garakyaraghi M, Avijegan M, Ebrahimi A, Esfandiari E, Esmaeili A, ShayanSh,et al. Assessment of qualitative and quantitative indexes of Clerkship Tests in general medicine. Iranian journalof medical education 2011; 10(5): 533-42. [In Persian].
  4. Seif A. Measurement, assessment and evaluation of training. 4th ed. Tehran: Dowran; 2008. [In Persian]. 124-126.
  5. Hooman H.A. Educational and psychological measurements.Tehran: PeykFarhang; 2011. [In Persian]. 50-55.
  6. Abrayshmkar S, Sabouri M, Shayan SH, Eshraghi N, Maleki L. Analyzing and comparing the results of Objective Structured Clinical Examination (OSCE), in-group evaluation and final improvement examination of neurosurgical assistants of Isfahan University of MedicalSciences in 2009-2010. Iranian journalof medical education 2011; 10(5): 634-42. [In Persian].
  7. Avizhgan M, Omid A, Dehghani M, Esmaeili A, Asilian A, Akhlaghi M,et al. Determining minimum skill achievements in advanced clinical Clerkship(externship) in school of medicine using logbooks. Iranian journalof medical education 2011; 10(5): 543-51. [In Persian].
  8. McDonaldP,PaunonenSV. A Monte Carlo comparison of item and person statisticsbased on Item Response theory versus classical test theory. EducPsycholMeas2002; 62(6): 921-43.
  9. StageC.A comparison between item analogismbased onitem response theory and classical test theory.A study of the SweSAT subset ERC,2000.
  10. Taghizadeh T. Evaluation of short term course held in education and research department province of Tehran; 2011. [In Persian].
  11. Mam sharifi, E. Delavar,A. Bloki, A. Shaabani, S. The evaluation of driving theory test is based on Item Response Theoryand comparison with test classical theory. 1391, 25-30
  12. Amiriyan, S. determination of Triple parameters of the multiple choice tests, medical university of Mashhad on the IRT. Master thesis of the department of education and PNU Tehran. Department of educational sciences. 1391