Multivariate Generalization Analysis for Students’ Evaluation on Teaching Level of Teachers in Medical Colleges
-
摘要:
目的 采用多元概化理论评价《医药类院校教师课堂教学水平学生评价量表》信度的同时对各维度条目数优化提出建议,并确定学生评教实践中适宜的学生人数。 方法 收集整理通过该问卷调查的某医科大学422名学生数据,使用mGENOVA进行多元概化分析。先在G研究中估计各种误差来源的方差分量,然后实施一系列改变条目数和学生数D研究获得不同情况下信度系数以评价量表信度。 结果 G研究中每个领域均呈现学生嵌套于教师的方差分量最大。D研究中,在领域水平,除了教学组织和教学方法2个领域外,其余领域的概化系数和可靠性指数均大于0.80;在总量表水平,合成概化系数和合成可靠性指数均高于0.85。保证可靠性指数在0.80及以上的前提下,每班至少抽取的学生数为25人;保证概化系数在0.80及以上的前提下,每班至少抽取的学生数为28人。 结论 基于多元概化分析此量表总体上有很好的信度,若下一步需要修订可考虑在教学组织和教学方法2个领域进行内容调整,在高校学生评教实践中各班抽取28名学生来进行调查最合适。 Abstract:Objective To evaluate the reliability of the Student Evaluation Scale for the Teaching Level of Teachers in Medical Universities (SESTLTMU) and determine the appropriate number of students in the teaching evaluation based on Multivariate Generalizability Theory (MGT) . Methods The data of 422 students from a medical university who were surveyed by this scale were collected and analyzed by using mGENOVA, a special software of multivariate generalizability theory. The variance components of various error sources were estimated in Generalizability Study (G-study), and then several Decision Studies (D-studies) with varying numbers of items and numbers of students were analyzed to obtain reliability coefficients including generalizability coefficient (G) and the indexes of dependability (Ф) in order to evaluate the reliability of the scale. Results In the G-study, the most prominent variation in every domain was introduced by student nested in teacher effect. In the D-study, at the level of domain, the G coefficients and the Ф coefficients for three of the five domains were approximately equal to or greater than 0.80, except for the teaching organization domain and teaching method domain (> 0.70 but < 0.80). For the overall scale, the compositeG and composite Ф coefficients were larger than 0.85. Under the premise that the Ф is 0.80 or above, the minimum number of students selected from each class should be 25. Under the premise that the G is 0.80 or above, the minimum number of students selected from each class should be 28. Conclusions The scale has good reliability as a whole based on the results of MGT. If this scale needs to be revised in the future, it can be considered to adjust the content in the teaching organization domain and the teaching method domain. It is the most appropriate to select 28 students from each class for investigation in the practice of teaching evaluation by university students. -
表 1 各领域方差及协方差分量估计
Table 1. The estimated variance-covariance components for every domain
效应 教学组织 教学内容 教学方法 教学态度 教学效果 t 0.0200 1.0282 1.0582 1.0586 1.0764 0.0262 0.0324 1.0068 1.0223 1.0118 0.0333 0.0404 0.0497 0.9881 1.0431 0.0243 0.0299 0.0358 0.0264 1.0172 0.0325 0.0389 0.0496 0.0353 0.0455 s:t 0.2235 0.1911 0.1908 0.2055 0.1976 0.2648 0.1726 0.1643 0.1820 0.1901 0.2112 0.2020 0.2686 0.1991 0.3510 i 0.0142 0.0138 0.0774 0.0012 0.0022 ti 0.0063 0.0008 0.0344 0.0073 0.0031 si:t 0.2083 0.1472 0.3348 0.1083 0.1697 对角线上加粗标注的值为各效应的方差分量,对角线以上的值是典型相关系数,而对角线以下值是各个领域的协方差分量。 表 2 基于原始测量长度条件下多元D研究结果
Table 2. D-study results for design based on original test length
指标 教学组织 教学内容 教学方法 教学态度 教学效果 总量表 $n'_i =5$ $n'_i =7$ $n'_i=8 $ $n'_i =7$ $n'_i =3$ $n'_i=33 $ $\sigma^2_P $ 0.0200 0.0324 0.0497 0.0264 0.0450 0.0356 $\sigma_{\delta}^2 $ 0.0049 0.0030 0.0085 0.0039 0.0058 0.0033 $\sigma_\Delta^2$ 0.0078 0.0050 0.0182 0.0041 0.0061 0.0040 $\sigma_{X_Pl}^2 $ 0.0078 0.0091 0.0213 0.0062 0.0106 0.0085 G 0.8023 0.9145 0.8535 0.8720 0.8878 0.9152 Ф 0.7203 0.8664 0.7318 0.8671 0.8816 0.8981 $\sigma^2_P $全域分数方差, $\sigma_{\delta}^2 $:相对误差方差, $\sigma_\Delta^2 $:绝对误差方差, $\sigma_{X_Pl}^2 $ :用样本均数来估计全域分数时的误差方差,G:概化系数,Ф:可靠性指数。 表 3 各个领域的领域条目数比例与方差贡献率间比较
Table 3. Comparison between the CRCUS and the PDS in every domain
指标 教学组织 教学内容 教学方法 教学态度 教学效果 条目数 5 7 8 7 6 领域条目数比例/权重系数(%) 15.15 21.21 24.24 21.21 18.18 领域全域分数对合成全域分数的方差贡献率(%) 11.79 20.26 28.76 18.29 20.89 方差贡献率与领域条目数比例间的绝对差(%) −3.36 −0.95 4.52 −2.92 2.71 方差贡献率与领域条目数比例间的相对差(%) −22.19 −4.49 18.64 −13.78 14.89 绝对差 = 方差贡献率−领域条目数比例;相对差 = (方差贡献率−领域条目数比例)/领域条目数比例×100%。 表 4 不同测量长度下各领域及共性量表的两信度系数间比较
Table 4. Comparison of two reliability coefficients of every domains and universe under different test length
领域 条目数 概化系数(G) 可靠性指数(Ф) 模型1 模型2 模型3 模型1 模型2 模型3 模型1 模型2 模型3 教学组织 5 6 7 0.8023 0.8123 0.8196 0.7203 0.7411 0.7567 教学内容 7 6 4 0.9145 0.9128 0.9069 0.8664 0.8574 0.8272 教学方法 8 9 10 0.8535 0.8615 0.8681 0.7318 0.7497 0.7646 教学态度 7 6 4 0.8720 0.8660 0.8457 0.8671 0.8603 0.8376 教学效果 6 5 3 0.8878 0.8847 0.8724 0.8816 0.8773 0.8605 总量表 33 32 28 0.9152 0.9135 0.9088 0.8981 0.8937 0.8818 表 5 不同样本下各领域及共性量表的两信度系数间比较
Table 5. Comparison of the two reliability coefficients of every domains and universe under different samples size
样本模型 合计样本数 概化系数(G) 可靠性指数(Ф) 模型A 420 0.9152 0.8981 模型B 281 0.8815 0.8600 模型C 212 0.8524 0.8376 模型D 140 0.7944 0.7815 模型E 450 0.9289 0.9112 模型F 300 0.9010 0.8844 模型G 150 0.8264 0.8124 模型H 140 0.8168 0.8031 模型I 135 0.8115 0.7980 模型J 125 0.8000 0.7868 模型K 100 0.7633 0.7513 -
[1] 本书编委会. 全国普通高校本科教育教学质量报告(2020年度)[M]. 北京: 高等教育出版社, 2021: 1-264. [2] 陈银燕. 高校发展性评价体系构建:教师和机构的双维度评价[J]. 内蒙古师范大学学报(教育科学版),2016,29(03):76-78. [3] Debroy A,Ingole A,Mudey A. Teachers’ perceptions on student evaluation of teaching as a tool for faculty development and quality assurance in medical education[J]. Educ Health Promot,2019,8:218-225. [4] Constantinou C,Wijnen-Meijer M. Student evaluations of teaching and the development of a comprehensive measure of teaching effectiveness for medical schools[J]. BMC Med Educ,2022,22(1):113. doi: 10.1186/s12909-022-03148-6 [5] 黎光明,甄锋泉,王幸君,等. 多元概化理论在教育测量与评价中的多维化分析[J]. 教育测量与评价(理论版),2016,180(2):13-17. [6] 孟琼,张美霞,陈莹,等. 医科院校教师教学水平学生评价量表的信度效度分析[J]. 卫生软科学,2016,30(7):46-48+53. doi: 10.3969/j.issn.1003-2800.2016.07.012 [7] 张志明, 张雷. 测评的概化理论及其应用[M]. 北京: 教育科学出版社, 2003: 52-53. [8] Nicaise V,Bois J E,Fairclough S J,et al. Girls’ and boys’ perceptions of physical education teachers' feedback:effects on performance and psychological responses[J]. Sports Sci,2007,25(8):915-926. doi: 10.1080/02640410600898095 [9] Wolbring T,Riordan P. How beauty works. Theoretical mechanisms and two empirical applications on students’ evaluation of teaching[J]. Soc Sci Res,2016,57:253-272. doi: 10.1016/j.ssresearch.2015.12.009 [10] Doubleday A F,Lee L M. Dissecting the voice:Health professions students’ perceptions of instructor age and gender in an online environment and the impact on evaluations for faculty[J]. Anat Sci Educ,2016,9(6):537-544. doi: 10.1002/ase.1609 [11] Briesch A M,Swaminathan H,Welsh M,et al. Generalizability theory:A practical guide to study design,implementation,and interpretation[J]. Sch Psychol,2014,52(1):13-35. doi: 10.1016/j.jsp.2013.11.008 [12] Vispoel W P,Morris C A,Kilinc M. Applications of generalizability theory and their relations to classical test theory and structural equation modeling[J]. Psychol Methods,2018,23(1):1-26. doi: 10.1037/met0000107 [13] Keller L A,Clauser B E,Swanson D B. Using multivariate generalizability theory to assess the effect of content stratification on the reliability of a performance assessment[J]. Advances in Health Sciences Education,2010,15(5):717-733. doi: 10.1007/s10459-010-9233-8 [14] Ibrahim A M. Using generalizability theory to estimate the relative effect of class size and number of items on the dependability of student ratings of instruction[J]. Psychol Rep,2011,109(1):252-258. doi: 10.2466/03.07.11.PR0.109.4.252-258