Teaching Evaluation Resources

	A	B	C	D	E	F	G	H	I	J	K	L	M
1	Reference	Title	Folder	Sub Folder	Format	Type of document (news article, presentation, report, research article, table…)	Type of Evaluation				Type of resource	Summary (much of the text is adapted from the respective source documents)	Abstract
2	Reference	Title	Folder	Sub Folder	Format		Self	Peer	Student	Other	Type of resource		Abstract

3	Laverty JT, Underwood SM, Matz RL, Posey LA, Carmel JH, Caballero MD, et al. (2016) Characterizing College Science Assessments: The Three-Dimensional Learning Assessment Protocol. PLoS ONE 11(9): 1-21.	PLOS-One Characterizing Science Assessments	Emily Miller (AAU)	None	PDF	article - research				x	protocol	Three-dimensional learning emphasizes scientific and engineering practices, crosscutting concepts, and disciplinary core ideas. This article introduces an assessment protocol for 3D learning in higher education college science courses, specifically supporing tasks in physics, chemistry, and biology. Paper goes over the development process, the validity, and the reliability of the protocol.	Many calls to improve science education in college and university settings have focused on improving instructor pedagogy. Meanwhile, science education at the K-12 level is undergoing significant changes as a result of the emphasis on scientific and engineering practices, crosscutting concepts, and disciplinary core ideas. This framework of “three-dimensional learning” is based on the literature about how people learn science and how we can help students put their knowledge to use. Recently, similar changes are underway in higher education by incorporating three-dimensional learning into college science courses. As these transformations move forward, it will become important to assess three-dimensional learning both to align assessments with the learning environment, and to assess the extent of the transformations. In this paper we introduce the Three-Dimensional Learning Assessment Protocol (3D-LAP), which is designed to characterize and support the development of assessment tasks in biology, chemistry, and physics that align with transformation efforts. We describe the development process used by our interdisciplinary team, discuss the validity and reliability of the protocol, and provide evidence that the protocol can distinguish between assessments that have the potential to elicit evidence of three-dimensional learning and those that do not.
4	Miller, E. (June, 2018). Changing the Culture to Recognize and Reward Teaching at Research Universities. In Exploring Practical Ways to Inspire and Research Teaching Effectiveness and Instructional Innovation, the symposium of the Center for Education Innovation and Learning in the Sciences at UCLA.	UCLA Teaching Eval Keynote - Emily Miller	Emily Miller (AAU)	None	PDF	presentation	x	x	x		policy, overview	Keynote talk centered on undergraduate STEM Education Initiative to influence the culture of STEM departments at AAU universities so that faculty members are encouraged to use teaching practices proven by research to effectively engage students in STEM education and help them learn. Goes over the following: institutional policy goals; university promotion and tenure policies; evaluations used to measure effective teaching; data from faculty and chairs; examples of effective evaluation strategies; career framework for university teaching; evidence of approach and impact; essential questions (for evidence-based teaching and for use of effectiveness measures in the review, promotion, and tenure process).	NA
5	American Association for the Advancement of Science. (2012). Describing and Measuring Undergraduate STEM Teaching Practices, report from national meeting on the measurement of undergraduate science, technology, engineering, and mathematics (STEM) teaching, Dec 17-19, 2012	Measuring STEM Teaching Practices	Emily Miller (AAU)	None	PDF	report	x	x	x		overview, instruments, protocols	Report identifies four basic measurement techniques (surveys, interviews, observations, and portfolios), provides an overview of the strengths and weaknesses of each, identifies and summarizes specific protocols and measurement tools within each technique, and gives references for further details. An important conclusion is that the best descriptions of STEM teaching involve the use of multiple techniques.	NA
6	Flaherty, C. (2018, May 22). Teaching Eval Shake-Up. Inside Higher Ed. Retrieved from https://www.insidehighered.com/news/2018/05/22/most-institutions-say-they-value-teaching-how-they-assess-it-tells-different-story	Most institutions say they value teaching but how they assess it tells a different story	Emily Miller (AAU)	None	PDF	article - scholarly		x	x		commentary	Brief article on USC doing away with its student evaluations on teaching (SETs) in tenure and promotion decisions. Discusses the evidence of bias in SETs and the proposed shift at USC toward peer review as evidence of teaching effectiveness. Also mentions similar directions that UC Berkeley, University of Oregon, and others are headed.	NA
7	Dennin, M., Schultz, Z. D., Feig, A., Finkelstein, N., Greenhoot, A. F., Hildreth, M., . . . Miller, E. R. (2017, Winter). Aligning practice to policies: Changing the culture to recognize and reward teaching at research universities. CBE—Life Sciences Education, 16(5), 1-8.	CBE--Life Sciences Article	Emily Miller (AAU)	None	PDF	article - research	x	x	x		policy, framework, rubric	This essay discusses efforts to improve the quality of undergraduate STEM education through instruction. It reports on the gap between teaching policies and practices within an institution and offers strategies intended to provide guidance on how institutions can more effectively align their practices for valuing teaching with the stated priorities in their formal policies. The essay concludes with profiles of three institutional examples drawing upon strategies to assess and reward contributions to teaching.	Recent calls for improvement in undergraduate education within STEM (science, technology, engineering, and mathematics) disciplines are hampered by the methods used to evaluate teaching effectiveness. Faculty members at research universities are commonly assessed and promoted mainly on the basis of research success. To improve the quality of undergraduate teaching across all disciplines, not only STEM fields, requires creating an environment wherein continuous improvement of teaching is valued, assessed, and rewarded at various stages of a faculty member’s career. This requires consistent application of policies that reflect well-established best practices for evaluating teaching at the department, college, and university levels. Evidence shows most teaching evaluation practices do not reflect stated policies, even when the policies specifically espouse teaching as a value. Thus, alignment of practice to policy is a major barrier to establishing a culture in which teaching is valued. Situated in the context of current national efforts to improve undergraduate STEM education, including the Association of American Universities Undergraduate STEM Education Initiative, this essay discusses four guiding principles for aligning practice with stated priorities in formal policies: 1) enhancing the role of deans and chairs; 2) effectively using the hiring process; 3) improving communication; and 4) improving the understanding of teaching as a scholarly activity. In addition, three specific examples of efforts to improve the practice of evaluating teaching are presented as examples: 1) Three Bucket Model of merit review at the University of California, Irvine; (2) Evaluation of Teaching Rubric, University of Kansas; and (3) Teaching Quality Framework, University of Colorado, Boulder. These examples provide flexible criteria to holistically evaluate and improve the quality of teaching across the diverse institutions comprising modern higher education.
8	Miller, E. (2018, June 6). Matrix of Teaching Evaluation Efforts and Resources at Various Institutions. Association of American Universities.	Matrix PT 06.06.18	Emily Miller (AAU)	None	PDF	reference (teaching evaluation efforts nationwide)	x	x	x	x	shared practices	Table provides information on the teaching evaluation efforts of 23 different projects at various institutions nationwide. Lists the institution and project, the institutional level of the project, the unit name, and the evaluation strategy. Also describes the teaching evaluation effort and includes links to related resources and websites along with citing the leadership and contact information for each project.	NA
9	Commission on the Future of Undergraduate Education. (2017). The Future of Undergraduate Education, The Future of America, report from American Academy of Arts & Sciences, Cambridge MA.	The Future of Undergraduate Education	Emily Miller (AAU)	None	PDF	report	x	x	x	x	policy, best practices	This report offers recommendations to improve the undergraduate experience so that students in every program and institution receive the education to succeed in the twenty-first century. Each of the first three sections offers a comprehensive national strategy encompassing three respective recommendations: 1. Ensure that all students have high-quality educational experiences. (Teaching effectiveness and assessment is included in this first section.) 2. Increase overall completion rates and reduce inequities among different student populations at every level of undergraduate education. 3. Manage college costs and improve the affordability of undergraduate education. The fourth and final section of the report takes a more speculative approach, looking to a future through the lenses of several factors—each plausible and pertinent to the Commission’s principal goals of quality, completion, and affordability—which could move in very different directions: our country’s level of social cohesion; the characteristics of the workforce; the level of access to information and educational technologies; and unforeseen natural or human-generated global challenges. The report ends by offering priority research areas to advance the work toward a strengthened and more affordable undergraduate education for a greater share of Americans. Throughout the report, promising practices are highlighted either in green or included under a green “Promising Practice” banner.	NA
10	Career Framework for University Teaching: Examples of evidence that could be included in a promotion case for each level of teaching achievement, structured within four evidence domains	Evidence Summary Table	Emily Miller (AAU)	None	PDF	guide (teaching evaluation)	x	x	x	x	guidelines, framework	This table has four columns that correspond to different ways to measure teaching effectiveness: self-assessment, professional activities, measures of student learning, and peer review and recognition. It’s corresponding five rows represent different (progressively higher) levels of teaching: effective teacher, skilled and collegial teacher, scholarly teacher, institutional leader, and national and global leader. It basically gives recommendations of four types of evidence for teaching effectiveness that could be used to assess teaching achievement at progressively higher levels of competence.	NA
11	Association of American Universities. (2017). Progress toward achieving systemic change: A Five-Year Status Report on the AAU Undergraduate STEM Education Initiative. Washington, D.C.	AAU Report Common Data Sections	Emily Miller (AAU)	None	PDF	report	x			x	best practices, framework	Report highlights institutional progress in improving the quality of undergraduate STEM teaching and learning resulting from the Undergraduate STEM Education Initiative. The report provides detailed analysis of STEM educational reforms at eight seed-funded AAU STEM project sites. Also, the report identifies key factors necessary to achieve systemic improvements in STEM instruction and highlights numerous evidence-based educational reforms implemented at AAU universities. This PDF document includes only excerpts—i.e., the common data and other sections—from report. To document cross-institutional effects, AAU collected data from all project sites, including a survey of instructors in participating departments; department chair narratives on policy and practice to assess teaching in the promotion and tenure process; and campus and department level assessments of learning spaces. AAU also collected data on campus infrastructure, including learning spaces, from project sites. The PDF further includes data on institutional commitment to align faculty rewards to evidence-based teaching practices, and it ends with a case study on the Teaching Quality Framework (UC Boulder).	NA
12	Miller, E. R., Fairweather, J. S., Slakey, L., Smith, T., & King, T. (2017) Catalyzing Institutional Transformation: Insights from the AAU STEM Initiative, Change: The Magazine of Higher Learning, 49(5), 36-45.	Catalyzing institutional transformation: Insights from the AAU STEM Initiative	Emily Miller (AAU)	None	PDF	article - scholarly				x	best practices, overview	This article reports on the AAU’s STEM Education Initiative, an effort to improve the instructional quality and effectiveness of undergraduate introductory STEM courses, primarily through sustainable implementation of evidence-based methods of instruction. The focus of the initiative is to influence the culture of STEM departments at AAU universities so that faculty members are encouraged to use teaching practices proven by research to be effective in engaging students in STEM education and helping them learn. The article gives recommendations for successful institutionalization of UG STEM education reforms. It identifies the implementation, dissemination, and institutionalization strategies from eight pilot projects and from a growing network of AAU research universities committed to improving undergraduate STEM teaching and learning.	NA
13	Wieman, C. (2015) A Better Way to Evaluate Undergraduate Teaching, Change: The Magazine of Higher Learning, 47(1), 6-15.	A Better Way to Evaluate Undergraduate Teaching	Emily Miller (AAU)	None	PDF	article - scholarly	x			x	critique, instrument, rubric	This paper examines the context in which teaching evaluation is done at research universities. It then considers the requirements for methods of evaluating teaching quality in higher education (i.e., validity, meaningful comparisons, fairness, practicality, and improvement) and how well current evaluation methods (i.e., student course evaluations, teaching portfolios, and direct measures of student learning) meet those requirements. Finally, it propose a new method based on the notion that the teaching methods used by an instructor are a more accurate proxy for teaching effectiveness than anything else that is practical to measure—a concept that has emerged from the results of STEM education research. This new method assumes that one can evaluate the quality of teaching by looking only at practices used by the teacher. It employs a Teaching Practices Inventory. The remainder of the article explores this inventory, including its main categories, a rubric used to score it, and the inventory’s benefits, limitations, and concerns.	NA
14	Dennin, M., Schultz, Z. D., Feig, A., Finkelstein, N., Greenhoot, A. F., Hildreth, M., . . . Miller, E. R. (2017, Winter). Aligning practice to policies: Changing the culture to recognize and reward teaching at research universities. CBE—Life Sciences Education, 16(5), 1-8.	Aligning practice to policies: Changing the culture to recognize and reward teaching at research universities.	Emily Miller (AAU)	None	PDF	article - research	x	x	x		policy, framework, rubric	This essay discusses efforts to improve the quality of undergraduate STEM education through instruction. It reports on the gap between teaching policies and practices within an institution and offers strategies intended to provide guidance on how institutions can more effectively align their practices for valuing teaching with the stated priorities in their formal policies. The essay concludes with profiles of three institutional examples drawing upon strategies to assess and reward contributions to teaching. Also included in this document are materials supplemental to the article. First is a policy document: Appointment and Promotion – Review Appraisal and Committees, which includes policies, instructions concerning appointment to various professorships and lecture positions. It ends with a section on authority. Second is AAU Undergraduate STEM Education Initiative Project Site Baseline Data Summary Report (Dec 11, 2014). It includes the following: a summary report on the AAU STEM Initiative Baseline Instructor Survey, a summary report on campus infrastructure, a summer report on evaluation of teaching, a final AAU baseline data collection, baseline data requirements by campus role/checklist, and a survey for instructors in departments participating in AAU initiative.	Recent calls for improvement in undergraduate education within STEM (science, technology, engineering, and mathematics) disciplines are hampered by the methods used to evaluate teaching effectiveness. Faculty members at research universities are commonly assessed and promoted mainly on the basis of research success. To improve the quality of undergraduate teaching across all disciplines, not only STEM fields, requires creating an environment wherein continuous improvement of teaching is valued, assessed, and rewarded at various stages of a faculty member’s career. This requires consistent application of policies that reflect well-established best practices for evaluating teaching at the department, college, and university levels. Evidence shows most teaching evaluation practices do not reflect stated policies, even when the policies specifically espouse teaching as a value. Thus, alignment of practice to policy is a major barrier to establishing a culture in which teaching is valued. Situated in the context of current national efforts to improve undergraduate STEM education, including the Association of American Universities Undergraduate STEM Education Initiative, this essay discusses four guiding principles for aligning practice with stated priorities in formal policies: 1) enhancing the role of deans and chairs; 2) effectively using the hiring process; 3) improving communication; and 4) improving the understanding of teaching as a scholarly activity. In addition, three specific examples of efforts to improve the practice of evaluating teaching are presented as examples: 1) Three Bucket Model of merit review at the University of California, Irvine; (2) Evaluation of Teaching Rubric, University of Kansas; and (3) Teaching Quality Framework, University of Colorado, Boulder. These examples provide flexible criteria to holistically evaluate and improve the quality of teaching across the diverse institutions comprising modern higher education.
15	NA	3. COPUS with visualization 2016	Sierra Dawson (Univ. Oregon)	Peer Review of Teaching Documents	XLSX	guide (classroom observation)		x			protocol/rubric/template	Excel document with six worksheets: Introduction, key to observation codes, COPUS data entry, qualitative questions, activities across time, and percent of activities graphs. Introduction: References, information for using the COPUS. Key to observation codes: A list and descriptions of all the possible activity codes for the instructor and students. COPUS data entry: The worksheet where you record the activities occurring in each two-minute interval in the class session. Enter a "1" in the cell to indicate the activity is occurring. Qualitative questions: Some questions you may find helpful in considering the effectiveness of the class session. These questions are not part of the official COPUS protocol; they were added at the University of Oregon and are optional. Activities across time: This worksheet contains visualizations of which activities occurred when in the class. These visualizations are useful in depicting the flow of the class session. Percent of activities graphs: The bars in these graphs are calculated by finding the total number of times a given activity occurred during the class period and dividing by the total number of times all activities (student or instructor, as appropriate) occurred during the class period. Percent of time intervals graphs: The bars in these graphs are calculated by finding the total number of times a give activity occurred during the class period and dividing by the total number of 2-minute time intervals in the class period.	NA
16	NA	2. HPHY Peer Review of Teaching Policy	Sierra Dawson (Univ. Oregon)	Peer Review of Teaching Documents	DOCX	policy guide		x			policy	For the Department of Human Physiology at University of Oregon, this brief guide gives the policy and procedures for peer review of teaching. It details the purposes, the varying frequencies of review for different levels of instructor, the step-by-step procedures (including dues dates for reporting and estimates of hours required by reviewer). The document ends with an overview of a template for the written report (that culminates from each peer review) and with a short list of references.	NA
17	NA	1. HPHY Peer Review of Teaching Report Template (1)	Sierra Dawson (Univ. Oregon)	Peer Review of Teaching Documents	DOCX	report template		x			template	For the Department of Human Physiology at University of Oregon, this report template offers a model for faculty conducting peer review to write up the results of those reviews in an organized, standard format. It is split into three sections and concludes with references and an appendice. The first section is an overview. The second section details data collected. And the third section allows the reviewers to provide recommendations to the individual being evaluated. Appendix A holds a list of all evidence-based teaching practices from the Teaching Practices Inventory (TPI). Appendix B shows results broken down in three charts.	NA
18	NA	Evaluation Schedule 2017.18	Sierra Dawson (Univ. Oregon)	Peer Review of Teaching Documents	XLSX	spreadsheet (schedule)		x			schedule	Excel document with two worksheets: Frequency by Rank and Review Team. Frequency by Rank gives the frequency of peer reviews to be conducted broken down by instructor rank or classification. It further has the evaluation schedule for specific instructors, including the course name, the name of the reviewer, and the quarter and year reviewed (for 2017-18). The Review Team tab lists individuals on the peer review team for 2015-16 and 2016-17.	NA
19	Wieman, C., and Gilbert, S. (2014, Fall). The teaching practices inventory: A new tool for characterizing college and university teaching in mathematics and science. CBE—Life Sciences Education, 13(3), 552-569.	Wieman et al - Teaching Practices Inventory	Sierra Dawson (Univ. Oregon)	None	PDF	article - research	x				inventory, rubric	Article presents an inventory of teaching practices. The inventory provides a detailed picture of the broad range of practices involved in teaching a STEM course. Information from the inventory also may help to determine the extent of use of research-based practices. To facilitate that determination is a scoring rubric that extracts a numerical score reflecting the extent of use of research-based practices. Use of the inventory helps instructors evaluate their teaching, see how it might be improved, and track improvement. Explored in the article are inventory development and validation, accuracy of responses, scoring rubric, and results from typical courses and departments. The authors also discuss implications of their work—e.g., to guide improvements for teaching, to quantify research-based practices for merit review or promotion, to see when instructors are at odds with departmental or institutional norms. Further areas of exploration include validating the inventory for use in other disciplines, and using measure to capture the quality in addition to the quantity of various teaching practices.	We have created an inventory to characterize the teaching practices used in science and mathematics courses. This inventory can aid instructors and departments in reflecting on their teaching. It has been tested with several hundred university instructors and courses from mathematics and four science disciplines. Most instructors complete the inventory in 10 min or less, and the results allow meaningful comparisons of the teaching used for the different courses and instructors within a department and across different departments. We also show how the inventory results can be used to gauge the extent of use of research-based teaching practices, and we illustrate this with the inventory results for five departments. These results show the high degree of discrimination provided by the inventory, as well as its effectiveness in tracking the increase in the use of research-based teaching practices.
20	Smith, M. K., Jones, F. H. M., Gilbert, S. L., & Wieman, C. E. (2013, Winter). The classroom observation protocol for undergraduate STEM (COPUS): A new instrument to characterize university STEM classroom practices. CBE—Life Sciences Education, 12(4), 618-627.	Wieman et al - COPUS Observation Protocol	Sierra Dawson (Univ. Oregon)	None	PDF	article - research		x			protocol	Responding to the need that two universities had to collect information about their STEM teaching practices as a way of supporting institutional change, the authors created the COPUS protocol. The protocol enables reviewers to document classroom behaviors in 2-minute intervals throughout the duration of the class session, does not require observers to make judgments of teaching quality, and produces clear graphical results. It is limited to 25 codes in two categories (“What the students are doing” and “What the instructor is doing”) and can reliably be used by university faculty with only 1.5 hours of training. Observers who range from STEM faculty members without a background in science education research to K–12 STEM teachers have reliably used this protocol to document instruction in undergraduate science, math, and engineering classrooms. The article covers development of protocol, training of reviewers, validity and reliability of results, and analysis of COPUS data.	Instructors and the teaching practices they employ play a critical role in improving student learning in college science, technology, engineering, and mathematics (STEM) courses. Consequently, there is increasing interest in collecting information on the range and frequency of teaching practices at department-wide and institution-wide scales. To help facilitate this process, we present a new classroom observation protocol known as the Classroom Observation Protocol for Undergraduate STEM or COPUS. This protocol allows STEM faculty, after a short 1.5-hour training period, to reliably characterize how faculty and students are spending their time in the classroom. We present the protocol, discuss how it differs from existing classroom observation protocols, and describe the process by which it was developed and validated. We also discuss how the observation data can be used to guide individual and institutional change.
21	Dawson, S. (June, 2018). Reforming the evaluation of teaching: From the department to the institutional level. In Exploring Practical Ways to Inspire and Research Teaching Effectiveness and Instructional Innovation, the symposium of the Center for Education Innovation and Learning in the Sciences at UCLA.	University of Oregon Presentation Slides	Sierra Dawson (Univ. Oregon)	None	PDF	presentation	x	x	x	x	shared practices	Presentation describes three initiatives (including process to develop) at the University of Oregon. The first is an effort in the Department of Human Physiology to enhance peer review of teaching using COPUS and TPI. In addition to describing the initiative, the presenter includes a discussion of typical problems with peer review. The second is university-level, brought about through the Senate Task Force on Teaching. The task force efforts led to a new system called Continuous Improvement and Evaluation of Teaching that ensures students, peer, and self (instructor reflection, peer review, and student surveys) all are included in evaluation of teaching. The third is a “master plan” to define, acknowledge, develop, and evaluate teaching excellence, an effort undertaken through the Teaching Engagement Program.	NA
22	NA	Midterm Student Survey Instructor Output Test Report	Sierra Dawson (Univ. Oregon)	None	PDF	output sample (survey results)			x		instrument	This is a sample of the output an instructor would receive from the recently piloted University of Oregon Midterm Student Survey. The sample includes results for both quantitative and qualitative survey items. For quantitative items, gives means, SDs, and counts in addition to graphical representations of results. For qualitative items, instructors would receive a list of students' full text responses.	NA
23	NA	Link to Midterm Student Experience Survey	Sierra Dawson (Univ. Oregon)	None	DOCX	link			x		instrument	This document includes simply a link to to the University of Oregon Midterm Student Experience Survey mocked up in Qualtrics that was piloted recently.	NA
24	Senate Task Force for Teaching Evaluation. (2018, April 20). Proposed UO Peer Review of Teaching Framework - draft.	proposed peer review framework 4-20-18	Sierra Dawson (Univ. Oregon)	None	PDF	report		x			framework	Report draft begins with an introduction and a list of current problems associated with peer review in the evaluation of teaching. It defines teaching excellence as inclusive, engaged, and research-led. It suggests that evidence of teaching excellence be collected from various sources and states that the goal of one such source, peer review, is to provide evidence and recommendations for use in both continuous course improvement and evaluation of teaching excellence. The report goes on to propose six requirements for a peer review system, suggesting that all units (at UO) create policies that outline these specific requirements. Requirement one is departmental peer reviews managed through coordinator oversight. Two is formal and evidence-based observation tools. Three is faculty self-assessment tool. Four is structured reviewer-instructor follow-up meeting. Five is template for peer review report. And six is trained peer reviewers. The body of the brief report gives detail on each of the six requirements. At the end is a weblink to examples for all of the requirements.	NA
25	Senate Task Force for Teaching Evaluation. (2018, April 20). Proposed Criteria for Evaluation of Teaching - draft.	proposed teaching evaluation framework 042018	Sierra Dawson (Univ. Oregon)	None	PDF	report	x	x	x	x	framework	Document provides a framework to evaluate teaching excellence according to specific criteria and using available data sources. It lays out the different data sources that are suited for various teaching evaluation purposes and suggests that units (across UO) adopt and use the given framework for all occasions of teaching evaluation. Four areas of teaching excellence are the focus of the report—inclusive teaching, engaged teaching, research-led teaching, and professionalism. Within each area, the report defines associated teaching behaviors and spells out the data sources that will offer the needed evaluation information. In addition, it gives the criteria according to which each area of excellence is to be measured.	NA
26	NA	Sp 18 instructor reflection 5-21-18	Sierra Dawson (Univ. Oregon)	None	PDF	survey instrument	x				survey instrument	Open-ended survey is a 10-minute reflection for instructors. It’s eight items are split into two parts. The first deals with the instructor’s impressions of what went well and what she plans to change. The second considers how the instructor’s teaching intersects with principles of teaching excellence (i.e., inclusive, engaged, and research-led teaching). As a pilot survey, it also gives respondents space to offer feedback on the survey itself.	NA
27	NA	Midterm student pilot Spring 2018	Sierra Dawson (Univ. Oregon)	None	PDF	survey instrument			x		survey instrument	The Midterm Student Experience survey document includes the items that were given to UO students in a pilot run Spring 2018. The brief survey includes closed- and open-ended items, with most of the open-ended items being follow-ups where students could explain their answers to the closed items. It sought information regarding how well students understood learning outcomes, what has been most beneficial for their learning, and what they felt could use improvement. In terms of what was most beneficial and what could use improvement, response options were inclusiveness, transparency of instructions and grading, timing of feedback, challenge of the course, quality of course materials, support, engagement in out-of-class assignments or projects, student engagement during class sessions, quality of interactions between students, instructor communication, and other. Respondents also had the space to offer feedback on the survey itself.	NA
28	NA	S18 End-of-term Student Experience Survey 5.21.18	Sierra Dawson (Univ. Oregon)	None	PDF	survey instrument			x		survey instrument	The End-of-Term Student Experience survey document includes the items to be given to UO students in a pilot run at the end of Spring 2018. The brief survey includes closed- and open-ended items, with most of the open-ended items being follow-ups where students could explain their answers to the closed items. It sought information regarding what students learned and how they changed and grew during the course, along with what has been most beneficial for their learning, and what they felt could use improvement. In terms of what was most beneficial and what could use improvement, response options were inclusiveness, transparency of instructions and grading, timing of feedback, challenge of the course, quality of course materials, support, engagement in out-of-class assignments or projects, student engagement during class sessions, quality of interactions between students, instructor communication, and other. Respondents also had the space to offer feedback on the survey itself.	NA
29	NA	2018.5.22 Senate Motion Teach Eval passed	Sierra Dawson (Univ. Oregon)	None	DOCX	proposal/motion				x	policy	Senate motion dated May 2, 2018 cites findings from the Senate-appointed task force on student/teaching evaluations. Motion states that student evaluations can be an important tool but are biased. It also states that since evaluations must be anonymous, department heads and review committees cannot used them in instructor review (as only signed reviews can be used for instructor review). Further, the task force found that peer (faculty) reviews of teaching at UO may lack effectiveness and usefulness since they are not often conducted by trained evaluators using consistent methodology. To address these issues, the motion proposes and lays out guidelines for the charge of a Continuous Improvement and Evaluation of Teaching Committee, to be established Fall 2018. Along with this committee, it proposes the adoption of a Continuous Improvement and Evaluation of Teaching System (CIETS) that starts with implementation of the Midterm Student Experience Survey and End-of-Term Instructor Reflection Survey. The motion lays the groundwork for 2019 proposals for a Peer Review Framework and Teaching Evaluation Framework as well.	NA
30	NA	APM 210 - p6-7	UCLA Policies – APM, CAP, & Psychology	None	PDF	policy manual (excerpt)				x	policy	UC Policy on Appointment and Promotion describes criteria that serve as guides for minimum standards in judging a teaching candidate at UC. It tells what to consider in judging the effectiveness of a candidates teaching and states that the committee should clearly indicate the sources of evidence on which its appraisal of teaching competence is based. The two-page excerpt goes on to describe significant types of evidence of teaching effectiveness and details what all cases for advancement and promotion normally will include.	NA
31	NA	UCLA Academic Personnel Office - Teaching Guidelines	UCLA Policies – APM, CAP, & Psychology	None	PDF	guide (teaching)				x	guidelines	Two-page excerpt on teaching from the UCLA Academic Personnel Office gives guidelines to faculty under review for appointments or promotions. It tells what the Council on Academic Personnel (CAP) relies on in its reviews of faculty and, correspondingly, describes what faculty should include in their dossiers (e.g., data on overall amount and type of teaching, student evaluations, peer reviews, mentoring of graduate and undergraduate students, personal statement). The excerpt also presents options to take where candidates are struggling with their teaching and other more general suggestions to improve teaching. The piece continues by pointing out the importance of mentorship for all teachers, those strong and those struggling. The final paragraph discusses the importance of teaching to the UCLA mission.	NA
32	NA	UCLA Psychology Dept Proposal Teacher Evaluation and Improvement 41516	Case Study: Psychology Department	None	DOCX	proposal	x	x	x	x	policy	Committee proposal from Spring 2016 seeks to change methods of teacher evaluation in the psychology department at UCLA. It starts with background and overview, describing several ways in which current teaching evaluation methods are problematic. The proposal then seeks to make two changes: to expand the sources of data considered in teaching evaluation and to shift the focus of evaluation onto assessing the faculty member's active efforts to improve his or her teaching. Proposal suggests that two required data sources for teaching evaluation should be student course evaluations and teacher participation in efforts to improve teaching (it lists six examples of such efforts). Claims in the proposal are supported by several references, provided at the end.	NA
33	NA	Univ Nebraska – Draft Teaching Rubric	Literature on Student Evaluations + Teaching Review Process	Teaching Review Process	PDF	guide (peer review)		x			rubric	The University of Nebraska, School of Biological Sciences drafted a document to help guide and standardize the peer review of teaching process. It is a one-page table that identifies five areas for peer review of teaching: student perception, cognitive processes, best practices, improvement, and mentoring. For each of these five areas or rows, there are three columns: less than expected, expected, and more than expected. The benchmarks listed under these three headings are intended to help the reviewer assess the data and observed practices of the teacher under review, giving an idea of how the teacher stands in terms of each of the five areas listed above. (The document offers no supporting references.)	NA
34	Pells, R. (2018, May 25). MIT dean: Nobel prize would give teaching prestige it needs. Times Higher Education.	MIT dean Nobel prize would give teaching the prestige it needs Times Higher Education (THE)	Literature on Student Evaluations + Teaching Review Process	Teaching Review Process	PDF	article - scholarly				x	commentary	Shigeru Miyagawa, professor of linguistics and senior associate dean for open learning at the Massachusetts Institute of Technology, told Times Higher Education that a high-profile global accolade would demonstrate that teaching, and not just research, should be key to academics’ endeavors. He talked about the endless of industry for need for new researchers and the vital role that universities play in training those researchers. He also expressed, “We are abandoning our students” insofar as teachers and universities are not spending adequate time to train teachers or to analyze the success of graduates as a result of teaching efforts.	NA
35	NA	Links to other institution recommendations	Literature on Student Evaluations + Teaching Review Process	Teaching Review Process	DOCX	link				x	guidelines	This document includes three web links that show what some other institutions recommend for reviewing teaching, in ways to reduce emphasis on student ratings. The link to Purdue does not appear to be active. The remaining two links are to Stanford and University of Michigan. The University of Michigan link leads to the Office of the Provost’s Faculty Promotion Guidelines. These guidelines include a 2019 Outline of Procedures for Faculty Promotions (effective 2018-2019) in addition to a detailed Checklist for Faculty Promotion Casebooks. The Stanford link goes to an Office of the Vice Provost for Teaching and Learning, Teaching Evaluation and Student Feedback page. On this page are presented eight key principles and practices of teaching evaluation. (E.g., “The validity of any measure of teaching effectiveness depends on how well it correlates with intended student outcomes.”)	NA
36	Uttl, B., White, C. A., Gonzalez, D. W. (2017). Meta-analysis of faculty’s teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22-42.	Uttl et al. – 2017 – Meta-analysis of faculty’s teaching effectiveness	Literature on Student Evaluations + Teaching Review Process	Peer-Reviewed Publications on Student Evaluations	PDF	article - research			x		critique	The authors point out serious flaws in three previous meta-analyses that showed positive relationships between SET ratings and student learning. Findings from these meta-analyses were artifacts of small size study effects. Re-analysis showed no significant relationship between student evaluations of teaching and student learning. The authors conducted a new meta-analysis that reached the same negative results. They concluded that previous findings were an artifact of poor methods and that students do not learn more from professors with higher ratings on SETs.	Student evaluation of teaching (SET) ratings are used to evaluate faculty's teaching effectiveness based on a widespread belief that students learn more from highly rated professors. The key evidence cited in support of this belief are meta-analyses of multisection studies showing small-to-moderate correlations between SET ratings and student achievement (e.g., Cohen, 1980, 1981; Feldman, 1989). We re-analyzed previously published meta-analyses of the multisection studies and found that their findings were an artifact of small sample sized studies and publication bias. Whereas the small sample sized studies showed large and moderate correlation, the large sample sized studies showed no or only minimal correlation between SET ratings and learning. Our up-to-date meta-analysis of all multisection studies revealed no significant correlations between the SET ratings and learning. These findings suggest that institutions focused on student learning and career success may want to abandon SET ratings as a measure of faculty's teaching effectiveness.
37	Toftness, A. R., Carpenter, S. K., Geller, J., Lauber, S., Johnson, M., Armstrong, P. I. (2018). Instructor fluency leads to higher confidence in learning, but not better learning. Metacognition and Learning, 13(1), 1-14.	Toftness et al. – 2018 – Instructor fluency leads to higher confidence in learning	Literature on Student Evaluations + Teaching Review Process	Peer-Reviewed Publications on Student Evaluations	PDF	article - research			x		critique	The researchers manipulated fluency, the perceived ease with which learning is acquired. They did so by manipulating the different lecturing styles that students were exposed to in a brief video lecture. That is, students were grouped in either a fluent or disfluent condition depending on the ease of the professor’s lecturing style. The researchers then gathered data from student instructor ratings, and they measured the predicted and actual performance on a test of students’ knowledge of the lecture material. Their findings suggest an “illusion of learning,” which they attributed to the manipulated variable, fluency. Instructors in the fluent condition were rated higher by students, produced higher confidence in student learning, but did not produce higher actual learning. These results are consistent with findings from previous studies.	Students’ judgements of their own learning often exceed their knowledge on a given topic. One source of this pervasive overconfidence is fluency, the perceived ease with which information is acquired. Though effects of fluency on metacognitive judgments have been explored by manipulating relatively simple stimuli such as font style, few studies have explored the effects of fluency on more complex forms of learning encountered in educational settings, such as learning from lectures. The present study manipulated the fluency of a 31-min videorecorded lecture, and measured its effects on both perceived and actual learning. In the fluent condition, the instructor used non-verbal gestures, voice dynamics, mobility about the space, and appropriate pauses. In the disfluent condition, the same instructor read directly from notes, hunched over a podium, rarely made eye contact, used few non-verbal gestures, spoke in monotone pitch, and took irregular and awkward pauses. Though participants rated the fluent instructor significantly higher than the disfluent instructor on measures of teaching effectiveness and estimated that they had learned more of the material, actual learning between the two groups did not differ as assessed by a memory test over the lecture contents given immediately (Experiment 1) or after a 1-day delay (Experiment 2). This counterintuitive result reveals an “illusion of learning” due to fluency in lecture-based learning, a very common form of instruction.
38	Linse, A. R. (2017). Interpreting and using student ratings data: Guidance for faculty serving as administrators and on evaluation committees. Studies in Educational Evaluation, 54, 94-106.	Linse – 2017 – Interpreting and using student rating	Literature on Student Evaluations + Teaching Review Process	Peer-Reviewed Publications on Student Evaluations	PDF	article - scholarly			x		guidelines	The purpose of this article is to make recommendations about some of the most common misuses of student ratings data in the faculty evaluation process, in a format that can be easily shared. It starts with a review of the common misconceptions of student ratings and faculty concerns about student ratings as represented in the academic press. Next, it suggests that the vast research literature on student ratings generally refutes the misconceptions, but that this literature is not widely known or accessed by faculty and administrators. The paper ends with two sections of guidance for two groups based on the challenges they face in using student ratings for evaluation: 1) administrators who must be able to accurately answer faculty questions about how their student ratings will be used and interpreted; and 2) faculty responsible for evaluating other faculty members’ dossiers. It also gives a sample format for a thematic analysis of students’ written comments.	This article is about the accurate interpretation of student ratings data and the appropriate use of that data to evaluate faculty. Its aim is to make recommendations for use and interpretation based on more than 80 years of student ratings research. As more colleges and universities use student ratings data to guide personnel decisions, it is critical that administrators and faculty evaluators have access to researchbased information about their use and interpretation. The article begins with an overview of common views and misconceptions about student ratings, followed by clarification of what student ratings are and are not. Next are two sections that provide advice for two audiences—administrators and faculty evaluators—to help them accurately, responsibly, and appropriately use and interpret student ratings data. A list of administrator questions is followed by a list of advice for faculty responsible for evaluating other faculty members’ records.
39	Kornell, N., and Hausman, H. (2016, April 25). Do the best teachers get the best ratings? Frontiers in Psychology, 7, 1-8.	Kornell and Hausman – 2016 – Do the Best Teachers Get the Best Ratings	Literature on Student Evaluations + Teaching Review Process	Peer-Reviewed Publications on Student Evaluations	PDF	article - scholarly			x		review	The main question explored in this review is whether teachers with higher student ratings engender more learning. The authors set out their ideal features of a study that measures the relationship between ratings and learning: (1) Evaluations are actual ratings obtained by a college or university (i.e., not data from a lab study). (2) Related subsequent courses are required. (3) Students are assigned to instructors randomly for the first course and subsequent courses. (4) The same (or comparable) objective measures of student knowledge are used for all instructors teaching a given course. The authors then identified and reviewed two recent studies that had these features. What they found was a negative relationship between student ratings and “deep learning,” or performance in subsequent related courses. The authors concluded that student ratings of teachers have serious limitations. They stated that student reports reflect their experiences, including whether they enjoyed the class, whether the instructor helped them appreciate the material, and whether the instructor made them more likely to take a related follow-up course. Further, they conclude that students are simply not in a position to make accurate judgments about their learning. The recommendation from the review is that student ratings should be combined with two additional sources of data in evaluating teachers. First, teachers should be mentored, assessed, rated, and taught by senior teachers. Second, where possible, steps should be taken to measure deep knowledge, i.e., students’ long-term learning after they have completed a professor’s course.	We review recent studies that asked: do college students learn relatively more from teachers whom they rate highly on student evaluation forms? Recent studies measured learning at two-time points. When learning was measured with a test at the end of the course, the teachers who got the highest ratings were the ones who contributed the most to learning. But when learning was measured as performance in subsequent related courses, the teachers who had received relatively low ratings appeared to have been most effective. We speculate about why these effects occurred: making a course difficult in productive ways may decrease ratings but enhance learning. Despite their limitations, we do not suggest abandoning student ratings, but do recommend that student evaluation scores should not be the sole basis for evaluating college teaching and they should be recognized for what they are.
40	Boring, A., Ottoboni, K., and Stark, P. B. (2016, January 7). Student evaluations of teaching (mostly) do not measure teaching effectiveness. ScienceOpen Research.	Stark et al. – 2016 – Student Evaluations of	Literature on Student Evaluations + Teaching Review Process	Peer-Reviewed Publications on Student Evaluations	PDF	article - research			x		critique	Using large datasets from a natural study in France and a controlled study in the US, the researchers founds SETs to be a better measure of students’ gender biases than of instructors’ teaching effectiveness. These biases were found to vary by courses and instructor. Further, the observed association between SET and student performance was sometimes positive, sometimes negative, and generally not statistically significant. Yet the researchers found strong significant associations between SET and grade expectation and between SET and instructor gender. Given the findings and the context-dependent variability of those findings, the researchers concluded that absent specific affirmative evidence on a given course in a given department at a given university, SETs shouldn’t be used for personnel decisions.	Student evaluations of teaching (SET) are widely used in academic personnel decisions as a measure of teaching effectiveness. We show: SET are biased against female instructors by an amount that is large and statistically significant. The bias affects how students rate even putatively objective aspects of teaching, such as how promptly assignments are graded. The bias varies by discipline and by student gender, among other things. It is not possible to adjust for the bias, because it depends on so many factors. SET are more sensitive to students’ gender bias and grade expectations than they are to teaching effectiveness. Gender biases can be large enough to cause more effective instructors to get lower SET than less effective instructors. These findings are based on nonparametric statistical tests applied to two datasets: 23,001 SET of 379 instructors by 4,423 students in six mandatory first-year courses in a fiveyear natural experiment at a French university, and 43 SET for four sections of an online course in a randomized, controlled, blind experiment at a US university.
41	Spooren, P., Brockx, B., and Mortelmans, D. (2013, December). On the validity of student evaluation of teaching: The state of the art. Review of Educational Research, 83(4), 598-642.	Spooren et al. – 2013 – On the Validity of Student Evaluation of Teaching	Literature on Student Evaluations + Teaching Review Process	Peer-Reviewed Publications on Student Evaluations	PDF	article - research			x		review (lit)	Using a meta-validity framework, the authors conduct a systematic review of the research on student evaluations of teaching. Their main findings deal with content-related validity, criterion related validity, and construct-related validity (under which fall structural validity, convergent validity, discriminant and divergent validity, outcome validity, and generalizability). From their findings on validity in these areas, the authors conclude that the utility and validity ascribed to SET should continue to be called into question. Further, since conclusive evidence has not been found yet, they suggest that such evaluations should be considered fragile, as important stakeholders (i.e., the subjects of evaluations and their educational performance) are often judged according to indicators of effective teaching (in some cases, a single indicator), the value of which continues to be contested in the research literature.	This article provides an extensive overview of the recent literature on student evaluation of teaching (SET) in higher education. The review is based on the SET meta-validation model, drawing upon research reports published in peer-reviewed journals since 2000. Through the lens of validity, we consider both the more traditional research themes in the field of SET (i.e., the dimensionality debate, the ‘bias’ question, and questionnaire design) and some recent trends in SET research, such as online SET and bias investigations into additional teacher personal characteristics. The review provides a clear idea of the state of the art with regard to research on SET, thus allowing researchers to formulate suggestions for future research. It is argued that SET remains a current yet delicate topic in higher education, as well as in education research. Many stakeholders are not convinced of the usefulness and validity of SET for both formative and summative purposes. Research on SET has thus far failed to provide clear answers to several critical questions concerning the validity of SET.
42	Clayson, D. E. (2009, April). Student evaluations of teaching: Are they related to what students learn? Journal of Marketing Education, 31(1), 16-30.	Clayson – 2009 – Student Evaluations of Teaching Are They Related	Literature on Student Evaluations + Teaching Review Process	Peer-Reviewed Publications on Student Evaluations	PDF	article - research			x		review (lit), meta-analysis	Lit review and meta-analysis reached five main findings. First, the students’ perceptions of their grades appear to be related to the evaluations they give both for the class and the instructor. Both a leniency and a reciprocity effect have been found. The research seems to suggest that these effects will be modified by a variety of influences irrespective of “learning.” Second, the research almost universally finds a negative association between rigor and learning on the SET. Students seem to associate rigor with negative instructor characteristics that override positive learning relationships. Third, the more objective the measurement process becomes, both for learning and SET, the more the learning/SET association is reduced. Although a positive relationship exists between student perceptions of learning and the evaluation, that relationship cannot be found when more objective measures of learning are utilized. Fourth, due to discipline differences the meta-analysis indicates that the academic discipline area is an important variable. Fifth, some differences in findings appear real, and not entirely artifacts of differing methodology. There is little evidence to suggest that all differences are solely situational or due to methodology. The researchers sum up their findings be stating that the learning/SET association is valid to the extent that the student’s perception of learning is valid. The literature, however, indicates that students do not always hold a realistic evaluation of their own learning	Although the student evaluation of teaching has been extensively researched, no general consensus has been reached about the validity of the process. One contentious issue has been the relationship between the evaluations and learning. If good instruction increases the amount of learning that takes place, then learning and the evaluations should be validly related to each other. A review of the literature shows that attempts to find such a nomological relationship has been complicated by practice, methodology, and interpretation. A meta-analysis of the literature shows that a small average relationship exists between learning and the evaluations but that the association is situational and not applicable to all teachers, academic disciplines, or levels of instruction. It is concluded that the more objectively learning is measured, the less likely it is to be related to the evaluations.
43	Lawrence, J. L. (2018, May-June). Student evaluations of teaching are not valid. Academe, 104(3).	Student Evaluations of Teaching are Not Valid – AAUP	Literature on Student Evaluations + Teaching Review Process	Articles that summarize studies	PDF	article - scholarly			x		commentary	Author cites studies to support his claim that a preponderance of evidence suggests average SET scores are not valid measures of teaching effectiveness. He goes over several statistical problems with SET scores, including that the response rate of student evaluations is often low and that there is no reason to assume that the response pattern of those who do not complete the surveys would be similar to the pattern of those who do complete them. Also, average SET scores in small classes will be more greatly influenced by outliers, luck, and error. He claims that more problematic are the substantive concerns. SET scores are a poor measure of teaching effectiveness. They are correlated with many variables unrelated to teaching effectiveness, including the student’s grade expectation and enjoyment of the class; the instructor’s gender, race, age, and physical attractiveness; and the weather the day the survey is completed. The conclusion he reaches, in line with other researchers, is that measuring both teaching effectiveness and learning is complex and that we cannot do so reliably and routinely without applying rigorous experimental methods to all courses. A focus on evaluating teaching requires evaluating the materials that professors create for classes and doing regular teaching observations. Further, as regards improving student performance, institutional changes are likely more effective than focusing on individual professor performance. Particularly for minority and first-generation students, small class sizes and increased opportunities to interact with the professors and peers can improve a variety of outcomes.	NA
44	Berrett, D. (2017, May 9). Students don’t always recognize good teaching, study finds. [Blog post]. Retrieved from https://www.chronicle.com/blogs/ticker/students-dont-always-recognize-good-teaching-study-finds/118274	Students Don’t Always Recognize Good Teaching, Study Finds – The Chronicle of Higher Education	Literature on Student Evaluations + Teaching Review Process	Articles that summarize studies	PDF	blog post			x		review (single study)	Blog post recaps study of nearly 340,000 students at the University of Phoenix. The study ("Measuring Up: Assessing Instructor Effectiveness in Higher Education," De Vlieger, Jacob, & Stange, 2017) looked at student scores on final exams for basic algebra course, performance in a subsequent algebra course, and evaluations of teaching. It measured high-quality instruction in terms of increasing students’ performance, as reflected by improvements in grades on the initial and subsequent courses. Researchers found that high-quality instruction didn’t necessarily predict positive feedback on student evaluations, but high marks on evaluations instead were most positively correlated with students’ course grades.	NA
45	Falkoff, M. (2018, April 25). Why we must stop relying on student ratings of teaching. The Chronicle of Higher Education. Retrieved from https://www.chronicle.com/article/Why-We-Must-Stop-Relying-on/243213	Why We Must Stop Relying on Student Ratings of Teaching – The Chronicle of Higher Education	Literature on Student Evaluations + Teaching Review Process	Articles that summarize studies	PDF	article - scholarly			x		commentary	Brief article discusses new findings that provide further evidence of gender bias in student evaluations of teaching. It further cites evidence of gender bias in SET from research dating back to the 1980s, in addition to more recent findings of racial and ethnic bias in SET. Another area the article touches on is bias in terms of student negativity toward classes they perceive as challenging. Finally, SET has also been found less reliable since most such evaluations have switched to online systems. The author urges a move to using multiple factors in assessment of teaching. She refers to studies that advise a big-picture approach, arguing for academic institutions to counter bias and help instructors succeed by taking steps to assess teaching more holistically.	NA
46	Mulhere, K. (2014, Dec 10). Study finds gender perception affects evaluations. Inside Higher Ed. Retrieved from https://www.insidehighered.com/news/2014/12/10/study-finds-gender-perception-affects-evaluations	Study finds gender perception affects evaluations – Inside Higher Ed	Literature on Student Evaluations + Teaching Review Process	Articles that summarize studies	PDF	article - scholarly			x		review (single study)	Article reviews a small pilot study of an online course at North Carolina State University ("What's in a Name: Exposing Gender Bias in Student Ratings of Teaching," MacNell, Driscoll, & Hunt, 2014). By keeping all of the teaching components for two sections equal and manipulating the identities of the male and female instructors, the researchers were able to test for gender bias in SET. They found that the same instructors received different ratings when they "switched" genders. The male instructor had lower ratings when students were told their instructor was female. The female instructor had higher ratings when students were told their instructor was male. The male identity received statistically higher scores across six of twelve variables students evaluated.	NA
47	Sprague, J. (2016, June 17). The bias in student course evaluations. Inside Higher Ed. Retrieved from https://www.insidehighered.com/advice/2016/06/17/removing-bias-student-evaluations-faculty-members-essay	Removing bias from student evaluations of faculty members (essay) – Inside Higher Ed	Literature on Student Evaluations + Teaching Review Process	Articles that summarize studies	PDF	article - scholarly			x		guidelines	Article addresses how to mitigate the damage that may result from biases in SET—biases that the research suggests are pervasive and likely to work against marginalized faculty members like people of color and women. It offers two main points of advice. One is to focus on measures within the SET that have higher levels of reliability. For example, the author recommends that instructors only use items that students can accurately answer, making sure the students have the information to provide an accurate response and focusing on items that offer a concrete behavior and time frame. Also, If the instructor is getting less favorable ratings on more global items that cannot be removed from the evaluation, and if these ratings seem to bear a link to biases, then the instructor may both call colleagues’ attention to how students’ expectations vary and point out the possibility of bias. Two is to apply sound statistical analysis. This means to report median scores alongside means and to point out any inconsistencies. More generally, the advice is to look at the degree and pattern of variation among students’ ratings of an instructor. Patterns may prove meaningful across different items in the same course or across ratings on the same item from class to class. Further, the author notes that whether the class is required and what grade the student expects to achieve both influence ratings and thus should be taken into account as a possible biasing factor in SET.	NA
48	Dennin, M., Schultz, Z. D., Feig, A., Finkelstein, N., Greenhoot, A. F., Hildreth, M., . . . Miller, E. R. (2017, Winter). Aligning practice to policies: Changing the culture to recognize and reward teaching at research universities. CBE—Life Sciences Education, 16(5), 1-8.	Miller et al 2017 Aligning Practice to Policy CBE – Life Sciences Ed	Literature on Student Evaluations + Teaching Review Process	Articles that summarize studies	PDF	article - research	x	x	x		policy, framework, rubric	This essay discusses efforts to improve the quality of undergraduate STEM education through instruction. It reports on the gap between teaching policies and practices within an institution and offers strategies intended to provide guidance on how institutions can more effectively align their practices for valuing teaching with the stated priorities in their formal policies. The essay concludes with profiles of three institutional examples drawing upon strategies to assess and reward contributions to teaching.	Recent calls for improvement in undergraduate education within STEM (science, technology, engineering, and mathematics) disciplines are hampered by the methods used to evaluate teaching effectiveness. Faculty members at research universities are commonly assessed and promoted mainly on the basis of research success. To improve the quality of undergraduate teaching across all disciplines, not only STEM fields, requires creating an environment wherein continuous improvement of teaching is valued, assessed, and rewarded at various stages of a faculty member’s career. This requires consistent application of policies that reflect well-established best practices for evaluating teaching at the department, college, and university levels. Evidence shows most teaching evaluation practices do not reflect stated policies, even when the policies specifically espouse teaching as a value. Thus, alignment of practice to policy is a major barrier to establishing a culture in which teaching is valued. Situated in the context of current national efforts to improve undergraduate STEM education, including the Association of American Universities Undergraduate STEM Education Initiative, this essay discusses four guiding principles for aligning practice with stated priorities in formal policies: 1) enhancing the role of deans and chairs; 2) effectively using the hiring process; 3) improving communication; and 4) improving the understanding of teaching as a scholarly activity. In addition, three specific examples of efforts to improve the practice of evaluating teaching are presented as examples: 1) Three Bucket Model of merit review at the University of California, Irvine; (2) Evaluation of Teaching Rubric, University of Kansas; and (3) Teaching Quality Framework, University of Colorado, Boulder. These examples provide flexible criteria to holistically evaluate and improve the quality of teaching across the diverse institutions comprising modern higher education.
49	Flaherty, C. (2016, January 11). Bias against female instructors: New analysis offers more evidence against the reliability of student evaluations of teaching, at least for their use in personnel decisions. Inside Higher Ed. Retrieved from https://www.insidehighered.com/news/2016/01/11/new-analysis-offers-more-evidence-against-student-evaluations-teaching	New analysis offers more evidence against student evaluations of teaching – Inside Higher Ed	Literature on Student Evaluations + Teaching Review Process	Articles that summarize studies	PDF	article - scholarly			x		review (single study)	Article presents a review of the study “Student Evaluations of Teaching (Mostly) Do Not Measure Teaching Effectiveness” (Boring, Ottoboni, & Stark, 2016), which is also included as a resource in the current document.	NA
50	Flaherty, C. (2018, May 22). Teaching Eval Shake-Up. Inside Higher Ed. Retrieved from https://www.insidehighered.com/news/2018/05/22/most-institutions-say-they-value-teaching-how-they-assess-it-tells-different-story	Most institutions say they value teaching but how they assess it tells a different story – Inside Higher Ed	Literature on Student Evaluations + Teaching Review Process	Articles that summarize studies	PDF	article - scholarly		x	x		commentary	Brief article on USC doing away with its student evaluations on teaching (SETs) in tenure and promotion decisions. Discusses the evidence of bias in SETs and the proposed shift at USC toward peer review as evidence of teaching effectiveness. Also mentions similar directions that UC Berkeley, University of Oregon, and others are headed.	NA
51	NA	List of References on Student Evaluations	Literature on Student Evaluations + Teaching Review Process	None	PDF	reference list			x		reference	Four-page list provides dozens of references related to student (and other) evaluations of college teaching	NA
52	UCI Division of Teaching Excellence and Innovation. (nd). Second piece of evidence of teaching effectiveness. Retrieved from http://dtei.uci.edu/2nd-piece-of-evidence/	Second Piece of Evidence of Teaching Effectiveness – Division of Teaching Excellence and Innovation	Diane O'Dowd (UC Irvine)	None	PDF	guide (for UCI faculty under review)	x	x			guidelines	Webpage starts with text on the requirements for the teaching review process at UCI, which emphasize that more than one kind of evidence shall accompany each review file. It then gives three options for the second piece of evidence of teaching effectiveness to be included in the instructor’s review. These three, briefly described, are a reflective teaching statement, a peer evaluation, and other evidence. Along with the description is a link to more detailed information for each of the three options.	NA
53	NA	UCI – Second Piece of Evidence of Teaching	Diane O'Dowd (UC Irvine)	None	DOCX	personal communication	x	x			guidelines	This document seems to be a memo or other sort of personal communication from Diane K. O’Dowd, Vice Provost, Academic Personnel, HHMI Professor of Developmental and Cell Biology, UC Irvine. It includes a screenshot of the UCI webpage regarding the second piece of evidence of teaching effectiveness, which is a requirement for the teaching review process at UCI. The webpage starts with text on the requirements for the teaching review process at UCI, which emphasize that more than one kind of evidence shall accompany each review file. It then gives three options for the second piece of evidence of teaching effectiveness to be included in the instructor’s review. These three, briefly described, are a reflective teaching statement, a peer evaluation, and other evidence. Along with the description is a link to more detailed information for each of the three options.	NA
54	NA	2018 Teaching Symposium Participant Ideas	None	None	PDF	proposals (informal notes)	x	x	x	x	guidelines (proposed or suggested)	From the CEILS UCLA Symposium: Exploring Practical Ways to Inspire and Reward Teaching Effectiveness and Instructional Innovation, this is a two-page list of participant suggestions for how UCLA can improve the process of evaluating teaching to incentivize more effective teaching. It has suggestions specific to different audiences: individual faculty, groups of faculty, merit and promotion committee, chairs, deans, academic senate. CAP, and OID or other institutional centers.	NA
55	NA	2018 Teaching Symposium Dean’s Panel & Closing Remarks	None	None	PDF	notes (dean's panel, UCLA teaching symposium)				x	best practices, action items	Page of notes is taken from the Dean’s Panel and Closing Remarks at the CEILS UCLA Symposium: Exploring Practical Ways to Inspire and Reward Teaching Effectiveness and Instructional Innovation. The notes start with a list of things the deans are doing to inspire and reward effective teaching. This list of best practices cites by name the specific deans (from physical sciences, graduate division, social sciences, humanities, life sciences, and undergraduate education) and what they are doing to advance effective teaching. The document then presents action ideas for the deans of social sciences, graduate division, and humanities. Following the action items are four questions that were presented by the audience to the deans. The last section has four closing remarks and large-scale action items (i.e., have teaching be inclusive, engaged, and research-led; develop standards for a teaching dossier; announce teaching dossier requirement to faculty; provide training for faculty and incentivize faculty to participate).	NA
56