Dropdown items
My Academies

Personal Library

Account settings

Promising Practices and Innovative Programs in the Responsible Conduct of Research: Proceedings of a Workshop (2023)

Chapter: 4 Designing Assessments and Measuring Outcomes

Visit NAP.edu/10766 to get more information about this book, to buy it in print, or to download it as a free PDF.

Previous chapter Next chapter
Page of 56
Search this publication

Previous Chapter: 3 Reproducibility and Data Reuse

Page 15 Cite Bookmark

Suggested Citation: "4 Designing Assessments and Measuring Outcomes." National Academies of Sciences, Engineering, and Medicine. 2023. Promising Practices and Innovative Programs in the Responsible Conduct of Research: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/27085.

4
Designing Assessments and Measuring Outcomes

To provide context for the presentation and discussion on designing assessments and measuring outcomes of RCR programs, Kalichman posed a number of questions that he wanted the workshop attendees to ponder:

How can we take the opportunity presented by this workshop to promote positive change?
What are the goals for training around RCR? Are they to convey knowledge, focus on skills, change attitudes and perspectives about RCR, improve communication among collaborators regarding RCR, or change behavior?
What are the interventions that can achieve those goals, and how will they be delivered?
Have you achieved those goals after delivering those interventions?

Regarding the last question, Kalichman commented that it is absurd that institutions of higher education are not testing whether what are sometimes costly interventions to determine whether they are accomplishing the goals set for them.

Tristan McIntosh (Washington University at St. Louis School of Medicine) and Dena Plemmons (University of California, Riverside) made a joint presentation. McIntosh noted four goals of evaluation for RCR training:

Determining whether students are learning the intended lessons. “You do not want to inadvertently lead your trainees to walk away from the training with the wrong message,” said McIntosh, or be over- or underconfident about their knowledge and skills related to navigating ethical issues.
Identifying areas for improvement in its design or content. “I encourage you to not assume that your program is perfect, but rather adopt a mindset of continuous improvement,” said McIntosh.
Assessing how the changes made to a program affected student learning or other outcomes. “There are several changes that you might make to your training program over time,” said McIntosh, “You might decide to cover new or additional RCR topics, you might incorporate new active learning exercises, or you might make changes to your instructional approach. And assessing your training program after these changes are implemented can help ensure ongoing effectiveness.”
Aiding in determining the minimum threshold for proficiency. Setting a proficiency minimum can identify those trainees who might benefit from a bit more training to achieve a base level of knowledge about specific topics, such as federal regulations.

Page 16 Cite Bookmark

DESIGNING AN EVALUATION

The key for any evaluation is to first identify the program’s learning goals and objectives. This allows program directors to link objectives to the type of assessment that will best inform their evaluation. Learning objectives, said McIntosh, should be stated using action verbs such as describe, identify, or apply, and they can pertain to knowledge, skills, attitudes, or behavior.

Plemmons then discussed what to assess in an RCR program in those four areas:

Knowledge – did trainees learn the necessary facts and concepts that provide the basis for thinking through ethical problems they might face in their research environments?
Skills – did trainees master the decision-making or problem-solving skills for navigating ethical problems?
Attitude – do trainees display the desired attitudes toward ethical challenges, such as how they feel about whistleblowing or what constitutes plagiarism?
Behavior – do trainees identify research misconduct, take corrective actions, and have conversations about research ethics outside of class?

Plemmons noted that it “may take a long time to see the intended effects for some of these measures, especially skills, attitudes, and behaviors.”

Formative assessments can help manage an ongoing training program, particularly when introducing new topics, new instructional approaches, or new instructors, while summative assessments can help with planning for the next iteration of a program by providing a more complete picture of a course upon its completion. McIntosh and Plemmons stressed the importance of using a mixed methods approach to assessment using both quantitative and qualitative approaches, because they provide different types of data that together can provide a more robust picture of a program. Quantitative assessment provides data that are quick to analyze using statistical software or even a spreadsheet program. Means, medians, and standard deviations can provide a quick snapshot of training performance on a given assessment, and they enable straightforward cohort and pre- and post-test comparisons that can signal potential training issues. The main limitation of quantitative data is that they lack the nuance, detail, and contextual information that qualitative data provide.

Qualitative data, said McIntosh, can also help interpret quantitative data and provide context and more granular information. For example, questions with Likert scale responses²⁰—this course has increased my discussions about research ethics outside the classroom, for example—might elicit a follow-up qualitative question such as, with whom did you have those discussions and about what specific issues. Qualitative data can help explain irregularities and differences across quantitative scores, and they can provide rich contextual information that help identify what resonates with trainees. They can also glean insight into diverse trainee perspectives. Finally, qualitative data can help create more valid and relevant quantitative approaches. For example, open-ended survey responses or focus group discussion results can lead to valid and relevant survey questions for a broader population.

McIntosh then listed some tools for acquiring qualitative data.

___________________

²⁰ Responses that indicate the level of agreement with a statement, typically expressed on a five point scale of (1) Strongly disagree; (2) Disagree; (3) Neither agree nor disagree; (4) Agree; (5) Strongly agree (Preedy and Watson, 2010).

Page 17 Cite Bookmark

Reflection exercises that ask trainees to draw from their personal experiences and integrate those experiences with the lessons they have been learning can be a powerful way to reinforce new concepts.
Asking students to write down the three most important take-home points from a session can identify if what they learned is congruent with a session’s desired learning goals.
Open-ended questions can identify areas of confusion on topics or concepts, can contextualize quantitative survey data, and can help formulate a subsequent quantitative assessment measure.
Focus groups or peer group discussions can generate rich data around a specific set of questions or topic.

Tools for quantitative assessments can include questions with Likert scale responses or multiple-choice questions that the trainees answer before, during, or after a session. Using polling software during class sessions can cultivate active learning while also providing a snapshot evaluation of where trainees are at that specific time in training. McIntosh stressed the importance of not simply making up questions, especially for measures of knowledge, skills, and attitudes, and said, “I encourage you to look around online and see what assessments are out there that you could use and modify to apply to your training program and take a look at what other programs have used to assess their program.”

WHAT MAKES FOR A SUCCESSFUL ASSESSMENT

When it comes to developing a successful assessment, Plemmons reiterated how important it is to first identify and clearly state a program’s learning goals and then pick assessments that align with those outcomes. The timing of an assessment is important, too, and Plemmons said a general rule of thumb is to conduct an assessment before and after delivering a course. However, if the purpose of an assessment is to determine a minimum proficiency threshold, a post-course test would be sufficient. Short assessments during the course, such as at the end of each session to reinforce a particular lesson or even during a session as an active learning exercise, can also be useful.

Plemmons reiterated the importance of using validated measures and ensuring that the assessments measure what they are supposed to measure. In fact, she suggested partnering with someone who has psychometric or measurement development skills to create an assessment that will be robust, accurate, and capture the desired constructs. The Bioethics Research Center²¹ at her institution provides free assessments that people can use to evaluate their program, and the Ethics Education Library²² has compiled a list of evaluation and assessment methods. “It pays to think strategically about your assessment approach—make sure that you are keeping in mind the time it would take for trainees to complete an assessment, for instance—and be systematic about the evaluation approach you will be taking,” said Plemmons.

Evaluation, she added, should be a regular occurrence that generates patterns of data across multiple measures. “This is where qualitative and quantitative data are nice complements to one another,” she said. Plemmons also noted that it is impossible to assess everything and that no single assessment is perfect. It is important, too, to consider whether a student’s cultural background or English proficiency will affect their performance on assessments.

___________________

²¹ https://graduate.ucr.edu/research-ethics

²² http://ethics.iit.edu/eelibrary/content/welcome-ethics-education-library

Page 18 Cite Bookmark

DISCUSSION

Following the presentations, Allison moderated a discussion session with the panelists. Allison first asked them whether it was worth doing a smaller scale evaluation when resources are not available to conduct a full-blown evaluation. Both McIntosh and Plemmons said that some evaluation is better than no evaluation because an evaluation scaled to fit the task can still generate evidence of training effectiveness. At a minimum, an evaluation measures whether trainees are learning the desired content. Plemmons said a smaller assessment could use tools such as reflection exercises, such as asking the students to write a one-minute summary of the session. McIntosh reiterated not to make up measures and to instead partnering with someone in statistics or psychology. Allison also noted the challenge of assessing behavior given the difficulty of conducting a longitudinal randomized controlled trial to see who does or does not commit misconduct.

An attendee commented that many of their colleagues are uncomfortable performing rigorous assessment and asked the panelists for advice for convincing colleagues of the importance of assessment. One argument Plemmons would make is that educators have an investment in their students learning and ensuring that they are in fact learning. Even a small-scale assessment of students’ perceptions of how much they have learned and how they value that learning can be useful. McIntosh commented that she would not use training assessments to grade students, but rather to frame them as a tool for course improvement and gaining knowledge that will be useful to the instructor.

Allison, combining several questions from the audience, asked how to adapt RCR programs, practices, and evaluations to be sensitive to diversity and cultural differences. McIntosh replied that she and a colleague have an NSF grant to look at that very issue, and that she would have an answer to that question in a couple of years. She suggested that incorporating different stories, different approaches to thinking about a problem, or bringing in speakers from different cultural backgrounds might be effective. Regarding assessment, she said to be cognizant of potential problems, such as a student taking longer to complete an assessment because English is not their first language.

When asked who the end users of RCR are and if that includes journalists, journal editors, and the general public, McIntosh replied that one RCR topic is “scientists as responsible members of society,” which addresses the public-facing element of RCR and may help to bring in a wider array of stakeholders. Plemmons added that students in her classes often choose the topic of “communication with the public about science,” and then examine how to describe science to the general public as well as the perceptions of science and scientists. In her view, the public-facing aspect of RCR is “becoming more a part of RCR courses and is something to pay attention to as a legitimate part of all of the conversations about the ethical dimensions and implications” of scientific work.

When asked if perfection is the goal of RCR training, Kalichman said perfection is not an option and suggested the goal of creating an environment where the risks of unethical behavior are reduced. In fact, he does not see RCR courses as the vehicle for reducing the risk that any particular student will do something unethical, but as a means of creating a culture and environment that strongly discourages wrongdoing. “If you have a very open and transparent research environment, it is harder to make up data,” said Kalichman. McIntosh agreed with this vision for RCR training.