Assessing Learning with Digital Badges
Digital badges are web-enabled tokens of accomplishment. They contain specific claims and detailed evidence about learning. Because they contain this information and can be readily accumulated and shared, they can work quite differently than traditional grades, certificates, and transcripts. Digital badges are becoming widely used to recognize learning in a variety of formal and informal educational settings. When the Badges for Lifelong Learning competition was launched, we were charged with the task of following the thirty projects awarded funds to build digital badge systems that encourage learning in a variety of formal and informal settings. In order to carefully analyze the different aspects of the badging systems and their development over time, we developed the Design Principles Documentation (DPD) Project and focused our questions and research around emerging practices in four categories of badging practices: recognizing, assessing, motivating, and researching learning with digital badges. We were concerned with the emerging informal knowledge that developed as projects moved from intended practices outlined in their initial proposals to enacted practices that developed as the projects matured and systems were implemented with learners. This paper focuses on the assessment strand of this research.
When designing a learning system, the choices of assessment types, functions, and practices directly impact the ways in which learners engage with content. Learning system developers must be acutely aware of how their choices impact learning if they are going to make claims about what learners can and cannot do as a result of engaging with their content and activities. This is especially relevant as digital badges enter the educational sphere and badge system developers design badges that act as credentials for learning.
At their core, badges recognize some kind of learning. However, if one wants to recognize learning and make claims that something has been learned, some form of assessment is needed. This research (a) traced intended and enacted assessment practices across the thirty projects, (b) derived ten more general principles for assessing learning with digital badges, (c) connected these principles to relevant aspects of project contexts, and (d) connected these principles to relevant external research and resources to help projects be more systematic. This paper summarizes these badge assessment design practices, principles, and resources across projects, and examines one of these principles and its enactment in two of the projects.
Our derivation of badge design principles and reflexive identification of outside literature focus directly on the ideas and insights most relevant to the badges initiative. Rather than summarizing the vast literature on assessment, the literature was reviewed and the design principles were derived in the same way evidence in a trial must be directly relevant to the case and the question at hand (Maxwell, 2006). Both the literature review and the design principles relate to the enacted practices that emerged as badge systems were implemented.
This “conceptual” (rather than “foundational”) approach is called for by the groundbreaking nature of this initiative. Teasing apart the areas of recognizing, assessing, motivating, and studying learning and addressing the tensions within and between these four areas serves to highlight the incredibly complex problems in educational reform and begin rectifying some of them. The review and design principles pragmatically review prior research and explore the value of newer paradigms based on new newer “social” theories of learning. These new theories are infused throughout the DML initiative and are embodied in the writings DML scholars and leaders like John Seely Brown (e.g. Brown et al. 1989; 2002; 2008;), Mimi Ito (Ito, et al. 2005; 2009), and Connie Yowell (Yowell & Smylie, 1999).
These social theories are being widely adopted, but many educational innovators embrace social theories of learning while using traditional practices to assess learning. This research aims to highlight the impact assessment choices have on learning so that educational innovators have a better understanding of the kinds of claims that can be made with those choices.
While carried out as design ethnography, this research was inspired by current design-based research (DBR) methods. DBR builds “local theories” through systematic iterative design in the context of implementations (Cobb, Confrey, diSessa, Lehrer, & Schauble, 2003) with particular attention to relevant aspects of particular educational contexts. To help capture the emergence of local theories and relevant aspects of context, this research reflexively documented more general and more specific ways of assessing learning in the context of digital badges. Meta-principles are very general theoretical statements that reflect how different “grand theories” of knowing and learning (i.e., behavioral, cognitive, or sociocultural) lead to different conceptualizations of the role of digital badges. General principles are the guiding principles behind badge system developers’ designs. These general principles are broad, and address the projects’ general practices for recognizing, assessing, motivating, and studying learning. Specific practices – labeled appropriate practices, explained below – reflect how the general design principles were enacted in light of specific constraints of the individual content development efforts. These practices, along with the specific features that emerged in that context are invaluable for helping similar projects appreciate how the general principles might be enacted and further refined in their own contexts.
Analyses of proposals and subsequent interviews with project stakeholders identified over 100 enacted assessment practices across the thirty projects. The thirty interviews revealed that projects needed to make substantial changes to their initial designs when they began working within the badging platforms in their particular settings. Doing so uncovered aspects of platforms and settings that impact how (and if) initial designs were enacted. These practices were organized into ten assessment design principles, which remained usefully linked back to the specific practices and features. This process was followed by a half-day workshop where stakeholders could review our characterizations of the projects’ assessment practices, refinement of the general assessment principles, as well as a recategorization of projects’ practices. These practices are documented in the DPD project database, to which all of the projects have access. These practices have been labeled appropriate practices rather than best practices, as the appropriateness of a specific practice depends on the context in which it is used. The appropriate practices contained in the ten principles for badging practice are open to the public so that new badging systems can use them to inform their decisions around assessment. This information is now being disseminated back to projects and to the broader badging community via blogs and at www.workingexamples.org where it will continue to evolve.
Because there is almost no literature on assessment for badges and a vast literature on assessment, a reflexive and recursive review of the relevant literature and other resources was needed to inform these efforts. For example, projects considering “leveled” versus “flat” badge assessment systems (described below) might consider how benchmark assessments in Philadelphia schools’ impacted reteaching of content (Bulkley, Christman, Goertz, & Lawrence, 2010; Oláh, Lawrence, & Riggan, 2010). These studies found that while the benchmark assessments gave teachers a general sense of what needed to be retaught, the incorrect answers did not provide guidance as to the error in students’ thinking. Badge systems may want to consider how their assessments will impact the formative feedback and reteaching methods that can be used when students take an assessment. The online database that houses the principles and the related literature outlines many such considerations badge system developers ought to be aware of.
Some of the most exciting evidence that emerged as part of this research was the impact that different decisions about assessment practices had on learning and how this in turn connected with the research literature. For example, S2R Medals trains youth to become sports journalists. Earners’ badges were associated with a portfolio of their work and encouraged not only the immediate learning community, but the community at large to comment on these portfolios and provide feedback. This meant that coaches, parents, and general community members were interacting with the learners and their projects, so learners received a variety of feedback from many different perspectives on their work. In contrast, the Sustainable Agriculture & Food Systems (SA&FS) program at UC Davis planned to keep portfolios initially closed to the immediate learning community in order to let students get feedback from their peers and mentors before opening them to be viewed by the public. SA&FS intended for portfolios to be assembled by students and presented as part of various formative and summative assessment processes within the school. This closed system was in place partly for the privacy and protection of students, but it also facilitated more formal interaction around the portfolios. It turns out that the advantages and disadvantages of public vs. private portfolios is an enduring strand of research in portfolio assessment (Gillespie, et al., 1996; Stiver, et al., 2011). By connecting this literature with badging practices, this research will help inform other projects while helping those projects inform broader audiences.
One approach to portfolio assessment is not inherently better than the other, but the impact system developers’ choices had on the kind of learning and revision in which the learners engaged is significant. These two projects did not know about each other’s practices. Part of the goal of this research is to connect projects and start a dialogue. When these two projects began conversing with one another, a very productive dialogue occurred, resulting in a better understanding of the kind of learning their assessments were fostering.
The general design principles for assessing learning in digital badge systems are a result of this process. They are ordered by prevalence among the thirty projects. The first principles are employed by almost all of the projects, while the last principle is used by only three. Badge system developers must consider what kind of learning they want to recognize, and how the assessment principles they use will impact the learning ecosystem.
The development of these principles has led to the need to attach them to the scholarly assessment literature. By bringing in the relevant research literature, both existing and new projects can make informed decisions about their assessment practices, and the DPD project can make recommendations of aspects to consider when employing particular principles. What follows are (a) the ten general design principles and the number of projects who use each principle, (b) the more specific principles under each design principle, (c) one example of relevant research associated with the principle, and (d) an example of a specific practice (also see Table 1).
Nearly all (twenty-nine) of the thirty projects included some kind of “leveling” system that students would move through as they practiced new skills, as opposed to a “flat” system where all badges have equal value. Sixteen projects created what we deemed “competency levels”, ten used “meta-badges”, and three formed hierarchical categories of badges. Projects using benchmark assessments to promote mastery of a specific skill would do well to learn from Bulkley et al.'s (2010) research on Philadelphia schools, finding that while the benchmark assessments revealed general categories that needed to be retaught, the assessments were not designed in such a way that a teacher could learn the mistakes in student thinking from the incorrect answers. For example, in the mastery-learning orientation of the BuzzMath project, this leveling meant that small badges for activities marked achievements that add up to larger mastery badges.
Twenty-six of the projects aligned their projects to existing standards. These standards varied from national and state standards to internal standards set by the parent organizations of the projects. Ten projects used internal standards, seven used national/state, and nine used the Common Core State Standards. Darling-Hammond (1997) discussed the need to raise standards and the system in which they are employed to support teaching and learning. This is relevant, for example, to DigitalMe and Makewaves’ Supporter 2 Reporter project because there is already a large community of teachers within the Makewaves community who are mapping the S2R curriculum to their own objectives and standards.
Sixteen projects used rubrics as an aid to score learner artifacts. Twelve projects developed rubrics for the assessment of specific artifacts, while four used general rubrics. Popham (1997) provided a succinct list of guidelines one should consider when creating and using rubrics. This relates to LevelUp’s practices with rubrics, which are competency based and generated ad hoc by individual teachers. However, the project is looking to standardize the process and pull the rubrics into a system. Their reforming of their practice could be well informed by this literature.
Fifteen projects provided varying types and amounts of formative feedback to learners. Five projects used peer feedback, three used expert feedback, and seven used a combination of the two. Schwartz & Arena (2009) make the case for choice-based assessments. Many researchers have argued that giving formative feedback enhances the learning experience (e.g. Black & Wiliam, 2009; Shepard, 2007), but Schwartz and Arena argue that the skill of knowing how to ask for formative feedback is a skill not being taught. Some projects encourage students to ask for, give, and use feedback to each other, which may help in building this skill. For example, in the Pathways for Lifelong Learning project, high school peers are also expected to provide formative assessment on blog entries online, as well as participate as panel judges for the final demonstrations and review the student demonstration with a rubric.
Twelve projects used expert judges to evaluate learner artifacts. Nine used experts who were teachers or practitioners, two used computer scoring systems, and one project used an AI tutor. Popham's (2007) chapter on validity highlights the information and practices teachers should consider to enhance the validity of the claims they make about learning. In the Design for America project, badges are validated by community mentors, so understanding validity of learning claims is particularly important. Peer feedback is given and used for refinement purposes. Badges are not awarded because of feedback given by peers, but artifacts that earn badges may be influenced by that feedback.
Eleven projects promote “soft skills” like leadership and collaboration in addition to the “hard skills” they promote. Schulz (2008) discussed the need for students to develop “soft skills” beyond academic knowledge. This is relevant to MOUSE Wins!, a project that wants "the assessment process to be as social as the learning is." There is a feedback loop in the workplace; they want learning to mirror that organic process.
Eight projects required learners to collect artifacts in a digital portfolio. One of these e-portfolio systems was open to the public, while seven were “closed,” meaning only the immediate learning community could see and comment on them. Gillespie et al. (1996) provide a review of the recent literature on portfolio assessment and address the topic of private and public portfolios. This is important to S2R Medals because “every S2R participant has their personal Reporter Page on www.makewav.es/s2r.” This serves as an e-portfolio and permits their educators, supporters, friends, family and peers to see and evaluate their work” (S2R Medals Proposal).
Seven projects used performance assessments to evaluate learners. Mehrens, Popham, & Ryan, (1998) provided six guidelines for using performance assessment, and suggested that instructors should be careful in how they prepare students for such assessments lest they compromise the assessment. Sweetwater AQUAPONS faced some of these considerations, as “the badges for each curricular area will be earned through written assessments, photo and video projects, and in-person demonstrations of proficiency” (Sweetwater AQUAPONS Proposal).
In this context we use the term “mastery learning” to mean that learners are given practice until they have mastered a single skill set, and then move to the next skill set. Six projects did this, two of whom used humans to judge “mastery” and one used only a computer. Three projects used a combination of human and computer experts to judge mastery. Duncan & Hmelo-Silver (2009) define and discuss learning progressions, and advocate focusing on a smaller set of focused skills rather than a large set of skills in a perfunctory manner. In the CS2N project, badges in activities supported by AI tutors are validated through the AI tutor and through automated online testing (through Moodle), or automated detection of in-game events (through Unity) in simlator environments. Instructor approval is used where appropriate in addition to automated tools.
Three projects involved students in the design of the physical badges, as well as in the design of the pathways it takes to earn a badge. Stefani (1994) studies student marks and grades, and their effectiveness in comparison to teacher marks. This is relevant to the Badge Constellation Design Process in Cooper-Hewitt’s Design Exchange, as they are realizing that the badges should have "personality" and personal touched added by students. The process of designing a badge reflects the process that goes into receiving a badge.
By showing projects the practices they have in common with each other, productive dialogues emerge and refinement of practices are suggested. This dialogue also makes projects explicitly aware of the impact their assessment choices have on learning, which they can then evaluate and refine as necessary. The design principles are instrumental in this dialogue, as existing projects can use them to find projects with whom they can engage, and new projects can review and ask questions to those already enacting these principles.
As education evolves more toward open and networked learning, innovations such as digital badges are becoming increasingly significant. For if one is going to use digital badges responsibly in education, they must consider the implications of their assessment practices on the learning process. By making the assessment design principles that emerged from the DML projects open to the public for use and discussion, this work is fostering important conversations about assessment design in digital credentialing and beyond.
By connecting these design principles to the scholarly literature and recommending assessment functions to consider when designing assessments, this work becomes relevant beyond creating a digital badge system; it is relevant to anyone designing assessments for educational programs. These assessment design principles offer a unique perspective on the implications of assessment design for learning, and can serve the larger audience as they design assessments within badging systems or in other contexts.
Black, P., & Wiliam, D. (2009). Developing the theory of formative assessment. Educational Assessment, Evaluation and Accountability, 21(1), 5–31.
Brown, J. S, & Duguid, P. (2002). The social life of information. Harvard Business Press.
Brown, J. S., & Adler, R. P. (2008). Minds on fire: Open education, the long tail, and learning 2.0. Educause Review, 43(1), 16–20.
Brown, John Seely, Collins, A., & Duguid, P. (1989). Situated cognition and the culture of learning. Educational Researcher, 18(1), 32–42.
Bulkley, K. E., Christman, J. B., Goertz, M. E., & Lawrence, N. R. (2010). Building With Benchmarks: The Role of the District in Philadelphia’s Benchmark Assessment System. Part of a special issue: Benchmarks for Success? Interim Assessments as a Strategy for Educational Improvement, 85(2), 186–204. doi:10.1080/01619561003685346
Cobb, P., Confrey, J., diSessa, A., Lehrer, R., & Schauble, L. (2003). Design experiments in educational research. Educational Researcher, 32(1), 9–13.
Darling-Hammond, L. (1997). Using standards and assessments to support student learning. Phi Delta Kappan, 79(3), 190.
Duncan, R. G., & Hmelo-Silver, C. E. (2009). Learning progressions: Aligning curriculum, instruction, and assessment. Journal of Research in Science Teaching, 46(6), 606–609. doi:10.1002/tea.20316
Gillespie, C. S., Ford, K. L., Gillespie, R. D., & Leavell, A. G. (1996). Portfolio assessment: Some questions, some answers, some recommendations. Journal of Adolescent & Adult Literacy, 39(6), 480–491.
Ito, M. (2005). Technologies of the childhood imagination: Yugioh, media mixes, and everyday cultural production.
Ito, M et al. (2009). Living and learning with new media: Summary of findings from the Digital Youth Project. Chicago IL: MacArthur Foundation.
Maxwell, J. A. (2006). Literature reviews of, and for, educational research: A commentary on Boote and Beile’s“ scholars before researchers.” Educational Researcher, 35(9), 28–31.
Mehrens, W. A., Popham, W. J., & Ryan, J. M. (1998). How to prepare students for perforrnance assessments. Educational Measurement: Issues and Practice, 17(1), 18–22.
Oláh, L. N., Lawrence, N. R., & Riggan, M. (2010). Learning to learn from benchmark assessment data: How teachers analyze results. Peabody Journal of Education, 85(2), 226–245.
Popham, W. J. (1997). What’s wrong-and what’s right-with rubrics. Educational Leadership, 55, 72–75.
Popham, W. James. (2007). Classroom Assessment: What Teachers Need to Know (5th ed.). Allyn & Bacon.
Schulz, B. (2008). The importance of soft skills: Education beyond academic knowledge. Retrieved from http://ir.polytechnic.edu.na/jspui/handle/10628/39
Schwartz, D. L., & Arena, D. (2009). Choice-based assessments for the digital age. MacArthur 21st Century Learning and Assessment Project. Retrieved from http://dml2011.dmlhub.net/sites/dmlcentral/files/resource_files/ChoiceSchwartzArenaAUGUST232009.pdf
Shepard, L. A. (2007). Formative assessment: Caveat emptor. In C. A. Dwyer (Ed.), The future of assessment: Shaping teaching and learning (pp. 279–303). Mahwah, NJ: Erlbaum.
Stefani, L. a. J. (1994). Peer, self and tutor assessment: Relative reliabilities. Studies in Higher Education, 19(1), 69.
Stiver, W., An, J., Millen, B., Connors, R., & Sorensen, K. (2011). Development of an Engineering Portfolio System. Proceedings of the Canadian Engineering Education Association. Retrieved from http://library.queensu.ca/ojs/index.php/PCEEA/article/download/3729/3762
Yowell, C. M., & Smylie, M. A. (1999). Self-regulation in democratic communities. The Elementary School Journal, 469–490.