Part IV of The Current State of Assessing Historical Thinking: Exemplar Article of Measuring Historical Thinking

Exemplar Article #1

McGrew, Sarah, Joel Breakstone, Teresa Ortega, Mark Smith, and Sam Wineburg. (2018). CanStudents Evaluate Online Sources? Learning from Assessments of Civic Online Reasoning. Theory & Research in Social Education, 46(2), 165-193.

Sarah McGrew and her team of Stanford researchers created short assessments tasks that measured students’ ability to search for, evaluate, and verify online information. McGrew and her research team are affiliated with the Stanford History Education Group (SHEG), which has developed online assessment tools that measure students’ historical thinking ability. McGrew created assessment tasks for her research “Can Students Evaluate Online Sources? Learning from Assessments of Civic Online Reasoning” (2018), which were similar to SHEG created lesson plans. These assessment tasks can be seen at Although the study focuses on civic reasoning, many of the inquiries are similar to historical thinking.

McGrew created fifteen assessment tasks which focused on three constructs; “Who is behind the information? What is the evidence? What do other sources say?” Respectively, these questions could be categorized under the inquires sourcing, critical thinking, and corroboration.

405 middle school students, 348 high school students, and 141 college students from twelve different states comprised the participant pool, and researchers collected 2,616 responses from this group. Once given the analysis sheets and online sources, students struggled to successfully evaluate online claims, sources and evidence. McGrew believed curriculum materials must be better in order to support students “civic online reasoning competencies” (McGrew, 2018, 165-166). In the case of the McGrew study, “better” seemed to mean “just need to exist.”

For more information on the breakdown of sourcing, critical thinking, and corroboration visit the History Forge page Skills Based Grading and Grading for Mastery.

There are videos on History Forge’s Youtube that explain how to examine sources using inquiry methods.

The McGrew study is helpful in understanding how to measure historical thinking skills because it provides a measuring framework for assessing online civic skills. McGrew measured civic skills in a similar way to how SHEG researchers did historical thinking skills. For example, McGrew used the exact same categories, such as “sourcing” that SHEG used in their Beyond the Bubble history assessments. Using the same types of skills in both civics and history is beneficial because it means common measurements can be used across social studies disciplines. Having common assessment tools benefits curriculum like the C3 Framework because it demonstrates how it is possible to vertically align assessment across social studies disciplines.

A lot of news media, especially online, is only meant to fan the flames of discontent. McGrew wants to give students the skills to identify these inflammatory news sources.

Along with the benefit of vertical alignment, assessments that measure skills are necessary because McGrew identified commonly held misinterpretations of students, such as always trusting “news” sources, even when they are clearly biased (2018, 193). These types of misinterpretations exist across social studies disciplines. If teachers want to ensure their assessments are going to improve cognitive processes, then they must collaborate in how they teach and assess skills. This collaboration is imperative as students only have an average of one year of civics and will take history sporadically throughout their secondary education. SHEG and McGrew are developing reasoning and thinking skills, which could be the uniting force to vertically align social studies departments.

Despite the potential of these skills, pedagogical methods like scaffolding must be improved and modeled before teachers can truly make critical thinking a standard practice.

Exemplar #2

Reich, G. A. (2009) Testing historical knowledge: Standards, multiple-choice questions and student reasoning. Theory & Research in Social Education, 37(3), 325-360.

Gabriel A. Reich has a high school history teacher background, where he grew increasingly frustrated with standardized testing over historical knowledge. Reich is also a historian, who focuses on how Americans, especially the young, learn about the Civil War and how myths form a significant part of our historical consciousness. Due to his background and frustrations, it is not a surprise that Reich examined how high school students choose answers on a set of multiple-choice questions. Reich wanted to know if students used historical reasoning when they select A, B, C, or D.

In his research, he focused on a class of urban 10th grade students who had to take a high-stakes exam at the end of the year in order to earn a social studies credit which was needed to pass high school.

How much learning do students actually show when they are constantly tested in a summative way?

Reich used the questions from New York State’s Global History and Geography Regents Exam, which is a required test in order to complete high school.

Based on students’ answers, and interviews with students afterward, Reich determined that students were using test-wise thinking skills to select correct answers. For example, in many of the interviews, the researchers found that students would eliminate answers because they used a similar answer on a different question, or the student knew a certain name did not fit the era they were studying. In these situations of answering test questions, historical thinking and factual recall played little part in how students chose correct answers. Reich did not see students using skills like sourcing, corroboration, continuity, or contextualization when they answered their multiple choice questions.

Based on Reich’s research, multiple choice tests do not seem to work, and even worse, they do not even accurately measure students’ knowledge of history. The researchers developed useful categorizations for the skills that students used to select the correct answer.

Improving literacy is important, but its not the same as improving historical reasoning. These two should not be confused as the same and efforts must be taken to measure both.

 The skills were “test-wiseness,” “literacy,” and “domains-history content.” If the question is “how do we measure historical thinking skills,” then it is beneficial to know what are the skills, what does not qualify as a skill, and does anything disrupt the measurement. If students are using skills outside of historical thinking to answer questions, then they must be identified and managed in a way where they do not interfere with assessment.

Reich’s study should be considered a warning sign before using multiple choice exams that are the same for all students. Not only does his research show that multiple choice tests do not assess historical thinking, but they also do not accurately measure a student’s ability to recall content information. Standardized multiple choice assessments being ineffective is not a new revelation, but it’s important to state because millions of students are subjugated to these exams each year.

This is especially disturbing in social studies courses because most states do not require standardized assessments of historical thinking or knowledge of history, but many social studies teachers still rely on summative assessments. It appears that testing culture is bad enough where teachers who do not even need to test will still do so.

For more information on collecting data for historical thinking, click on the article Data Collection: Teaching Social Studies Skills and Historical Thinking

 If history assessments are going to improve, they need to become something more than another standardized test. Greater emphasis on formative assessments and students gaining mastery over skills must be established. There is hope, as McGrew demonstrated how cognitive processes can be measured. Along with demonstrating how to measure historical thinking, both the McGrew and Reich studies give examples of what data can be pulled from social studies assessments.

Coming Up Next

The next part will focus on the second question of this article series; “What data can be pulled from research in inquiry and historical thinking? To read the next part click on Part V of The Current State of Assessing Historical Thinking: What Data Can Be Pulled From Research in Inquiry and Historical Thinking


Attewell, Paul and David Lavin (2011). “The Other 75%: College Education Beyond the Elite.” In E. Lageman’s and H. Lewis’s (Eds.) What is College For? The Public Purpose of Higher Education. New York: Teachers College Press.

Corliss, Stephanie B., and Marcia C. Linn (2011). Assessing Learning From Inquiry Science Instruction. In D. Robinson and G. Schraw (Eds). Assessment of Higher Order Thinking Skills, 219-243.

Hicks, David, and Peter E. Doolittle (2008). Fostering Analysis in Historical Inquiry Through Multimedia Embedded Scaffolding. Theory and Research in Social Education, 36(3), 206-232.

Lee, John (unpublished chapter). Assessing Inquiry.

Lee, P., & Ashby, R. (2000). Progression in Historical Understanding Among Student Ages 7-14. In P.N. Stearns, P. Seixas, and S. Wineburg (Eds.), Knowing Teaching & Learning History: National and International Perspectives, 45-94. New York: New York University Press.

Levesque, Stephane and Penney Clark (2018). Historical Thinking: Definitions and Educational Applications. In S. Metzger’s and L. Harris’s (Eds.) The Wiley Handbook of History Teaching and Learning, 119-148. Hoboken, NJ: John Wiley & Sons.

McGrew, Sarah, Joel Breakstone, Teresa Ortega, Mark Smith, and Sam Wineburg. (2018). Can Students Evaluate Online Sources? Learning from Assessments of Civic Online Reasoning. Theory & Research in Social Education, 46(2), 165-193.

National Council for the Social Studies (NCSS). The College, Career, and Civic Life (C3) Framework for Social Studies State Standards: Guidance for Enhancing the Rigor of K-12 Civics, Economics, Geography, and History (2013). Silver Spring, Maryland: NCSS.

Reich, G. A. (2009) Testing historical knowledge: Standards, multiple-choice questions and student reasoning. Theory & Research in Social Education, 37(3), 325-360.

Renn, Kristen and Robert Reason. “Characteristics of College Students in The United States”. In Renn’s and R. Reason’s College Students in the United States: Characteristics, Experiences, and Outcomes, 3-27. San Francisco, CA: Jossey-Bass.

Shemilt, Denis (2018). Assessment of Learning in History Education: Past, Present, and Possible Futures. In S. Metzger’s and L. Harris’s (Eds.) The Wiley Handbook of History Teaching and Learning, 449-472. Hoboken, NJ: John Wiley & Sons.

Selwyn, Doug (2014). Why Inquiry? In E. Ross’s The Social Studies Curriculum, 267-288. New York: State University Press.

Vanderslight, Bruce (2014). Assessing Historical Thinking & Understanding: Innovative Designs for New Standards. New York: Routledge.

Virginia Tech. SCIM-C: Historical Inquiry. Retrieved from

Waldis, Monika, et al. (2015). Material-Based and Open-Ended Writing Tasks for Assessing Narrative Among Students. In K. Ercikan and P. Seixas (Eds.), New Directions in Assessing Historical Thinking (pp. 117-131). New York: Routledge.

Wallace, David Adams (1987). The Past as Experience: A Qualitative Assessment of National History Day, The History Teacher, 20(2), 179-242.

Wineburg, Sam. (2001). Picturing the Past. In Historical Thinking and Other Unnatural Acts: Charting the Future of Teaching the Past (pp. 113-136). Philadelphia: Temple University Press.

Leave a Comment

Your email address will not be published. Required fields are marked *