Pop Quiz: AI Matches Human Performance At Developing Good Test Questions
Researchers have developed an artificial intelligence (AI) model that can generate online course assessment questions that instructors found indistinguishable from questions written by humans.
The new AI is called QUADL, and it does two things: it identifies key terms and ideas in instructional texts, and then crafts questions that focus on those terms and ideas.
“We provide QUADL with the courseware contents and the learning objectives for the curriculum, and QUADL can then develop questions that help students achieve those learning objectives,” says Noboru Matsuda, associate professor of computer science at North Carolina State University and co-author of a paper on the work.
“Humans are good at developing courses, but in interviews with instructors and courseware developers, we found that they often struggle to develop questions that are effective at assessing student progress on the learning objectives for those courses,” says Machi Shimmei, a Ph.D. student at NC State and first author of the paper. “Our study suggests QUADL can be a useful tool for instructors and course developers.”
To test QUADL’s performance, the researchers used existing online courseware called the Open Learning Initiative. The researchers recruited five instructors who use the OLI for their classes and asked them to evaluate a lengthy list of questions. Some of the questions were generated by QUADL; some were generated by the current state-of-the-art question-generating AI model (called Info-HCVAE); and some of the questions were already in use in the OLI courses. Study participants were not told where the questions came from, and were asked to assess the pedagogical value of each question.
“The pedagogical value scores given to questions generated by QUADL were essentially identical to the value scores that instructors gave to questions written by people for use in the OLI,” Shimmei says. “The questions generated by Info-HCVAE received lower scores from the instructors.”
The researchers are now planning undergraduate classroom studies that will ask instructors to use questions generated by QUADL in order to see how, if at all, questions generated by QUADL affect student learning.
“This forthcoming work should close the loop for this technology,” Matsuda says. “Hypothetically, QUADL will work. Now we have to see if it actually will work in practice.”
QUADL is part of a larger suite of AI technologies that Matsuda and his collaborators are developing called PASTEL. All of the PASTEL technologies are designed to facilitate the development of educational courseware.
“These technologies deal with everything from generating questions – which is QUADL’s role – to quality assurance functions used to assess how effective each element of the courseware is at helping students learn,” Matsuda says. “We are looking for both research partners to help us develop these generative AI technologies, and for partners who are educators interested in using these AI tools in their courses.”