Do great minds think alike? How a new framework is uncovering the strengths and weaknesses of human and AI question answering.
đź“„ Full Paper
đź’¬ Ask
AI seems to be acing tests that many humans struggle with. However, new research is demonstrating that these claims may be premature and that AI and humans excel at different types of questions.
A new framework, CAIMIRA, is enabling researchers to quantitatively assess the strengths and weaknesses of both AI and humans in question-answering tasks. By analyzing over 300,000 responses from 70 AI systems and 155 humans across thousands of quiz questions, CAIMIRA identified distinct proficiency patterns for humans and AI systems.
The research, published in a new paper by researchers at the University of Maryland and Microsoft Research, shows that humans excel at knowledge-grounded abductive and conceptual reasoning, while state-of-the-art LLMs like GPT-4-TURBO and LLAMA-3-70B show superior performance on targeted information retrieval and fact-based reasoning, particularly when information gaps are well-defined and addressable through pattern matching or data retrieval.
For example, humans tended to outperform AI systems on questions that required complex reasoning, such as interpreting ambiguous narratives or making connections between seemingly unrelated pieces of information. AI systems, on the other hand, performed better on questions where the answer could be found by retrieving specific information from a database or text.
“These findings highlight the need for future question-answering tasks to focus on questions that challenge not only higher-order reasoning and scientific thinking, but also demand nuanced linguistic interpretation and cross-contextual knowledge application,” say the researchers. “This will help advance AI developments that better emulate or complement human cognitive abilities in real-world problem-solving.”
The study suggests that AI and humans can work together to solve complex problems. By understanding each other’s strengths and weaknesses, researchers can develop AI systems that better complement human abilities.
CAIMIRA represents a significant advancement in the field of question-answering, and its ability to quantify and compare human and AI performance will be a valuable tool for future research.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.