TY - JOUR
T1 - What Did My AI Learn? How Data Scientists Make Sense of Model Behavior
AU - Cabrera, Ángel Alexander
AU - Ribeiro, Marco Tulio
AU - Lee, Bongshin
AU - Deline, Robert
AU - Perer, Adam
AU - Drucker, Steven M.
N1 - Publisher Copyright:
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2023/3/7
Y1 - 2023/3/7
N2 - Data scientists require rich mental models of how AI systems behave to effectively train, debug, and work with them. Despite the prevalence of AI analysis tools, there is no general theory describing how people make sense of what their models have learned. We frame this process as a form of sensemaking and derive a framework describing how data scientists develop mental models of AI behavior. To evaluate the framework, we show how existing AI analysis tools fit into this sensemaking process and use it to design AIFinnity, a system for analyzing image-and-text models. Lastly, we explored how data scientists use a tool developed with the framework through a think-aloud study with 10 data scientists tasked with using AIFinnity to pick an image captioning model. We found that AIFinnity's sensemaking workflow reflected participants' mental processes and enabled them to discover and validate diverse AI behaviors.
AB - Data scientists require rich mental models of how AI systems behave to effectively train, debug, and work with them. Despite the prevalence of AI analysis tools, there is no general theory describing how people make sense of what their models have learned. We frame this process as a form of sensemaking and derive a framework describing how data scientists develop mental models of AI behavior. To evaluate the framework, we show how existing AI analysis tools fit into this sensemaking process and use it to design AIFinnity, a system for analyzing image-and-text models. Lastly, we explored how data scientists use a tool developed with the framework through a think-aloud study with 10 data scientists tasked with using AIFinnity to pick an image captioning model. We found that AIFinnity's sensemaking workflow reflected participants' mental processes and enabled them to discover and validate diverse AI behaviors.
KW - AI
KW - Machine learning
KW - machine behavior
KW - machine learning testing
KW - sensemaking
KW - visualization
UR - http://www.scopus.com/inward/record.url?scp=85152636624&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85152636624&partnerID=8YFLogxK
U2 - 10.1145/3542921
DO - 10.1145/3542921
M3 - Article
AN - SCOPUS:85152636624
SN - 1073-0516
VL - 30
JO - ACM Transactions on Computer-Human Interaction
JF - ACM Transactions on Computer-Human Interaction
IS - 1
M1 - 1
ER -