Communicative Information Visualizations: How to make data more understandable by the general public

15 Aug

Monday, 08/15/2022 10:00am to 12:00pm

Zoom

PhD Thesis Defense

Abstract: Although data visualizations have been around for centuries and are encountered frequently by the general public, existing evidence suggests that a significant portion of people have difficulty understanding and interpreting them. It might not seem like a big problem when a reader misreads a weather map and finds themselves without an umbrella in a rainstorm, but for those who lack the means, experience, or ability to make sense of data, misreading a data visualization concerning public health and safety could be a matter of life and death. However, figuring out how to make visualizations truly usable for a diverse audience remains difficult.

Implicit information -- that is, information which is not directly stated, but instead must be inferred -- may contribute to data visualizations that are difficult to understand. In the context of reading data visualizations, critical information about what is being encoded and how to decode it might be communicated implicitly. This means that people who do not already know how to read the visualization need to make assumptions about how to do so and may guess incorrectly. Alternately, in the context of the production of research about visualizations, if the audience of a visualization (for example) is defined implicitly then readers must infer who the results apply to. In the absence of sufficient context, they too may infer incorrectly. In both cases, communicating key information implicitly may lead to visualizations which are difficult for some people to use -- either because there is not enough information for a reader to make sense of what they see or because the visualization was never intended, or evaluated, to work for a particular reader.Therefore, in my dissertation, I examined three ways in which making implicit information explicit might help make data visualizations more understandable and impactful in the future.

First, I conducted a critical analysis of the ways that audiences of data visualizations are defined in visualization research papers. Poorly defining the audience of a visualization can have negative effects for both people and science. For example, it may lead to over-generalizing research results or guidelines for audiences that were never intended (or evaluated) in the original research or ignoring the (potentially unique) needs of different populations by aggregating them into majority groups. Therefore, as a case study to investigate current practices, I conducted a survey of every paper published in a major visualization venue that referred to ``novices,'' ``non-experts,'' the ``general public,'' or ``laypeople'' in their title or abstract. I selected this set of audiences because they represent some of the words used to describe visualization's broadening audience. My case study of 79 papers demonstrated that audiences were rarely explicitly defined. In place of explicit definitions, authors relied on implicit definitions (e.g., examples, counter-examples) to clarify who was in the audience. To improve how audiences are defined in future work, I draw on fields outside of visualization including philosophy and feminist theory to argue that authors should: (1) Explicitly define their audiences by specifying the dimensions which qualify an individual for inclusion, (2) Think of their audiences intersectionally by considering how multiple dimensions of identity and experience can produce unique perspectives on visualization, and (3) be aware of systems of power so that the ways an audience is defined does not cause harm through reinforcing existing oppressive power dynamics.

Next, I examined current visualization design techniques to test how more explicitly encoding data as an array of countable pictographs (instead of as solid abstract shapes) may impact understanding and experience. In existing research, infographics show positive empirical findings in terms of memory, engagement, and assessment of risk -- particularly when they contain pictographs (simple, iconic pictures that represent a word or topic). However, there was little exploration of how pictographs affect and afford the general public's understanding of the underlying data or how the choice to use pictographs affects readers' personal experiences. Therefore, I conducted an experiment that utilized a novel method of producing questions that probe different aspects of a reader's understanding of 6 pairs of real-world visualizations which are identical except for their use of pictograph arrays. My results indicated that the use of pictograph arrays does not directly impact understanding but can allow readers to more easily envision real-world connections.

Finally, I explored how accompanying a visualization with contextual information could impact understanding and experience. While visualizations are designed to present a multitude of data, they often are not accompanied by key metadata which provides background on the source of the data, the transformations applied to the data, the visualization elements, its purpose, the people involved in its creation, or its intended audience. Recent history has shown that ambiguity surrounding the data collection and the visualization design process can erode the trust of readers and be exploited to cast doubt on scientific processes (e.g., by COVID-19 skeptics). Common wisdom suggests that contextualizing visualization with metadata (e.g., disclosing the creator or the data source) may counter these effects and potentially increase understanding and aspects of trust. However, the impact of adding metadata to visualizations remained largely unknown.To fill this gap, I conducted two experiments. In Experiment 1, I explored what kinds of metadata participants valued the most and how chart type, topic, and user goal impacted their choices of metadata. My results indicated that participants were most interested in metadata which explained the visualization's encoding for goals related to understanding and metadata about the source of the data for assessing trust. Based on the results of Experiment 1, in the second experiment, I explored how these two types of metadata impact trust, information relevance, and understanding. I asked 144 participants to explain the main message of two pairs of visualizations (one with metadata and one without); rate them on scales of trust and relevance; and then predict the likelihood that they were selected by an organization for a presentation to policy makers. My results suggested that among four dimensions of trust, visualizations with metadata were perceived as more thorough but similarly accurate, clear, and complete in comparison to those without. Additionally, visualizations with metadata were assigned higher probabilities of being selected by a hypothetical organization for a presentation for policy makers. However, participants did not perceive the information in visualizations with metadata as more relevant than those without. Finally, the presence of metadata did not impact the accuracy of extracting information from the visualizations, but may have influenced which information participants remembered as important or interesting.

Advisor: Narges Mahyar

Join via Zoom

Communicative Information Visualizations: How to make data more understandable by the general public

Subscribe to the CICS eNewsletter