Date Published: March 28, 2019
Publisher: Public Library of Science
Author(s): Gem Stapleton, Peter Chapman, Peter Rodgers, Anestis Touloumis, Andrew Blake, Aidan Delaney, Nicholas J. Provart.
This paper presents the first empirical investigation that compares Euler and linear diagrams when they are used to represent set cardinality. A common approach is to use area-proportional Euler diagrams but linear diagrams can exploit length-proportional straight-lines for the same purpose. Another common approach is to use numerical annotations. We first conducted two empirical studies, one on Euler diagrams and the other on linear diagrams. These suggest that area-proportional Euler diagrams with numerical annotations and length-proportional linear diagrams without numerical annotations support significantly better task performance. We then conducted a third study to investigate which of these two notations should be used in practice. This suggests that area-proportional Euler diagrams with numerical annotations most effectively supports task performance and so should be used to visualize set cardinalities. However, these studies focused on data that can be visualized reasonably accurately using circles and the results should be taken as valid within that context. Future work needs to determine whether the results generalize both to when circles cannot be used and for other ways of encoding cardinality information.
This paper sets out to shed light on how to represent sets and their cardinalities in a manner most effective for users. This is of particular significance because there are enormous amounts of set-based data available in a wide variety of application areas . Set visualization techniques often exploit closed curves (or variations thereof) [9, 11, 25, 32, 36] or lines [1, 8, 15, 34]. This paper therefore focuses on such methods by evaluating extensions of Euler diagrams (closed curves) [39, 41] and linear diagrams (lines)  that represent cardinality information. In addition, we consider two common ways of representing cardinality: proportions (of areas or lengths) and numerical annotations.
To address RQs 1-3, three between-group empirical studies were conducted that measured task performance in terms of accuracy and time. This section describes the studies’ design. Ethical approval was obtained from the University of Kent (approval number 0811516).
Initially a pilot study collected data from 90 people, 30 per group. This identified a missing number from one of the numerical-group diagrams and another question had an erroneous answer. Five other questions were associated with minor bugs in the data collection software. All problems were rectified. For the main study, 300 people were recruited, 100 for each group; allocation to groups was random in this and both subsequent studies. Of these 300 participants, 23 were identified as inattentive and, thus, had their data removed before analysis; these inattentive participants were distributed across the groups as follows: 7 in the Euler Diagrams—Proportional (ED-P) group, 6 in the Euler Diagrams—Numerical (ED-N) group, and 10 in the uler Diagrams—Proportional & Numerical (ED-P&N) group. This left data from 277 participants (142 M, 134 F, 1 undisclosed; ages 20–74, mean 36.5), with 93 participants (0 colourblind, 1 did not supply colourblind information) in the ED-P group, 94 participants (1 colourblind) in the ED-N group, and 90 participants (1 colourblind, 2 did not supply colourblind information) in the ED-P&N group.
A pilot study collected data from 90 people, 30 per group. No problems were identified so we proceeded with the main study. We recruited 300 people, 100 for each group. Of these, 28 were identified as inattentive and, thus, had their data removed before analysis; these inattentive participants were distributed across the groups as follows: 12 in the Linear Diagrams—Proportional (LD-P) group, 7 in the Linear Diagrams—Numerical (LD-N) group, and 9 in the Linear Diagrams—Proportional & Numerical (LD-P&N) group. This left data from 272 participants (131 M, 141 F; ages 18–71, mean 36.1), with 88 participants (0 colourblind, 2 did not supply colourblind information) in the LD-P group, 93 participants (1 colourblind) in the LD-N group, and 91 participants (3 colourblind) in the LD-P&N group.
The first two studies revealed two visualization methods as the most effective: area-proportional Euler diagram with numerical annotations (ED-P&N) and length-proportional linear diagrams without numerical annotations (LD-P). The purpose of this section is to suggest which of these two competing choices most effectively supports performance when focusing on cardinality-oriented tasks.
Threats to validity are categorized as internal, construct and external . With regard to internal validity, which examines whether confounding factors impact the results, a major consideration in the design of all three studies related to carry-over effect. This threat occurs when the measures arising from a treatment are affected by another treatment. To eliminate this, we used between group designs and participants could only take part once.
The extensive occurrence of set-based data, and the many attempts to automatically visualize it, are primary drivers for understanding the cognitive impact of different visualization methods on user performance. Mindful of the importance of understanding such choices in real-world settings—that is, by using real-world data and actual implemented visualization systems—this paper addressed three research questions: