Data Visualization is the graphical representation of data.
Generally, there are two categories of data visualization: exploration and explanation.
Exploratory data visualizations are a great starting point with you have are not sure what's in your data. These data visualizations give you a sense of what is in your data set and allow you to identify patterns and trends in your data.
Explanatory data visualizations are most appropriate when you have a sense of what your data has to say and you are ready to tell the story of your data.
Let's look at Anscombe's quartet, which presents the argument for considering data in a visual medium and the importance of exploration in data analysis. First published by Francis Anscombe in the 1973 paper Graphs in Statistical Analysis, Anscombe's quartet presents four made-up datasets each containing eleven observations of two variables, x and y.
x1 | y1 | x2 | y2 | x3 | y3 | x4 | y4 | |
---|---|---|---|---|---|---|---|---|
1 | 10 | 8.04 | 10 | 9.14 | 10 | 7.46 | 8 | 6.58 |
2 | 8 | 6.95 | 8 | 8.14 | 8 | 6.77 | 8 | 5.76 |
3 | 13 | 7.58 | 13 | 8.74 | 13 | 12.74 | 8 | 7.71 |
4 | 9 | 8.81 | 9 | 8.77 | 9 | 7.11 | 8 | 8.84 |
5 | 11 | 8.33 | 11 | 9.26 | 11 | 7.81 | 8 | 8.47 |
6 | 14 | 9.96 | 14 | 8.10 | 14 | 8.84 | 8 | 7.04 |
7 | 6 | 7.24 | 6 | 6.13 | 6 | 6.08 | 8 | 5.25 |
8 | 4 | 4.26 | 4 | 3.10 | 4 | 5.39 | 19 | 12.50 |
9 | 12 | 10.84 | 12 | 9.13 | 12 | 8.15 | 8 | 5.56 |
10 | 7 | 4.82 | 7 | 7.26 | 7 | 6.42 | 8 | 7.91 |
11 | 5 | 5.68 | 5 | 4.74 | 5 | 5.73 | 8 | 6.89 |
By design, each of these datasets have nearly identical means, variances, and correlation coefficients. However, when the datasets are plotted, they reveal great differences between the datasets. Thus, illustrating how exploratory data visualizations reveal patterns that may not be readily apparent from summary statistics alone.
Below are some historical examples of how explanatory data visualizations have shaped our world.
Source: gallica.bnf.fr / Bibliothéque nationale de France
More information: Joseph Charles Mindard's Carte figurative des pertes successives en homes de l'Armée Française dans la campagn de Russie 1812-1813
Source: The Public Domain Review
More information: W.E.B. Du Bois Data Visualizations
Source: National Geographic
More information: John Snow's Cholera Map
Iliinsky, N., & Steele, J. (2011). Designing data visualizations: Representing informational Relationships. " O'Reilly Media, Inc.".
Anscombe, F. J. (1973). Graphs in statistical analysis. The american statistician, 27(1), 17-21.
Dutta, D. (2017). Anscombe’s quartet. RPubs. https://rpubs.com/debosruti007/anscombeQuartet.
R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.