This is Not a Pipe
All data is semiotic. That is, it is representative and lacks meaning without context. A basic example of a datum is a count. This can be expected to be a positive integer, but in any case, it stands in for something else (e.g., my cat weighs 10 pounds).
Choosing to count something is an act that introduces bias, as is choosing not to count something. Creating data without some form of bias is impossible. An example of this problem can be seen in crime forecasting. We can use the number of arrests as a metric on a map to predict where more arrests will happen in the future, but we would not be seeing crime, but rather where arrests tend to happen. In this case, the number of arrests is a poor representation for crime.
Magritte’s The Treachery of Images has the label “this is not a pipe,” and he was correct, it is a painting of a pipe. (Magritte, 1929)
All Models Are Wrong
Due to the semiotic nature, it can be dangerous to deploy systems that make decisions based on data. In the above example, allocating resources based the on number of arrests will only reinforce existing structures. You may see more arrests, especially if there is a mechanistic relationship between resources and the number of arrests, but you are not actually addressing crime. This is not to say that we cannot use data towards beneficial ends.
For example, if I regularly weigh my cat and the scale always says she is 10 pounds, I can use that as a non-invasive health diagnostic going forward. If I put her on the scale and it says she is 12 pounds, I know that I might be overfeeding her; if the scale says she is 8 pounds, it is probably time to bring her in to see a veterinarian. My scale does not need to be accurate for this to work, and just because her weight fluctuates does not mean that there is a health problem. The aphorism, “all models are wrong, but some are useful,” (Box, 1987) reflects this. Data is always standing in for something, but we can still use it to make better decisions.
A Ship in Harbor is Safe
Your organization might have outstanding data management practices, but concerns about data security or the idea of offloading executive decision making to a machine. The only perfectly secure data is that which cannot be accessed by anyone, and if you don’t build models, you never have to use them. “A ship in harbor is safe, but that is not what ships are built for,” (Shedd, 1928) applies here. If an organization is so concerned about security that it does not tolerate its use, and there are no regulatory requirements, it should not spend resources keeping the data. On the other hand, if the hesitation is about relying on a novel approach, or making data-driven decisions, we can address that!
Written By: Travis Dawry, MS, Strategic Analytics Faculty
Box, G. E. (1987). Empirical Model-Building and Response Surfaces. Wiley.
Magritte, R. (1929). La trahison des images [Ceci n’est pas une pipe]. Belgium.
Shedd, J. A. (1928). Salt from My Attic. Mosher Press.