Thursday, 1 April 2021

Shifting Modern Data Science Forward: Dijkstra principle for data science


Prelude
Dijkstra in Zurich, 1984 (Wikipedia)

Edsger Dijkstra was a Dutch theoretical physicist turned computer scientist, and probably one of the most influential earlier pioneers in the field. He had deep insight in what is computer science and well founded notion of how should it be taught in academics. In this post we extrapolate his ideas into data science. We developed something called, Dijkstra principle for data science, that is driven by his ideas on what does computer science entails.

Computer Science and Astronomy 

Astronomy is not about telescopes. Indeed, it is about how universe works and how its constituent parts are interacting. Telescopes, either being optical or radio observations or similar detection techniques are merely tools to practice and do investigation for astronomy. A formed analogy goes into computer science as well, this is the quote from Dijkstra:
Computer science is no more about computers than astronomy is about telescopes.  - Edsger Dijkstra
The idea of Computer Science being not about computer is rather strange in the first instance. However, what Dijkstra had in mind is abstract mechanism and mathematical constructs that one can map real problems and solve it as a computer science problem, such as graph algorithms. Though Computer Science had a lot of subfields but its inception can be considered as rooted in applied mathematics.

Dijkstra principle for data science

By using Dijkstra's approach now we are in position to formulate a principle for data science. 
Data science is no more about data than computer science is about computers. -Dijkstra principle for data science
This sounds absurd. If data science is not about data, then what is it about? Apart from definition of data science as an emergent field, as an amalgamation of multiple fields from statistics to high performance computing,  the idea that data not being the core tenant of data science implies the practice does not aim at data itself rather a higher purpose. Data is used similar to a telescope in astronomy, the purpose is to reveal the empirical truths about representations data conveys. There is no unique ways to achieve this purpose. 

Conclusive Remarks

Dijkstra principle for data science would be very helpful in understanding the data science practice as not data-centric, contrary to mainstream dogma, rather as a science-centric  practice with the data being the primary tool to leverage, using multitude of techniques. Implication is that machine learning is a secondary tool on top of data in practicing data science. This attitude would help causality playing a major role shifting modern data science forward.


No comments:

Post a Comment

(c) Copyright 2008-2024 Mehmet Suzen (suzen at acm dot org)

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.