Adding to the Conversation on Data Science Training: Looking into the Future

2020 May 9th.

Please note: this is a temporally relevant article – it’s likely to be wrong immediately after it was written, however I publish it as it marks a step of the process. The below is my response to an expression of consideration on how to teach and how to learn Data Science in order to be most effective (as an employee, as a service to businesses and as a service to society). A key aspect raised during the discussion was on the consequences of focus upon domain expertise and of focus upon technical expertise, and of focus spread between both areas of expertise. My reply below (I believe adds a valuable addition and) helps guide the definition of Data Science teaching, learning and the ongoing strategy involved in continuous “lifelong” learning; or as long as Data Science remains as it is. I concede that the view presented below could easily have included many of other influencers to guide the viewpoint, more citations, viewpoints, argument points, evidence examples. But this is the nature of conversation imposed by a time limit. So, here goes:


I think a missing piece to defining a course in Data Science is a view of its future.

While I have been hesitant to define the structure of data science training, and I am biased towards (as I have) a Comp Sci background. I conversely follow the doctrine that “comp sci is without purpose with no application”, owing to my position of “domain knowledge is where the value is generated” (societal, financial, etc). If setting boundaries and principles in Data Science is required, I think it’s the view of the future that must be settled first.

The vision of future education that people such as Joseph Aouns purports is a basis of Comp Sci and a capability for divergent problem solving (you’ll find longer summaries of his writing elsewhere). The convincing view I have established since reading his work (among others) is that successful developments in AI lead to practice in what we call data science. In my mind, terms such as data science, big data analytics, data analytics, quantitative decision making (feel free to fill in your own missing terms) and further back statistical mathematician, all have aimed to solve the essence of the same problem, yet with different concepts, frames, techniques and technologies. If I “had” to give a name to that, it would be “applications of quantitative statistics”.

While the future of non-comp sci fields (agriculture, business, linguistics, social sciences, etc), all will change the landscape upon which “the now called data science” field is formed, in my mind, that means the basis of data science is set to change. As those other fields change, their domains change and thus the applications of data science will change.

The complexity of the problems solved by AI will increase. The trajectory is clear. For example, while some may notice some stagnation or limits of the untapped innovation remaining in the deep learning field, it is certainly successful at solving categories of problems. Among evolutionary robotics and other areas of AI, generalised and transferable problem solving is available. Time will let these and other techniques increase their applicability to more complex problem solving, and the degree to which data science will be applied within each of the domains will be extended. Aouns (2017) mentions supply chain logistics,, I like to entertain the idea of algorithms successfully negotiating the complex interactions of political policy making. The techniques and problems solved by data science will change.

So I come back to the original thought, (1) a strong foundational basis of comp sci (which is information problem solving and includes statistics and scientific rigour for decision making) to learn the new techniques and technologies, (2) divergent problem solving and divergent thinking in the face of changing problems within the domains, which is an exercise of creativity within a mastered subject (in this case information problem solving) while (3) continuing to refresh the knowledge of the current data science toolbox, made available by developments in AI.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.