REVIEW NOTES: DATA SCIENCE FOR BUSINESS BY PROVOST & FAWCETT: CHAPTER 7
I enjoyed reading this chapter. It’s insightful and well explained with detailed examples, diagrams and graphics, on a few data science topics that correspond directly to conventional scientific research in computer science. That makes me happy, because these are crucial points, yet rarely are the focus of Kaggle Competitions, books on Machine Learning or Statistics, the latest and greatest in TensorFlow, PyTorch, AutoML libraries (etc, etc) and too infrequently discussed in DL/AI/ML social posts and blogs. Below I have written about the points that are well worth taking home. These topics are broadly on:
Careful consideration of what is desired from data science results.
Expected value as a key evaluation framework.
Consideration of appropriate comparative baselines, in machine learning models.
Labelling “Senior”, “Mid” and “Junior” roles of software engineers comes up from time to time in the developer and programmer forums. While I’m not a fan of labels for people or groups of people – Seniority and Skill/Knowledge/Ability Levels get to me because they are so ambiguous. So it is down to us to contribute and discuss to reach a clear definition.
A truth of seniority, across all genres, is group-wide effect. It’s leadership, it’s empathy, it’s improving the individuals and the group as a whole for the group’s common interest. It’s a positive improvement, it’s team-wide developer productivity and overall business-wide productivity improvement. But what does that mean for Developers and Software Engineers?
Please note: this is a temporally relevant article – it’s likely to be wrong immediately after it was written, however I publish it as it marks a step of the process. The below is my response to an expression of consideration on how to teach and how to learn Data Science in order to be most effective (as an employee, as a service to businesses and as a service to society). A key aspect raised during the discussion was on the consequences of focus upon domain expertise and of focus upon technical expertise, and of focus spread between both areas of expertise. My reply below (I believe adds a valuable addition and) helps guide the definition of Data Science teaching, learning and the ongoing strategy involved in continuous “lifelong” learning; or as long as Data Science remains as it is. I concede that the view presented below could easily have included many of other influencers to guide the viewpoint, more citations, viewpoints, argument points, evidence examples. But this is the nature of conversation imposed by a time limit. So, here goes:
I finished the book by Joseph Aoun a little while ago, and I’ve been sitting on my notes letting them stir. I think i have a fairly safe conclusion for its second half. That said, I would expect those with an understanding and empathetic relationship with their CS students and their families will have been at the cusp of some similar conclusions drawn by Aoun in Robot Proof in 2017.
Today I’m posting updates (1/n) to the Light Stage open source project codebase.
The updates mark improvements for integrating experimental result data and 3d geometry data with light and camera-trigger hardware controllers (3). Included are two new lighting sequence improvements (1) and (2) and a way to get started, no matter your stage design and target capture application (4). These changes contribute towards standardised capture sequences and integrated 3d reconstruction pipeline processing, while supporting stage design tools and retaining visualisations, measurable evaluations and optimisations at each step.
Altogether, this work takes a step towards the vision of a comprehensive open source framework for open hardware light stages, find more details at the Build a Light Stage website.
In this post, I took the liberty to write some of my thoughts and reflections on why Lightstages are (“pretty cool in my book” and also) relevant amongst today’s cutting edge developments in machine learning and data-driven decision making.
Over the last few years, I’ve had the opportunity to work as a researcher on the Aber Lightstage project, under Dr. Hannah Dee. Back then, I wrote a Python-OpenGL-based application to help us visualise and numerically evaluate lighting positions on our stage — the project is open source and on Github. Dr. Dee had successfully raised a bit of funding to bring together a team of engineers, researchers and advisors, each offering their specialist skills and knowledge to the project, and I got the chance to get involved.
News from April 10th 2019 is the release of Google’s collaborative AI platform for Data Science teams, for execution on cloud or on premises. Google’s platform joins Alibaba‘s similar platform called PAI 2.0 announced in March 29th 2017. While comprehensive information on Alibaba’s platform is sparse in non-Chinese, the Google AI Platform does give samples and tutorials. Two others ClusterOne for the DevOps of data science and DeterminedAI for collaboration each had funding announcements earlier this year. Google and Alibaba’s platforms give a clear separation for team roles to collaborate at each stage of the process (as is indicated for the two yet-to-be-released others). The concept is well worth a mention because they are collaborative frameworks pushing forward the methodologies of data science, engineering and in essence, social intelligence..