ON MEASURING MACHINE LEARNING MODELS AGAINST CONCRETE BUSINESS OBJECTIVES

REVIEW NOTES: DATA SCIENCE FOR BUSINESS BY PROVOST & FAWCETT: CHAPTER 7

I enjoyed reading this chapter. It’s insightful and well explained with detailed examples, diagrams and graphics, on a few data science topics that correspond directly to conventional scientific research in computer science. That makes me happy, because these are crucial points, yet rarely are the focus of Kaggle Competitions, books on Machine Learning or Statistics, the latest and greatest in TensorFlow, PyTorch, AutoML libraries (etc, etc) and too infrequently discussed in DL/AI/ML social posts and blogs. Below I have written about the points that are well worth taking home. These topics are broadly on:

  • Careful consideration of what is desired from data science results.
  • Expected value as a key evaluation framework.
  • Consideration of appropriate comparative baselines, in machine learning models.
Continue reading “ON MEASURING MACHINE LEARNING MODELS AGAINST CONCRETE BUSINESS OBJECTIVES”

TIL: Fixing ‘file not found’ dependency libraries in Linux

In Ubuntu (Debian/CentOS, and the like) apt is our go to CLI application package installer. It handles everything in a single iconic command that every Linux user knows:

sudo apt install <packageName>

Sometimes, and I still don’t get why or when, a package’s shared library (dependency) is not installed.

For example, this happened today for me with MySQL-Workbench. I run it on the CLI and it shows a dependent library is missing or can’t be found, and up throws an error message like:

$ mysql-workbench
/usr/lib/mysql-workbench/mysql-workbench-bin: error while loading shared libraries: libgdkmm-2.4.so.1: cannot open shared object file: No such file or directory

key points of THE FIX are:

1. Ensure the GNU locate database(s) (e.g. mlocate, slocate), are up to date with current information about file locations.

sudo updatedb

2. Ensure the file exists. (No print out means no file found)

locate libgdkmm-2.4.so.1

3. Reinstall the file if missing.
Here using -f for force install dependencies,, and --reinstall for force reinstall (if already installed).

sudo apt-get install -f --reinstall libgtkmm-2.4-1v5

4. Ensure the application configuration is looking in the correct location for the shared library files.

Today I didn’t need this. But essentially, to run MySQL-Workbench on Ubuntu uses a !#/bin/bash ELF file containing a script of commands to execute prior to starting the application binary. In that script, the following environment variables can be used to define the configuration locations export MWB_BINARIES_DIR=xyz and export LD_LIBRARY_PATH=xyz.

In my case, the application script use those environment variable values on the line that executes the runtime binary, as linker library address(es) to the corresponding shared library files written in C/C++. In interpreted language applications those environment variables values might be used as environment arguments into the executed code (for example in Python) or as library classpath addresses on the runtime execution line (for example in JIT-Java). Alas, I didn’t need to change those locations from default, but that’s how it works.

That’s how to resolve missing shared library dependencies in Ubuntu (and Debian/CentOS, etc).

On Measuring the Senior, In Senior Software Engineering Roles

Labelling “Senior”, “Mid” and “Junior” roles of software engineers comes up from time to time in the developer and programmer forums. While I’m not a fan of labels for people or groups of people – Seniority and Skill/Knowledge/Ability Levels get to me because they are so ambiguous. So it is down to us to contribute and discuss to reach a clear definition.

A truth of seniority, across all genres, is group-wide effect. It’s leadership, it’s empathy, it’s improving the individuals and the group as a whole for the group’s common interest. It’s a positive improvement, it’s team-wide developer productivity and overall business-wide productivity improvement. But what does that mean for Developers and Software Engineers?

Continue reading “On Measuring the Senior, In Senior Software Engineering Roles”

Adding to the Conversation on Data Science Training: Looking into the Future

2020 May 9th.

Please note: this is a temporally relevant article – it’s likely to be wrong immediately after it was written, however I publish it as it marks a step of the process. The below is my response to an expression of consideration on how to teach and how to learn Data Science in order to be most effective (as an employee, as a service to businesses and as a service to society). A key aspect raised during the discussion was on the consequences of focus upon domain expertise and of focus upon technical expertise, and of focus spread between both areas of expertise. My reply below (I believe adds a valuable addition and) helps guide the definition of Data Science teaching, learning and the ongoing strategy involved in continuous “lifelong” learning; or as long as Data Science remains as it is. I concede that the view presented below could easily have included many of other influencers to guide the viewpoint, more citations, viewpoints, argument points, evidence examples. But this is the nature of conversation imposed by a time limit. So, here goes:

Continue reading “Adding to the Conversation on Data Science Training: Looking into the Future”

Book review 2/2 on Robot Proof: Higher Education in the Age of AI

I finished the book by Joseph Aoun a little while ago, and I’ve been sitting on my notes letting them stir. I think i have a fairly safe conclusion for its second half. That said, I would expect those with an understanding and empathetic relationship with their CS students and their families will have been at the cusp of some similar conclusions drawn by Aoun in Robot Proof in 2017.

Continue reading “Book review 2/2 on Robot Proof: Higher Education in the Age of AI”

Open Source Code for Light Stage Capture Sequences

Today I’m posting updates (1/n) to the Light Stage open source project codebase.

The updates mark improvements for integrating experimental result data and 3d geometry data with light and camera-trigger hardware controllers (3). Included are two new lighting sequence improvements (1) and (2) and a way to get started, no matter your stage design and target capture application (4). These changes contribute towards standardised capture sequences and integrated 3d reconstruction pipeline processing, while supporting stage design tools and retaining visualisations, measurable evaluations and optimisations at each step.

Altogether, this work takes a step towards the vision of a comprehensive open source framework for open hardware light stages, find more details at the Build a Light Stage website.

These recent updates to the LightStage-Repo on github include:

  1. Spherical gradient” lighting sequence.
  2. Balanced lighting baseline”.
  3. Local web service (on port 8080) to return data requested by an HTTP client, such as a hardware controller with Ethernet/Wifi module.
  4. Configuration file designed for each Light Stage, to easily get the web service responding with correct sequence data.
Continue reading “Open Source Code for Light Stage Capture Sequences”

Update on Lightstage Project

In this post, I took the liberty to write some of my thoughts and reflections on why Lightstages are (“pretty cool in my book” and also) relevant amongst today’s cutting edge developments in machine learning and data-driven decision making.

Over the last few years, I’ve had the opportunity to work as a researcher on the Aber Lightstage project, under Dr. Hannah Dee. Back then, I wrote a Python-OpenGL-based application to help us visualise and numerically evaluate lighting positions on our stage — the project is open source and on Github. Dr. Dee had successfully raised a bit of funding to bring together a team of engineers, researchers and advisors, each offering their specialist skills and knowledge to the project, and I got the chance to get involved.

Continue reading “Update on Lightstage Project”

Collaboration Platforms for Data Scientists

News from April 10th 2019 is the release of Google’s collaborative AI platform for Data Science teams, for execution on cloud or on premises. Google’s platform joins Alibaba‘s similar platform called PAI 2.0 announced in March 29th 2017. While comprehensive information on Alibaba’s platform is sparse in non-Chinese, the Google AI Platform does give samples and tutorials. Two others ClusterOne for the DevOps of data science and DeterminedAI for collaboration each had funding announcements earlier this year. Google and Alibaba’s platforms give a clear separation for team roles to collaborate at each stage of the process (as is indicated for the two yet-to-be-released others). The concept is well worth a mention because they are collaborative frameworks pushing forward the methodologies of data science, engineering and in essence, social intelligence..

 

cloud-ai-platform2
AI Platform announced by Google April 10th 2019: Process pipeline of data-driven application stages.

Continue reading “Collaboration Platforms for Data Scientists”