Blog

Extension of Python Dict .get() – Lookup with Similarity for Built-in Libraries

This is a prospective extension to Python dict .get() that solves a common problem in data applications. The bold proposal asks whether to include such an implementation in the core language or in a library, across languages used for data processing. See what you think..

Background & Why?

These days we have more data-oriented code being written (ML/AI,etc). Data is often “dirty” (missing values/spelling errors/grammar typos/etc). “Fuzzy” (less certain) matching can be useful in many of these cases (and traditionally in SQL we might use %LIKE%). Dictionary implementations (i.e. hashmap, hashtable, associative-array, etc) are an efficient lookup mechanism. They respond as a boolean lookup – the key is there, or the key is not; however it’s unconventional to think of their lookup with a confidence measure. In data-oriented code, dictionaries are often used for matching data or conditionally joining datasets. When we cross (natural) languages we get more typo variations (in the general sense, double languages means double the varieties of typos) and therefore greater likelihood of mismatches when performing lookups (or translating) across those languages.

Code Description:

Below is Python code using the difflib string similarity library. The code will perform a lookup in a dictionary (dict), using a double get_or_else mechanism. get_or_else has become a broadly adopted functional paradigm best practice in software engineering in order to replace if-else blocks with a (coupled) curried function. Coupling multiple get_or_else function calls is normal, yet tends to give more edge cases / complicates testing. It remains unconventional to throw confidence-based matching into this mix; which is precisely what we do here:

The dictionary lookup will either:

  1. match the key, or
  2. match the key with a similarity score >= threshold=0.5, or
  3. fail to match, and return default_value.

Code Sample:

Code Breakdown

Obviously, the above looks like code golf, so let’s step through the call and show each operation:

(0) Standard Get or Else dict lookup:

The base operation is a standard dict key check using the get(..) method. get(..) ensures an exception is not raised if the key is not found. If the key is found it returns. (See the Runtime Analysis below for more on this execution, as it is evaluated after snippets 1-6).

(1) Similarity Scores:

Create a list of matches to each key. This is an O(n) operation, checking every dict key.

(2) Filter by Threshold Score:

Keep the keys with a similarity score that reaches or exceeds the threshold value. This is an O(n) operation, checking every dict key.

(3) Sort to find the best match:

Sort the filtered results, so the best result is in index position 0. This is an O(n) to O(n log n) operation (for Timsort: i.e. Insertion or Merge).

(4) Get the top match or Handle if there are no matches. Ensure a value is returned:

We’re using float('nan') here because it should never unintentionally match a genuine key. Python doesn’t have a true null type (i.e. None == None is True, which is not the case for null) . float('nan') provides that null behaviour. This is an O(1) operation.

(5) Extract the key value (i.e. from (key,score) tuple) or handle no matches:

Same reasoning applies for float('nan') to ensure the null result it will not match an existing dictionary key. This is an O(1) operation.

(6) Second-Level Get_or_Else Lookup:

A simple get_or_else lookup. Note, that if a top_match_key was not found, then its value will be float('nan'), which will not match. Therefore, it will fail and return the specified default_value. This is an O(1) operation.

Runtime Analysis

Total time complexity is O(3n+3c) for average, worst and best case scenarios (excluding variations in dependent functions, e.g. Timsort). Comparatively, dict‘s native lookup is O(1).

In the Appendix: Lazy Implementation section below, you can find a time complexity of O(3n+3c) for average and worst case scenarios and best as O(1), by separating the boolean and confidence-based lookups into (curried, yet) separate function calls.

Appendix:

Appendix: Lazy implementation

This implementation improves the best case execution time. In this case the similarity lookup is optional, and lazily called. If key is not found within the dictionary, get_or_threshold_match_lazy() will return a function (object) pointer, which can then be called. Note: the major difference in this function is on line 11.

Why? Well, the eager implementation (above) will first evaluate the O(3n+3c) lookup, then it will try the O(1) lookup. The lazy implementation, will first evaluate the O(1) lookup, then it will wait for evaluation of the O(3n+3c) lookup.

Pros:

  • Time complexity best case is O(1). Still average and worst is O(3n+3c).
  • An option to separate function calls, and conditionally request the secondary function.
  • Good for large lookup dictionaries.

Cons:

  • More complex code to write / read.

Appendix: Eager implementation (above) Pros & Cons

Pros:

  • Relatively simple code to write.

Cons:

  • Bad for dictionaries with large key sets.
  • Guaranteed O(3n+3c) for dictionary lookups.

Thailand Province Border Adjacency Dataset/Code

A quick update post to help get my latest project’s new dataset more readily indexed on Google search, etc. (Feb 8th 2021)

I’ve recently been working on risk assessment for COVID-19 in our 2nd wave. To create an email alert per province (taking account of local regional data) I needed to join provincial data together. It turns out that for much of Thailand’s publicly available government datasets (particularly in Office of Agricultural Economics, Land Department, etc) the data is summarised at Province level (i.e. is not GIS coordinate-based). Yet, there’s no mapping of province -> [neighbouring provinces] dataset out there (that I could find), so I created one the other night and wrote the code to verify and integrate it.

That dataset/code is now on github: https://github.com/pmdscully/thailand_province_border_adjacency

An obligatory requirement of using data relations (X->Y) is making a pretty visualisation on GraphViz, so dutifully — here it is: ^^ (Along with Wikipedia’s provincial public map for comparison..)

Q & A

Is it correct & up to date? Yes. The newest Thai province change was adding Bueng Kan, which was split-off from Nong Khai, effective on 23 March 2011 – that’s included; so it’s up-to-date as of Feb 2021. Bangkok is referred to as a Special Administrative Area, but it’s included as province in the mappings; giving a total of 77 entries.

Is it easy to use the mapping dataset by importing a Python module into my own software application? Yes, you can join province datasets together based on their semantic geo-neighbourhoods – 🙂

  1. Just git clone the repository,
  2. download a province naming dataset ,
  3. import the python module,
  4. Write about 4 lines of code gives you a dictionary lookup (see the readme.md for full details).

I want to SQL join my provincial datasets together, but only for the provinces nextdoor, how can I do that? Yes, that’s precisely what this dataset and code is for. Before you create your SQL query,

  1. import the Python module (province_neighbours.py),
  2. instantiate the ProvinceRelationsParser object,
  3. get the dictionary,
  4. perform the dictionary lookup on your key province, this will give you the list of neighbouring provinces.
  5. Simply plug those names into your SQL query and you are ready! (Find a code example in the readme.md).

Can I use Thai language (UTF-8) as my lookup and get neighbour results in Thai (UTF-8)? Short answer is yes. See the readme.md on the Github repo for full details with code samples.

Over to you

There’s plenty more to say about this project, but if you’re interested in the details, go visit the Github repository. (Or send me a message, if you want extra detailed info).

Feel free to check it out.

Towards COVID-19 Wave Risk Assessment Tool for BKK Residents: Results so far…

Dated 25th Jan 2021

(A) New Cases for Bangkok and Nearby Provinces:

All data collected from Daily COVID-19 report, Thailand information [Daily COVID-19 cases reported]
Data Service: https://opendata.data.go.th/dataset/covid-19-daily
Last Updated: 24 มกราคม 2564

0 New Cases in กรุงเทพมหานคร / Bangkok on 2021-01-25:
25 New Cases in กรุงเทพมหานคร / Bangkok on 2021-01-24:
5 Key Clusters with 21 Cases in กรุงเทพมหานคร / Bangkok on 2021-01-24 (excluding state quarantine and arrivals ASQ/ALQ)
1 Days Since Last New Case

0 New Cases in สมุทรปราการ / Samut Prakan on 2021-01-25:
12 New Cases in สมุทรปราการ / Samut Prakan on 2021-01-24:
4 Key Clusters with 12 Cases in สมุทรปราการ / Samut Prakan on 2021-01-24 (excluding state quarantine and arrivals ASQ/ALQ)
1 Days Since Last New Case

0 New Cases in นนทบุรี / Nonthaburi on 2021-01-25:
1 New Cases in นนทบุรี / Nonthaburi on 2021-01-24:
1 Key Clusters with 1 Cases in นนทบุรี / Nonthaburi on 2021-01-24 (excluding state quarantine and arrivals ASQ/ALQ)
1 Days Since Last New Case

0 New Cases in ปทุมธานี / Pathum Thani on 2021-01-25:
0 New Cases in ปทุมธานี / Pathum Thani on 2021-01-24:
0 New Cases in ปทุมธานี / Pathum Thani on 2021-01-23:
6 New Cases in ปทุมธานี / Pathum Thani on 2021-01-22:
2 Key Clusters with 6 Cases in ปทุมธานี / Pathum Thani on 2021-01-22 (excluding state quarantine and arrivals ASQ/ALQ)
3 Days Since Last New Case

0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-25:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-24:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-23:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-22:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-21:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-20:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-19:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-18:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-17:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-16:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-15:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-14:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-13:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-12:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-11:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-10:
1 New Cases in นครปฐม / Nakhon Pathom on 2021-01-09:
1 Key Clusters with 1 Cases in นครปฐม / Nakhon Pathom on 2021-01-09 (excluding state quarantine and arrivals ASQ/ALQ)
16 Days Since Last New Case

0 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-25:
0 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-24:
0 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-23:
2 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-22:
1 Key Clusters with 2 Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-22 (excluding state quarantine and arrivals ASQ/ALQ)
3 Days Since Last New Case

0 New Cases in สมุทรสาคร / Samut Sakhon on 2021-01-25:
147 New Cases in สมุทรสาคร / Samut Sakhon on 2021-01-24:
1 Key Clusters with 147 Cases in สมุทรสาคร / Samut Sakhon on 2021-01-24 (excluding state quarantine and arrivals ASQ/ALQ)
1 Days Since Last New Case

0 New Cases in สมุทรสงคราม / Samut Songkhram on 2021-01-25:
7 New Cases in สมุทรสงคราม / Samut Songkhram on 2021-01-24:
1 Key Clusters with 7 Cases in สมุทรสงคราม / Samut Songkhram on 2021-01-24 (excluding state quarantine and arrivals ASQ/ALQ)
1 Days Since Last New Case

(B) Last 14 days BANGKOK AND NEARBY PROVINCES:

All data collected from Daily COVID-19 report, Thailand information [Daily COVID-19 cases reported]
Data Service: https://opendata.data.go.th/dataset/covid-19-daily
Last Updated: 24 มกราคม 2564

0 New Cases in กรุงเทพมหานคร / Bangkok on 2021-01-25:
25 New Cases in กรุงเทพมหานคร / Bangkok on 2021-01-24:
14 New Cases in กรุงเทพมหานคร / Bangkok on 2021-01-23:
22 New Cases in กรุงเทพมหานคร / Bangkok on 2021-01-22:
23 New Cases in กรุงเทพมหานคร / Bangkok on 2021-01-21:
18 New Cases in กรุงเทพมหานคร / Bangkok on 2021-01-20:
22 New Cases in กรุงเทพมหานคร / Bangkok on 2021-01-19:
24 New Cases in กรุงเทพมหานคร / Bangkok on 2021-01-18:
16 New Cases in กรุงเทพมหานคร / Bangkok on 2021-01-17:
22 New Cases in กรุงเทพมหานคร / Bangkok on 2021-01-16:
36 New Cases in กรุงเทพมหานคร / Bangkok on 2021-01-15:
21 New Cases in กรุงเทพมหานคร / Bangkok on 2021-01-14:
28 New Cases in กรุงเทพมหานคร / Bangkok on 2021-01-13:
38 New Cases in กรุงเทพมหานคร / Bangkok on 2021-01-12:
46 New Cases in กรุงเทพมหานคร / Bangkok on 2021-01-11:

0 New Cases in สมุทรปราการ / Samut Prakan on 2021-01-25:
12 New Cases in สมุทรปราการ / Samut Prakan on 2021-01-24:
2 New Cases in สมุทรปราการ / Samut Prakan on 2021-01-23:
3 New Cases in สมุทรปราการ / Samut Prakan on 2021-01-22:
5 New Cases in สมุทรปราการ / Samut Prakan on 2021-01-21:
3 New Cases in สมุทรปราการ / Samut Prakan on 2021-01-20:
1 New Cases in สมุทรปราการ / Samut Prakan on 2021-01-19:
3 New Cases in สมุทรปราการ / Samut Prakan on 2021-01-18:
1 New Cases in สมุทรปราการ / Samut Prakan on 2021-01-17:
3 New Cases in สมุทรปราการ / Samut Prakan on 2021-01-16:
14 New Cases in สมุทรปราการ / Samut Prakan on 2021-01-15:
6 New Cases in สมุทรปราการ / Samut Prakan on 2021-01-14:
17 New Cases in สมุทรปราการ / Samut Prakan on 2021-01-13:
13 New Cases in สมุทรปราการ / Samut Prakan on 2021-01-12:
9 New Cases in สมุทรปราการ / Samut Prakan on 2021-01-11:

0 New Cases in นนทบุรี / Nonthaburi on 2021-01-25:
1 New Cases in นนทบุรี / Nonthaburi on 2021-01-24:
1 New Cases in นนทบุรี / Nonthaburi on 2021-01-23:
3 New Cases in นนทบุรี / Nonthaburi on 2021-01-22:
2 New Cases in นนทบุรี / Nonthaburi on 2021-01-21:
0 New Cases in นนทบุรี / Nonthaburi on 2021-01-20:
2 New Cases in นนทบุรี / Nonthaburi on 2021-01-19:
2 New Cases in นนทบุรี / Nonthaburi on 2021-01-18:
2 New Cases in นนทบุรี / Nonthaburi on 2021-01-17:
0 New Cases in นนทบุรี / Nonthaburi on 2021-01-16:
1 New Cases in นนทบุรี / Nonthaburi on 2021-01-15:
3 New Cases in นนทบุรี / Nonthaburi on 2021-01-14:
2 New Cases in นนทบุรี / Nonthaburi on 2021-01-13:
0 New Cases in นนทบุรี / Nonthaburi on 2021-01-12:
36 New Cases in นนทบุรี / Nonthaburi on 2021-01-11:

0 New Cases in ปทุมธานี / Pathum Thani on 2021-01-25:
0 New Cases in ปทุมธานี / Pathum Thani on 2021-01-24:
0 New Cases in ปทุมธานี / Pathum Thani on 2021-01-23:
6 New Cases in ปทุมธานี / Pathum Thani on 2021-01-22:
4 New Cases in ปทุมธานี / Pathum Thani on 2021-01-21:
1 New Cases in ปทุมธานี / Pathum Thani on 2021-01-20:
0 New Cases in ปทุมธานี / Pathum Thani on 2021-01-19:
1 New Cases in ปทุมธานี / Pathum Thani on 2021-01-18:
0 New Cases in ปทุมธานี / Pathum Thani on 2021-01-17:
4 New Cases in ปทุมธานี / Pathum Thani on 2021-01-16:
2 New Cases in ปทุมธานี / Pathum Thani on 2021-01-15:
1 New Cases in ปทุมธานี / Pathum Thani on 2021-01-14:
15 New Cases in ปทุมธานี / Pathum Thani on 2021-01-13:
5 New Cases in ปทุมธานี / Pathum Thani on 2021-01-12:
1 New Cases in ปทุมธานี / Pathum Thani on 2021-01-11:

0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-25:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-24:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-23:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-22:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-21:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-20:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-19:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-18:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-17:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-16:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-15:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-14:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-13:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-12:
0 New Cases in นครปฐม / Nakhon Pathom on 2021-01-11:

0 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-25:
0 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-24:
0 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-23:
2 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-22:
0 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-21:
1 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-20:
0 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-19:
1 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-18:
1 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-17:
0 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-16:
0 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-15:
0 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-14:
0 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-13:
3 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-12:
3 New Cases in พระนครศรีอยุธยา / Ayutthaya on 2021-01-11:

0 New Cases in สมุทรสาคร / Samut Sakhon on 2021-01-25:
147 New Cases in สมุทรสาคร / Samut Sakhon on 2021-01-24:
163 New Cases in สมุทรสาคร / Samut Sakhon on 2021-01-23:
217 New Cases in สมุทรสาคร / Samut Sakhon on 2021-01-22:
29 New Cases in สมุทรสาคร / Samut Sakhon on 2021-01-21:
27 New Cases in สมุทรสาคร / Samut Sakhon on 2021-01-20:
138 New Cases in สมุทรสาคร / Samut Sakhon on 2021-01-19:
320 New Cases in สมุทรสาคร / Samut Sakhon on 2021-01-18:
335 New Cases in สมุทรสาคร / Samut Sakhon on 2021-01-17:
165 New Cases in สมุทรสาคร / Samut Sakhon on 2021-01-16:
99 New Cases in สมุทรสาคร / Samut Sakhon on 2021-01-15:
208 New Cases in สมุทรสาคร / Samut Sakhon on 2021-01-14:
35 New Cases in สมุทรสาคร / Samut Sakhon on 2021-01-13:
176 New Cases in สมุทรสาคร / Samut Sakhon on 2021-01-12:
80 New Cases in สมุทรสาคร / Samut Sakhon on 2021-01-11:

0 New Cases in สมุทรสงคราม / Samut Songkhram on 2021-01-25:
7 New Cases in สมุทรสงคราม / Samut Songkhram on 2021-01-24:
5 New Cases in สมุทรสงคราม / Samut Songkhram on 2021-01-23:
0 New Cases in สมุทรสงคราม / Samut Songkhram on 2021-01-22:
0 New Cases in สมุทรสงคราม / Samut Songkhram on 2021-01-21:
0 New Cases in สมุทรสงคราม / Samut Songkhram on 2021-01-20:
0 New Cases in สมุทรสงคราม / Samut Songkhram on 2021-01-19:
0 New Cases in สมุทรสงคราม / Samut Songkhram on 2021-01-18:
0 New Cases in สมุทรสงคราม / Samut Songkhram on 2021-01-17:
0 New Cases in สมุทรสงคราม / Samut Songkhram on 2021-01-16:
0 New Cases in สมุทรสงคราม / Samut Songkhram on 2021-01-15:
0 New Cases in สมุทรสงคราม / Samut Songkhram on 2021-01-14:
1 New Cases in สมุทรสงคราม / Samut Songkhram on 2021-01-13:
0 New Cases in สมุทรสงคราม / Samut Songkhram on 2021-01-12:
0 New Cases in สมุทรสงคราม / Samut Songkhram on 2021-01-11:

What is it?

Above are (a) results that answer 3 research questions about assessing risk of organising events and expected risk of infection when visiting public spaces and (b) results of cases in the past 14 days. The data source is the (mostly up-to-date) COVID-19-Daily open dataset on opendata.go.th published by the Digital Government Agency (DGA), with this dataset maintained by the Department of Disease Control (DDC), each are an arm of Thailand’s government. The currently available official data services [1], [2], [3] and the map (currently 11 days out-of-date) at [4] do not give a simple way to assess risk for specific events at venues or places within a province or district. This project aims to provide public with a measure of risk for fine-grained activities planning, with special focus on locally-adjacent provinces.

Research Questions

  1. By province, how many new cases are announced?
  2. By province, how many days since last new case was announced?
  3. By province, how many “key” clusters exist (excluding state quarantine and arrivals ASQ/ALQ) and how many cases are there?

Specific column data used: “announce_date”, “province_of_isolation”, “risk”.

Future Work for Risk Assessment:

  • By province, itemise the key clusters and their number of new cases over the past X days.
    • (3,7,14, risk of new cases lowers after 14 days).

What are other countries doing to deliver risk assessment to public?

Other risk assessment metrics exist (e.g. covidactnow.org or panditpranav), which take account of testing and vaccination data to give a probability of infection risk. I’m not yet aware of whether that data is released by Thailand’s Digital Government Agency (DGA), but that’s possible for the future.

Next, Get Involved? / What is Next To DO?

In the near future, I will try to make use of the district, as well as province information to help get a better sense of risk levels and risk-place associations; yet any public and busy spaces nearby can easily be considered “at risk”. Certainly, Future Work (see subheading) includes adding the accumulated key clusters over the past 3, 7, 14 days. This will give a good sense of gradually lowering risk for a province, which can be accumulated with the adjacent province data too. I’ll aim to add those and maybe some more.. If you have ideas that can improve on this, feel welcome to say.

For me personally, I would like to see a daily morning email (or real-time alert) in my inbox, so I’ll look into making this an email subscription service. If that’s interesting too, just let me know if you’d like to be added to the list.

ON MEASURING MACHINE LEARNING MODELS AGAINST CONCRETE BUSINESS OBJECTIVES

REVIEW NOTES: DATA SCIENCE FOR BUSINESS BY PROVOST & FAWCETT: CHAPTER 7

I enjoyed reading this chapter. It’s insightful and well explained with detailed examples, diagrams and graphics, on a few data science topics that correspond directly to conventional scientific research in computer science. That makes me happy, because these are crucial points, yet rarely are the focus of Kaggle Competitions, books on Machine Learning or Statistics, the latest and greatest in TensorFlow, PyTorch, AutoML libraries (etc, etc) and too infrequently discussed in DL/AI/ML social posts and blogs. Below I have written about the points that are well worth taking home. These topics are broadly on:

  • Careful consideration of what is desired from data science results.
  • Expected value as a key evaluation framework.
  • Consideration of appropriate comparative baselines, in machine learning models.
Continue reading “ON MEASURING MACHINE LEARNING MODELS AGAINST CONCRETE BUSINESS OBJECTIVES”

TIL: Fixing ‘file not found’ dependency libraries in Linux

In Ubuntu (Debian/CentOS, and the like) apt is our go to CLI application package installer. It handles everything in a single iconic command that every Linux user knows:

sudo apt install <packageName>

Sometimes, and I still don’t get why or when, a package’s shared library (dependency) is not installed.

For example, this happened today for me with MySQL-Workbench. I run it on the CLI and it shows a dependent library is missing or can’t be found, and up throws an error message like:

$ mysql-workbench
/usr/lib/mysql-workbench/mysql-workbench-bin: error while loading shared libraries: libgdkmm-2.4.so.1: cannot open shared object file: No such file or directory

key points of THE FIX are:

1. Ensure the GNU locate database(s) (e.g. mlocate, slocate), are up to date with current information about file locations.

sudo updatedb

2. Ensure the file exists. (No print out means no file found)

locate libgdkmm-2.4.so.1

3. Reinstall the file if missing.
Here using -f for force install dependencies,, and --reinstall for force reinstall (if already installed).

sudo apt-get install -f --reinstall libgtkmm-2.4-1v5

4. Ensure the application configuration is looking in the correct location for the shared library files.

Today I didn’t need this. But essentially, to run MySQL-Workbench on Ubuntu uses a !#/bin/bash ELF file containing a script of commands to execute prior to starting the application binary. In that script, the following environment variables can be used to define the configuration locations export MWB_BINARIES_DIR=xyz and export LD_LIBRARY_PATH=xyz.

In my case, the application script use those environment variable values on the line that executes the runtime binary, as linker library address(es) to the corresponding shared library files written in C/C++. In interpreted language applications those environment variables values might be used as environment arguments into the executed code (for example in Python) or as library classpath addresses on the runtime execution line (for example in JIT-Java). Alas, I didn’t need to change those locations from default, but that’s how it works.

That’s how to resolve missing shared library dependencies in Ubuntu (and Debian/CentOS, etc).

On Measuring the Senior, In Senior Software Engineering Roles

Labelling “Senior”, “Mid” and “Junior” roles of software engineers comes up from time to time in the developer and programmer forums. While I’m not a fan of labels for people or groups of people – Seniority and Skill/Knowledge/Ability Levels get to me because they are so ambiguous. So it is down to us to contribute and discuss to reach a clear definition.

A truth of seniority, across all genres, is group-wide effect. It’s leadership, it’s empathy, it’s improving the individuals and the group as a whole for the group’s common interest. It’s a positive improvement, it’s team-wide developer productivity and overall business-wide productivity improvement. But what does that mean for Developers and Software Engineers?

Continue reading “On Measuring the Senior, In Senior Software Engineering Roles”

Adding to the Conversation on Data Science Training: Looking into the Future

2020 May 9th.

Please note: this is a temporally relevant article – it’s likely to be wrong immediately after it was written, however I publish it as it marks a step of the process. The below is my response to an expression of consideration on how to teach and how to learn Data Science in order to be most effective (as an employee, as a service to businesses and as a service to society). A key aspect raised during the discussion was on the consequences of focus upon domain expertise and of focus upon technical expertise, and of focus spread between both areas of expertise. My reply below (I believe adds a valuable addition and) helps guide the definition of Data Science teaching, learning and the ongoing strategy involved in continuous “lifelong” learning; or as long as Data Science remains as it is. I concede that the view presented below could easily have included many of other influencers to guide the viewpoint, more citations, viewpoints, argument points, evidence examples. But this is the nature of conversation imposed by a time limit. So, here goes:

Continue reading “Adding to the Conversation on Data Science Training: Looking into the Future”

Book review 2/2 on Robot Proof: Higher Education in the Age of AI

I finished the book by Joseph Aoun a little while ago, and I’ve been sitting on my notes letting them stir. I think i have a fairly safe conclusion for its second half. That said, I would expect those with an understanding and empathetic relationship with their CS students and their families will have been at the cusp of some similar conclusions drawn by Aoun in Robot Proof in 2017.

Continue reading “Book review 2/2 on Robot Proof: Higher Education in the Age of AI”

Open Source Code for Light Stage Capture Sequences

Today I’m posting updates (1/n) to the Light Stage open source project codebase.

The updates mark improvements for integrating experimental result data and 3d geometry data with light and camera-trigger hardware controllers (3). Included are two new lighting sequence improvements (1) and (2) and a way to get started, no matter your stage design and target capture application (4). These changes contribute towards standardised capture sequences and integrated 3d reconstruction pipeline processing, while supporting stage design tools and retaining visualisations, measurable evaluations and optimisations at each step.

Altogether, this work takes a step towards the vision of a comprehensive open source framework for open hardware light stages, find more details at the Build a Light Stage website.

These recent updates to the LightStage-Repo on github include:

  1. Spherical gradient” lighting sequence.
  2. Balanced lighting baseline”.
  3. Local web service (on port 8080) to return data requested by an HTTP client, such as a hardware controller with Ethernet/Wifi module.
  4. Configuration file designed for each Light Stage, to easily get the web service responding with correct sequence data.
Continue reading “Open Source Code for Light Stage Capture Sequences”

Update on Lightstage Project

In this post, I took the liberty to write some of my thoughts and reflections on why Lightstages are (“pretty cool in my book” and also) relevant amongst today’s cutting edge developments in machine learning and data-driven decision making.

Over the last few years, I’ve had the opportunity to work as a researcher on the Aber Lightstage project, under Dr. Hannah Dee. Back then, I wrote a Python-OpenGL-based application to help us visualise and numerically evaluate lighting positions on our stage — the project is open source and on Github. Dr. Dee had successfully raised a bit of funding to bring together a team of engineers, researchers and advisors, each offering their specialist skills and knowledge to the project, and I got the chance to get involved.

Continue reading “Update on Lightstage Project”