On Estimating Thailand’s COVID-19 Infections and Infection Fatality Ratio (IFR)

**UPDATE 2 – 17:00, JULY 19TH 2021**

The Infection Rate data on this page is dated in May ’21 and you may prefer to find newer data. Dylan Jay releases a daily plot for Infections vs Recorded Cases across Thailand (using a similar approach) and offers commentary on other infection model estimates for Thailand, I can recommend both.

For Infections by Province in Thailand, this page’s method can help to replicate the procedure. This remains a “reasoned strategy” to model infections within each of Thailand’s provinces and under the condition to statistically estimate Thailand’s missing data required by Imperial College London’s projection model using IFR.

On Estimating Thailand’s COVID-19 InfectionS and Infection Fatality Ratio (IFR) :

MAY 11, 2021

This post records the method used to create an estimate for Thailand’s Infected Cases nationally and by province, by using the IFR metric, quantities of deaths in Thailand, combined with known provincial demographics to estimate Thailand’s (unreleased) COVID-19 related death ages. Infected Cases differ from actual Recorded Cases in that they offer a view on whether more cases (i.e. asymptomatic) are likely to be found. This work was sparked by Matt Greenfield and Dylan Jay’s discussion. Obviously, there’s too much to fit in a tweet.

In the article, we go through (1) how readers can directly calculate their own estimates for Number of Infected Cases from our data, followed by (2) method taken to reach the results, then (3) discussion, limitation and conclusions. There are plenty of caveats described in this article, so take a look at these before drawing conclusions.

How to Calculate Number of Infected Cases (est.) in Thailand from IFR

After a little further effort sparked by Dylan Jay, we have reached an agreeable answer (final breakthrough by Matt Greenfield) to translate these IFR scores in the righthand side plot below into Number of Infected Cases (as lower and upper bounds). The plot data presentation is not easy to follow, so this first article section with calculations and helpful explanation makes that more straightforward.


(Not suitable for reproducing or sharing).

(link to full image (left)) (link to full image (right))

Nationally – Calculate (Est.) Number of Infected Cases

Thailand’s total deaths is 336 for our dataset in the date range 2021-03-02 to 2021-05-10. The reported IFR is [0.015 to 0.025]. In this case, the plot is showing the mean IFR from all provinces (/77). So we must reverse that mean average to get the sum, then proceed to calculate estimated infected cases, as follows:

6,304 Cases are Reported above mean average Estimated Infected Cases for all of Thailand.

This is the Min, Max and Mean expected Number of Infected Cases, for the date period.

We calculate the difference from Recorded Cases (58,048) to decide whether more (or less) cases are expected to be found. It looks like Recorded Cases are in the upper half of the range, so case finding looks good across Thailand.

Provincial – Calculate (Est.) Number of Infected Cases

This method applies for all provinces. Bangkok is our example. Bangkok’s total deaths is 148 for our date range. The reported IFR is [0.443 to 0.753]. In this case, the IFR is multiplied by x100 (I’ll fix it in a few days). So we must reverse that to get a calculable IFR, before calculating Infected Cases as follows:

Shortfall of 5,750 Reported Cases below mean average Estimated Infected Cases for Bangkok

Again, Min, Max and Mean expected values of Infected Cases are shown for the date period.

We can calculate the difference from Recorded Cases (20,781). For Bangkok, it looks like the Recorded Cases are on the lower half of the range; therefore, it looks like many new cases are still to be found in Bangkok.

See the righthand side plot and follow the calculation, to discover Estimated Number of Infected Cases for one of the provinces.


The procedure to reproduce the models of IFR and Infection Rates for Thailand and its Provinces has the following caveats and assumptions:

  • The Cases and Deaths data is from the cases by province dataset, which holds it’s first recorded death associated to a province on 2021-03-02 and it’s latest recorded death on 2021-05-10. Only this range’s cases and deaths are used in this analysis. In this dataset’s range, there were 58048 recorded cases and 336 recorded deaths. For reference, in the covid19.th-stat dataset between 2021-03-02 and 2021-05-10, there were 58,932 (85005-26073) reported cases, a discrepancy of 884; and 337 (421-84) reported deaths, a discrepancy of 1.
  • This National Statistics Office Age distribution dataset was used, extracting 2010 data only. When it was created Bueng Kan didn’t exist, so I copied Nong Khai’s data (Bueng Kan was split from Nong Khai in ~2017, so it is likely it held the same age distribution) and used a normalised age distribution, per age bucket. 

  • In the Age Distribution dataset there are 17 buckets. 0-5,…. 75-79 and 80+. I split 80+ into 80 to 85, to match epimonitor’s IFR analysis‘s table of age distributions and risk levels.

  • Therefore, I have split the deaths based on the age distribution per province, and that effectively gives the green line. In fact the green line is further multiplied by the IFR Risk % to bring it closer to COVID-19 death likelihood; yet this has a smaller visual impact.

  • From epimonitor’s IFR analysis‘s table of age distributions, I interpolate linearly between the known age and Risk% values, and extrapolate (backfill) to 0 with a value matching the earliest entry (i.e. 5yo to 0yo each have equally likelihood of passing).

  • Taking account of epimonitor’s IFR analysis‘s table’s Risk% dependency on 100.000 cases, I calculate the factorisation to bring recorded Deaths and Cases in-line with the Risk% (per100k). (NB: Applies to Lefthand Side Plot only)

At this point, I separate the Min and Max of each age bucket’s record and allocate the appropriate (interpolated) Risk% to each, thereby giving each row (e.g. the 10-15yo bucket) a specific minimum Risk% (for 10yo) and specific maximum Risk% (for 15yo). Effectively, gives us the ranging data between the orange and blue lines and upper/lower bounds estimates relative to 100,000 cases.

Next I reduce the Min/Max Risk% estimates from 100k to the provincial relative numbers. As each of the Min/Max Risk% values are relative to 100,000, I normalise these. Then apply the province’s normalised age bucket population factor, which is proportional to the province’s true population. Then finally, having factorised the case/deaths (to/from 100k), I use this to reduce the Expected IFR (death) rates (in orange and blue) down to our province’s relative death rates. This is just the factor of 100k/case numbers.

This gives us a reasonable ranging estimate for expected deaths and expected age ranges for those deaths, relative to the Province’s age distribution.

Righthand Side Plot – IFR

In the righthand side plot (above), an Estimate for IFR is reported, as defined in the equation below. This is calculated from the number of deaths split over the stratified age distribution per province (as described above) and multiplied by the corresponding Risk% column value from epimonitor’s IFR analysis page. These score values are summed, giving IFR as a percentage. This calculation of IFR has not been x100.

The estimated national IFR is shown in min/max range: [0.015-0.25]. The individual province IFRs are shown similarly, for example: Bangkok: [0.443-0.753] and Chiang Mai: [0.049-0.083].

Within Imperial College London’s (ICL) Oct 2020 article on IFR, the authors indicate that a value of 1.15% (0.78-1.79) and 0.13% (0.14-0.42) were normal at the time of writing for “high income” (more elderly people) and “low-income” (more young people) countries.

ICL state “the infection fatality ratio (IFR) is a key statistic for estimating the burden of COVID-19”, so perhaps higher means more burden; higher risk of death due to greater population of elderly.

Using ICL’s article values to calibrate our understanding, we can position Bangkok’s IFR and nationally, Thailand’s IFR within their range of “high to low income”.

Crucially, the IFR let’s us estimate the Number of Infected Cases.

The WHO’s article show an IFR calculation, similarly to how I have described in the equation above, though, with interpretation. As stated, we lack both (i) Age distribution of deaths data and (ii) random samples of test data, in order to calculate this precisely. The WHO article also indicates the difference between measured Case Fatality Rate and Infection Fatality Ratio, as differentiated by antibody testing. As far as I am aware, in Thailand’s Publicly available data, there is no available serological testing data from random samples (please leave a comment if you have access to such data).

LEFTHAND Side Plot: A weaker Estimate

On the lefthand side plot, we’re showing gaining an view on reporting of deaths. By taking the mean and min/max of the Expected IFR Death Distribution, we gain a ballpark figure for where we expect our green line (estimated age distribution of deaths) to fall between. It doesn’t because we do not have the true patient death ages and the (case/death) numbers are too small to be accurately representative. But there is some redemption.

The true distance of the (i) recorded province Deaths sum**  from the (ii) Expected IFR Mean average sum** is the large +/- number within each of the province plots (**sum over age distributions). This difference is most interesting, because the Expected IFR Mean is directly from the larger (trusted) population. This difference from Recorded to Trusted is a good measure for indicating whether the values are above or below the globally expected deaths; and are relative per province.

However, this distance measure is calculated with (dependent on) actual Recorded Case numbers, and not Infected Case numbers; which affects how we might use this number.


The weakness here is in the green line. It shows an estimate of deaths over the continuous age range. We don’t have the actual COVID-19 death ages per province, at this time. In many provinces, no deaths have been recorded, so the age distribution data (death by age distribution), has a second degree of imprecision (the green line is flat). In the righthand side plot only, for provinces without any deaths data, we have no information on their IFR; and many provinces are in this condition.


The righthand side plot the IFR is shown. These IFR figures correspond approximately within the expected range published by ICL. According to their IFR ratings, Bangkok falls into the upper half of “low income” (more young people) countries and nationally, Thailand falls to the lower 3rd of “low income” countries.

Most importantly, the IFR let’s us estimate the Number of Infected Cases per province and across Thailand, to decide whether the Recorded Cases represent all the cases out there to find. On this front, Thailand is nationally doing okay (above the estimated mean).

In the lefthand side plot, the Expected IFR and differences from it are a useful guide for whether nationally, and in each province, the reported deaths match global expectations. One should consider the three presented measures of difference: mean average, min and max. The “true” value is somewhere near those ranges.

Considering all the caveats, these results indicate that Thailand is nationally +69.7 deaths (from Expected IFR Mean), and sitting at the maximum bound. We might interpret this as the reported deaths are matching those expected at the upper limit. Too many deaths? Possibly. But perhaps more likely is that this *might* indirectly indicate that case finding has sufficiently led us towards the global expected death rates. Which is good for the staff working on case finding and reporting. However, I am not tying myself to these conclusions, this is “back of envelope” stuff.

Without the age of deaths, the sums are slightly miscalculated; even though we have corrected for (i) province age distribution in IFR and (i) and (ii) Expected IFR risk % in the deaths projection estimates. The lacking evidence of serological testing data from random sampling also puts in question, whether these figures would be prone to further change given complete information.

(**Disclaimer: I do not claim to be an expert on the intricacies of COVID-19 infection metrics or public health data. I claim the code as accurate, the modelling process as well reasoned, the presentation of IFR ought to be improved in future and presented as Infected vs Recorded Case. Thanks for reading. Thanks to Dylan Jay and Matt Greenfield for the collaboration.)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.