CGM accuracy – Calibration is King!

Over the years I’ve spent some time living with multiple CGM systems:

  • Dexcom G4 and G5
  • Medtronic Guardian2/Enlite (640G pump)
  • FreeStyle Libre
  • FreeStyle Libre with Spike/xDrip+

Incidentally for the purposes of this technical discussion, the Libre is a Continuous Glucose Monitor. Any lack of alarms/etc does not change this, even if “Flash” monitoring doesn’t have the same end features as other CGMs.

I think it’s worth discussing the accuracy of these. Manufacturers like to quote “MARD” (which is an indication of how far away from a lab reference they are on average), and we see various people in peer-support forums saying things like “I love brand X: it’s always accurate” or “brand Y was terrible: often way off.”

Unfortunately it’s hard to make objective judgements when you’re only dealing with one CGM.

As well as using some of these systems individually, I’m currently part-way through an exercise where I’m wearing three systems concurrently. This has meant I can compare them throughout the day, not just when I decide to do a fingerprick.

I plan to write up the results of this in a later article once it’s complete, but I can at least make some comments already.

Any CGM can only be as good as its calibration

I’ve discussed in the past (after an exercise where I wore Libre and Dexcom G5 systems concurrently) how the Libre Reader’s algorithms don’t always match fingerpricks, but when you feed the raw data into a CGM such as Spike or xDrip+ and apply calibrations, it can be similar to the Dexcom in accuracy. See “How accurate is the Libre Reader?“.

The Dexcom G6 promises no need for calibrations (but does allow you to apply calibrations if you need) and feedback from users in other countries is very positive. But in this discussion I’m comparing the following systems:

  • Dexcom G5 (using the transmitter’s internal calibration)
  • Libre through xDrip+ (using a Nightrider device as the interface, but the results are the same whether you use LibreAlarm, MiaoMiao, BlueReader, etc). Also Dexcom G5 through xDrip+ is the same. Spike’s calibration routines are similar to xDrip+’s.
  • Medtronic 640G (Enlite) CGM

I have been calibrating these systems at the same time, with the same fingerprick value (obtained through Contour Next One or Accu-Chek Guide meters, as discussed in “Do you trust your meter?“). The BG meter is not guaranteed to be 100% accurate, but it’s important to use a device that’s as accurate as possible. Any error in the BG reading can be amplified in the CGM calibration.

How does calibration work?

There’s lots of research and science that goes into these algorithms, and what I’m going to describe is very much a simplification of the calibration routines in these systems. But the basic concept applies to all, so it’s worth getting your head around it.

The sensor returns a “raw” value, and the calibration routine has to translate this into an equivalent glucose value. Generally as the raw number increases so does the glucose number, but the trick is to work out at what rate. Each sensor can return slightly different ranges of values, and in fact during the life of the sensor the results can change (e.g. as the chemicals in the sensor are used up).

Consider this graph, with several calibration points and a line that the calibration algorithm has chosen to fit them:

Using this line, if the sensor returns a raw value of 50 (on the left side of the graph) it’s obvious that the CGM is going to translate that to just under 3 mmol/L.

The line isn’t always going to be a perfect fit for the points recorded, but hopefully they’ll be fairly close to the line:

But what if we only have one calibration point? How does the algorithm decide where the line goes? How steep is it? It has to guess.

In this example the chosen slope has ended up a bit less steep than before, and now a raw value of 50 would be translated to a glucose value of over 3 mmol/L.

Calibration is rarely accurate until you have multiple samples, at different points on the graph!

After I supply the initial calibration for a sensor, I need to remind myself that until I supply more calibration points, the further away from that calibration level I am, the less accurate the CGM is likely to be.  Once I enter a second point it improves a lot, and a third point usually improves it further.

“Good” calibrations are essential

But just adding a second point doesn’t necessarily fix the problem. Any error in the value can in fact make things worse. Consider this graph, where two points of 4.9 and 5.1 have resulted in a line almost identical to the one I initially showed.

But there’s a time lag between a blood glucose value and when that flows through to the interstitial fluid the CGM is measuring. If my BG was rising/falling when I sampled it, the value measured by the CGM won’t be a good match. And our BG meters can introduce their own errors through lack of precision. Hopefully my fingers were clean as well.

Also keep in mind that some of these systems can take 5-15 minutes to process the calibration. If your glucose levels change dramatically in that time this can introduce further errors.

Consider this graph where each of those points was only different by 0.1 mmol/L. Such a small difference that we would generally regard it as an insignificant change.

It should be immediately obvious that any raw value of 50 would now be translated into a glucose value of 1 mmol/L (instead of just under 3). It’s very annoying being woken up by hypo alarms when you’re not actually low! At the same time, high values will be exaggerated with this graph.

Incidentally, my observations of the “calibration-free” Abbott algorithm in LibreLink and the Libre Reader is that it seems to result in a steeper-than-actual calibration slope. Both hypos and hypers are usually exaggerated (which is probably safer than hiding them).

Older calibrations are ignored

As the sensor ages the calibration will usually vary (some systems more than others) so fresh calibrations are important.

The details of this vary between systems, but this seems a reasonable summary:

  • The Medtronic CGM only uses the last four calibration points.
  • The Dexcom G5 only uses the last three points.
  • xDrip+ uses as many as you give it. By default it ignores calibrations older than a day, but you can allow it to use older points if there aren’t enough new ones. You can see the calibration points on a graph (similar to shown here) and easily judge if a recent point was a long way away from the line. And you can delete bad points.

The Medtronic and Dexcom systems do not allow you to easily identify or delete a bad calibration point. The most straightforward solution is to work the bad point out of the system by supplying 3 (Dexcom) or 4 (Medtronic) new calibrations. Again ideally at varying BG levels, but each when your levels were stable. This can be a pain!


All these CGM systems can be accurate. But only if they’re calibrated well. Poor calibrations will stuff up any system! Unfortunately this is a major “pain point” of current CGM technology.

But if you keep these concepts in mind you can turn most CGMs into very useful tools.

  • Only calibrate when your levels are flat, and likely to remain flat for a while.
    Don’t get stuck into a meal immediately after applying a calibration test.
  • Keep in mind that the systems need more than one calibration point to be able to extrapolate well.

My own closed-loop pump system makes insulin dosing decisions off the values coming out of my CGM. I do have a lot of trust in the system, but I know that I need to keep an eye on the calibration.

5 thoughts on “CGM accuracy – Calibration is King!”

  1. Thanks for this post. Very illustrative. Keep posting.

    I’m starting to compare LibreLink & LibreReader with the glucose levels reported through XDrip+ using MiaoMiao. I’m calibrating the Xdrip with BG finger pricks. My experience so far is that Xdrip+ measures are much more accurate than the ones got through LibreReader being these last ones ‘higher’ in general, both for hypos and hipers.

    This behavior doesn’t much with your appreciation above: “Incidentally, my observations of the “calibration-free” Abbott algorithm in LibreLink and the Libre Reader is that it seems to result in a steeper-than-actual calibration slope. Both hypos and hypers are usually exaggerated (which is probably safer than hiding them).”

    If the calibration slope in Libre is steeper than actual (BG) calibration, this would lead to show higher glucose values both for hypos and hiper given the same raw input data. I don’t understand why you say this. Steeper calibration slopes would exaggerate hyper values (which is not good) but also hypos values which is even worse.

    1. A steeper slope can result in higher highs AND lower lows.
      Consider a see-saw that pivots around a point near the centre of the graph. When on one side it goes higher, on the other side it goes lower.
      The simplistic graphs I showed here can be defined by two values. The “slope” (angle) of the line, and the vertical position (offset).

      1. Got it! “Consider a see-saw that pivots around a point near the centre of the graph” –> Yes, that is what I thought once I wrote you.
        On the other hand, I already have two weeks of observation of Libre Reader vs Libre/Xdrip+ via MiaoMiao and effectively, I am starting to appreciate quite considerable differences in the readings when low or high being higher the hypers in Libre Reader vs Libre/Xdrip+ and the other way around in the hypos (lower in the Libre Reader vs Libre/Xdrip).
        As you very well mentioned in other post, either you should have only one reader or three, but not two 😉

  2. xDrip+ delays applying calibration by 15 minutes. This means calibration points generally match the interstitial reading when it’s finally applied. Because of this it’s safer to calibrate if your trend is not exactly flat (although it’s still recommended). This also should result in a better calibration profile in most circumstances. The only exception is the initial calibration is applied immediately, which is frustrating, so after a few more calibration points I look to see if it makes sense to delete these initial points.

    Being able to delete *bad* calibration points is a useful feature.

    Also, xDrip uses calibration points from the previous 3 days, not one day as you said in the article. But as you indicated it could be set to use more.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.