On the N Days Of Christmas, part 5

Sometimes, you need to visualise data to understand it.

Here is a plot of the number of days vs the total number of gifts, ending with our favourite, traditional data point: 12 days vs 364 total gifts.

Drawing a straight line through these data points isn’t possible, so the relationship between days and total gifts is not linear. There’s clear upward curvature.

What we need to do is create a model of the data.

A first reasonable thought is that the data may represent exponential growth, something we’ve all become accustomed to in the age of COVID. That turns out not to be the case here, as a little reflection will show; there is not a pattern of doubling or tripling, except for the first few days.

There are numerous models we could use, some more complex than others, but to cut a long story short, a polynomial model turns out to fit the data best. Perfectly, in fact. In particular, the model is a so-called polynomial of degree 3.

But I’m getting slightly ahead of myself.

Spreadsheets like Excel, Numbers, and LibreOffice allow a plot and model to be created, much like the ones above, for which I used the R programming language. Here’s what this looks like in Excel:

There are a few things to notice.

The first is the equation at top right of the plot: the 3rd degree polynomial, 3rd because the largest exponent to which the variable x (day number here) is raised is 3.

The second thing to notice is R² = 1. This R-squared value is a measure of the “goodness of fit” of the model to the data.

A polynomial of degree 2 is not a good fit, and adding more terms, e.g. to give a 4th degree polynomial, doesn’t improve the fit.

The third thing to notice is the column in the table at left titled Total (model). This is the output of applying the polynomial model to numbers in the Day column. It’s almost the same, but not quite the same as the Total column, which is the actual total number of gifts we’ve been computing through different means all along.

The R-squared goodness of fit measure is very close to 1 (perfect fit) but not exactly. It appears that it’s being rounded up. This happens no matter what spreadsheet software you use. Not only that, but the actual model equation itself differs across software used. LibreOffice gave the smallest error of the spreadsheets I tried because the model had constant multipliers in each term with more decimal places than Excel or Numbers.

So, what’s going on? The answer is that in a computer, real numbers are most often represented using some variant of so-called floating-point format. The dominant standard for floating-point numbers is IEEE-754, created in 1985. It’s very successful for general purpose computation, but errors of various sorts can creep in such that the result is an approximation. The topic of the representation of real numbers for computation would take us down another rabbit hole.

The question is: can we take this approximate model and make it exact? The answer, as Bob the Builder might say is: yes we can!

The coefficient (constant multiplier) 0.1667 in the first term is just an approximation for the rational number one sixth, a rational number being an exact number represented as the division of two integers (whole numbers), i.e. a fraction.

The coefficient 0.5 in the second term is of course just one half, another rational number.

The coefficient 0.3333 in the 3rd term is just an approximation for the rational number one third.

The 4th term 2E-11 is engineering notation for 2^-11 which is 0.00000000002, a very small number, added to the result, again, to adjust for the fact that the model coefficients are approximations of exact rational numbers.

Given this, the model simply becomes:

I’ve used n instead of x here, since this is the variable name we have used all along to denote number of days.

This turns out to give answers that are in perfect agreement with the other methods we have been using to compute the total number of gifts for a given number of days.

As the statistician Paul Box is reported to have said:

All models are wrong, but some are useful.

Sometimes they’re actually not wrong at all, as in this case. The initial polynomial model was “wrong but useful” but the simplified form is exactly right.

Even though the model was created using data for only the first 12 days, it works for larger numbers just as our other methods do, making it truly predictive, something that not all models can claim to be. Admittedly, as models go, this is a simple one.

Here is the working for f(12):

which in words is: one sixth of 12 cubed, plus one half of 12 squared, plus one third of 12, which evaluates to 288 + 72 + 4 = 364.

This polynomial of degree 3 corresponds to one sixth of a 12x12x12 cube (volume 1728) + one half of a 12×12 square (area 144) + one third of a line 12 units long.

This function, f(n), is the most compact, general expression for computing the total gifts for the Nth day of Christmas I’ve found so far.

It also turns out to be the fastest to run on a computer, since no matter how large n becomes, the function executes in constant time, O(1), meaning that computation time is not sensitive to the size of n.

The best we have been able to do before now is linear time, O(n). The last two posts have discussed computational complexity in the context of this problem, so refer to those for more detail.

The actual run-time (at least on my M2 Mac) is around one millionth of a second, for any value of n.

As in the last two posts, I wrote code in the Julia programming language to implement this function. Unlike many programming languages, Julia supports rational numbers as a first class data type. So, one way to write f(n) in Julia as daysofxmas(n) is:

function daysofxmas(n)
    1//6*n^3 + 1//2*n^2 + 1//3*n
end

where 1//6 is the rational number (fraction) one sixth, for example, and n^3 is n³.

Here’s an example in which the function is applied to 12:

julia> daysofxmas(12)
364//1

The result is literally the fraction 364 over 1, simply because we were very explicit about using rational numbers as coefficients. While that’s an exact answer, we know that the answer really is just a positive whole number. We can convert the result to a suitable integer data type:

function daysofxmas(n)
    Int128(1//6*n^3 + 1//2*n^2 + 1//3*n)
end

A sample run of the modified function gives the simple integer result we’re after:

julia> daysofxmas(12)
364

A reminder from previous posts that the intention is to generalise the gift total calculation to any positive number of days n, i.e. not just n = 12 days, just because we can, not because we have to! 🙂

The final twist is that integers in programming languages have limits, so that performing operations like exponentiation and multiplication on very large integers as we do here can lead to integer “overflow”, resulting in answers that seem to make no sense or give errors:

julia> daysofxmas(1000000000)
ERROR: InexactError: Int128(-1965449412722243072//3)
...

This can be solved by converting the data type of n itself (the default integer type in Julia is Int64, i.e. a 64 bit integer) to a larger integer type so that intermediate operations within the expression don’t overflow, at least where the value of n becomes too large. This also achieves the rational to integer type conversion added in the previous version of the function. The modified function is as follows:

# 9 
function daysofxmas(n)
    if n > 100000000
        n = Int128(n)
    end

    1//6*n^3 + 1//2*n^2 + 1//3*n
end

giving the correct, yet ridiculous, total number of gifts for n = 1,000,000,000 (one billion) of 166,666,667,166,666,667,000,000,000 or approximately 167 million billion billion:

julia> daysofxmas(1000000000)
166666667166666667000000000

While writing this post, I realised that implementations of the daysofxmas function in the previous 2 posts fail to give the correct answer beyond values of n of around 100,000,000. For n = 1,000,000,000, instead of an error like the one above, the total number of gifts given is 2.4 billion billion by these functions instead of 167 million billion billion. The reason, integer overflow, is related to the error encountered above in which the use of rational numbers led to an error instead of an incorrect result due to the vagaries of the way arithmetic operations work in programming languages and the hardware they target.

The next part will summarise the different methods, results, and Julia function run-times, taking into account the correction necessary to give the correct result, even for large values of n.

Beyond this, there appears to be some kind of link between triangular numbers and the polynomial model presented here. This will be explored in the final post. First, part 6 summarises the trip we’ve been on.

This entry was posted on March 10, 2024 at 9:32 pm and is filed under Mathematics, Programming. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Strange Quarks