As part of the SwissEph.Net project, I’ve got some user feedbacks about diffrences in the results of calculation between the .NET library and the original C DLL.
As I have not had feedback from users about used code and diferences found, I spend some time to create a project for testing and compare result values between the differents versions: https://github.com/ygrenier/SwissEphNet.TestsAndCompare.
Once the tests are done, I found there is a difference in floating point calculations in .NET, but not between C code and .NET, the difference is between 32bits and 64bits of the .NET code.
The application define some tests and save the result in more values.
For each platform, the tests are executed and exported to a CSV file.
Then an Excel file was created to merge results and compare the values.
So the tests was performed on my Windows 10 64-bit machine.
You can find the files in the « tests » folder of the repository.
The result of comparaisons between the differents platforms are following:
We can see there is no differences between 32/64-bit C DLL, but there is a difference in results with .NET 32-bit version (the 64-bit version have same results than the C DLL).
First of all, I recall the floats are an approximation, there are always some small errors in calculations.
There format are specified by the standard IEEE 754 and the .NET use for the
float type (
System.Single) a float based on a 32-bit, and for the
double type (
System.Double) a float based on a 64-bit (the
decimal is stored in an another format).
The floating calculations are quite costly in CPU, that is why we created specific coprocessor named FPU (Floating Point Unit). For a long time the FPU are included in the CPU.
Since few years too, CPU include new instructions such as SSE instructions that optimize the calculation, especially the about the floats.
Here our problems start 🙂
The FPU and SSE instructions do not calculate the floats in the same way. The FPU make calculations on 80 bits, while SSE compute on 32 bits (or 64 bits from SSE2).
So if we use the FPU we get a better precision, but as we use
double (stored in 64 bits) the calculation are truncated to go 80 bits to 64 bits.
This is not the same with the SSE instructions because we are in the same size.
This makes that depending the type of instructions used (FPU or SSE) you will not always get the same values. The difference is usually quite low, but in a complex calculation these differences can muultiply and give considerable discrepancies.
This is one of the reasons why when we do scientific calculations are made, we used specific mathematical libraries that reduce this risk by using fixed point floats.
Why this problem is not present with C library ?
Because the C code is compiled to native code, so it’s in the compilation we decide how the floats are calulated. In the Swiss Ephemeris case, I think the DLL is compiled in the same way in 32-bit and 64-bit. So we get the same results.
And why there is a problem in the .NET ?
I think you are beginning to understand why there is a difference of calculation in the .NET library: the 32-bit version don’t used the same instructions than the 64-bit version.
To be clear, is the runtime that will compile Just-In-time le CIL code when executing the application, and who will decide to use the FPU if we are in 32-bit or the SSE instructions is we are in 64-bit (and if the SSE are available).
So because this is the runtime, we can’t change this behavior (in any cas not to my knowledge), et some other runtime implementation (like Mono) may have an another behavior.
To resolve this problem, simply don’t use
double and instead of use the
decimal or a third part library using is proper calculations without FPU/SSE influence.
decimal are floats encoded in 128 bits, and optimised for financial calculations (reducing the rounding problems meets with floating-point). There are a better precision but are a smaller range.
Now don’t be suprised if you find difference calculations in .NET with differents platforms.
If this behavior is a problem so you can:
– fix a platform 32/64 bit without permetting choice (this could be a problem for a library)
decimal if you can
– using third part library resolving this problem
See you soon,
PS: this article is the English translation of the french article