Ever since I went on a 2 day trip to London, where I read an article on my phone about unums, I always liked posits. Posits allow for generally more precise operations, while also having properly defined relationship operators. Only one NaR, no ±0 or ±inf. Posits only have 2 exception values, which makes handling them much more easy. This blogpost will go over Gosit and some of the decisions made in the development of it.
hall
For the rest of this blog post I will refer to the IEE754 representation as float, or traditional float.
A posit is an approximation of a real number, much like traditional floats. There is a finite amount of precision in computers, you only have a certain amount of RAM available, and ideally you don’t want to use all of it just for one number. Usually programs use the 32 or 64 bits versions of IEE754, both because hardware allows it, and because that is approximately the precision needed. But these floats have some flaws with them, which I will talk about later. In essence posits aim to fix or resolve these problems as much as possible. Of course this means it wont be 100% compatible, but as long as you keep yourself to real numbers, most operations should result in similar numbers.
Posits have a number of parts which define it. From left to right (MSB to LSB) there are:
One of the problems with floats is that there are many representations of NaN. This creates problems for languages and libraries. How to sort a list of NaNs? Is one NaN bigger than the other? are they equal? You would think, they are all NaN, of course they should be equal! But this makes checking if 2 floats are equal harder. For example take 0b1111 1111 1000 0000 0000 0000 0000 0000
is NaN, but 0b1111 1111 1000 0000 0000 0000 0000 0001
is also NaN, checking if they are equal is more complex than just comparing the bits.
Similarly a problem exists in maps. If I store value a
in a map with a NaN as key, and retrieve a value from the map with a different NaN key, should I receive a
? If so this would complicate the maps just like it did with equality.
There is also some more minor things in floats, that most people wont have to deal with when using posits. For example there is ±0, and weird rounding modes.
Mostly, yes. There are some issues, like there is no ±Inf which can be useful. But when working close to 1 Posits have better precision in mathematical operations, and in representation. When working with the very big and very small the opposite it true. There are contrived examples of Posits being worse, but the same can be (and has been) done vise versa. Both formats have ranges in which they are superior, and in which they are not.
Just because. I wanted a better understanding how floating point numbers worked, and this seemed like a good starting point. Writing this library gave me a good understanding of how they work, and how you can operate on them.
Posits come in many different forms, for a given size in bits, there is also a number (ES) which determines how many bits are reserved for the exponent. As of the latest official specification, this is always 2, but this library can generate for different ES as well. I wanted to support all possible ES values. For this there is one generic Posit type, Posit
. This stores the ES in the type, and should work with all values under 32. This however is a bit slow, because ES will never change during the operation, and a series of operations. There already is a restriction on working with posits with different ES sizes. It is still not constant however. If ES were constant, the compiler could do a lot of optimisations. There are a lot of shifts by ES amount, adds etc that could all be evaluated AOT by the compiler. To make this happen I there is a go generate
script which generates posit types for two predetermined sizes, 32ES2 and 16ES1. If you want to generate more yourself you can use that program to do it too. There is one big template, that handles all the sizes. This way adding a new one should be as easy as adding it to the list of generated types. There are some limitations, like go doesn’t support 128 bit ints which are needed for 64 bit operations, and I don’t think 8 bits would work. But at least you can specify any ES value for both 16 and 32 bits and it should work. Keep in mind that you may need to update the script with some constants for the script to generate all functions correctly.
Gosit used to be fuzzed against softposit-rs and the goal is to always fuzz for at least 24 hours before releasing a stable release, but this may vary depending on the changes made. Many bugs have been found using fuzzing, and these are added to the regular tests. In fact Gosit has found some bugs in softposit-rs, one is truncate truncating 0.00004 to 1 and the other is truncate truncating 3.0 (exact) to 2
I have since moved off fuzzing against the mostly broken rust port, and started using the original c version of much higher quality(#13).
In this MR I also improved the performance a bunch, so it should be possible to do exhaustive testing for all 16ES1 functions, and 32ES2 that only take one argument.
This is mostly a copy past of the Gosit readme.md
Semi constant-time benchmarks are ran with the exect same bench cases to eliminate favouring one library over another because of coincidence.
These are rotated out every iteration. For sqrt, the absolute value is taken to avoid the fast-path.
However, exp()
and log2
are very dependent on the magniture of the input. For this reason there are 3 versions of each benchmark, Tiny
, Medium
, and Big
. Which lay on some natural semi natural boundries on which the performance characteristics probably change.
All benches use the recomended ES for the bitsize
.
#Turbo boost has been disabled
cset shield --exec -- go test --run=X --bench=. -benchtime 30s # cset with 2 threads
BenchmarkAddSlow-10 1000000000 14.33 ns/op
BenchmarkAdd32ESConst-10 1000000000 12.33 ns/op
BenchmarkAdd16ESConst-10 1000000000 11.74 ns/op
BenchmarkAddSlowGoposit-10 11298813 3463 ns/op
BenchmarkMulSlow-10 1000000000 13.28 ns/op
BenchmarkMul32ESConst-10 1000000000 11.40 ns/op
BenchmarkMul16ESConst-10 1000000000 10.04 ns/op
BenchmarkMulSlowGoposit-10 10038568 3466 ns/op
BenchmarkDivSlow-10 1000000000 15.19 ns/op
BenchmarkDiv32ESConst-10 1000000000 13.58 ns/op
BenchmarkDiv16ESConst-10 1000000000 11.89 ns/op
BenchmarkDivSlowGoposit-10 9835690 3661 ns/op
BenchmarkSqrtSlow-10 1000000000 32.94 ns/op
BenchmarkSqrt32ESConst-10 928119480 38.82 ns/op
BenchmarkSqrt16ESConst-10 1000000000 16.54 ns/op
BenchmarkTruncSlow-10 1000000000 3.747 ns/op
BenchmarkTrunc32ESConst-10 1000000000 3.388 ns/op
BenchmarkTrunc16ESConst-10 1000000000 3.739 ns/op
BenchmarkRoundSlow-10 1000000000 6.574 ns/op
BenchmarkRound32ESConst-10 1000000000 5.746 ns/op
BenchmarkRound16ESConst-10 1000000000 5.079 ns/op
BenchmarkStringSlow-10 147758671 251.7 ns/op
BenchmarkExp32Tiny-10 1000000000 8.718 ns/op
BenchmarkExp32Medium-10 257134201 140.9 ns/op
BenchmarkExp32Big-10 1000000000 11.89 ns/op
BenchmarkExp32ConstESTiny-10 1000000000 8.321 ns/op
BenchmarkExp32ConstESMedium-10 262799006 137.3 ns/op
BenchmarkExp32ConstESBig-10 1000000000 11.85 ns/op
BenchmarkExp16ConstESTiny-10 1000000000 7.412 ns/op
BenchmarkExp16ConstESMedium-10 503256609 72.13 ns/op
BenchmarkExp16ConstESBig-10 1000000000 7.098 ns/op
BenchmarkLog232Tiny-10 1000000000 12.77 ns/op
BenchmarkLog232Medium-10 570807181 63.49 ns/op
BenchmarkLog232Big-10 589752162 60.92 ns/op
BenchmarkLog232ConstESTiny-10 1000000000 12.04 ns/op
BenchmarkLog232ConstESMedium-10 606771810 59.79 ns/op
BenchmarkLog232ConstESBig-10 630153366 57.70 ns/op
BenchmarkLog216ConstESTiny-10 1000000000 7.370 ns/op
BenchmarkLog216ConstESMedium-10 1000000000 34.29 ns/op
BenchmarkLog216ConstESBig-10 1000000000 20.42 ns/op
BenchmarkFromInt3232Slow-10 1000000000 10.03 ns/op
BenchmarkFromInt3232ConstES-10 1000000000 7.838 ns/op
BenchmarkFromInt3216ConstES-10 1000000000 5.431 ns/op
BenchmarkFromUint3232Slow-10 1000000000 8.697 ns/op
BenchmarkFromUint3232ConstES-10 1000000000 6.613 ns/op
BenchmarkFromUint3216ConstES-10 1000000000 5.427 ns/op
BenchmarkFromInt6432Slow-10 1000000000 10.10 ns/op
BenchmarkFromInt6432ConstES-10 1000000000 8.242 ns/op
BenchmarkFromInt6416ConstES-10 1000000000 5.619 ns/op
BenchmarkFromUint6432Slow-10 1000000000 8.985 ns/op
BenchmarkFromUint6432ConstES-10 1000000000 7.253 ns/op
BenchmarkFromUint6416ConstES-10 1000000000 5.582 ns/op
BenchmarkFromInt1616ConstES-10 1000000000 11.18 ns/op
BenchmarkFromUint1616ConstES-10 1000000000 7.202 ns/op
These are directly comparible to the go benchmarks as they use the same data.
As you can see, these are quite a bit slower than the rust version.
In the soft-posit readme it is recommended to enable optimisations in their makefile,
however this did not bring any significant improvement.
BenchmarkCAdd32ES2-10 1000000000 11.15 ns/op
BenchmarkCMul32ES2-10 1000000000 10.89 ns/op
BenchmarkCDiv32ES2-10 1000000000 17.83 ns/op
BenchmarkCSqrt32ES2-10 1000000000 10.57 ns/op
BenchmarkCTrunc32ES2-10 1000000000 7.333 ns/op
No direct comparisons exist, but taking some averages from cargo bench on my machine.
(also using cset and no boost)
32P2:
Operation | ns/op |
---|---|
Add | 8.4ns/op |
- | 7.9ns/op |
* | 7.1ns/op |
/ | 11.1ns/op |
sqrt | 8.2ns/op |
trunc | - |
⌊x⌉ | 3.8ns/op |
16P1
Operation | ns/op |
---|---|
Add | 10.1ns/op |
- | 8.5ns/op |
* | 8.2ns/op |
/ | 11.3ns/op |
sqrt | 9.0ns/op |
trunc | - |
⌊x⌉ | 5.2ns/op |
I have been working hard in improvements on the exponential function. Other hyperbolic functions such as sinh
. cosh
, and tanh
are also coming. Trigonometric functions like sin
, cos
, and tan
are also bound to arrive at some point.
Other improvements will be the code generation. Generating more complicated code and handling more ES and size values will require this.
As a possibility, quire might also be supported in the future.