============================================================================= FPU Tutorial v 1.02 (06/05/2000) (c) 2000 Eli-Jean Leyssens This tutorial can be downloaded from http://www.dse.nl/~topix ============================================================================= This is an extremely short (or is it ;) tutorial on how to use the FPU, Floating Point Unit. It shows you how to move values to and from the FPU and how to use the data operations, like divide, square root etc. First off, I assume you already know what floating point numbers are and what single, double and extended precision means. If you don't know what they are then you can probably still learn something from this tutorial and you could probably incorporate some of the example code into your own programs. However, I would certainly advise you to look around on the Internet for some documents describing the general idea and workings of floating point numbers. I've included some links at the end of this tutorial which could help you on your way. Secondly, note that to run the examples you'll either need RISC OS 4 or the ExtBas module which extends the BASIC module to recognize and assemble FP instructions. The ExtBas module is part of the archive: ftp://mic2.hensa.ac.uk/local/riscos/programming/extbasdis.zip --------------------------------------------- Floating Point Unit on RISC OS machines --------------------------------------------- On some machines the FPU is present in hardware as a coprocessor, called a FPA, Floating Point Accelerator, but most RISC OS machines only have the FPE, Floating Point Emulator. Note that even with a FPA fitted some instructions may still be emulated in software. There can be slight variations in accuracy between FPA and FPE implementations, but generally speaking programs do not need to know whether a FPA is fitted or not. The main difference between FPA and FPE is speed of execution. This means that you can write code that uses FP instructions without having to worry whether they'll be executed by dedicated hardware or emulated in software. When no FPA is present an FP instruction will yield an "Undefined instruction" exception. The Undefined instruction vector is called, which is claimed by the FP Emulator, which then emulates the "undefined" instruction. Execution is then continued after the emulated FP instruction, without any registers being corrupted (except for the ones requested by the FP instruction of course). The FPU has 8 (eight) floating point registers, known as F0 to F7 and also a status and a control register. In this tutorial we'll only look at the "normal" floating point registers, from here on called FP registers, not at the status or control registers. The format in which numbers are stored in FP registers is not specified. The different FP formats only become visible when transferring a number from or to memory: Name Size Exponent Fraction Single Precision (S) 4 bytes 8 bits 23 bits Double Precision (D) 8 bytes 11 bits 52 bits Double Extended Precision (E) 12 bytes 15 bits 64 bits Packed Decimal (P) 12 bytes 4 digits 19 digits Expanded Packed Decimal (EP) 16 bytes 6 digits 24 digits If you look closely at the the table above you'll notice that Packed formats store the numbers as digits rather than bits. This is done by storing 1 digit per nibble (4 bits). In almost all our examples we'll store numbers in memory at Single Precision (S) and we won't even look into the Packed format as it's rather silly ;) although it can of course be useful, especially when communicating with humans as they better understand digits than bits :) All basic floating point instructions operate as though the result were computed to infinite precision and then rounded to the length, and in the way, specified by the instruction. The rounding is selectable from: - Round to nearest - Round to +infinity (P) - Round to -infinity (M) - Round to zero (Z) The default is "round to nearest"; in the even of a tie, this rounds to "nearest even" as required by the IEEE. >> NOTE! you should only use FP instructions in User Mode programs! << -------------------------- Moving data TO the FPU -------------------------- Before we can tell the FPU to for instance divide two numbers we'll of course need a way to tell it what these two numbers are. There are many ways to load a number into a FP register; I'll only show the three most popular ones here. The first method is to load a value into a normal ARM register first and then load the value of that register into any of the 8 FP registers. The instruction used for the latter operation is FLT. Here's an example of how you can load the value 123 into FP register f0: mov r0, #123 ; First setup an ARM register with the value flts f0, r0 ; Now transfer the value from the ARM ; register into the FP register. The s in flts means that we want to use single precision. Also note that instead of r0 we could have used any other general purpose ARM register and instead of f0 we could have used any of the 8 FP registers. So, mov r9, #123 flts f3, r9 would also have worked, although then of course f3 would have 123 loaded into it and not f0. It should be fairly obvious that by using FLT we can only load integers into the FP registers. I mean, you can't load 123 and a half into r0 and therefore you can't load 123.5 into f0 either. At least, not by using FLT. Which brings us to the second most popular instruction for loading values into FP registers, namely FLD. To use FLD though you'll first need to set up a floating point value in memory. And I do mean floating point value. So, just setting up an integer value using equd won't work. Luckily there's an instruction for defining a floating point value in memory as well, namely EQUF. So, to load 123 and a half into f0 you could use: .floataddress equfs 123.5 .code ldfs f0, floataddress Easy huh? Note that once again the s in both equfs and ldfs stands for single precision. For ldfs this is particularly important as the precision must match the precision you specified at the equf command. The third method shown here for loading values into FP registers also uses the FLT instruction, but instead of loading the value from an ARM register the value is encoded in the FP instruction. There are only a small number of values that can be loaded in this way though. They are: 0, 1, 2, 3, 4, 5, 10 and 0.5 flts f0, #3 ; Load 3 into f0 flts f1, #0.5 ; Load 0.5 into f1 These special values can be used in FP data operations as well as you'll find out later on. Now, before we look at how we can perform operations on the FP registers, let's first look at ways to move data from the FP registers back to the ARM registers or memory. ---------------------------- Moving data FROM the FPU ---------------------------- For copying data from the FPU we'll look only at the two most popular ways: transfering a single FP register to a single ARM register or memory. To transfer the value from a FP register into a normal ARM register you can use the FIX instruction. So to transfer the value of f3 into r9: fix r9, f3 Note the absence of the precision identifier, fix doesn't take one. Also note that registers can only contain integers, so the number stored in r9 is the rounded value of f3. You can find out how to specify the rounding mode further down in this document. To save the value from f3 into memory: stfs f3, floataddress Yes, once again you need to specify the precision. Note that Double and Extended precision floating point numbers take up more bytes than single precision ones. So, if you defined floataddress with equfs than you should not use stfD as that will overwrite more bytes than you reserved with equfs. Right, now that we now how to move data to and from the FPU let's look at some data operations. --------------- Square root --------------- One of the simplest operations is the Square root operation as it only operates on one value. The instruction for it is SQT and it takes two parameters. The first parameter indicates the FP register to store the result in, the other indicates the FP register to take the Square root of. sqts f0, f1 ; f0 = sqt( f1) It's as simple as that. So, the "entire" code to calculate the square root of an ARM register, by using the FPU for the calculation would be: ; r0 = number flts f0, r0 ; f0 = r0 sqts f0, f0 ; f0 = square root of f0, single precision fix r0, f0 ; r0 = f0 = sqt( r0) The "sqroot" program included in this archive contains a working example. ---------------------- Divide and conquer ---------------------- Another handy operation is the divide operation. The instruction for it is DVF and it takes three parameters, all indicating FP registers. The parameters are for Quotient, Number and Divisor. dvfs f0, f1, f2 ; f0 = f1 / f2 So, the code to divide two ARM registers, by using the FPU for the calculation would be: ; r0 = number ; r1 = divisor flts f0, r0 ; f0 = r0 flts f1, r1 ; f1 = r1 dvfs f0, f0, f1 ; f0 = f0 / f1 fix r0, f0 ; r0 = f0 = f0 / f1 = r0 / r1 The "divide" program included in this archive contains a working example. ------------------ Wave "Bye-Bye" ------------------ For our last example of data operation instructions we'll look at the sine wave. As the FPU's sine (and cosine) calculations are extremely slow you will almost certainly only want to use them to build a look up table. So, that's just what I'm going to show you. The first thing you need to know about FPU's sine, cosine, tangent etc functions is that they work with radians, not degrees. So, a full sine period is 2*PI (radians) and not 360 (degrees). Right then, let's say we want to build a sine lookup table with 256 values describing a whole period. We'll set the amplitude at 127. In BASIC you would probably do it somewhat like this: Steps% = 256 : REM Number of steps to divide one period in Amplitude% = 127 : REM Amplitude of the sine wave DIM SineTable% Steps%*4 : REM 4 bytes per value as we're storing REM words, not bytes FOR x% = 0 TO Steps%-1 SineTable%!( x% * 4) = Amplitude% * SIN( x% * ( 2*PI / Steps%)) NEXT If you have a hard time understanding this BASIC version then I can only advise you to dust off some old calculus books before you proceed to the FPU version ;) The assembly version using FPU isn't much different from the above. I'm not going to type it in here though, just look at the "sine" example program. ----------------------------- Could you be more precise? ----------------------------- Note that throughout these examples I've used single precision. This means that only 23 bits will be used for the Fractional part of the floating point number. However, due to the way floating point numbers work we effectively get 24 significant bits. So, if you want to load/store numbers bigger than &ffffff without losing information from the least significant bits then you should use Double or Extended precision instead. Simply append a d or e instead of an s after the floating point instruction. So, instead of flts, you should use fltd or flte. Take a look at the "fltSfltD" example for further clarification. ------------------------------------ "No, it's rounder" (c) 2000 Nike ------------------------------------ As you have probably read in the part "Floating Point Unit on RISC OS machines" there are several rounding modes. By default numbers are rounded to nearest. Note that this rounding not only occurs when transferring values from FP registers to ARM registers, but also when storing FP registers in memory, but more importantly also internally in the FPU. Assume we're loading the value of f3 into r9. Let's see what the results of the different rounding modes are for 4 different values of f3. Rounding -4.5 -3.6 -3.5 -3.4 3.4 3.6 3.5 4.5 (Nearest) -4 -4 -4 -3 3 4 4 4 P(lus infinity) -4 -3 -3 -3 4 4 4 5 M(inus infinity) -5 -4 -4 -4 3 3 3 4 Z(ero) -4 -3 -3 -3 3 3 3 4 So, Nearest is also nearest to what you're used to in every day life, except that on a tie, that is x.5 it is rounded to the "nearest even". So, that's why 4.5 is not rounded to 5 (uneven), but to 4 (even). Plus infinity means it's always rounded up to the "higher" value. So, -3.6 is rounded up to -3 as -3 is higher than -4. Minus infinity means it's always rounded down to the "lower" value. So, 3.6 is rounded down to 3 as that's lower than 4. Zero is simply discarding the part after the point :) ----------------------- FP Instruction List ----------------------- This list is in no way complete! It doesn't include instructions for handling the status or control registers, nor does it include instructions for loading/storing multiple FP registers. -- Register transfer -- Instruction syntax: FLT{cond}prec{round} Fn, Rd FLT{cond}prec{round} Fn, #Value FIX{cond}{round} Rd, Fm Don't get fooled by the d in FLT... Fn, Rd The destinaton register is always the first one, just like with any other ARM instruction. So, FLT Fn, Rd stores the ARM register Rd in FP register Fn. {cond} is the standard ARM instruction condition (eq, ne, gt etc) prec is the precision ( S, D, E etc) {round} is the rounding mode ( P, M, Z) {cond} and {round} are of course optional and default to respectively Always and Nearest Value can be any of 0, 1, 2, 3, 4, 5, 10, 0.5 Instructions: FLT Integer to Floating Point Fn := Rd FIX Floating Point to Integer Rd := Fm -- Data operations -- Instruction syntax: unop{cond}prec{round} Fd, Fm unop{cond}prec{round} Fd, #Value binop{cond}prec{round} Fd, Fn, Fm binop{cond}prec{round} Fd, Fn, #Value unop, or unary operations, calculate with just one parameter binop, or binary operations, calculate with two parameters Value can be any of 0, 1, 2, 3, 4, 5, 10, 0.5 Instructions: ADF Add Fd := Fn + Fm MUF Multiply Fd := Fn * Fm SUF Subtract Fd := Fn - Fm RSF Reverse Subtract Fd := Fm - Fn DVF Divide Fd := Fn / Fm RDF Reverse Divide Fd := Fm / Fn POW Power Fd := Fn to the power of Fm RPW Reverse Power Fd := Fm to the power of Fn RMF Remainder Fd := remainder of Fn / Fm Fn - Fm * integer value of ( Fn/Fm) * FML Fast Multiply Fd := Fn * Fm * FDV Fast Divide Fd := Fn / Fm * FRD Fast Reverse Divide Fd := Fm / Fn MVF Move Fd := Fm MNF Move Negated Fd := -Fm ABS Absolute value Fd := ABS( Fm) RND Round to integral value Fd := integer value of Fm SQT Square root Fd := square root of Fm LOG Logarithm to base 10 Fd := log Fm LGN Logarithm to base e Fd := ln Fm EXP Exponent Fd := e to the power of Fm SIN Sine Fd := sine of Fm COS Cosine Fd := cosine of Fm TAN Tangent Fd := tangent of Fm ** ASN Arc Sine Fd := arcsine of Fm ACS Arc Cosine Fd := arccosine of Fm ATN Arc Tangent Fd := arctangent of Fm * FML, FDV and FRD are only definded to work with single precision operands and are not necessarily faster than MUF, DVF and RDF. ** Use ASN Fd, #1 to easily load Pi/2 into Fd. Note that for all these unops and binops you can replace Fm by one of the constants 0, 1, 2, 3, 4, 5, 10 and 0.5 This is also why there are Reverse version of some of the instructions. The rounding according to the rounding mode specified in the instruction is only applied in the final stage. The rounding done during the actual calculations to compute the value are all done with the Nearest rounding mode. This is especially noticable for RMF: Fn := 18 Fm := 5 Fd := Fn - Fm * integer value, rounded to Nearest, of ( Fn / Fm) := 18 - 5 * integer value, rounded to Nearest, of ( 18 / 5) := 18 - 5 * integer value, rounded to Nearest, of 3.6 := 18 - 5 * 4 <- !!! := 18 - 20 := -2 !!! You could correct for this by adding Fm to the remainder when the remainder is less than zero. -------------- Link me up -------------- Here are some links to documents you might find useful in respect to using and coding for the FPU. As mentioned at the start of this tutorial, you'll need something like the ExtBas module to assemble FP instructions if you don't have RISC OS 4. This module is part of the archive: ftp://mic2.hensa.ac.uk/local/riscos/programming/extbasdis.zip There is a whole chapter on the Floating Point Emulator in the RISC OS 3 PRMs (Programmer's Reference Manuals). It should probably have been called Floating Point Unit instead and it's quite a good read: Programmer's Reference Manual, Volume 4, Pages 4-163 to 4-184 Even more technical documentation can be found on the ARM Ltd site. The documentation for the ARM7500FE contains three chapters on the FPA. The documentation for the ARM7500FE has been split up into several files. Either view the table of contents, or download only the file containing the FPA documentation. Note that the documentation is in PDF format. There are PDF readers out in the Public Domain though. http://www.arm.com/Documentation/UserMans/PDF/ARM7500FEvB.html http://www.arm.com/Documentation/UserMans/PDF/ARM7500FEvB_5.pdf Last but not least, you can learn quite a bit from looking at other people's code. Many entries in the CodeCraft competition(s) use FP instructions and as one of the rules of the competition(s) is that full sources must be included they might prove to be valuable examples. If you're lost in the high number of entries then I can only say that at least my entry called HappyRGB, which can be found in the 1K Entries section of the CodeCraft#2 competition, has a lot of FP code. http://surf.to/codecraft http://www.cybercable.tm.fr/~brooby/code.htm http://www.dse.nl/~topix -> Click the CodeCraft menu entry ----------- Credits ----------- Many thanks to Tony Haines for proof reading this tutorial and making some excellent suggestions on how to improve it. Much information was gathered from the Floating Point Emulator chapter of Acorn's Programmer's Reference manual and ARM Ltd's ARM7500FE documentation. You can find links to both in the "Link me up" chapter above. ------------- Copyright ------------- This tutorial and the accompanying example programs have all been written by Eli-Jean Leyssens, aka Pervect of Topix. Eli-Jean Leyssens holds the copyright to this tutorial. The accompanying example programs are to be considered an integral part, and as such this text may only be copied /together/ with the example programs. Equally, if you wish to copy the example programs then you must also include this text. You are freely permitted to use the example routines in your own programs. An acknowledgement of any help obtained would be appreciated. This tutorial, in whole or in part, may not be published in any magazine, digital or hardcopy, or on any website without the written permission of the copyright holder. Download text version + example sources: FPE102.ZIP (10k) Distributed via www.icebird.org with permission by Topix.

©2000 Icebird Acorn Produxions