Hacker's Delight, by Henry S. Warren, Jr., describes a number of algorithms useful for resource-sensitive computing. Among them is a method for using a multiply instruction to implement division by a constant. There are a number of reasons why this can be useful in embedded devices, some of which do not have a divide instruction (e.g. Arm Cortex-M0). Even when a divide instruction is available, it might be undesirable, either because it is relatively slow or because the execution time is variable in a context where stable execution time is required. Even with high-performance non-embedded processors, the divide instruction might not be wanted, as explained by Matt Godbolt's lecture, "What has my compiler done for me lately? Unbolting the compiler's lid".
In fact, this algorithm is so effective that many compilers use it, silently transforming "x / 9u" into "( x * 0x38E38E39uLL ) >> 33", which can surprise the unsuspecting user while debugging. They see a division at the C source level but a multiplication in the disassembly view.
However, some of our favourite embedded compilers do not do this and so it can be useful to explicitly encode, at the C source level, the implementation using multiplication. This web page allows you to specify either signed or unsigned division by a given constant and applies the algorithm to derive the appropriate C code. The C code is intended for use with a multiply machine instruction that takes two 32-bit operands and provides convenient access to the upper 32 bits of the 64-bit result.
$q = {n \over d}$
Optimized embedded code:
Even the Arm Cortex-M0 can use its multiply instruction (with only a 32-bit result) four times to generate the 64-bit result, which will have stable execution time and the final code be much faster than a library call to perform the division.
Instruction set | Signed multiply instruction | Unsigned multiply instruction |
---|---|---|
TriCore | MUL | MUL.U |
Armv7-M/R | SMULL | UMULL |
MPC5xxx | MULHW | MULHWU |
RH850 | MUL | MULU |