Analog and mixed signal circuit simulation often employs the use of the so called LU decomposition method to solve a set of linear algebraic equations represented as Ax=b, where A is a square matrix. The LU method requires the factorization of A into two tri-diagonal matrices. Factorization is 0(n3) time and dominates the execution time of the LU decomposition method. A number of approaches have been developed for reducing the execution time of LU factorization. One approach is to unroll the factorization algorithm and considering each resulting assignment statement to be a machine operation, interpret the instruction stream. If an interpreter is implemented in a special purpose hardware engine there may be efficiencies to be gained by using a uniquely developed floating point unit within the hardware interpreter. This paper documents the research in exploring alternatives in the design of a special purpose double precision floating point unitfora hardware interpreter to perform LU factorization using unrolled code. Alternatives explored were primarily looking at integer adder and multiplier units of the floating point unit to determine the speed and area for each integer unit that was considered. Afloating point divider algorithm was also explored and studied. The result of this paper for the pipelined floating point unit gives the best performance with the Block Carry look-ahead integer adder, Booth-2 integer multiplier and the SRT integer divider units. One of the interesting aspects of this research was heavy use of rapid prototyping all models were implemented in Verilog to evaluate system level performance.
Keywords: IEEE-754, Floating point Unit, 64-bit FPU.