Publications

Export 108 results:
Sort by: Author Title Type [ Year  (Desc)]
2017
A. Wahba, A., and H. A. H. Fahmy, "Area Efficient and Fast Combined Binary/Decimal Floating Point Fused Multiply Add Unit", {IEEE} {T}ransactions on {C}omputers, vol. 66, no. 2, pp. 226–239, 2017. AbstractWebsite

In this work we present a new 64-bit floating point Fused Multiply Add (FMA) unit that can perform both binary and decimal addition, multiplication, and fused-multiply-add operations. The presented FMA has 6% less delay than the fastest stand-alone decimal unit and 23% less area than both binary and decimal units together. These results were achieved by the use of: 1) column by column reduction to reduce the partial products in the multiplier tree, 2) a new leading zeros detector that produces its output in base-3 to simplify the normalization shifting in the binary datapath, 3) the use of a redundant adder to perform the final addition, 4) using a new rounding-while-redundant technique to hide the rounding delay and remove it from the critical path, and 5) using a new simple conversion technique from redundant to binary/decimal.

Sayed, W. S., A. G. Radwan, A. A. Rezk, and H. A. H. Fahmy, "Finite Precision Logistic Map Between Computational Efficiency and Accuracy with Encryption Applications", Complexity, 2017. Abstract

Chaotic systems appear in many applications such as pseudo-random number generation, text encryption and secure image transfer. Numerical solutions of these systems using digital software or hardware inevitably deviate from the expected analytical solutions. Chaotic orbits produced using finite precision systems do not exhibit the infinite period expected under the assumptions of infinite simulation time and precision. In this paper, digital implementation of the generalized logistic map with signed parameter is considered. We present a fixed-point hardware realization of a Pseudo-Random Number Generator using the logistic map that experiences a tradeoff between computational efficiency and accuracy. Several introduced factors such as the used precision, the order of execution of the operations, parameter and initial point values affect the properties of the finite precision map. For positive and negative parameter cases, the studied properties include bifurcation points, output range, maximum Lyapunov Exponent, and period length. The performance of the finite precision logistic map is compared in the two cases. A basic stream cipher system is realized to evaluate the system performance for encryption applications for different bus sizes regarding the encryption key size, hardware requirements, maximum clock frequency, NIST and correlation, histogram, entropy and Mean Absolute Error analyses of encrypted images.

Sayed, W. S., H. A. H. Fahmy, A. A. Rezk, and A. G. Radwan, "Generalized Smooth Transition Map Between Tent and Logistic Maps", International Journal of Bifurcation and Chaos, vol. 27, no. 01, pp. 1730004, 2017. AbstractWebsite

There is a continuous demand on novel chaotic generators to be employed in various modeling and pseudo-random number generation applications. This paper proposes a new chaotic map which is a general form for one-dimensional discrete-time maps employing the power function with the tent and logistic maps as special cases. The proposed map uses extra parameters to provide responses that fit multiple applications for which conventional maps were not enough. The proposed generalization covers also maps whose iterative relations are not based on polynomials, i.e., with fractional powers. We introduce a framework for analyzing the proposed map mathematically and predicting its behavior for various combinations of its parameters. In addi- tion, we present and explain the transition map which results in intermediate responses as the parameters vary from their values corresponding to tent map to those corresponding to logistic map case. We study the properties of the proposed map including graph of the map equation, general bifurcation diagram and its key-points, output sequences, and maximum Lyapunov exponent. We present further explorations such as effects of scaling, system response with respect to the new parameters, and operating ranges other than transition region. Finally, a stream cipher system based on the generalized transition map validates its utility for image encryption applications. The system allows the construction of more efficient encryption keys which enhances its sensitivity and other cryptographic properties.

2016
Sayed, W. S., and H. A. H. Fahmy, "What are the Correct Results for the Special Values of the Operands of the Power Operation?", {ACM} Transactions on Mathematical Software, vol. 42, no. 2, New York, NY, USA, ACM, pp. 14:1–14:17, may, 2016. AbstractWebsite

Language standards such as C99, C11, as well as the IEEE Standard for Floating-Point Arithmetic 754 (IEEE Std 754-2008) specify the expected behavior of binary and decimal floating-point arithmetic in computer programming environments and the handling of special values and exception conditions. Many researchers focus on verifying the compliance of implementations for binary and decimal floating-point operations with these standards. In this article, we are concerned with the special values of the operands of the power function $Z = X^Y$. We study how the standards define the correct results for this operation, propose a mathematically justified definition for the correct results of the power function on the occurrence of these special values as its operands, test how different software implementations for the power function deal with these special values, and classify the behavior of different programming languages from the viewpoint of how much they conform to the standards and our proposed mathematical definition. We present inconsistencies between the implementations and the standards and we discuss incompatibilities between different versions of the same software.

Gamal, N., H. A. H. Fahmy, Y. Ismail, and H. Mostafa, "Design Guidelines for Embedded {NoCs} on {FPGAs}", The 17th {IEEE} International Symposium on Quality Electronic Design ({ISQED}), {S}anta {C}lara, {CA}, {USA}, 2016. Abstract

Including Networks-on-Chip (NoCs) within FPGAs has become necessary to overcome the problems of point-to-point interconnect scheme. This will enable interfacing with high speed IOs and partial dynamic reconfiguration (PDR), and reduce compile time and improve system performance. We compared FPGA-specific NoC components on soft and hard implementations and analyzed the efficiency gap between the two technologies to get design constraints in this space. Input module that includes memory buffers, implemented using block RAMs (BRAMs), has less 1.8x area, 2.9x delay and 5.3x power. Switch has the largest gap: 90x area, 7x delay and 53x power. If the router is totally hard implemented, this will save 9x area, 3.7x delay and 12x power. By comparing our results with same flow on ASIC-specific router, we show that using FPGA-specific NoCs design improves utility with 3x in area with slight increase in delay.

Radwan, A. G., W. S. Sayed, and H. A. H. Fahmy, "Double-Sided Bifurcations in Tent maps: Analysis and Applications", The 3rd {I}nternational {C}onference on {A}dvances in {C}omputational {T}ools for {E}ngineering {A}pplications ({ACTEA}), {L}ebanon, 2016. Abstract
n/a
M. Hassan, A., H. A. H. Fahmy, and N. H. Rafat, "Enhanced Model of Conductive Filament-Based Memristor via including Trapezoidal Electron Tunneling Barrier Effect", {IEEE} {T}ransactions on {N}anotechnology ({TNANO}), vol. 15, no. 3, pp. 484–491, 2016. AbstractWebsite

Memristors exhibit very promising features such as nonvolatility and small area. Several types of memristors have been developed in the last decade using different materials along with physical models explaining their behaviors. In this paper, we modify a previously published model to account for a trapezoidal electron tunneling barrier rather than a zero field or constant potential barrier. The model is verified against experimental data showing better agreement. We then perform a study to find out the effect of different memristors parameters on its I-V characteristics and how to shape the characteristics to fit the applications. Finally, we provide a SPICE model which takes into account the tunneling capacitance and clarify that any fabricated memristor has, inherently, a memcapacitor in parallel. The dominant element may be the memristor or the memcapacitor depending on the frequency of operation.

El-Din, M. M., H. A. H. Fahmy, Y. Ismail, N. Gamal, and H. Mostafa, "Leakage Power Evaluation of {F}in{FET}-Based {FPGA} Cluster Under Threshold Voltage Variation", The 11th International Design and Test Symposium, {H}ammamet, {T}unisia, 2016. Abstract
n/a
Elashkar, N., M. Aboudina, H. A. H. Fahmy, G. H. Ibrahim, and A. H. Khalil, "Memristor based {BPSK} and {QPSK} Demodulators with Nonlinear Dopant Drift Model", Microelectronics Journal, vol. 56, pp. 17–24, 2016. AbstractWebsite

In this paper, the dependence of the instantaneous memristance value and its I–V characteristics on a periodic signal phase are studied. Hence, expression for the instantaneous memristance as a function of the periodic input phase is derived. This derivation is based on the memristor linear dopant drift model and is provided for sinusoidal input waveforms. To prove the tendency, simulations using linear and nonlinear dopant drift memristor models are performed in the Cadence simulation environment. Based on those, a set of digital communication demodulators are proposed and investigated exploiting the change of the average memristance with the initial phase of applied signal. The experimental-based `nonlinear' dopant drift model is used in designing the proposed demodulators for Binary Phase Shift Keying (BPSK) and Quadrature Phase Shift Keying (QPSK) modulation schemes. Since all proposed demodulators are asynchronous, the proposed circuits do not need any carrier recovery circuits. Moreover, transient simulations have been executed showing the proper matching to the expected performance.

Zidan, M. A., H. Omran, R. Naous, A. Sultan, H. A. H. Fahmy, W. D. Lu, and K. N. Salama, "Single-Readout High-Density Memristor Crossbar", Scientific Reports, vol. 6, 2016. AbstractWebsite

High-density memristor-crossbar architecture is a very promising technology for future computing systems. The simplicity of the gateless-crossbar structure is both its principal advantage and the source of undesired sneak-paths of current. This parasitic current could consume an enormous amount of energy and ruin the readout process. We introduce new adaptive-threshold readout techniques that utilize the locality and hierarchy properties of the computer-memory system to address the sneak-paths problem. The proposed methods require a single memory access per pixel for an array readout. Besides, the memristive crossbar consumes an order of magnitude less power than state-of-the-art readout techniques.

2015
Nouh, K., and H. A. H. Fahmy, "Binary Floating Point Verification Using Random Test Vector Generation Based on {SV} Constraints", The {IEEE} International Conference on Electronics, Circuits, and Systems, ({ICECS}), {C}airo, {E}gypt, pp. 433–436, dec, 2015. Abstract

Verification of Binary Floating Point (FP) Arithmetic requires robust techniques to prove compliance with Floating point IEEE Standard (IEEE Std 754-2008). This paper provides a new verification methodology that uses a constraint based random technique to generate test vectors for validating FP arithmetic instructions. The new proposal is generic and can be used to verify any software or hardware binary FP design. The constraints used in verification are written in System Verilog (SV) language and can be solved with any SV constraint solver tool. The paper provides a case study to prove the feasibility and usefulness of the proposed approach in finding bugs for Addition-Subtraction and Multiplication operations.

Sayed, W. S., A. G. Radwan, and H. A. H. Fahmy, "Design of a Generalized Bidirectional Tent Map Suitable for Encryption Applications", The 11th International Computer Engineering Conference ({ICENCO2015}), {C}airo, {E}gypt, dec, 2015. Abstract
n/a
Sayed, W. S., A. - L. E. Hussien, H. A. H. Fahmy, and A. G. Radwan, "Generalized chaotic maps and elementary functions between analysis and implementation", The {IEEE} International Conference on Electronics, Circuits, and Systems, ({ICECS}), {C}airo, {E}gypt, pp. 433–436, dec, 2015. Abstract

Nonlinear analysis and chaos have many applications in communications, cryptography, and many other fields. In this work, we aim to bridge the gap between mathematical analysis of generalized 1D discrete chaotic maps and their implementation on digital platforms. We propose several variations and generalizations on the logistic and tent maps and employ the power function z = xy in a general map that could yield each of them and other new maps. Finite precision logistic map is studied explaining the impact of finitude on its properties. In addition, floating-point implementations of the power function are tested on the occurrence of special values of the operands.

Hassan, A., R. Ahmed, H. Mostafa, H. A. H. Fahmy, and A. Hussien, "Performance evaluation of dynamic partial reconfiguration techniques for software defined radio implementation on {FPGA}", The {IEEE} International Conference on Electronics, Circuits, and Systems, ({ICECS}), {C}airo, {E}gypt, pp. 183–186, dec, 2015. Abstract

Reconfigurability of SRAM-based Field Programmable Gate Arrays (FPGAs) is the most powerful feature over ASIC designs. Dynamic Partial Reconfiguration (DPR) emphasizes this feature by adding more flexibility over runtime phase. Xilinx Virtex family of FPGAs provides four techniques to perform DPR; SelectMAP, Serial mode, JTAG, and ICAP. In this paper, each of these techniques is reviewed, evaluated, and tested using Convolutional encoder, an essential block from Software Defined Radio (SDR) system, which becomes the most promising application for DPR. Experiments are carried out using Xilinx Virtex 5 kit ``XUPV5-LX110T'' to measure the trade-offs between performance and area-overhead by adding reconfiguration controller on/off FPGA fabric. It is shown that the performance of each interface is independent of design resource, but proportional only with partial reconfiguration region selection that had been chosen at design place and route phase.

El-Motaz, M. A., A. M. El-Shafiey, M. E. Farag, O. A. Nasr, and H. A. H. Fahmy, "Speeding-up fast fourier transform", The {IEEE} International Conference on Electronics, Circuits, and Systems, ({ICECS}), {C}airo, {E}gypt, pp. 510–511, dec, 2015. Abstract

This work proposes a restructure of FFT algorithm to be more hardware friendly. The proposed algorithm is modeled as a combinatorial optimization problem. This paper presents two sub-optimal schemes of the proposed FFT restructure: one-stage and two-stage optimization. The proposed FFT algorithm is applied on 1024-point Radix-2 Single-Path Delay Feedback (R2SDF) architecture. The one-stage and two-stage optimization schemes achieve reduction in the multipliers area by 40.8% and 62.5%, respectively, compared with the conventional algorithm.

El-Shafiey, A. M., M. E. Farag, M. A. El-Motaz, O. A. Nasr, and H. A. H. Fahmy, "Two-Stage Optimization of {CORDIC}-Friendly {FFT}", The {IEEE} International Conference on Electronics, Circuits, and Systems, ({ICECS}), {C}airo, {E}gypt, pp. 408–411, dec, 2015. Abstract

In this paper, authors extend the work of CORDIC-Friendly Fast Fourier Transform (FFT) architecture in [1]. Instead of optimizing each stage independently, a joint optimization of two stages of the CORDIC-Friendly FFT rotations is considered. At no additional hardware cost, the proposed scheme achieves up to 38 dB SQNR gain using two-iteration MSR-CORDIC when compared to the previous algorithm for different FFT lengths.

Elhelw, A., A. A. El-Moursy, and H. A. H. Fahmy, "Adaptive Time-Based Least Memory Intensive scheduling", The 9th {IEEE} International Symposium on Embedded Multicore/Many-core Systems-on-Chip ({MCSoC}-15), {T}urin, {I}taly, 2015. Abstract

DRAM memory is a major resource shared in multi-core system, hence memory requests from different applications interfere with each other. Therefore, different applications running together on the same chip can experience extremely different memory system performance: one application can experience a severe slowdown or starvation while another is unfairly prioritized by the memory scheduler. Existing memory access scheduling techniques try to optimize the overall multi-core system performance and fairness. This paper proposes an effective memory access scheduler, called Adaptive Time-Based Least Memory Intensive scheduling (Adaptive TB-LMI). The goal of the proposed scheduler is to increase the overall system performance and fairness. Adaptive TB-LMI showed an average increase in performance and fairness by 2.5% and 10.2% respectively compared to Time-Based Least Memory Intensive scheduling (TB-LMI) (previous work providing the best system throughput and fairness). Adaptive TB-LMI showed a maximum increase in performance and fairness by 9.65% and 22.16% respectively compared to TB-LMI. Adaptive TB-LMI decreases the hardware area required by 30.8% compared to TB-LMI.

Zidan, M. A., A. S. Salem, H. Omran, H. A. H. Fahmy, and K. N. Salama, "Compensated Readout for High Density {MOS}-Gated Memristor Crossbar Array", {IEEE} {T}ransactions on {N}anotechnology ({TNANO}), vol. 14, no. 1, pp. 3–6, 2015. AbstractWebsite

Leakage current is one of the main challenges facing high-density MOS-gated memristor arrays. In this study, we show that leakage current ruins the memory readout process for high-density arrays, and analyze the tradeoff between the array density and its power consumption. We propose a novel readout technique and its underlying circuitry, which is able to compensate for the transistor leakage-current effect in the high-density gated memristor array.

Sayed, W. S., A. G. Radwan, and H. A. H. Fahmy, "Design of Positive, Negative, and Alternating Sign Generalized Logistic Maps", Discrete Dynamics in Nature and Society, vol. 2015, 2015. AbstractWebsite

The discrete logistic map is one of the most famous discrete chaotic maps that has widely-spread applications. This paper investigates a set of four generalized logistic maps where the conventional map is a special case. The proposed maps have extra degrees of freedom which provide different chaotic characteristics and increase the design flexibility required for many applications such as quantitative financial modeling. Based on the maximum chaotic range of the output, the proposed maps can be classified as: positive logistic map, mostly positive logistic map, negative logistic map, and mostly negative logistic map. Mathematical analysis for each generalized map includes: bifurcation diagrams relative to all parameters, effective range of parameters, first bifurcation point, as well as the maximum Lyapunov exponent (MLE). Independent, vertical, and horizontal scales of the bifurcation diagram are discussed for each generalized map as well as a new bifurcation diagram related to one of the added parameters. A systematic procedure to design two-constraints logistic map is discussed and validated through four different examples.

El-Din, M. M., H. Mostafa, H. A. H. Fahmy, Y. Ismail, and H. Abdelhamid, "Performance evaluation of {F}in{FET}-based {FPGA} cluster under threshold voltage variation", The 13th {IEEE} International New Circuits and Systems Conference ({NEWCAS}), {G}renoble, {F}rance, pp. 1–4, 2015. Abstract

The performance of FinFET-based FPGA cluster is evaluated with technology scaling for channel length from 20nm down to 7nm showing the scaling trends of basic performance metrics. The impact of threshold voltage variation, considering die-to-die variations, on the delay, power, and power-delay product is reported after the simulation of a 2-bit adder benchmark. Simulation results show an increasing trend of the average power and power-delay product variations with threshold voltage as we go down with technology node. On the contrary, the delay is showing the least percentage of variations with threshold voltage at the most advanced node of 7nm.

Mohamed, A. S., A. A. El-Moursy, and H. A. H. Fahmy, "Real-Time Memory Controller for Embedded Multi-core System", The 17th {IEEE} International Conference on High Performance Computing and Communications, {N}ew {Y}ork, {USA}, 2015. Abstract

Nowadays modern chip multi-cores (CMPs) become more demanding because of their high performance especially in real-time embedded systems. On the other side, bounded latencies has become vital to guarantee high performance and fairness for applications running on CMPs cores. We propose a new memory controller that prioritizes and assigns defined quotas for cores within unified epoch (MCES). Our approach works on variety of generations of double data rate DRAM (DDR DRAM). MCES is able to achieve an overall performance reached 35% for 4 cores system.

Sayed, W. S., A. G. Radwan, H. A. H. Fahmy, and A. - L. E. Hussein, "Scaling Parameters and Chaos in Generalized 1D Discrete Time Maps", The 2015 International Symposium on Nonlinear Theory and its Applications ({NOLTA2015}), {H}ong {K}ong, {C}hina, 2015. Abstract

Among all chaotic generators, 1D discrete maps are characterized by their simplicity and suitability for digital implementation, in addition to their widely spread applications. Generalizations on 1D discrete maps enhance their unpredictability and increase their reliabitity in secure communication and encryption. In this paper, three parameterized maps are discussed: scaled positive logistic map (SPLM), scaled mostly positive logistic map (SMPLM), and scaled tent map (STM). The impacts of the introduced scaling parameters on the properties of each map are discussed including: the bifurcation diagram versus the main system parameter, the main keypoints, the maximum chaotic range, and calculation of maximum Lyapunov exponent (MLE) versus all system parameters.

2014
Elhelw, A., A. A. El-Moursy, and H. A. H. Fahmy, "Time-Based Least Memory Intensive scheduling", The 8th {IEEE} International Symposium on Embedded Multicore/Many-core Systems-on-Chip ({MCSoC}-14), {A}izu-{W}akamatsu, {J}apan, sep, 2014. Abstract
n/a
Mahmoud, M. M., N. Soin, and H. A. H. Fahmy, "Design Framework to Overcome Aging Degradation of the 16 nm {VLSI} Technology Circuits", {IEEE} {T}ransaction on {C}omputer {A}ided {D}esign of {I}ntegrated {C}ircuits and {S}ystems, vol. 33, no. 5, pp. 691–703, may, 2014. AbstractWebsite

Intensive scaling for VLSI circuits is a key factor for gaining outstanding performance. However, this scaling has huge negative impact on circuit reliability, as it increases the undesired effect of aging degradation on ultra-deep submicron technologies. Nowadays, Bias Temperature Instability (BTI) aging process has a major negative impact on VLSI circuits reliability. This paper presents a comprehensive framework that assists in designing fortified VLSI circuits against BTI aging degradation. The framework contains: (1) Novel circuit level techniques that eliminate the effect of BTI, these techniques successfully decrease the power dissipation by 36% and enhance the reliability of VLSI circuits, (2) Evaluation of the reliability of all circuit level techniques used to eliminate BTI aging degradation for 16 nm CMOS technology, (3) Comparison between the efficiency of all circuit level techniques in terms of power consumption and area.

A. Osman, M. T., H. A. H. Fahmy, Y. A. H. Fahmy, M. M. Elsabrouty, and A. Shalash, "Two Extended Programmable {BCH} Soft Decoders Using Least Reliable Bits Reprocessing", Circuits, Systems and Signal Processing by Springer, vol. 33, no. 5, pp. 1369–1391, may, 2014. AbstractWebsite

This paper proposes two BCH soft decoders suitable for high rate codes with medium to large word length. The proposed decoders extend the correcting capability by providing a programmable performance gain according to the choice of the extra compensated bits p, with a theoretical maximum likelihood decoding when 2t + p approaches the codeword size n, where t is the correcting capability of the code under algebraic decoding. Our proposed architectures for the proposed algorithms use pipelined arithmetic units, leading to a reduction in the critical paths. This allows for an increase in the operating frequency up to m/2 times compared to algebraic decoders, where m is the Galois field size. Our proposed decoders operate only on the least reliable bits, which leads to a reduction in the decoder complexity by removing the Chien search procedure.