Send Email
Send SMS
Search
Nano Scientific Research Centre Pvt Ltd
Nano Scientific Research Centre Pvt Ltd


VLSI DESIGN IEEE PROJECTS

MTech VLSI IEEE Projects Specialized On M. Tech Vlsi Designing (frontend & Backend)

Domains:

  • Processor Architecture
  • Bist Algorithms
  • Signal Processing
  • Image & Video Processing
  • Communication & Bus Protocols
  • Low Power Vlsi
  • Physical Design (250nm-180nm-90nm-45nm-32nm)
  • Fpga Prototyping, Etc. . . ,

Languages:

  • Vhdl
  • Verilog Hdl
  • System Verilog
  • H-spice

Softwares :

  • Xilinx Ise
  • Xilinx Platform Studio
  • Tanner Eda
  • Dsch
  • Modelsim Ise
  • Microwind
  • Questasim
  • Pspice

Hardwares :

  • Spartan Series
  • Vertex Series
  • Altera Cyclone Series

Our Training Features :
  • 100% Outputs With Extension
  • Paper Publishing In International Level
  • Project Training Session Are Conducted By Real-time Instructor With Real-time Examples.
  • Best Project Training Material .
  • State-of-the-art Lab With Required Software For Practicing.

Design, Synthesis and FPGA-based Implementation of a 32-bit

REQUEST CALLBACK

Design, Synthesis and FPGA-based Implementation of a 32-bit
Approx. Rs 10,000 / student
Get Best Quote

AIM:

The main aim of the project is to design “Design, Synthesis and FPGA-based

Implementation of a 32-bit Digital Signal Processor”.

(ABSTRACT)

With the advent of personal computer, smart phones, gaming and other multimedia devices, the demand for DSP processors in semiconductor industry and modern life is ever increasing. Traditional DSP processors which are special purpose (custom logic) logic , added to essentially general purpose processors, no longer tends to meet the ever increasing demand for processing power. Today FPGAs have become an important platform for implementing high–end DSP applications and DSP processors because of their inherent parallelism and fast processing speed. This design work models and synthesizes a 32 bit two stage pipelined DSP processor for implementation on a Xilinx Spartan-3E (XC3S500e) FPGA. The design is optimized for speed constraint. A hazard free pipelined architecture and a dedicated single cycle integer Multiply-Accumulator (MAC) contribute in enhancing processing speed of this design. The design maintains a restricted instruction set, and consists of four major components: 1) the hazard free speed optimized Control unit, 2) a two stage pipelined data path, 3) a single cycle multiply and accumulator (MAC) and 4) a system memory. Harvard architecture is used to improve the processor’s performance as both memories (program and data memory) are accessed simultaneously. The complete processor design has been defined in VHDL. Functionalities of designed processor are verified through Functional Simulation using Modelsim SE 6.5 simulator. The design is placed and routed for a Xilinx Spartan-3E FPGA

 

Proposed Method:

In the proposed architecture we can extent 32 bit to 64 bit architecture . we implemented this architecture with Harvard architecture both memories (program and data memory) are accessed simultaneously. Due to this more hardware is required so cost is more .Modification is we can implement this architecture with modified Harvard architecture in which we can implement both data memory and program memory can be done with single memory and allocate different time slot so that hard ware will be reduced and cost also reduces. And we can implement with verilog HDL language.

Advantage: Ø Two stage pipelined DSP processor can successfully manipulate two instructions at a time even if they have hazards and produce correct cycle by cycle timing. Ø The speed and throughput of the processor are found around 13 MHz and 12.06MB/sec respectively.

BLOCK DIAGRAM:

 

Fig.1 System diagram of Proposed 32 bit Pipelined DSP Processor

TOOLS: xlinx 9.2i ise, model sim 6.4c

 

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD

    REQUEST CALLBACK

    Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design “Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD Support”.

    (ABSTRACT)

    Binary64 arithmetic is rapidly becoming inadequate to cope with today’s large-scale computations due to an accumulation of errors. Therefore, binary128 arithmetic is now required to increase the accuracy and reliability of these computations. At the same time, an obvious trend emerging in modern processors is to extend their instruction sets by allowing single instruction multiple data (SIMD) execution, which can significantly accelerate the data-parallel applications. To address the combined demands mentioned above, this paper presents the architecture of a low-cost binary128 floating-point fused multiply add (FMA) unit with SIMD support. The proposed FMA design can execute a binary128 FMA every other cycle with a latency of four cycles, or two binary64 FMAs fully pipelined with a latency of three cycles, or four binary32 FMAs fully pipelined with a latency of three cycles. We use two binary64 FMA units to support binary128 FMA which requires much less hardware than a fully pipelined binary128 FMA. The presented binary128 FMA design uses both segmentation and iteration hardware vectorization methods to trade off performance, such as throughput and latency, against area and power. Compared with a standard binary128 FMA implementation, the proposed FMA design has 30 percent less area and 29 percent less dynamic power dissipation.

    Proposed Method:

    In this architecture, Array multiplier is used we can replace that multiplier with vedic multiplier which is more faster

    Advantage:

    Following the approach, the presented binary128 FMA design uses both segmentation and iteration hardware vectorization methods to trade off performance, such as throughput and latency, against area and power. Compared with a standard binary128 FMA implementation, the proposed FMA design has 30 percent less area and 29 percent less dynamic power dissipation.

    BLOCK DIAGRAM: 

     

    Fig. 1 block diagram of the proposed SIMD FMA unit

    TOOLS: XILLINX ISE 9.2i, MODEL SIM 6.4c

    REFERENCE:

    [1] R.K. Montoye, E. Hokenek, and S.L. Runyon, “Design of the IBM RISC System/6000 Floating-Point Execution Unit,” IBM J. Research & Development, vol. 34, pp. 59-70, 1990.

    [2] S.K. Raman, V. Pentkovski, and J. Keshava, “Implementing Streaming SIMD Extensions on the Pentium III Processor,” IEEE Micro, vol. 20, no. 4, pp. 47-57, July/Aug. 2000.

     

    [3] C. Keltcher, K. McGrath, A. Ahmed, and P. Conway, “The AMD Opteron Processor for Multiprocessor Servers,” IEEE Micro, vol. 23, no. 2, pp. 66-76, Mar./Apr. 2003.

  • Item Code: 555
  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    Scalable Digital CMOS Comparator

    REQUEST CALLBACK

    Scalable Digital CMOS Comparator
    Approx. Rs 10,000 / Piece
    Get Best Quote

    AIM:

    The main aim of the project is to design “Scalable Digital CMOS Comparator Using a Parallel Prefix Tree”.

    ABSTRACT:

    We present a new comparator design featuring wide-range and high-speed operation using only conventional digital CMOS cells. Our comparator exploits a novel scalable parallel prefix structure that leverages the comparison outcome of the most significant bit, proceeding bitwise toward the least significant bit only when the compared bits are equal. This method reduces dynamic power dissipation by eliminating unnecessary transitions in a parallel prefix structure that generates the N-bit comparison result after CMOS gate delays. Our comparator is composed of locally interconnected CMOS gates with a maximum fan-in and fan-out of five and four, respectively, independent of the comparator bit width. The main advantages of our design are high speed and power efficiency, maintained over a wide range. Additionally, our design uses a regular reconfigurable VLSI topology, which allows analytical derivation of the input-output delay as a function of bit width. simulation for a 64-b comparator shows a worst case input-output delay of 0.86 ns and a maximum power dissipation of 7.7 mW using 0.15-µm TSMC technology at 1 GHz.

     

    Proposed method

    We can replace the gates used in this with muxed based gates which will make operation more faster

     

    BLOCK DIAGRAM:

     

    Fig: Implementation details for the comparison resolution module and the decision module.

    TOOLS:

    Xilinx 9.2ISE, Modelsim6.4c.

    APPLICATION ADVANTAGES:

    · The main advantages of our design are high speed and power efficiency, maintained over a wide range.

    · Additionally, our design uses a regular reconfigurable VLSI topology, which allows analytical derivation of the input-output delay as a function of bit width.

    REFERENCES:

    · H.J.R.LiuandH.Yao, High-Performance VLSI Signal Processing Innovative Architectures and Algorithms, vol. 2. Piscataway, NJ: IEEE Press.

     

    · Y. Sheng and W. Wang, “Design and implementation of compression algorithm comparator for digital image processing on component,” in Proc. 9th Int. Conf. Young Comput. Sci., pp. 1337–1341.

  • Minimum Order Quantity: 1 Piece
  • Yes! I am interested

    Product Code Schemes for Error Correction in MLC NAND Flash

    REQUEST CALLBACK

    Product Code Schemes for Error Correction in MLC NAND Flash
    Approx. Rs 10,000 / student
    Get Best Quote

    AbstractError control coding (ECC) is essential for correcting soft errors in Flash memories. In this paper we propose use of product code based schemes to support higher error correction capability. Speci???cally, we propose product code swhichuse Reed-Solomon (RS) codes along rows and Hamming codes along columns and have reduced hardware overhead. Simulation results show that product codes can achieve better performance compared to both Bose-Chaudhuri-Hocquenghem codes and plain RS codes with less area and low latency. We also propose a ???exible product code based ECC scheme that migrates to a s tronger ECC scheme when the numbers of errors due to increased program/erase cycles increases. While these schemes have slightly larger latency and require additional parity bit storage, they provide an easy mechanism to increase the lifetime of the Flash memory devices.

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    Low-Power and Area-Efficient Carry Select Adder

    REQUEST CALLBACK

    Low-Power and Area-Efficient Carry Select Adder
    Approx. Rs 10,000
    Get Best Quote

    AIM:

    The main aim of the project is to design “Low-Power and Area-Efficient Carry Select Adder”.

    (ABSTRACT)

    Carry Select Adder (CSLA) is one of the fastest adders used in many data-processing processors to perform fast arithmetic functions.From the structure of the CSLA, it is clear that there is scope for reducing the area and power consumption in the CSLA. This work uses a simple and efficient gate-level modification to significantly reduce the area and power of the CSLA. Based on this modification 8-, 16-, 32-, and 64-b square-root CSLA (SQRT CSLA) architecture have been developed and compared with the regular SQRT CSLA architecture. The proposed design has reduced area and power as compared with the regular SQRT CSLA with only a slight increase in the delay. This work evaluates the performance of the

    proposed designs in terms of delay, area, power, and their products by hand with logical effort and through custom design and layout in 0.18- m CMOS process technology.

     

    Proposed Architecture:

    Advantage:

    The reduced number of gates of this work offers the great advantage in the reduction of area and also the total power. compared results show that the modified SQRT CSLA has reduced area and powerThe power-delay product and also the area-delay product of the proposed design show a decrease for 16-, 32-, and 64-b

    sizes which indicates the success of the method and not a mere tradeoff

    of delay for power and area. The modified CSLA architecture is therefore,

    low area, low power, simple and efficient for VLSI hardware implementation.

    BLOCK DIAGRAM: 

     

    Proposed 16-b SQRT CSLA

     

    TOOLS: Xilinx ISE 12.2

    REFERENCE:

    [1] O. J. Bedrij, “Carry-select adder,” IRE Trans. Electron. Comput., pp.

    340–344, 1962.

    [2] B. Ramkumar, H.M. Kittur, and P. M. Kannan, “ASIC implementation

    of modified faster carry save adder,” Eur. J. Sci. Res., vol. 42, no. 1, pp.

    53–58, 2010.

    [3] T. Y. Ceiang and M. J. Hsiao, “Carry-select adder using single ripple

    carry adder,” Electron. Lett., vol. 34, no. 22, pp. 2101–2103, Oct. 1998.

    [4] Y. Kim and L.-S. Kim, “64-bit carry-select adder with reduced area,”

    Electron. Lett., vol. 37, no. 10, pp. 614–615, May 2001.

    Yes! I am interested

    Implementation of I2C Master Bus Controller on FPGA

    REQUEST CALLBACK

    Implementation of I2C Master Bus Controller on FPGA
    Approx. Rs 10,000 / No
    Get Best Quote

    AIM:

    The main aim of the project is to design “I2C Master Bus Controller

    on FPGA”.

    (ABSTRACT)

    This paper implements serial data communication using I2C (Inter-Integrated Circuit) master bus controller using a field programmable gate array (FPGA). The I2C master bus controller was interfaced with MAXIM DS1307, which act as a slave.This module was designed in Verilog HDL and simulated in Modelsim 10.1c. The design was synthesized using Xilinx ISE Design Suite 14.2. I2C master initiates data transmission and in order slave responds to it. It can be used to interface low speed peripherals like motherboard, embedded system,mobile phones, set top boxes, DVD, PDA’s or other electronic devices.

    Proposed Architecture:

    Advantage:

    Any low speed peripheral devices can be interfaced using I2C bus protocol as master. In future, this can be implemented as real time clock in networks that

    contains multiple masters and multiple slaves to coordinate the entire system by clock synchronization techniques.

     

    BLOCK DIAGRAM:

     

    Diagram of I2C Master Controller interfaced with

    DS1307 RTC slave device

     

    TOOLS: Xilinx ISE 12.2

    REFERENCE:

    [1] Philips Semiconductor “I2C Bus Specification”version 2. 1, January 2000.

    [2] Maxim integrated “DS1307 64 x 8, Serial, I2C Real Time Clock”, 2008.

    [3] Prof. Jai Karan Singh “Design and Implementation of I2c master controller on FPGA using VHDL,” IJET,Vol 4 No 4 Aug-Sep 2012.

    [4] Raj kamal ,“Embedded system: Architecture programming and Design”,Tata McGraw Hill,2008.

    [5] Stuart Sutherland, “Verilog® HDL Quick Reference Guide”, IEEE Std 1364-2001.

  • Minimum Order Quantity: 1 No
  • Yes! I am interested

    Efficient Majority Logic Fault Detection WithDifference-Set

    REQUEST CALLBACK

    Efficient Majority Logic Fault Detection WithDifference-Set
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design “Efficient Majority Logic Fault Detection with Difference-Set Codes for Memory Applications”.

    ABSTRACT:

    Nowadays, single event upsets (SEUs) altering digital circuits are becoming a bigger concern for memory applications. This paper presents an error-detection method for difference-set cyclic codes with majority logic decoding. Majority logic decodable codes are suitable for memory applications due to their capability to correct a large number of errors. However, they require a large decoding time that impacts memory performance. The proposed fault-detection method significantly reduces memory access time when there is no error in the data read. The technique uses the majority logic decoder itself to detect failures, which makes the area overhead minimal and keeps the extra power consumption low.

    BLOCK DIAGRAM:

     

    Fig: Schematic of an ML decoder. I) cyclic shift register. II) XOR matrix. III) Majority gate. IV)XOR for correction

    PROPOSEDWORK:

    The performance of iterative decoding algorithm for one-step majority logic decodable (OSMLD) codes is investigated. A new soft-in soft-out of APP threshold algorithm which is able to decode theses codes nearly as well as belief propagation (BP) algorithm. Computation time of the proposed algorithm is very low. The developed algorithm can also be applied to product codes and parallel concatenated codes based on block codes.

     

     

    TOOLS:
    Xilinx ISE 9.2 I and ModelSim 6.4c.

    APPLICATION ADVANTAGES:

    ü The proposed technique is able to detect any pattern of up to five bit-flips in the first three cycles of the decoding process. This improves the performance of the design with respect to the traditional MLD approach.

    ü The MLDD error detector module has been designed in a way that is independent of the code size. This makes its area overhead quite reduced compared with other traditional approaches such as the syndrome calculation (SFD).

    ü A theoretical proof of the proposed MLDD scheme for the case of double errors has also been presented.

    REFERENCES:

    · C. W. Slayman, “Cache and memory error detection, correction, and reduction techniques for terrestrial servers and workstations,” IEEE Trans. Device Mater. Reliabil., vol. 5, no. 3, pp. 397–404.

    ·

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    Period Extension and Randomness Enhancement Using High-Throu

    REQUEST CALLBACK

    Period Extension and Randomness Enhancement Using High-Throu
    Approx. Rs 10,000 / student
    Get Best Quote

     

    AIM:

    The main aim of the project is to achieve“Period Extension and Randomness Enhancement Using High-Throughput Reseeding-Mixing PRNG”.

    (ABSTRACT)

    In this paper ,we present a new reseeding-mixing method to extend the system period length and to enhance the statistical properties of a chaos-based logistic map pseudo random number generator (PRNG). The reseeding method removes the short periods of the digitized logistic map and the mixing method extends the system period length by ???XOring??? with a DX generator. When implemented in the TSMC 0.18-μm 1P6M CMOS process, the new reseeding-mixing PRNG (RM-PRNG) attains the best throughput rate of 6.4 Gb/s compared with other nonlinear PRNGs. In addition, the generated random sequences pass the NIST SP 800-22 statistical tests including ratio test and U-value test

    Proposed Architecture:

    Advantage:

    The proposed hardware implementation of RM-PRNG offer long periods and high throughput rate while adhering to established statistical standards for PRNGs. the hardware cost is reduced and the hardware efficiency increases. In addition, the high throughput rate (>6.4 Gb/s) is attained because RM-PRNG can generate multiple random bits in an iteration. With all these advantages, the proposed nonlinear RM-PRNG can a good candidate for potential applications in test pattern generation, telecommunication system and even cryptography if the security issue can be addressed properly.

    BLOCK DIAGRAM: 

     

    Structure of the proposed RM-PRNG.

     

    TOOLS: Xilinx ISE 12.2

    REFERENCE:

    1. J. E. Gentle, Random Number Generation and Monte CarloMethods,2nded.NewYork:Springer-Verlag, 2003.

     

    2. M. P. Kennedy, R. Rovatti, and G. Setti, Chaotic Electronics in Telecommunications. Boca Raton, FL: CRC, 2000.

     

    3. D. Knuth, The Art of Computer Programming, 2nd ed. Reading, MA: Addison-Wesley, 1981.

     

    4. A). Klapper and M. Goresky, ???Feedback shift registers, 2-adic span, and combiners with memory,??? J. Cryptology, vol. 10, pp. 111–147, 1997.

     

    5. D. H. Lehmer, ???Mathematical methods in large-scale computing units,??? in Proc. 2nd Symp. Large Scale Digital Comput. Machinery, Cambridge, MA, 1951, pp. 141–146, Harvard Univ. Press.

     

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    Low-Power and Area-Efficient Carry Select Adder

    REQUEST CALLBACK

    Low-Power and Area-Efficient Carry Select Adder
    Approx. Rs 10,000
    Get Best Quote

    AIM:

    The main aim of the project is to design “Low-Power and Area-Efficient Carry Select Adder”.

    (ABSTRACT)

    Carry Select Adder (CSLA) is one of the fastest adders used in many data-processing processors to perform fast arithmetic functions.From the structure of the CSLA, it is clear that there is scope for reducing the area and power consumption in the CSLA. This work uses a simple and efficient gate-level modification to significantly reduce the area and power of the CSLA. Based on this modification 8-, 16-, 32-, and 64-b square-root CSLA (SQRT CSLA) architecture have been developed and compared with the regular SQRT CSLA architecture. The proposed design has reduced area and power as compared with the regular SQRT CSLA with only a slight increase in the delay. This work evaluates the performance of the

    proposed designs in terms of delay, area, power, and their products by hand with logical effort and through custom design and layout in 0.18- m CMOS process technology.

     

    Proposed Architecture:

    Advantage:

    The reduced number of gates of this work offers the great advantage in the reduction of area and also the total power. compared results show that the modified SQRT CSLA has reduced area and powerThe power-delay product and also the area-delay product of the proposed design show a decrease for 16-, 32-, and 64-b

    sizes which indicates the success of the method and not a mere tradeoff

    of delay for power and area. The modified CSLA architecture is therefore,

    low area, low power, simple and efficient for VLSI hardware implementation.

    BLOCK DIAGRAM: 

     

    Proposed 16-b SQRT CSLA

     

    TOOLS: Xilinx ISE 12.2

    REFERENCE:

    [1] O. J. Bedrij, “Carry-select adder,” IRE Trans. Electron. Comput., pp.

    340–344, 1962.

    [2] B. Ramkumar, H.M. Kittur, and P. M. Kannan, “ASIC implementation

    of modified faster carry save adder,” Eur. J. Sci. Res., vol. 42, no. 1, pp.

    53–58, 2010.

    Yes! I am interested

    BIST Based Test Applications Enhanced - Adaptive Low Power

    REQUEST CALLBACK

    BIST Based Test Applications Enhanced - Adaptive Low Power
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design “BIST Based Test Applications Enhanced with Adaptive Low Power RTPG and LFSR Reseeding Techniques”.

    (ABSTRACT)

    Power, area and time are the major milestones for VLSI circuits. Power consumed

    during the scan based test mode of a circuit is much more than that in the normal mode because of increased switching transitions. This work is aimed to reduce the power consumed during testing a circuit without affecting the test coverage, speed and memory requirements. The work can be applied to Built In Self Test (BIST)

    based test applications. To achieve all these objectives, a Low Power Random Test Pattern Generator (LPRTPG) along with partial LFSR reseeding is added to conventional BIST unit. The experimental results shows efficiency of the work in terms of reduction in test power and memory requirements.

    Proposed Architecture:

    We can change the lfsr block with another kind of lfsr(LP LFSR)

    Advantage:

    The architecture improves the trade-off between test coverage and shift power. Shift power is reduced considerably with a negligible test coverage loss so that the high power dissipation effects in CUT can be reduced. Since the transition controller and tester cover a negligible portion of chip area, the burden of area overhead can be avoided.

    BLOCK DIAGRAM:

     

    Proposed BIST architecture

     

     

    TOOLS: Xilinx ISE 12.2

    REFERENCE:

    [1] S Sivanantham, Renju Thomas John and Sreekanth K. D, “Adaptive low power RTPG for BIST based test applications,” in 2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 09 – 11, 2013.

    [2] M. Kalaiselvi, K.S.Neelukumari, "LFSR reseeding scheme for achieving test coverage,” in IEEE Transactions 2013.

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    32 By32 Bit Multi precision Razor-Based Dynamic Voltage Sc

    REQUEST CALLBACK

    32 By32 Bit Multi precision Razor-Based Dynamic Voltage Sc
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design “32 Bit×32 Bit Multi precision Razor-Based DynamicVoltage Scaling Multiplier With Operands Scheduler”.

    (ABSTRACT)

    A multiprecision (MP)reconfigurable multiplier that incorporates variable precision, parallel processing (PP), razor-based dynamic voltage scaling DVS), and dedicated MP operands scheduling to provide optimum performance for a variety of operating conditions. All of the building blocks of the proposed reconfigurable multiplier can either work as independent smaller-precision multipliers or work in parallel to perform higher-precision multiplications. Given the user’s requirements (e.g., throughput), a dynamic voltage/ frequency scaling management unit configures the multiplier to operate at the proper precision and frequency. Adapting to the run-time workload of the targeted application, razor flip-flops together with a dithering voltage unit then configure the multiplier to achieve the lowest power consumption. The single-switch dithering voltage unit and razor flip-flops help to reduce the voltage safety margins and overhead typically associated to DVS to the lowest level. The large silicon area and power overhead typically associated to reconfigurability features are removed. Finally, the proposed novel MP multiplier can further benefit from an operands scheduler that rearranges

    the input data, hence to determine the optimum voltage and frequency operating conditions for minimum power consumption.

    Proposed Architecture:

    We can implement this architecture with 64x64

    Advantage:

    Novel MP multiplier architecture featuring respectively, and reduction in silicon area and power consumption compared with its 32 × 32 bit conventional fixed-width multiplier counterpart.When integrating this MP multiplier architecture with an error-tolerant razor-based DVS approach and the proposed novel operands scheduler, total power reduction was achieved with a total silicon area overhead as low.

    BLOCK DIAGRAM:

     

    Fig . Multiplier system architecture.

     

    TOOLS: Xilinx

    REFERENCE:

    1] R. Min, M. Bhardwaj, S.-H. Cho, N. Ickes, E. Shih, A. Sinha, A. Wang, and A. Chandrakasan, “Energy-centric enabling technologies for wireless sensor networks,” IEEE Wirel. Commun., vol. 9, no. 4, pp. 28–39, Aug. 2002.

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    A 16-Core Processor With Shared-Memory and Message-Passing C

    REQUEST CALLBACK

    A 16-Core Processor With Shared-Memory and Message-Passing C
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design “A 16-Core Processor With Shared-Memory and Message-Passing Communications”.

    (ABSTRACT)

    16-core processor with both message-passing and shared-memory inter-core communication mechanisms is implemented in 65 nm CMOS. Message-passing communication is enabled in a 3 6 Mesh packet-switched network-on-chip, and shared-memory communication is supported using the shared memory within each cluster.

    Proposed Architecture:

    Advantage:

    The processor has 16 processor cores and 2 memory cores. Message-passing communications are supported by the 3 6 2D Mesh NoC, and shared-memory communications are supported by shared memory units in the memory cores. The

    proposed cluster-based memory hierarchy makes the processor well-suited for most embedded applications. The processor chip has a total 256 KB on-chip memory, while each processor core has an 8 KB instruction memory and a 4 KB private data memory, and each memory core has a 32 KB shared memory..

    BLOCK DIAGRAM: 

     

    Fig. Architecture overview of the proposed 16-core processor.

    TOOLS: Xilinx

    REFERENCE:

    [1] G. Blake, R. G. Dreslinski, and T. Mudge, “A survey of multi core processors:

    A review of their common attributes,” IEEE Signal Process. Mag., pp. 26–37, Nov. 2009.

    [2] R. Kumar, V. Zyuban, and D. Tullsen, “Interconnections in multi-core architecture: Understanding mechanisms, overheads and scaling,” inProc. 32nd Int. Symp. Computer Architecture (ISCA’05), 2005, pp.

    408–419.

    [3] H.-Y. Kim, Y.-J. Kim, J.-H. Oh, and L.-S. Kim, “A reconfigurable SIMT processor for mobile ray tracing with contention reduction in shared memory,” IEEE Trans. Circuits Syst. I, Reg. Papers, no. 60, pt.4, pp. 938–950, Apr. 2013.

    [4] L. Hammond, B.-A. Hubbert, M. Siu, M.-K. Prabhu,M. Chen, and K.

    Olukolun, “The stanford Hydra CMP,” IEEE Micro, vol. 20, no. 2, pp.71–84, 2000.

    [5] A. S. Leon, B. Langley, and L. S. Jinuk, “The Ultra SPARC T1 processor:

     

    CMT reliability,” in Proc. Custom Integrated Circuits Conf. (CICC’06) Dig. Tech. Papers, 2006, pp. 555–562.

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    A Practical NoC Design for Parallel DES Computation

    REQUEST CALLBACK

    A Practical NoC Design for Parallel DES Computation
    Approx. Rs 10,000 / student
    Get Best Quote

    The main aim of this project is to monitoring the product in various manufacturing systems.

    The purpose of this project is to obtain the real-time status and data of the various manufacturing systems located in different nodes.

    Now a day’s energy product has some life time and that life time is going to decrease day by day for every product to check that life time we are going to spend a lot of money by checking the product life time. So by using the RFID and Zigbee Based Manufacturing system we can AIM:

    The main aim of the project is to design “A Practical NoC Design for Parallel DES Computation”.

    ABSTRACT:

    The Network-on-Chip (NoC) is considered to be a new SoC paradigm for the next generation to support a large number of processing cores. The idea to combine NoC with homogeneous processors constructing a Multi-Core NoC (MCNoC) is one way to achieve high computational throughput for specific purpose like cryptography. Many researches use cryptography standards for performance demonstration but rarely discuss a suitable NoC for such standard. The goal of this paper is to present a practical methodology without complicated virtual channel or pipeline technologies to provide high throughput Data Encryption Standard (DES) computation on FPGA. The results point out that a mesh-based NoC with packet and Processing Element (PE) design according to DES specification can achieve great performance over previous works. Moreover, the deterministic XY routing algorithm shows its competitiveness in high throughput NoC and the West-First routing offers the best performance among Turn-Model routings, representatives of adaptive routing.

    Proposed architecture

    In this paper 2x2 mesh topology was implemented we can use 2x2 torus topology

    BLOCK DIAGRAM:

     

    Fig: A 5×5 mesh-based MCNoC

     

    TOOLS:

    Xilinx 9.2ISE, Modelsim6.4c.

    APPLICATION ADVANTAGES:

    · A high throughput DES computation design can be achieved with low-cost switching, packet format and routing algorithms in a 5×5 mesh-based MCNoC.

    · Using large PE is area efficient to FPGA and having PE processing time longer than routing time is a key factor for PE architecture selection.

     

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    A VLIW Architecture for Executing Multi-Scalar/Vector Instru

    REQUEST CALLBACK

    A VLIW Architecture for Executing Multi-Scalar/Vector Instru
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design “A VLIW Architecture for Executing

    Multi-Scalar/Vector Instructions on Unified Datapath”.

    (ABSTRACT)

    This paper proposes new processor architecture for accelerating data-parallel applications based on the combination of VLIW and vector processing paradigms. It uses VLIW architecture for processing multiple independent scalar instructions concurrently on parallel execution units. Data parallelism is expressed by vector ISA and processed on the same parallel execution units of the VLIW architecture. The proposed processor, which is called VecLIW, has unified register file of

    64x32-bit registers in the decode stage for storing scalar/vector data. VecLIW can issue up to four scalar/vector operations in each cycle for parallel processing a set of operands and producing up to four results. However, it cannot issue more than

    one memory operation at a time, which loads/stores 128-bit scalar/vector data from/to data cache. Four 32-bit results can be written back into VecLIW register file. The complete design of our proposed VecLIW processor is implemented using Verilog targeting the Xilinx FPGA Virtex-5, XC5VLX110T-3FF1136 device.

    Proposed Architecture:

    VLIW size is 256 with more than 20 instruction operations, and 4 parallel stages

    Advantage:

    VecLIW executes multi-scalar and vector instructions on the same parallel execution datapath. VecLIW has a modified five-stage pipeline for (1) fetching 128-bit VLIW instruction (four individual instructions), (2) decoding/reading operands of the four instructions packed in VLIW, (3) executing four operations on parallel execution units, (4) loading/storing 128-bit (4×32-bit scalar/vector) data from/to data memory, and (5)writing back 4×32-bit scalar/vector results.

    BLOCK DIAGRAM: 

     

    VecLIW datapath for executing multi-scalar/vector instructions

     

    TOOLS: Xilinx ISE 12.2

    REFERENCE:

    [1] J. Hennessay and D. Patterson, Computer Architecture A Quantitative

     

    Approach, 5th ed, Morgan-Kaufmann, September 2011.

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    An Optimized Modified Booth Recoder For Efficient Design

    REQUEST CALLBACK

    An Optimized Modified Booth Recoder For Efficient Design
    Approx. Rs 10,000
    Get Best Quote

    • AIM:

      The main aim of the project is to design “An Optimized Modified Booth Recoder for Efficient Design of the Add-Multiply Operator”.

      (ABSTRACT)

      Complex arithmetic operations are widely used in Digital Signal Processing (DSP) applications. In this paper, we focus on optimizing the design of the fused Add-Multiply (FAM) operator for increasing performance. We investigate techniques

      to implement the direct recoding of the sum of two numbers in its Modified Booth (MB) form. We introduce a structured and efficient recoding technique and explore three different schemes by incorporating them in FAM designs. Comparing them with the FAM designs which use existing recoding schemes, the proposed technique yields considerable reductions in terms of critical delay,hardware complexity and power consumption of the FAM unit.

      Proposed Architecture:

       

      Advantage:

       

      The proposed recoding schemes, when they are incorporated in FAMdesigns, yield considerable performance improvements in comparison with the most efficient recoding schemes found in literature.

      BLOCK DIAGRAM: 

       

      fused design

      with direct recoding of the sum of and in its MB representation. The multiplier

      is a basic parallel multiplier based on the MB algorithm. The terms CT,

      CSA Tree and CLA Adder are referred to the Correction Term, the Carry-Save

      Adder Tree and the final Carry-Look-Ahead Adder of the multiplier.

       

      TOOLS: Xilinx ISE 12.2

      REFERENCE:

      [1] A. Amaricai, M. Vladutiu, and O. Boncalo, “Design issues and implementations

    Yes! I am interested

    An Optimized Modified Booth Recoder For Efficient Design

    REQUEST CALLBACK

    An Optimized Modified Booth Recoder For Efficient Design
    Approx. Rs 10,000
    Get Best Quote

    AIM:

    The main aim of the project is to design “An Optimized Modified Booth Recoder for Efficient Design of the Add-Multiply Operator”.

    (ABSTRACT)

    Complex arithmetic operations are widely used in Digital Signal Processing (DSP) applications. In this paper, we focus on optimizing the design of the fused Add-Multiply (FAM) operator for increasing performance. We investigate techniques

    to implement the direct recoding of the sum of two numbers in its Modified Booth (MB) form. We introduce a structured and efficient recoding technique and explore three different schemes by incorporating them in FAM designs. Comparing them with the FAM designs which use existing recoding schemes, the proposed technique yields considerable reductions in terms of critical delay,hardware complexity and power consumption of the FAM unit.

    Proposed Architecture:

     

    Advantage:

     

    The proposed recoding schemes, when they are incorporated in FAMdesigns, yield considerable performance improvements in comparison with the most efficient recoding schemes found in literature.

    BLOCK DIAGRAM: 

     

    fused design

    with direct recoding of the sum of and in its MB representation. The multiplier

    is a basic parallel multiplier based on the MB algorithm. The terms CT,

    CSA Tree and CLA Adder are referred to the Correction Term, the Carry-Save

    Adder Tree and the final Carry-Look-Ahead Adder of the multiplier.

     

    TOOLS: Xilinx ISE 12.2

    REFERENCE:

    [1] A. Amaricai, M. Vladutiu, and O. Boncalo, “Design issues and implementations

    for floating-point divide-add fused,” IEEE Trans. Circuits

    Syst. II–Exp. Briefs, vol. 57, no. 4, pp. 295–299, Apr. 2010.

    [2] E. E. Swartzlander and H. H. M. Saleh, “FFT implementation with

    fused floating-point operations,” IEEE Trans. Comput., vol. 61, no. 2,

    pp. 284–288, Feb. 2012.

    [3] J. J. F. Cavanagh,Digital Computer Arithmetic. NewYork:McGraw-

    Hill, 1984.

    [4] S. Nikolaidis, E. Karaolis, and E. D. Kyriakis-Bitzaros, “Estimation of

    signal transition activity in FIR filters implemented by a MAC architecture,”

    IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol.

    19, no. 1, pp. 164–169, Jan. 2000.

    [5] O. Kwon, K. Nowka, and E. E. Swartzlander, “A 16-bit by 16-bitMAC

    design using fast 5: 3 compressor cells,” J. VLSI Signal Process. Syst.,

     

    vol. 31, no. 2, pp. 77–89, Jun. 2002.

    Yes! I am interested

    Area-Delay-Power Efficient Fixed-Point LMS Adaptive Filter W

    REQUEST CALLBACK

    Area-Delay-Power Efficient Fixed-Point LMS Adaptive Filter W
    Approx. Rs 10,000 / student
    Get Best Quote

     

    AIM:

    The main aim of the project is to design “Area-Delay-Power Efficient Fixed-Point LMS Adaptive Filter With Low Adaptation-Delay”.

    (ABSTRACT)

    An efficient architecture for the implementation of a delayed least mean square adaptive filter is presented in this paper. For achieving lower adaptation-delay and area-delay-power efficient implementation, we use a novel partial product generator and propose a strategy for optimized balanced pipelining across the time-consuming combinational blocks of the structure. From synthesis results, we find that the proposed design offers less area-delay product (ADP) and

    less energy-delay product (EDP) than the best of the existing systolic structures, on average, for filter lengths N = 8, 16, and 32. We propose an efficient fixed-point implementation scheme of the proposed architecture, and derive the expression for steady-state error. We show that the steady-state mean squared error obtained from the analytical result matches with the simulation result. Moreover, we have proposed a bit-level pruning of the proposed architecture, which provides saving in ADP and saving in EDP over the proposed structure before pruning without noticeable degradation of steady-state-error performance.

     

    Proposed Architecture:

     

    Advantage:

    Ø An efficient addition scheme for inner-product computation is proposed to reduce the adaptation delay significantly in order to achieve faster convergence performance and to reduce the critical path to support high input-sampling rates.

    Ø Aside from this, a strategy for optimized balanced pipelining across the time-consuming blocks of the structure is proposed to reduce the adaptation delay and power consumption.

    BLOCK DIAGRAM:

     

    Structure of the proposed delayed LMS adaptive filter.

     

    Proposed structure of the error-computation block

    Proposed structure of the weight-update block.

    TOOLS: xilinx 12.2

    REFERENCE:

    [1] B. Widrow and S. D. Stearns, Adaptive Signal Processing. Englewood

    Cliffs, NJ, USA: Prentice-Hall, 1985.

    [

     

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    Bist Based Test Applications Enhanced With Adaptive

    REQUEST CALLBACK

    Bist Based Test Applications Enhanced With Adaptive
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    Low Power

    The main aim of the project is to design “BIST Based Test Applications Enhanced with Adaptive Low Power RTPG and LFSR Reseeding Techniques”.

    (ABSTRACT)

    Power, area and time are the major milestones for VLSI circuits. Power consumed

    during the scan based test mode of a circuit is much more than that in the normal mode because of increased switching transitions. This work is aimed to reduce the power consumed during testing a circuit without affecting the test coverage, speed and memory requirements. The work can be applied to Built In Self Test (BIST)

    based test applications. To achieve all these objectives, a Low Power Random Test Pattern Generator (LPRTPG) along with partial LFSR reseeding is added to conventional BIST unit. The experimental results shows efficiency of the work in terms of reduction in test power and memory requirements.

    Proposed Architecture:

    Advantage:

    The architecture improves the trade-off between test coverage and shift power. Shift power is reduced considerably with a negligible test coverage loss so that the high power dissipation effects in CUT can be reduced. Since the transition controller and tester cover a negligible portion of chip area, the burden of area overhead can be avoided.

    BLOCK DIAGRAM:

    Proposed BIST architecture

     

     

    TOOLS: Xilinx ISE 12.2

    REFERENCE:

    [1] S Sivanantham, Renju Thomas John and Sreekanth K. D, “Adaptive low power RTPG for BIST based test applications,” in 2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 09 – 11, 2013.

    [2] M. Kalaiselvi, K.S.Neelukumari, "LFSR reseeding scheme for achieving test coverage,” in IEEE Transactions 2013.

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    Comparative Performance Analysis of XORXNOR Function Based H

    REQUEST CALLBACK

    Comparative Performance Analysis of XORXNOR Function Based H
    Approx. Rs 10,000 / student
    Get Best Quote

    MTech VLSI IEEE Projects 2015 Specialized on M. Tech Vlsi Designing (frontend & Backend)

    32 Bit×32 Bit Multi precision Razor-Based Dynamic Voltage Scaling Multiplier With Operands Scheduler

    A 16-Core Processor With Shared-Memory and Message-Passing Communications

    An Optimized Modified Booth Recoder for Efficient Design of the Add-Multiply Operator

    Low-Power, High-Throughput, and Low-Area Adaptive FIR Filter Based on Distributed Arithmetic

    Improved 8-Point Approximate DCT for Image and Video Compression Requiring Only 14 Additions

    Implementation of I2C Master Bus Controller on FPGA

    Multifunction Residue Architectures for Cryptography

    Area-Delay-Power Efficient Fixed-Point LMS Adaptive Filter With Low Adaptation-Delay

    Aging-Aware Reliable Multiplier Design With Adaptive Hold Logic

    Fast Sign Detection Algorithm for the RNS Moduli Set {2n 1 − 1, 2n − 1, 2n}

    Efficient Integer DCT Architectures for HEVC

    Bit-Level Optimization of Adder-Trees for Multiple Constant Multiplications for Efficient FIR Filter Implementation

    Design of Efficient Binary Comparators in Quantum-Dot Cellular Automata

    Reverse Converter Design via Parallel-Prefix Adders: Novel Components, Methodology, and Implementations

    Low-Complexity Low-Latency Architecture for Matching of Data Encoded With Hard Systematic Error-Correcting Codes

    Energy-Efficient High-Throughput Montgomery Modular Multipliers for RSA Cryptosystems

    Error Detection in Majority Logic Decoding of Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes

    Low-Power, High-Throughput, and Low-Area Adaptive FIR Filter Based on Distributed Arithmetic

    Pipelined Radix- Feedforward FFT Architectures

    Global built-in self-repair for 3D memories with redundancy sharing and parallel testing

    Radix-4 and radix-8 booth encoded multi-modulus multipliers

    High performance hardware implementation for RC4 stream cipher

    A Practical NoC Design for Parallel DES Computation

    Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-On-Chip

    Parallel AES Encryption Engines for Many-Core Processor Arrays

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    Design and Implementation of 64-Bit Execute Stage for VLIW P

    REQUEST CALLBACK

    Design and Implementation of 64-Bit Execute Stage for VLIW P
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design “Design and Implementation of 64-Bit Execute Stage for VLIW Processor Architecture on FPGA”.

    (ABSTRACT)

    FPGA implementation of 64-bit execute unit for VLIW processor, and improve power representation have been done in this paper. VHDL is used to modelled this architecture. VLIW stands for Very Long Instruction Word. This Processor Architecture is based on parallel processing in which more than one instruction is executed in parallel. This architecture is used to increase the instruction throughput. So this is the base of the modern Superscalar Processors. Basically VLIW is a RISC Processor. The difference is it contains long instruction as compared to RISC. This stage of the pipeline executes the instruction. This is the stage where the ALU (arithmetic logic unit) is located. Execute stage are synthesized and targeted for Xilinx Virtex 4 FPGA and the results calculated for 64-bit Execute stage improve the power as compared to previous work done

    Proposed Method:

    We can increase the number of instructions compared to the existing VLIW processor architecture. And we can increase the number of bits compared to the existing architecture. Due to this we can increase the number of operations can be done at a time compared to existing VLIW architecture.

    Advantage:

    Ø The increase for an n (n=3 to 6) parallel pipe is represented by the expression n4 + 2n2. This means that the amount of logic required to implement the register bypass conditions also increases.

    Ø In this study, the register bypass logic is implemented for a 3/4/5/6 parallel operations per VLIW instruction on FPGA spartan3E “xc4vfx12-12sf363”. It can be inferred from the synthesis report that proposed architecture offer speed which is approximately 216MHz 

    BLOCK DIAGRAM: 

     

     

     

    TOOLS: xlinx 9.2i ise, model sim 6.4c

    REFERENCE:

    [1]Weng Fook Lee, Azrul Halim, Nor Hisham, Yap Vooi Voon, Lo Hai Hiung, Patrick Sebastian. “Implementation Results on Register Bypass Conditions of an n-Parallel Pipes Superscalar Pipeline Microprocessor Core on FPGA,”2007.

     

     

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    Design, Synthesis and FPGA-based Implementation of a 32-bit

    REQUEST CALLBACK

    Design, Synthesis and FPGA-based Implementation of a 32-bit
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design “Design, Synthesis and FPGA-based

    Implementation of a 32-bit Digital Signal Processor”.

    (ABSTRACT)

    With the advent of personal computer, smart phones, gaming and other multimedia devices, the demand for DSP processors in semiconductor industry and modern life is ever increasing. Traditional DSP processors which are special purpose (custom logic) logic , added to essentially general purpose processors, no longer tends to meet the ever increasing demand for processing power. Today FPGAs have become an important platform for implementing high–end DSP applications and DSP processors because of their inherent parallelism and fast processing speed. This design work models and synthesizes a 32 bit two stage pipelined DSP processor for implementation on a Xilinx Spartan-3E (XC3S500e) FPGA. The design is optimized for speed constraint. A hazard free pipelined architecture and a dedicated single cycle integer Multiply-Accumulator (MAC) contribute in enhancing processing speed of this design. The design maintains a restricted instruction set, and consists of four major components: 1) the hazard free speed optimized Control unit, 2) a two stage pipelined data path, 3) a single cycle multiply and accumulator (MAC) and 4) a system memory. Harvard architecture is used to improve the processor’s performance as both memories (program and data memory) are accessed simultaneously. The complete processor design has been defined in VHDL. Functionalities of designed processor are verified through Functional Simulation using Modelsim SE 6.5 simulator. The design is placed and routed for a Xilinx Spartan-3E FPGA

     

    Proposed Method:

    In the proposed architecture we can extent 32 bit to 64 bit architecture . we implemented this architecture with Harvard architecture both memories (program and data memory) are accessed simultaneously. Due to this more hardware is required so cost is more .Modification is we can implement this architecture with modified Harvard architecture in which we can implement both data memory and program memory can be done with single memory and allocate different time slot so that hard ware will be reduced and cost also reduces. And we can implement with verilog HDL language.

    Advantage: Ø Two stage pipelined DSP processor can successfully manipulate two instructions at a time even if they have hazards and produce correct cycle by cycle timing. Ø The speed and throughput of the processor are found around 13 MHz and 12.06MB/sec respectively.

     

    TOOLS: xlinx 9.2i ise, model sim 6.4c

     

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    Energy-efficient High-throughput Montgomery

    REQUEST CALLBACK

    Energy-efficient High-throughput Montgomery
    Approx. Rs 10,000
    Get Best Quote

    Energy-efficient High-throughput Montgomery Modular Multiple

    AIM:

    The main aim of the project is to design “Energy-Efficient High-Throughput Montgomery Modular Multipliers for RSA Cryptosystems”.

    (ABSTRACT)

    Modular exponentiation in the Rivest, Shamir, and Adleman cryptosystem is usually achieved by repeated modular multiplications on large integers. To speed up the encryption/ decryption process, many high-speed Montgomery modular multiplication algorithms and hardware architectures employ carry-save addition to avoid the carry propagation at each addition operation of the add-shift loop. In this paper, we propose an energy-efficient algorithm and its corresponding architecture to not only reduce the energy consumption but also further enhance the throughput of Montgomery modular multipliers. The proposed architecture is capable of bypassing the super fluous carry-save addition and register write operations, leading to less energy consumption and higher throughput. In addition, we also modify the barrel register full adder (BRFA) so that the gated lock design technique can be applied to significantly reduce the energy consumption of storage elements in BRFA.

    Proposed Architecture:

    Advantage:

    The high-speed Montgomery modular multipliers, which speed up the decryption/encryption process by maintaining all inputs and outputs of the modular multiplication in a redundant carry save format. This paper presented an efficient algorithm and its corresponding architecture to reduce the energy consumption and enhance the throughput of Montgomery modular multipliers simultaneously. Moreover, we modified the structure of BRFA and adopted the gated clock Design technique to further reduce the energy consumption of Montgomery modular multipliers.

    BLOCK DIAGRAM: 

     

    fig .1 Structure of RA1 in MBRFA_C.

    TOOLS: Xilinx

    REFERENCE:

    [1] R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital

    signature and public-key cryptosystems,” Commun. ACM, vol. 21, no. 2,

    pp. 120–126, Feb. 1978.

    [2] P. L. Montgomery, “Modular multiplication without trial division,” Math.

    Comput., vol. 44, no. 170, pp. 519–521, Apr. 1985.

    [3] C. K. Koc, T. Acar, and B. S. Kaliski, “Analyzing and comparing

    Montgomery multiplication algorithms,” IEEE Micro, vol. 16, no. 3,

    pp. 26–33, Jun. 1996.

    [4] Y. S. Kim, W. S. Kang, and J. R. Choi, “Implementation of 1024-bit

    modular processor for RSA cryptosystem,” in Proc. IEEE Asia-Pacific

    Conf., Aug. 2000, pp. 187–190.

    Yes! I am interested

    Enhanced Area Efficient Architecture for 128 bit Modified C

    REQUEST CALLBACK

    Enhanced Area Efficient Architecture for 128 bit  Modified C
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design “Enhanced Area Effi cient Architecture for 128 bit Modified CSLA”.

    (ABSTRACT)

    In the design of Integrated circuits, area occupancy plays a vital role because of increasing necessity of portable systems. Carry Select Adder (CSLA) is a fast adder used in dataprocessing processors for performing fast arithmetic functions.

    From the structure of the CSLA, the scope is to reduce the area of CSLA based on the efficient gate-level modification. In this paper 128 bit Regular Linear CSLA, Modified Linear CSLA, Regular Square-root CSLA (SQRT CSLA) and Modified SQRT CSLA architectures have been developed and compared. However, the

    Regular CSLA is still area-consuming due to the dual RippleCarry Adder (RCA) structure. For reducing area, the CSLA can be implemented by using a single RCA and an add-one circuit instead of using dual RCA. Comparing the Regular Linear CSLA with Regular SQRT CSLA, the Regular SQRT CSLA has reduced area as well as comparing the Modified Linear CSLA with Modified SQRT CSLA; the Modified SQRT CSLA has reduced area. The results and analysis show that the Modified Linear CSLA and Modified SQRT CSLA provide better outcomes than

    the Regular Linear CSLA and Regular SQRT CSLA respectively.This project was aimed for implementing high performance optimized FPGA architecture. Xilinx ISE 12.2 Simulator is used for simulating the CSLA and synthesized using Xilinx PlanAhead13.4.Then the implementation is done in Virtex5 FPGA Kit.

    Proposed Architecture:

    Advantage:

    The reduced number of gates of this work offers the great advantage in the reduction of area. The area of the proposed design shows a decrease for 16-bit, 32-bit, 64-bit and 128-bit sizes which indicate the success of the method and not a

    mere tradeoff of delay for area. The Modified CSLA architecture is therefore, low area, simple and efficient for VLSI hardware implementation.

    BLOCK DIAGRAM: 

     

    Modified 16-bit SQRT CSLA

     

    Modified 16-bit Linear CSLA

     

    TOOLS: Xillinx ISE 12.2

    REFERENCE:

    [1] 0.1. Bedrij, "Carry-select adder, " IRE Trans. Electron.Computer., pp. 340-344, 1962.

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    Error Detection in Majority Logic Decoding of Euclidean Geom

    REQUEST CALLBACK

    Error Detection in Majority Logic Decoding of Euclidean Geom
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design and develop “Error Detection in Majority Logic Decoding of Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes”.

    ABSTRACT:

    In a recent paper, a method was proposed to accelerate the majority logic decoding of difference set low density parity check codes. This is useful as majority logic decoding can be implemented serially with simple hardware but requires a large decoding time. For memory applications, this increases the memory access time. The method detects whether a word has errors in the first iterations of majority logic decoding, and when there are no errors the decoding ends without completing the rest of the iterations. Since most words in a memory will be error-free, the average decoding time is greatly reduced. In this brief, we study the application of a similar technique to a class of Euclidean geometry low density parity check (EG-LDPC) codes that are one step majority logic decodable. The results obtained show that the method is also effective for EG-LDPC codes. Extensive simulation results are given to accurately estimate the probability of error detection for different code sizes and numbers of errors.

     

     

     

     

     

    BLOCK DIAGRAM:

     

    Fig: Serial one-step majority logic decoder for the EG-LPDC code.

    Tools:

    Xilinx 9.2i ISE, Modelsim 6.4c.

    Application Advantages:

    · The objective is to reduce the decoding time by stopping the decoding process when no errors are detected.

    · This project extend the ones recently presented for DS-LDPC codes, making the modified one step majority logic decoding more attractive for memory applications.

    · The designer now has a larger choice of word lengths and error correction capabilities.

    References:

    · R. C. Baumann, “Radiation-induced soft errors in advanced semiconductor technologies,” IEEE Trans. Device Mater. Reliab., vol. 5, no. 3, pp. 301–316.

    · M. A. Bajura, Y. Boulghassoul, R. Naseer, S. DasGupta, A. F.Witulski, J. Sondeen, S. D. Stansberry, J. Draper, L. W. Massengill, and J. N. Damoulakis, “Models and algorithmic limits for an ECC-based approach to hardening sub-100-nm SRAMs,” IEEE Trans. Nucl. Sci., vol. 54, no. 4, pp. 935–945.

    ·

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    FPG Design and Implementation

    REQUEST CALLBACK

    FPG Design and Implementation
    Approx. Rs 10,000
    Get Best Quote

    FPG Design and Implementation of 32 Bit Unsigned Multiplier U


    AIM:

    The main aim of the project is to design “32 Bit Unsigned Multiplier Using CLAA and CSLA”.

    (ABSTRACT)

    This project deals with the comparison of the VLSI design of the carry look-ahead adder (CLAA) based 32-bit unsigned integer multiplier and the VLSI design of the carry select adder (CSLA) based 32-bit unsigned integer multiplier. Both the VLSI design of multiplier multiplies two 32-bit unsigned integer values and gives a product term of 64-bit values. The CLAA based multiplier uses the delay time of 99ns for performing multiplication operation where as in CSLA based multiplier also uses nearly the same delay time for multiplication operation. But the area needed for CLAA multiplier is reduced by the CSLA based multiplier to

    complete the multiplication operation. These multipliers are implemented using Altera Quartus II and timing diagrams are viewed through avan waves.

     

    Proposed Architecture:

    We can incerase the bit size to 64 bits. So more number of operations can be done

    Advantage:

    Use of CSLA improves the overall performance of the multiplier. Area delay product reduction is possible with the use of the CSLA based 32 bit unsigned

    parallel multiplier than CLAA based 32 bit unsigned parallel multiplier.

    BLOCK DIAGRAM: 

     

    Carry Look-Ahead Adder

     

    Carry Select Adder

     

    A partial schematic of the multiplier

     

    TOOLS: Xilinx ISE 12.2

    REFERENCE:

    [I] P. Asadi and K. Navi, "A novel highs-speed 54-54 bit multiplier",

     

    Am. J Applied Sci., vol. 4 (9), pp. 666-672. 2007.

    Yes! I am interested

    FPGA Architecture for OFDM Software Defined Radio with an op

    REQUEST CALLBACK

    FPGA Architecture for OFDM Software Defined Radio with an op
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design “FPGA Architecture for OFDM Software Defined Radio with an optimized Direct Digital Frequency Synthesizer”.

    (ABSTRACT)

    A Software Defined Radio (SDR) is a transmitter and receiver system that uses digital signal processing (DSP) for coding, decoding, modulating, and demodulating data. This paper presents the framework for hardware implementation of SDR using Orthogonal Frequency Division Multiplexing (OFDM). The framework comprises of VLSI mapping of algorithms, Orthogonal Frequency Division Multiplexing (OFDM), Quadrature Phase Shift Keying (QPSK), Fast Fourier Transform (FFT) Algorithms and most importantly, the

    algorithm for Direct Digital Frequency Synthesis (DDFS). A digital frequency synthesizer with optimized time and area resources has been proposed for the SDR. This VLSI implementation of the DDFS computes the sine and cosine function on a single edge of clock, thus proving to be optimized in terms of area and speed. Fixed-Point implementation was accomplished with Xilinx simulator. Verilog HDL was used as a description language for mapping Algorithms in VLSI. Xilinx Spartan 3 XC3S200 Field Programmable Gate Array (FPGA) was chosen as a Hardware Platform for the System Implementation.

    Proposed Architecture:

    In this architecture PSK block can be implemented with QPSK block

    Advantage:

    The proposed design has area and speed optimized architecture for Direct Digital

    Frequency Synthesis, one of the backbone for SDR. The proposed framework consumes lesser silicon area and is realizable on FPGA. The required memory resources are extremely less as only two 32-bit registers have been used in the architecture. Besides, the design may be optimized leading to a more intelligent framework to help in realization of a more ideal Cognitive Radio.

    BLOCK DIAGRAM:

     

    Block Diagram of OFDM Software Defined Radio

     

    TOOLS: Xilinx ISE 12.2

    REFERENCE:

    [1] W. Tuttlebee, Software Defined Radio Enabling Technologies. John

    Wiley and Sons, 2002.

    [2] X. Qi, L. Xiao, and S. Zhou, “A novel GPP-based Software-Defined

     

    Radio architecture,” in 7th Internatioal ICST Conference on Communications and Networking in China (CHINACOM), 2012,pp. 838 – 842.

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    Global Built-In Self-Repair for 3D Memories with Redundancy

    REQUEST CALLBACK

    Global Built-In Self-Repair for 3D Memories with Redundancy
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design “Global Built-In Self-Repair for 3D Memories with Redundancy Sharing and Parallel Testing”.

    (ABSTRACT)

    3D integration is a promising technology that provides high memory bandwidth, reduced power, shortened latency, and smaller form factor. Among many issues in 3D IC design and production, testing remains one of the major challenges. This paper introduces a new design-for-test technique called 3D-GESP, an efficient Built-In-Self Repair (BISR) algorithm to fulfill the test and reliability needs for 3D-stacked memories. Instead of the local testing and redundancy allocation method as most current BISR techniques employed, we introduce a global 3D BISR scheme, which not only enables redundancy sharing, but also parallelizes the BISR procedure among all the stacked layers of a 3D memory. Our simulation results show that our proposed technique will significantly increase the memory repair rate and reduce the test time similar speedup with at least 50% EDP reduction across all data activities

    Advantage:

    3D-GESP, is a real Global BISR technique, which enables the global redundancy sharing and parallel testing. The experimental results showed that our 3D-GESP scheme can achieve 27.01% higher repair rate compared to the local BISR, and 8.26% over another global algorithm MESP. In addition, our scheme only requires 1/n testing time compared with the traditional BISR procedure, where n is the number of stacked layers of 3D memories. Therefore, our scheme will significantly improve the manufacturing yield, repair rate, and testing throughput of 3D die-stacked memories.

    BLOCK DIAGRAM:

     

    PROPOSED WORK:

    Built in redundancy-analysis (BIRA) module is one key component of the BISR circuit. In this paper, we can change a BIRA scheme for random access memories (RAMs) with 3D redundancy to improve the yield of RAMs with cluster faults. A RAM with 3D redundancy is equipped with spare rows, spare columns, and spare IOs. The proposed BIRA scheme also can be designed as programmable such that it can serve multiple RAMs and support the multiple-time repair to increase the repair rate further.

     

    TOOLS: XILINX ISE 9.2i AND MODELSIM 6.4C

    REFERENCE:

    [1]S. Bahl. A Sharable Built-in Self-repair for Semiconductor Memories with 2-D Redundancy Scheme. In 22nd IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, pages 331–339, 2007.

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    High Performance Hardware Implementation of AES Using Minima

    REQUEST CALLBACK

    High Performance Hardware Implementation of AES Using Minima
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design “High Performance Hardware Implementation of AES Using Minimal Resources”.

    ABSTRACT:

    Increasing need of data protection in computer networks led to the development of several cryptographic algorithms hence sending data securely over a transmission link is critically important in many applications. Hardware implementation of cryptographic algorithms are physically secure than software implementations since outside attackers cannot modify them. In order to achieve higher performance in today’s heavily loaded communication networks, hardware implementation is a wise choice in terms of better speed and reliability. This paper presents the hardware implementation of Advanced Encryption Standard (AES) algorithm using Xilinx– virtex5 Field Programmable Gate Array (FPGA). In order to achieve higher speed and lesser area, Sub Byte operation, Inverse Sub Byte operation, Mix Column operation and Inverse Mix Column operations are designed as Look Up Tables (LUTs) and Read Only Memories (ROMs). This approach gives a throughput of 3.74Gbps utilizing only 1% of total slices in xc5vlx110t-3- ff1136 target device.

    Proposed method

    In this paper architecture is implemented to 128 bits we can increase bit size to 128 or 256

    BLOCK DIAGRAM:

     

    Fig: Proposed architecture of encryption module.

     

    Fig: Proposed architecture of Decryption module.

    TOOLS:

    Xilinx 9.2ISE, Modelsim6.4c.

    APPLICATION ADVANTAGES:

    · The proposed design serves as the best high speed encryption algorithm and is thus suitable for various applications.

    · Moreover with less area utilization, the proposed design can be embedded with other larger designs as well.

    REFERENCES:

    · M. Goswami and S. Kannojiya, “High Performance FPGA Implementation of AES Algorithm with 128-Bit Keys,” Proc. IEEE Int. Conf. Advances Computing Comm., vol. 1, Himarpur, India, 2011, pp.281-286.

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    High Performance Pipelined Design for FFT Processor

    REQUEST CALLBACK

    High Performance Pipelined Design for FFT Processor
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design “High Performance Pipelined Structure for FFT Processor based on FPGA”.

    (ABSTRACT)

    A radix-2 pipelined FFT processor based on Field Programmable Gate Array (FPGA) for Wireless Local Area Networks (WLAN) is proposed in this paper. This paper concentrates on the development of the Fast Fourier Transform (FFT), based on Decimation-In-Time (DIT) domain, Radix-2 algorithm, this paper uses VHDL as a design entity, and their Synthesis by Xilinx Synthesis Tool on SPARTAN kit has been done. The input of Fast Fourier transform has been given by a PS2 KEYBOARD using a test bench and output has been displayed using the waveforms on the Xilinx Design Suite 12.1.The synthesis results show that the computation for calculating the 32-point Fast Fourier transform is efficient in terms of speed. The implementation was made on a Field Programmable Gate Array (FPGA) because it can achieve higher computing speed than digital signal processors, and also can achieve cost effectively ASIC-like performance with lower development time, and risks.

    Proposed Architecture:

    Advantage:

     

    Unlike being stored in the traditional ROM, the twiddle factors in our pipelined FFT processor can be accessed directly. The processor achieves higher throughput and lower area and latenc with the use of FPGA.

    BLOCK DIAGRAM:

     

    Radix-2 Decimation in Time Domain FFT Algorithm for

    Length of 32 Signals

     

    TOOLS: Xilinx ISE 12.2

    REFERENCE:

    [1] Sneha N.kherde, Meghana Hasamnis, “Efficient Designand Implementation of FFT”, International Journal of Engineering Science and Technology (IJEST), ISSN :0975- 5462 NCICT Special Issue Feb 2011

    [2] Ahmed Saeed, M. Elbably, G. Abdelfadeel, and M. I.Eladawy,“Efficient FPGA implementation of FFT/IFFT Processor”,INTERNATIONAL JOURNAL OF

    CIRCUITS, SYSTEMS AND SIGNAL PROCESSING,Issue 3, Volume[3] 3, 2009

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    High Performance Pipelined Design for FFT Processor based on

    REQUEST CALLBACK

    High Performance Pipelined Design for FFT Processor based on
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design “High Performance Pipelined Structure for FFT Processor based on FPGA”.

    (ABSTRACT)

    A radix-2 pipelined FFT processor based on Field Programmable Gate Array (FPGA) for Wireless Local Area Networks (WLAN) is proposed in this paper. This paper concentrates on the development of the Fast Fourier Transform (FFT), based on Decimation-In-Time (DIT) domain, Radix-2 algorithm, this paper uses VHDL as a design entity, and their Synthesis by Xilinx Synthesis Tool on SPARTAN kit has been done. The input of Fast Fourier transform has been given by a PS2 KEYBOARD using a test bench and output has been displayed using the waveforms on the Xilinx Design Suite 12.1.The synthesis results show that the computation for calculating the 32-point Fast Fourier transform is efficient in terms of speed. The implementation was made on a Field Programmable Gate Array (FPGA) because it can achieve higher computing speed than digital signal processors, and also can achieve cost effectively ASIC-like performance with lower development time, and risks.

    Proposed Architecture:

    We can replace the radix 2 architecture with split radix architecutre and floating point algorithm is also implemented

     

    Advantage:

    Unlike being stored in the traditional ROM, the twiddle factors in our pipelined FFT processor can be accessed directly. The processor achieves higher throughput and lower area and latenc with the use of FPGA.

    BLOCK DIAGRAM:

     

    Radix-2 Decimation in Time Domain FFT Algorithm for

    Length of 32 Signals

    TOOLS: Xilinx ISE 12.2

    REFERENCE:

    [1] Sneha N.kherde, Meghana Hasamnis, “Efficient Designand Implementation of FFT”, International Journal of Engineering Science and Technology (IJEST), ISSN :0975- 5462 NCICT Special Issue Feb 2011

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    High-Performance Hardware Implementation For Rc4 Stream

    REQUEST CALLBACK

    High-Performance Hardware Implementation For Rc4 Stream
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design “High-Performance Hardware

    Implementation for RC4 Stream Cipher”.

    (ABSTRACT)

    RC4 is the most popular stream cipher in the domain of cryptology. In this paper, we present a systematic study of the hardware implementation of RC4, and propose the fastest known architecture for the cipher. We combine the ideas of hardware pipeline and loop unrolling to design an architecture that produces 2 RC4 keystream bytes per clock cycle. We have optimized and implemented our proposed design using Verilog description, synthesized with 130, 90, and 65 nm fabrication technologies at clock frequencies

    Proposed Architecture:

    Generalized PRNG block with 2 loop unrolling, 2 pipeline stages

    Advantage:

    The alleged RC4 has been dominant in the arena of stream ciphers since its advent in 1987, and has earned its reputation as the most popular stream cipher till date. Be it academics or the industry, RC4 has been used in numerous forms and shapes in a majority of cryptographic solutions based on stream ciphers. The algorithm for the cipher is intriguingly simple, and one can easily implement it within a few lines of code. It is rumored in the cryptographic community that an efficient RC4 software implementation can produce the keystream bytes at a rate of three cycles per byte into the hardware architecture for optimizing the area without compromising the runtime performance

    .

    BLOCK DIAGRAM:

     

    Fig. Circuit proposed RC4 architecture

     

    TOOLS: Xilinx

    REFERENCE:

    [1] Software Performance Results from the e STREAM Project, e STREAM, the ECRYPT Stream Cipher Project, http://www. ecrypt.eu.org/stream/perf/#results, 2012.

    [2] The Current eSTREAM Portfolio, eSTREAM, the ECRYPT Stream Cipher Project, http://www.ecrypt.eu.org/stream/index.html, 2012.

    [3] S.R. Fluhrer and D.A. McGrew, “Statistical Analysis of the Alleged RC4 Keystream Generator,” Proc. Seventh Int’l Workshop Fast Software Encryption (FSE ’00), vol. 1978, pp. 19-30, 2000.

    [4] S.R. Fluhrer, I. Mantin, and A. Shamir, “Weaknesses in the Key Scheduling Algorithm of RC4,” Proc. Eighth Ann.Int’l Workshop Selected Area sin Cryptography (SAC ’01), vol. 2259, pp. 1-24, 2001.

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    Implementation of I2C Master Bus Controller on FPGA

    REQUEST CALLBACK

    Implementation of I2C Master Bus Controller on FPGA
    Approx. Rs 10,000 / No
    Get Best Quote

    AIM:

    The main aim of the project is to design “I2C Master Bus Controller

    on FPGA”.

    (ABSTRACT)

    This paper implements serial data communication using I2C (Inter-Integrated Circuit) master bus controller using a field programmable gate array (FPGA). The I2C master bus controller was interfaced with MAXIM DS1307, which act as a slave.This module was designed in Verilog HDL and simulated in Modelsim 10.1c. The design was synthesized using Xilinx ISE Design Suite 14.2. I2C master initiates data transmission and in order slave responds to it. It can be used to interface low speed peripherals like motherboard, embedded system,mobile phones, set top boxes, DVD, PDA’s or other electronic devices.

    Proposed Architecture:

    We have only single master and single slave in this paper. We can extend to multi master multi slave(2’M, 2’S)

    Advantage:

    Any low speed peripheral devices can be interfaced using I2C bus protocol as master. In future, this can be implemented as real time clock in networks that

    contains multiple masters and multiple slaves to coordinate the entire system by clock synchronization techniques.

     

    BLOCK DIAGRAM:

     

    Diagram of I2C Master Controller interfaced with

    DS1307 RTC slave device

     

    TOOLS: Xilinx ISE 12.2

    REFERENCE:

    [1] Philips Semiconductor “I2C Bus Specification”version 2. 1, January 2000.

    [2] Maxim integrated “DS1307 64 x 8, Serial, I2C Real Time Clock”, 2008.

    [3] Prof. Jai Karan Singh “Design and Implementation of I2c master controller on FPGA using VHDL,” IJET,Vol 4 No 4 Aug-Sep 2012.

    [4] Raj kamal ,“Embedded system: Architecture programming and Design”,Tata McGraw Hill,2008.

     

    [5] Stuart Sutherland, “Verilog® HDL Quick Reference Guide”, IEEE Std 1364-2001.

  • Minimum Order Quantity: 1 No
  • Yes! I am interested

    Low-Cost FIR Filter Designs Based on Faithfully Rounded Trun

    REQUEST CALLBACK

    Low-Cost FIR Filter Designs Based on Faithfully Rounded Trun
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design and develop “Low-Cost FIR Filter Designs Based on Faithfully Rounded Truncated Multiple Constant Multiplication/Accumulation”.

    ABSTRACT:

    Low-cost finite impulse response (FIR) designs are presented using the concept of faithfully rounded truncated multipliers. We jointly consider the optimization of bit width and hardware resources without sacrificing the frequency response and output signal precision. Non uniform coefficient quantization with proper filter order is proposed to minimize total area cost. Multiple constant multiplication/accumulation in a direct FIR structure is implemented using an improved version of truncated multipliers. Comparisons with previous FIR design approaches show that the proposed designs achieve the best area and power results.

    BLOCK DIAGRAM:

     

    Fig: Overall FIR filter architecture using multiple constant multipliers/ accumulators with faithfully rounded truncation (MCMAT).

    Tools:

    Xilinx 9.2i ISE, Modelsim 6.4c.

    Application Advantages:

    · This brief has presented low-cost FIR filter designs by jointly considering the optimization of coefficient bit width and hardware resources in implementations.

    · Although most prior designs are based on the transposed form.

    · The direct FIR structure with faithfully rounded MCMAT leads to the smallest area cost and power consumption.

    References:

    · M. M. Peiro, E. I. Boemo, and L. Wanhammar, “Design of high-speed multiplierless filters using a nonrecursive signed common subexpression algorithm,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 49, no. 3, pp. 196–203.

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    Low-Cost FIR Filter Designs Based on Faithfully Rounded Trun

    REQUEST CALLBACK

    Low-Cost FIR Filter Designs Based on Faithfully Rounded Trun
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design and develop “Low-Cost FIR Filter Designs Based on Faithfully Rounded Truncated Multiple Constant Multiplication/Accumulation”.

    ABSTRACT:

    Low-cost finite impulse response (FIR) designs are presented using the concept of faithfully rounded truncated multipliers. We jointly consider the optimization of bit width and hardware resources without sacrificing the frequency response and output signal precision. Non uniform coefficient quantization with proper filter order is proposed to minimize total area cost. Multiple constant multiplication/accumulation in a direct FIR structure is implemented using an improved version of truncated multipliers. Comparisons with previous FIR design approaches show that the proposed designs achieve the best area and power results.

    BLOCK DIAGRAM:

     

    Fig: Overall FIR filter architecture using multiple constant multipliers/ accumulators with faithfully rounded truncation (MCMAT).

    Tools:

    Xilinx 9.2i ISE, Modelsim 6.4c.

    Application Advantages:

    · This brief has presented low-cost FIR filter designs by jointly considering the optimization of coefficient bit width and hardware resources in implementations.

    · Although most prior designs are based on the transposed form.

    · The direct FIR structure with faithfully rounded MCMAT leads to the smallest area cost and power consumption.

    References:

    · M. M. Peiro, E. I. Boemo, and L. Wanhammar, “Design of high-speed multiplierless filters using a nonrecursive signed common subexpression algorithm,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 49, no. 3, pp. 196–203.

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    Low-Power, High-Throughput, and Low-Area Adaptive FIR

    REQUEST CALLBACK

    Low-Power, High-Throughput, and Low-Area Adaptive FIR
    Approx. Rs 10,000
    Get Best Quote

    AIM:

    The main aim of the project is to design “Low-Power, High-Throughput, and Low-Area Adaptive FIR Filter Based on Distributed Arithmetic”.

    (ABSTRACT)

    A novel pipelined architecture for low-power, high-throughput, and low-area implementation of adaptive filter based on distributed arithmetic (DA). The throughput rate of the proposed design is significantly increased by parallel lookup table (LUT) update and concurrent implementation of filtering and weight-update operations. The conventional adder-based shift accumulation for DA-based inner-product computation is replaced by conditional signed carry-save accumulation in order to reduce the sampling period and area complexity. Reduction of power consumption is achieved in the proposed design by using a fast bit clock for carry-save accumulation but a much slower clock for all other operations. It involves the same number of multiplexors, smaller LUT, and nearly half the number of adders compared to the existing DA-based design. From synthesis results, it is found that the proposed design consumes less power and less area-delay product (ADP) over our previous DA-based adaptive filter in average for filter lengths N = 16 and 32.

    Proposed Architecture:

    Advantage:

    an efficient pipelined architecture for low-power, high-throughput, and low-area implementation of DA-based adaptive filter. Throughput rate is significantly enhanced by parallel LUT update and concurrent processing of filtering operation and weight-update operation. We have also proposed a carry-save accumulation scheme of signed partial inner products for the computation of filter output. From the synthesis results, we find that the proposed design consumes less power and less ADP over our previous DA-based FIR adaptive filter in average for filter lengths N = 16 and 32.

    BLOCK DIAGRAM: 

     

    Fig Proposed structure of DA-based LMS adaptive filter of length N = 16 and P =4.

    TOOLS: Xilinx

    REFERENCE:

    [1] S. Haykin and B. Widrow, Least-Mean-Square Adaptive Filters. Hoboken, NJ, USA: Wiley, 2003.

    [2] S. A. White, “Applications of the distributed arithmetic to digital signal

    processing: A tutorial review,” IEEE ASSP Mag., vol. 6, no. 3, pp. 4–19, Jul. 1989.

    [3] D. J. Allred, H. Yoo, V. Krishnan, W. Huang, and D. V. Anderson, “LMS adaptive filters using distributed arithmetic for high throughput,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 7, pp. 1327–1337, Jul. 2005.

    [4] R. Guo and L. S. DeBrunner, “Two high-performance adaptive filter implementation schemes using distributed arithmetic,” IEEE Trans. CircuitsSyst. II, Exp. Briefs, vol. 58, no. 9, pp. 600–604, Sep. 2011

    Yes! I am interested

    Novel Method Of Digital Clock Frequency Multiplication

    REQUEST CALLBACK

    Novel Method Of Digital Clock Frequency Multiplication
    Approx. Rs 10,000
    Get Best Quote

    AIM:

    The main aim of the project is to design “Novel method of digital clock frequency multiplication and division using floating point arithmetic”.

    ABSTRACT:

    A digital clock frequency multiplier, divisor using floating point arithmetic which generates the output clock with almost zero frequency error has been presented. The circuit has an unbounded multiplication and division factor range and low lock time. A low power mechanism has been incorporated to ensure that the overall power consumption of the circuit is less. The circuit has been designed in TSMC 65nm CMOS process for an input reference time of 0.01ns and has been verified with random multiplication factor values.

    PROPOSED WORK:

    In this paper they have implemented for single precision floating point multiplier further we can implement the same architecture for double precission floating point multiplier which will be useful for bigger architecture.

    BLOCK DIAGRAM:

     

    Fig: Block diagram of the Clock frequency multiplier and divisor using floating point arithmetic.

    TOOLS:

    Xilinx 9.2ISE, Modelsim6.4c.

    APPLICATION ADVANTAGES:

    · The new proposed clock frequency multiplier and divider has the shorter lock time when compared to the programmable digital frequency multiplier.

    · The new clock frequency multiplier and divider improve the accuracy of frequency multiplication and division by using the floating point division and multiplication algorithms.

    REFERENCES:

     

    · Sanjay K. Wadhwa, Deeya Muhury, & Krishna Thakur, “Programmable digital frequency multiplier”, IEEE Computer Society, VLSID’07.

    Yes! I am interested

    Pipelined Radix- Feed forward FFT Architectures

    REQUEST CALLBACK

    Pipelined Radix- Feed forward FFT Architectures
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design and develop “Pipelined Radix- Feed forward FFT Architectures”.

    ABSTRACT:

    The appearance of radix- was a milestone in the design of pipelined FFT hardware architectures. Later, radix- was extended to radix- . However, radix- was only proposed for single-path delay feedback (SDF) architectures, but not for feed forward ones, also called multi-path delay commutator (MDC). This paper presents the radix- feed forward (MDC) FFT architectures. In feed forward architectures radix- is used for any number of parallel samples which is a power of two. Furthermore, both decimation in frequency (DIF) and decimation in time (DIT) decompositions can be used. In addition to this, the designs can achieve very high throughputs, which make them suitable for the most demanding applications. Indeed, the proposed radix- feed forward architectures require fewer hardware resources than parallel feedback ones, also called multi-path delay feedback (MDF), when several samples in parallel must be processed. As a result, the proposed radix- feed forward architectures not only offer an attractive solution for current applications, but also open up a new research line on feed forward structures.

     

    Existing method: In this the inputs and outputs are taken in integer format which is not that much appropriate.

     

    BLOCK DIAGRAM:

     

    Fig: Flow graph of the 16-point radix- DIF FFT.

    Tools:

    Xilinx 9.2i ISE, Modelsim 6.4c.

    Application Advantages:

    · In feed forward architectures radix- can be used for any number of parallel samples which is a power of two. Indeed, the number of parallel samples can be chosen arbitrarily depending of the throughput that is required.

    · Additionally, both DIF and DIT decompositions can be used.

    · The designs are efficient both in area and performance, being possible to obtain throughputs of the order of GSamples/s as well as very low latencies.

    References:

    · L. Yang, K. Zhang, H. Liu, J. Huang, and S. Huang, “An efficient locally pipelined FFT processor,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53, no. 7, pp. 585–589.

    · H. L. Groginsky and G. A.Works, “A pipeline fast Fourier transform,” IEEE Trans. Comput., vol. C-19, no. 11, pp. 1015–1019.

    · A. M. Despain, “Fourier transform computers using CORDIC iterations,” IEEE Trans. Comput., vol. C-23, pp. 993–1001.

    S. He and M. Torkelson, “Design and implementation of a 1024-point pipeline FFT processor,” in Proc. IEEE Custom Integr. Circuits Conf., pp. 131–134.

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    Radix-4 and Radix-8 Booth Encoded Multi-Modulus Multipliers

    REQUEST CALLBACK

    Radix-4 and Radix-8 Booth Encoded Multi-Modulus Multipliers
    Approx. Rs 10,000
    Get Best Quote

    AIM:The main aim of the project is to design “Radix-4 and Radix-8 Booth Encoded Multi-Modulus Multipliers”.

    (ABSTRACT)

    Novel multi-modulus designs capable of performing the desired modulo operation for more than one modulus in Residue Number System (RNS) are explored in this paper to lower the hardware overhead of residue multiplication. Two multi-modulus multipliers that reuse the hardware resources amongst the modulo

    2n _ 1 modulo 2n and modulo 2n +1multipliers by virtue of their analogous number theoretic properties are proposed. The former employs the radix-22 Booth encoding algorithm and the latter employs the radix-23 Booth encoding algorithm. In the proposed radix-22 and radix-23 Booth encoded multi-modulus multipliers, the modulo-reduced products for the moduli 2n-1,2n ,2n+1, and are computed successively. With the basis of the radix-22 Booth encoded modulo 2n+1 and radix-23 Booth encoded modulo 2n+1 and modulo multiplier architectures, new Booth encoded modulo 2n multipliers are proposed to maximally share the hardware resources in the multi-modulus architectures. based RNS multiplication show that the proposed radix-22 and radix-23 Booth encoded multi-modulus multipliers save area over the corresponding single-modulus multipliers. The proposed radix-22 and radix- 23Booth encoded multi-modulus multipliers increase the delay of the corresponding single-modulus multipliers by and, respectively in the worst case. Compared to the single-modulus multipliers, the proposed multi-modulus multipliers incur a minor power dissipation

    Proposed Architecture:we can replace the full adder architecture with 3:2 compressor

    Advantage:The equivalences in operations central to modulo multiplication, i.e., modulo negation, modulo reduction of binary weight, modulo multiplication by powers-of-two, and two-operand modulo addition for the three special moduli, 2n-1,2n, and 2n+1were demonstrated. New radix- 22and radix-23 Booth encoded modulo 2n multipliers with architectures comparable to those of the corresponding modulo 2n-1, and modulo 2n+1multipliers were introduced.With the correlation among modulo 2n-1 , modulo2n and modulo 2n+1operations as the basis, radix- 22and radix-23 Booth encoded multi-modulus multipliers

    BLOCK DIAGRAM: 

    Fig Proposed multi-modulus partial product addition for radix-23 Booth encoding.

    TOOLS: Xilinx

    REFERENCE:
    [1] R. Conway and J. Nelson, “Improved RNS FIR filter architectures,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 51, no. 1, pp. 26–28, Jan. 2004.[2] A. S. Madhukumar and F. Chin, “Enhanced architecture for residue number system-based CDMA for high rate data transmission,” IEEETrans. Wireless Commun., vol. 3, no. 5, pp. 1363–1368, Sep. 2004.

     

    Yes! I am interested

    Soft-Error-Resilient FPGAs Using Built-In 2-D

    REQUEST CALLBACK

    Soft-Error-Resilient FPGAs Using Built-In 2-D
    Approx. Rs 10,000 / student
    Get Best Quote

    Hamming Product

    AIM:The main aim of the project is to design “Soft-Error-Resilient FPGAs Using Built-In 2-D Hamming Product Code”.

    ABSTRACT:

    Radiation-induced soft error rate (SER) degrades the reliability of static random access memory (SRAM)-based field programmable gate arrays (FPGAs). This paper presents a new built-in 2-D Hamming product code (2–D HPC) scheme to provide reliable operation of SRAM-based FPGAs in hostile operating environments such as space. Multi bit error correction capability of our built-in 2-D HPC can improve the reliability, and hence, system availability, by orders of magnitude. Simulation results show that the large number of error correction capability of 2-D HPC can recover configuration bits without depending on an external memory preserving a golden copy of the configuration bits. To provide efficient 2-D HPC in a built-in logic, we also propose a new 2-D SRAM buffer. Using the proposed multi bit error correction scheme, system availability of an SRAM-based FPGA can be more than 99.9999999% with SRAM cell failures in 1 billion h of operation of 7.

    BLOCK DIAGRAM:

     

    Fig: Proposed FPGA architecture with built-in 2-D HPC and 2-D buffer design.

    PROPOSED WORK:

    2D HPC with Enhanced Detection can be used in FPGAs in mission critical applications like satellite systems. In a satellite system, FPGA performs programmed functions while it interacts with other system components. Due to the susceptibility of SRAM-based FPGA to soft errors, frequent repair processes have to be performed to ensure reliable operation. Thus, repairing FPGAs in such architecture involves interruptions of multiple system components. In that case, synchronization with other blocks, known as coherence problem, can result in further degradation of system performance. Therefore, the actual cost of correcting errors in FPGA and restarting the entire system is much higher than that of repairing FPGA itself. To prevent frequent system interruptions due to SEUs in FPGA, 2D HPC with Enhanced Detection is used. FT-UART so designed is used in safety critical SoC based applications.

    The 2D HPC has 2D Hamming circuitry which corrects single bit errors and detects single and double bit errors. A disadvantage of this scheme is that it fails to detect adjacent bit errors .So to detect double and triple adjacent bit errors Selective Bit Placement Strategy is used. Shortened Hamming codes such as (12,8),(21,16) etc are used for Double adjacent error detection whereas parity extended version versions of Hamming codes such as (13,8),(22,16) etc are used for Triple adjacent errors.

    TOOLS:
    Xilinx ISE 9.2 I and ModelSim 6.4c.

    APPLICATION ADVANTAGES:

    Ø The simulation results show that 2-D single bit corrections can provide very high multi bit ECs.

    Ø This project also includes an optimized hardware implementation of our proposed scheme using a new 2-D SRAM array.

    Ø Due to the extremely large multibit error-handling capability of the proposed design, self-healing of SRAM-based FPGA can be achieved even in harsh radiation environments.

    REFERENCES:

    · A. Leseaet al., “The Rosetta experiment: Atmospheric soft error rate testing in differing technology FPGAs,” IEEE Trans. Device Mater. Rel., vol. 5, no. 3, pp. 317–328.

    ·

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    Novel Method Of Digital Clock Frequency Multiplication And D

    REQUEST CALLBACK

    Novel Method Of Digital Clock Frequency Multiplication And D
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design “Novel method of digital clock frequency multiplication and division using floating point arithmetic”.

    ABSTRACT:

    A digital clock frequency multiplier, divisor using floating point arithmetic which generates the output clock with almost zero frequency error has been presented. The circuit has an unbounded multiplication and division factor range and low lock time. A low power mechanism has been incorporated to ensure that the overall power consumption of the circuit is less. The circuit has been designed in TSMC 65nm CMOS process for an input reference time of 0.01ns and has been verified with random multiplication factor values.

    PROPOSED WORK:

    In this paper they have implemented for single precision floating point multiplier further we can implement the same architecture for double precission floating point multiplier which will be useful for bigger architecture.

    BLOCK DIAGRAM:

     

    Fig: Block diagram of the Clock frequency multiplier and divisor using floating point arithmetic.

    TOOLS:

    Xilinx 9.2ISE, Modelsim6.4c.

    APPLICATION ADVANTAGES:

    · The new proposed clock frequency multiplier and divider has the shorter lock time when compared to the programmable digital frequency multiplier.

    · The new clock frequency multiplier and divider improve the accuracy of frequency multiplication and division by using the floating point division and multiplication algorithms.

    REFERENCES:

    · Sanjay K. Wadhwa, Deeya Muhury, & Krishna Thakur, “Programmable digital frequency multiplier”, IEEE Computer Society, VLSID’07.

    · B. Razavi, Design of Analog CMOS Integrated Circuits, McGraw- Hill, pp. 532.

     

    · B. Razavi, K. F. Lee, & R. H.Yan, “Design of highspeed, low-power Frequency dividers and PLL’s in deep submicron CMOS”, IEEE J. Solid-state Circuits, vo1.30, no.2, pp. 101-109.

     

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

    Design and Evaluation of High-Performance Processing Element

    REQUEST CALLBACK

    Design and Evaluation of High-Performance Processing Element
    Approx. Rs 10,000 / student
    Get Best Quote

    AIM:

    The main aim of the project is to design “Design and Evaluation of High-Performance Processing Elements for Reconfigurable Systems”.

    (ABSTRACT)

    The design and evaluation of two new processing elements for reconfigurable computing. We also present a circuit-level implementation of the data paths

    in static and dynamic design styles to explore the various performance-power tradeoffs involved. When implemented in IBM 90-nm CMOS process, the 8-b data paths achieve operating frequencies ranging over 1 GHz both for static and dynamic implementations, with each data path supporting single-cycle computational capability. A novel single-precision floating point processing element (FPPE) using a 24-b variant of the proposed data paths is also presented.. Comparison with competing architectures shows that the FPPE provides two orders of magnitude higher throughput. Furthermore, to evaluate its feasibility as a soft-processing solution, we also map the floating point unit onto the Virtex 4 and 5 devices, and observe that the unit requires less than of the total logic slices, while utilizing only around 4% of the DSP blocks available. When compared against popular field-programmable-gate-array-based floating point units, our design on Virtex 5 showed significantly lower resource utilization, while achieving comparable peak operating frequency..

    Proposed Architecture:

    Generalized processing element is developed.

    Advantage:

    High-throughput and low-area, data path elements for reconfigurable media processing architectures. When implemented using the static, domino, and D3L methodologies, over a wide range of operating voltages, the D3L version was found to be superior over most of the operating range of both the data

    paths..

    BLOCK DIAGRAM:

     

    fig Organization of the proposed FPPE.

    TOOLS: Xilinx

    REFERENCE:

    [1] M. Taylor, J. Psota, A. Saraf, N. Shnidman, V. Strumpen, M. Frank, S. Amarasinghe, A. Agarwal, W. Lee, D. Wentzlaff, I. Bratt, B. Greenwald, H. Hoffmann, P. Johnson, and J. Kim, “Evaluation of the raw microprocessor:An exposed-wire-delay architecture for ILP and streams,” in Proc. 31st Annu. Int. Symp. Comput. Arch., 2004, pp. 2–13.

  • Minimum Order Quantity: 1 student
  • Yes! I am interested

     
    X


    Call Us: +91-8071803116
    Nano Scientific Research Centre Pvt Ltd
    K. Aravind Reddy (Business Director)
    4 & 6th Floor, Siri Estates, Opposite To Karur Vysya Bank, Opposite Lane To R. S. Brothers Ameerpet,
    Hyderabad - 500073, Telangana, India
    Contact via E-mail
    Contact via SMS



    Home  |   Profile   |  Our Services  |  Site Map  |   Contact Us   |   Mobile Site

    © Nano Scientific Research Centre Pvt Ltd. All Rights Reserved (Terms of Use)
    Developed and Managed by IndiaMART InterMESH Limited

    Nano Scientific Research Centre Pvt Ltd