|
 |
programmablelogicZONE Products for the week of April 7, 2008
Xilinx Says…
Virtex-5 FXT FPGAs Delivers the Ultimate in System Integration for Designs that Demand High-Performance Processing and High-Speed Serial I/O Fourth platform in 65-nm Virtex-5 Series of FPGAs now available with PowerPC 440 Processor Block, GTX high-speed transceivers, and over 190 GMACs DSP performance
Xilinx, Inc. has announced the availability of Virtex-5 FXT devices, the industry’s first FPGAs with embedded PowerPC 440 processor blocks, high-speed RocketIO GTX transceivers and dedicated XtremeDSP processing capabilities. Comprising the fourth platform in the 65nm Virtex-5 family, Virtex-5 FXT devices deliver high performance while enabling designers to reduce system costs, board space and component count. With support from Xilinx and industry-leading providers of logic, embedded and DSP development tools, and IP cores, Virtex-5 FXT FPGAs deliver the ultimate system integration platform for applications in: wired and wireless communications, audio/video broadcast equipment, military, aerospace, industrial systems, and many others.
“The integration of major processing and SERDES components on a single device will be of significant value to designers who need to conserve board space and costs, while meeting stringent requirements for high-performance,” said analyst Will Strauss, President and Founder of Forward Concepts. “In wireless, for example, the kind of base stations a technology like the Virtex-5 FXT platform can enable is enviable, especially in the area of LTE (Long-Term Evolution) basebands in support of 4G communications systems.”
With today’s announcement, Xilinx has completed the introduction of the four domain-optimized Virtex-5 FPGA platforms that comprise the Virtex-5 family. The Virtex-5 family – the first FPGA family to deliver the performance, density, and cost benefits of 65nm – offers unprecedented performance and density gains with speeds on average 30 percent faster and a logic capacity 65 percent greater than previous 90nm generation FPGAs. Xilinx achieved this breakthrough performance while at the same time reducing dynamic power consumption by 35 percent relative to the previous generation devices. The four domain-optimized platforms – LX, LXT, SXT, and FXT – offer a wide range of devices that enable engineers to cost-effectively implement electronics systems by selecting an FPGA that incorporates the optimal mix of resources for their particular design: logic, I/O, and hardened IP blocks for logic-intensive, embedded processing, digital signal processing (DSP), and serial connectivity applications.
Highest performance embedded processing block
The innovative Virtex-5 FXT platform offers the first FPGAs to provide up to two industry-standard PowerPC 440 processor blocks. Each processor, with integrated 32KB instruction and 32KB data caches, delivers up to 1,100 DMIPS at 550 MHz. Tightly coupled to the PowerPC440 blocks is a new integrated 5x2 cross bar processor interconnect architecture that provides simultaneous access to I/O and memory. Highly integrated, this innovative interconnect architecture includes dedicated master and slave processor local bus interfaces, four DMA ports with separate transmit and receive channels, and a dedicated memory bus interface enabling high-performance, low latency point-to-point connectivity.
Designers can rapidly and easily implement advanced scalable embedded processing applications using the PowerPC 440 embedded processor blocks. The advanced PLB architecture maximizes data transfers between the processor, crossbar and soft IP logic with high-throughput 128-bit interfaces to help minimize system bottlenecks. Also, the enhanced high-performance Auxiliary Processor Control Unit (APU) provides added connectivity for dedicated co-processing engines or custom user defined instructions in applications such as video processing, 3D data processing and floating-point math.
With the release of EDK 10.1, the PowerPC440 block in Virtex-5 FXT is supported by industry standard operating systems including Wind River Systems, Green Hills, and other key embedded OS providers. Linux support is provided through MontaVista, Wind River Systems with others soon to be added. In addition, Xilinx is actively engaged in the open source Linux community.
Advanced Serial Connectivity
To address the growing demand for higher I/O bandwidth, the Virtex-5 FXT platform includes high-performance, low-power RocketIO GTX transceivers capable of supporting data rates from 500 Mbps to 6.5Gbps. Customers can design applications supporting standards such as XAUI, Fibre Channel, SONET, Serial RapidIO, PCI Express 1.1 and 2.0, Interlaken, and others. Consuming less than 200mW typical power per channel at 6.5Gbps, the GTX transceivers come with many advanced features such as 4-tap DFE receiver equalization in addition to linear equalization and transmit pre-emphasis to improve signal integrity at higher line rates. The new transceiver blocks also include a unique multi-code physical coding sublayer to support both 64B/66B and 64B/67B encoding/decoding schemes saving thousands of logic cells for each channel. In addition, cross-platform pin compatibility enables customers who have designs targeting Virtex-5 LXT and SXT devices to migrate their designs to Virtex-5 FXT devices in order to take advantage of the higher-performance embedded processing and serial connectivity.
Innovative Signal Processing Capabilities
The Virtex-5 FXT platform includes up to 384 DSP slices and 16.5 MB of internal memory that can be configured to provide over 190 GMACs of DSP processing performance and 92 tera-bits/sec of memory bandwidth respectively at 500 MHz. This balance of hardware resources maximizes the performance for computation-intensive applications typical of DSP and video applications. The DSP48E slice, available in all XtremeDSP Virtex-5 devices, enables higher levels of DSP integration and lower power consumption than previous-generation Virtex devices. Over 40 dynamically controlled operating modes are supported including: multiplier, multiplier-accumulator, multipler-adder/subtractor, tree input adder, barrel shifter, wide counters and comparators.
“To meet both the market and bandwidth requirements of transporting voice, video and data, today’s system on chip solutions must combine flexibility with very high-performance embedded processing, digital signal processing and connectivity capabilities,” said Steve Douglass, vice president of product development for the Xilinx Advanced Product Group. “The Virtex-5 FXT platform ties the high-performance logic and DSP processing capabilities of the Virtex-5 family with industry standard PowerPC 440 processor blocks for high-performance processing and high-speed transceivers that can move data on and off chip fast.”
Design Support
The Virtex-5 FXT FPGA platform is supported by the new ISE Design Suite 10.1 development tools from Xilinx. This recently announced unified development offering includes access to all the domain specific tools to streamline complete system designs for logic, embedded and DSP applications. This includes ISE Foundation, Embedded Development Kit (EDK), System Generator for DSP, AccelDSP synthesis tool, ChipScope Pro and ChipScope Pro serial I/O Toolkit, PlanAhead design and analysis tool and ISE simulator.
In addition to simplified installation and registration processes, ISE Design Suite 10.1 introduces inter-tool integration improvements and makes all products, purchased or not, available for evaluation.
EN-Genius Says…
The introduction of Xilinx biggest, baddest, and heaviest-hitting branch of their V5 family is an especially pivotal moment for the company since its ability (or inability) to live up to the hefty claims being made for it carries roughly the same risks and rewards that Boeing and Airbus face each time they roll out a new airframe. That’s why I was especially pleased to have a chance to see their new Virtex-5 FXT live and running in its native habitat during my last swing through Silicon Valley. Despite the fact that I still have some minor concerns about whether Xilinx SerDes are quite as robust as they claim and the success of their aggressive efforts to improve yield on these complex chips, I am hoping that these issues will simply be speed bumps for a product line that deserves consideration as an alternative to ASICS in many compute-intensive applications.
Before we get into what I saw at Xilinx’s lab, I’ll go over some of the key features covered in their press release. The FXT series is particularly interesting for compute-intensive tasks because it combines the advanced DSP slices originally developed for their SXT series with the most powerful processor core I’ve seen on an FPGA to date. They’ve moved from the V4 series PowerPC 405 (a 5-stage pipeline 450 MHz core) to the PowerPC 440, a 7-stage superscalar unit running at up to 550 MHz. Xilinx has also equipped the processor with an interconnect scheme that effectively couples all that raw compute horsepower to both the FPGA logic elements and to the outside world. This is accomplished via a 128-bit 5 x 2 crossbar that enables the processor to quickly switch between multiple (250 MHz) DMA channels, multiple processor local busses (PLBs), and memory interfaces.
The PowerPC 440 is also equipped with an auxiliary processor unit (APU) that works sort of like Intel’s front-side bus to let you build custom instructions and hardware functions that run under direct processor control. We did not go into great depth about it, but Xilinx says that you can use both their standard logic elements and the DSP slices to build some instruction extensions that can range from a relatively simple double-precision math unit to wild custom cores that handle tasks such as complex array processing or convolutional computation. While visiting Xilinx, I saw a cute demonstration of a complete video transcoding and processing system implemented on a single FPGA that made use of this clever feature.
Like its companion DSP-only SXT family, the FXT DSP slices are stacked in columns that are sandwiched between other columns of dual-port block RAM, distributed RAM, I/O and interconnect logic. These so-called ExtremeDSP blocks can be configured to use either the entire logic chain or any combination of its, multiplier, accumulator ALU and registers, allowing you to implement custom functions and save clock cycles. Depending on the FPGA size, you get either two or four columns of DSP/memory/logic elements that can be operated serially or in cascaded arrays of almost any width at speeds of up to 550 MHz.
Even the smallest FXT FPGAs bristle with a rich assortment of I/O blocks that include several GbE MACs and the system elements for constructing nearly any SerDes-based interface running at 6 Gbits/s or below. Depending on the particular device, you’ll get between 8 and 24 6.125 Gbit/s SerDes transceivers derived the GTX technology originally developed for their FXT family. As the release above indicates, the new SerDes adds a digitally-controlled, four-tap analog decision feedback equalizer (DFE) to the linear equalizer and 8-level transmit pre/de-emphasis circuit used on the slower GTX transceivers. The performance of the circuit I saw running in Xilinx lab indicates that adding the relatively simple DFE to the receive circuit is a good trade-off between circuit complexity, power and performance that should allow it to support deployment of many emerging interface standards that are settling in around the 5 Gbit/s speed range -- as long as the channel characteristics are not overly-challenging (more about this in a moment).
The transceiver integrated gearbox logic core supports most common line coding schemes (8b/10b, 64b/66b, and 64b/67b) without using any of the FPGA’s precious logic cells. If you do need some oddball line coding scheme for your particular application, it can be added using programmable logic. Xilinx supplies IP that allows you to build up transceivers for most popular I/O standards including PCIe (Gen2), SRIO, CPRI, and OBSAI. When I asked about whether they intended to offer a MAC for the Interlaken chip-to-chip interconnect standard, they told me that they expect to roll them out in the near future, as well as the IP for a HyperTransport interface.
Since Xilinx had experienced some early difficulties with the SerDes elements in its Virtex V4 series, I was especially interested to learn what the company had done differently this time. As a result of the lessons learned from its V4 days Xilinx has done several new things in addition to the improved receive equalization. The faster transistor speeds of their 65-nm process made it possible to use digital CMOS in most of the higher speed elements in the PLLs and CDRs instead of the power-hungry, process-sensitive CML circuits they used in the 90 nm V4. Eliminating the current-hungry CML circuits is also a major contributor to the new lower power consumption of the transceiver. Xilinx says that the improved CDRs, plus a careful re-examination of on-chip signal integrity issues has helped to fix the jitter generation issues that limited yields on their early V4 SerDes devices. They are also very proud of their new Sparse Chevron packaging technology that gives much better impedance control and noise isolation – something that may be almost as important as any improvements to the chip itself.
With my curiosity aroused, it was now time to put the PowerPoint slides aside and see how all these improvements worked in real (or as real as a corporate lab can be) life. The lab I visited had an early sample FX30T device on an eval board which was connected to a pair of blades that were in turn plugged into a high-quality back-drilled backplane via an Amphenol EHSD connector. Counting the SMA cable runs, the PCB traces on the blades and the 40 inches of traces on the backplane, the 5 Gbit/s signal ran through a 72 inch channel that included the reflections generated by four SMA connectors and two Amphenol connectors. Using a PRBS32 signal, the link was error-free at 2.5 Gbit/s with only the linear EQ running but required the DFB equalizer to be turned on in order to get a clean, error-free eye at 5 Gbit/s.
This seemed very impressive (as a factory demo ought to) until I asked a few questions. Since the reflections that crop up in shorter channel lengths can often be as challenging to deal with as the extreme attenuation in longer runs, I asked whether we could plug the blades into a backplane trace that put them within a few inches of each other. Unfortunately, they said it would take quite a bit of time to change the test set-up so all I can do is take their word for it that they could adjust the transmit de-emphasis enough to make a 5 Gbit/s link work well over a short-haul link. I also asked abut what the backplane and blades were made out of and found out it was Nelco material which enjoys much better, more uniform dielectric characteristics than the standard FR-4 fiberglass lay-ups most commonly used in all but the highest-performance systems.
Since I did not have a chance to take any detailed measurements, it’s tough to say how much the high-end materials, connectors, and carefully-laid out traces you often find in demonstration backplanes contributed to the excellent performance I saw. Based on what I saw, I'll bet that the transceivers will have difficulty supporting a full 5 Gbit/s in the longer, and more challenging trace runs found in some of today's system backplanes that don't employ advanced materials, premium connectors and back-drilling of PCB via stubs. But since many applications which will employ 5 Gbit/s links will be clean-sheet designs using a higher-grade backplane, this makes for an easy and practical solution that's very cost-effective for the increased performance it yields. I’d also guess that the handful of so-called problem channels in some legacy designs can have their excessive attenuation or cross-talk tamed with relatively inexpensive discrete EQ and cancellation devices. Despite some reservations about the V5 FXT ability to drive legacy designs at 5 - 6 Gbit/s, its DFE equalization and improved CDR circuitry should allow them to enjoy rock-solid operation at 2.5 - 3.125 Gbit/s when most other SerDes-equipped products would have trouble.
While I did not get complete answers to everything I wanted to know, the things Xilinx did share with me were enough to make me pretty confident that the RISC and DSP elements of the FXT family will work as-advertised and that they have made big strides in improving the performance, power consumption and yield issues in their SerDes transceivers. Together with the updated IDE development tool package (reviewed here last week) this adds up to a winning combination that will help these chips find a home in wireless, video, and imaging systems and wherever else ridiculously large amounts of processing power are required.
Virtex-5 FXT FPGA samples are now shipping for the FX30T and FX70T devices. The remaining FX100T, FX130T and FX200T devices will be available over the next six months with the first production devices scheduled to be made available in the third quarter of 2008. The FX30T device will list for $159 in 1000-piece lots by the second half of 2009.
Product Page
|
|
|
|
|