March 2009 - PCI Express Components

by Lee H Goldberg

Love it or hate it, PCI Express (PCIe) is now the de facto interconnect standard for most computer-based systems. Originally developed by Intel as a high-speed upgrade to the aging PCI parallel bus in PCs and servers, PCIe has also begun to develop a strong following within the embedded systems and networking communities. Much of this SerDes-based technology’s appeal is due to its ability to deliver scalable capacity and the growing universe of commodity silicon that supports it. Designers now have their choice of several vendors offering the switches, bridges, and interconnect elements needed to bolt together a wide assortment of commercially-available PCIe-capable processors, peripherals and I/O devices. Should you need a custom processor or specialized I/O function, three of the four major FPGA makers now offer economy-priced SerDes-equipped devices that can be programmed to support both your application and a suitable PCIe interface.

That said, PCIe is far from a perfect technology. Developed originally for standard PC applications where a single CPU dominates the system, PCIe does not have native support for multiple root complex nodes normally required for multi-processor designs. It also lacks the hardware-level synchronization logic for shared memory resources that is native to the Serial RapidIO standard. Designers unfamiliar with the signal integrity issues associated PCIe’s multi-gigabit SerDes channels will have to get used to much more stringent layout and reach restrictions, especially when dealing with the new 5-Gbit/s PCIe Gen 2 standard.

There’s no reason to despair, however, since many chip vendors have developed their own solutions to these shortcomings that make the flawed technology more flexible and much easier to use. Read on for a deeper look at all these design issues, or jump straight to the conclusion for our PCIe product directory.

Pericom offers a short video demonstrating how their ReDriver for PCIe 2.0 helps to enable optimal system stability at high speeds. Click on the link to view.

A Surfeit of Switches

When PCIe first emerged most of the switches and bridging elements being produced were fairly generic, but as the technology began to spread from high-end servers downward to workstations, PCs, and even laptops, we’ve seen silicon emerge with port counts, and special features that are tailored to a particular class of application.

Perhaps the most widely-used function is I/O expansion, where a Northbridge or main processing resource feeds multiple lower-bandwidth downstream peripherals. In these applications, a switch is used to fan out a 2-, 4-, or 8-lane PCIe root complex to multiple 1- or 2-lane peripherals, a technique that’s especially popular in servers, storage systems, host bus adapters (HBAs), NICs and embedded systems. These products enable cost-effective PCIe lane fan out and allow the designer to easily add PCIe channels to these platforms. That’s one of the reasons these smaller PCIe switches are also finding lots of use in notebook docking stations, multi function printers (MFP), notebook computers, digital TVs (DTVs), wireless access points (WAPs), and high channel-count video surveillance platforms. Companies like IDT, Pericom, and PLX all make these smaller but important system elements.

While the bulk of these applications still use the slower, 2.5 Gbit/s PCIe Gen 1 technology, all the major players have introduced comparable PCIe Gen 2 parts that support 5 Gbit/s data rates. At Gen 2 speeds, PCIe’s SerDes signals are especially susceptible to attenuation, reflection, and other channel impairments, and a switch’s signal integrity capabilities become an even more important feature. As we’ll see a bit later, this is where companies like Pericom and Gennum that focus on high-performance equalization technologies play an important role in making everything work.

PCI Express is also becoming increasingly important as a system interconnect technology, passing multiple lanes of high-speed data between processing elements in wireless communications systems, video processing equipment and other ultra-high bandwidth applications. IDT, for example, offers a family of products that focus on the upper end of the market, competing with PLX, where high lane and port count, and the ability to minimize switching latency are all basic requirements. With the bar set this high, design wins are now often based on advanced features like smart power management, variable link width, and multi-root operation as much as they are on price.

A Cure For the Multi-Root Blues?

Although Intel hopes that its multi-root I/O virtualization (MRIOV) standard will eventually allow more than one root node per system, it’s still in development and it may be some time before it emerges from committee. Meanwhile IDT offers a line of so-called inter-domain switches that have a partitionable architecture supporting multi-root operations today. It spoofs the PCI Express protocol by translating end point addresses in a way that allows more than one processor or other root node device to share a set of peripherals. Their devices also have a so-called dynamic resource allocation feature that lets you select how much of a root node’s capacity and its corresponding QoS level is allocated to a particular end-node. Moving I/Os and peripherals between multiple roots, advanced failover support, and support for virtualized systems is not a complete reality today (maybe never) because the specification is not supported by silicon yet, but failover structures and other multi-root apps are already possible with IDT silicon, at a lower cost.

Although PLX Technologies did not respond to our invitation to participate in the briefings for this article, a survey of their product literature indicates that they also have some solutions to the multi-root dilemma. Their switches provide non-transparent port capability and allow host-centric and true peer-to-peer data transfers.

A Bevy of Bridges

Since not every device in the compute universe is based on the PCIe bus, there’s a fairly strong demand for bridging devices that allow the SerDes-based protocol to communicate with legacy systems that still run the PCI/PCIx parallel bus. These devices also allow new PCIe-based systems to use PCI-based peripheral silicon for which there is not yet a modern equivalent. Large, long-lived capital equipment such as medial and military systems are especially big markets for these inter-bus bridges. Both Pericom and Tundra Semiconductor (whose acquisition by Gennum has just been announced) offer PCIe/PCI and PCIe/PCIx bridge solutions which can extend the life of legacy systems. Marvell also offers the 88SB2211, a low-cost, flexible PCI/PCIe bridge solution.

Of course, PCI Express must extend its reach to many other legacy applications, so there is a lively business in silicon that supports a small bestiary of more esoteric interfaces. Gennum, for example, offers a line of four-lane and single-lane PCIe-to-LocalBus endpoint bridges. If your products still use low-speed serial interfaces, you’ll appreciate Pericom’s family of two-, four- and eight-port PCIe-to-UART bridges. If you want to bridge your PCIe bus to an Ethernet connection, products like Broadcom’s BCM57710 PCIe-to-10GbE controller and Marvell’s Yukon family of GbE solutions are there to help.

FPGA-Based PCIe Solutions

Despite the great wealth of merchant PCIe silicon available, there’s a good chance that you’ll need a custom interface between your PCIe bus and some part of your design. Or perhaps you need a specialized DSP or decryption function to offload your PCIe-based host processor? That’s where the new generation of SerDes-equipped FPGAs will come in handy.

While Altera and Xilinx  have offered premium-priced SerDes-equipped FPGAs for quite a while, Lattice Semiconductor rocked the programmable logic universe a couple of years ago when they introduced the low-cost ECPM2 family, whose serial transceivers could support Gen 1 PCIe. It’s no wonder then that both of Lattice’s competitors have recently updated both their high- and low-end FPGA lines to support PCIe and other SerDes-based protocols.

Altera offers PCI Express capabilities in both its high-end Stratix IV GX/GT series and their lower-priced Arria II GX family. The Arria family is intended as a cost-effective alternative to ASICs for applications requiring medium-level logic density and speed and 2.5 Gbit/s PCIe Gen 1 connections, while the Stratix offers larger, faster arrays of logic, memory, and DSP elements, and supports both Gen1 and Gen2.

The Arria I series relied on soft IP to implement the PCIe MAC functions (transaction layer, data link layer and physical layer). When Altera designed the new Arria II (reviewed here February 2009) they added a version of the updated hard PCIe MAC cores that had been developed for their Stratix IV line (also covered in the same February 2009 review). Both the Arria and Stratix MAC cores can be configured as either a root- or end-node. Arria supports X1 or X2 operation while Stratix’ SerDes channels can be configured in X1, X2, X4 or X8 lane configurations. At the time of this writing, Altera is the only vendor to be able to implement a full SIG-compliant eight-lane PCIe Gen2 connection, but this is expected to change shortly.

In addition to its PCIe capabilities, the Stratix family’s embedded SerDes transceivers include some clever gearbox and coding hardware that enable multi-standard support, making it especially good for bridging to other serial protocols like SATA and SAS. Altera says that they have SAS/SATA IP in development with third party vendors.

Lattice Semiconductor’s SC/SCM metal-programmable series was introduced in 2006 and offers general-purpose SerDes with hard physical sublayer coding blocks and data link layer functions. It uses a soft PCIe endpoint transaction layer (MAC function) which consumes 5 k – 10 k LUTs/s depending on lane count and features. Each SerDes PHY draws 125 mW and the power consumption for 4-lane link is around 1 W. Their more recent ECP2M family is more of a general-purpose product line that implements PCIe using a soft data link and transaction layers. Intended for smaller applications, its logic density ranges from 20 k LUTs – 100 k LUTs.

Their new Lattice ECP3 series of SerDes-equipped devices (reviewed here March 2009) offers higher performance and higher logic densities (17 k to 149 k LUTs). Portions of its physical layer subcoding are hardware, but the higher-layer is still soft, consuming 5 – 12 k LUTs. The new transceivers’ power consumption is lower (100 mW/channel) and boasts both higher maximum operating speed and less jitter, all of which provide more design margin in PCIe applications. The ECP3’s ability to support quad-lane operation makes it suitable for bridging PCIe to10G Ethernet, XAUI, 3G CPRI, and SMTI digital video connections. In addition to their extremely competitive prices, Lattice prides themselves on making designers lives easier when working with PCI Express-based designs. Their well-provisioned development platform and design tool package includes a fully-populated circuit board with working X1 and X4 designs that allows many engineers to go from tearing the shrinkwrap to a running system in 30 minutes.

Xilinx has also aggressively attacked the SerDes-capable FPGA market on both ends of the price/performance spectrum with their Spartan 6 and Virtex 6 device families. Xilinx’ economy-minded Spartan 6 devices (reviewed here February 2009) support a single layer of Gen 1 PCIe with a hard MAC. Its other general-purpose SerDes transceivers can be outfitted with soft PCIe MACs (supplied by Northwest Logic and PLDA) for multi-lane applications. Depending on the device size and configuration you’ll have between 2 and 8 transceivers available. Typical applications for Spartan-class PCIe devices are infotainment and automotive systems: usually driven by Intel Atom/Renaissance-class processors that have a PCIe interface. Xilinx also expects their Spartan products to be popular for custom PCIe and Express Card I/O elements in laptop computers.

There are several application-specific series of devices within Xilinx’ high-end Virtex 6 family but the LX series is the one that’s most-often used for PCIe-oriented designs. Lately however, Xilinx reports that their DSP-oriented SXT series is also finding lots of use as a signal-processing element within wireless systems that use the PCIe bus as a system interface instead of the traditional SRIO connection. In these systems, their DSP elements are combined with standard programmable logic to build voice compression, voice transcoding, and baseband processing elements that provide more throughput than a standard DSP can supply. Other applications for these high-end FPGAs are front ends in video servers to compress video to fit the particular application it is being streamed to.

Both Virtex 6 families have GTX transceivers and a logical gearbox that handles physical layer encoding (for operation up to 6.5 Gbit/s) and hard block MACs that support both Gen 1 and Gen 2 speeds. Depending on the actual device you’re using, you’ll have between 12 and 36 transceivers to play with. At the time of this writing, Xilinx’ Virtex 6 series only supports X1, X2, and X4 PCIe connections but, given the growing market demand for high-bandwidth X8 connections in wireless basestations and broadcast video processing systems, we can expect them to be offering X8 link capabilities in the near future.

If you want to use a lower-cost non-SerDes FPGA and couple it to your design via an external PCIe interface, Gennum’s PCIe-to-LocalBus products make the job relatively easy. Gennum has also published a great article on the topic, entitled PCI Express Bridging Options Enable FPGA-based Configurable Computing.

Signal Integrity & Retimer Products

Even 2.5 Gbit/s Gen 1 PCIe connections can run into performance-robbing attenuation, reflections, and crosstalk if a printed circuit board is not designed and manufactured to extremely tight specifications. At the 5 Gbit/s data rate used by Gen 2 transceivers, achieving reliable reach across a backplane becomes a serious challenge. While all switch and bridge manufacturers have made good progress in improving the equalization and pre-emphasis technology they pack into their transceivers, there are some significant performance differences between different product families. This extra ability to handle weak, noisy signals and punch through less-than-ideal channel conditions translates into extra reach across a long server blade or backplane, or extra margin that helps products perform properly with wider manufacturing variations in their PCBs. In challenging applications with relatively low port counts, the extra signal integrity capabilities packed into Pericom’s switches makes them a good choice.

In many cases, however, even the best chip just can’t reach far enough and a designer must reach for either a redriver or a retimer to bridge the gap. Even in PCIe Gen 1 applications, around 50% of blade server manufacturers have to rely on a redriver device to push their signals across 18 to 30 inches of backplane and blade PCB. For equipment operating at 5 Gbit/s Gen 2 speeds, virtually every manufacturer uses a redriver. Fortunately, companies like Gennum, Pericom and Vitesse all offer solutions that allow regular electrical engineers to deal with signal integrity issues which, until now, were left to specialists.

Drawing on its extensive experience with high-speed video signal transport, Gennum developed the GN1407, a quad channel redriver designed for 1.25 Gbit/s to 8 Gbit/s high-speed data. They also offer the GN1406 as a robust gen 1/gen 2 retiming re-driver for PCIe-over-cable applications. While Gennum’s product line is somewhat limited today, their upcoming acquisition of Tundra Semiconductor will likely result in a wealth of new offerings that combine the best features of both companies’ product lines.

Whether it’s their switches, redrivers, or PCIe timing products, Pericom’s big value is in the extra PHY layer signal integrity they bring to their designs. Pericom’s redrivers and signal switches provide signal conditioning that is essential in helping Northbridge outputs reach through long PCB runs in blade servers. In fact, they are actually used in many of Intel’s reference designs. The redrivers can also support External PCIe over 5 – 7 m of copper cable: an application that’s becoming very popular in industrial and military systems. Pericom says that their redrivers also support eSATA applications for external hard drives: which is bidirectional, has less overhead, and is much faster than USB 2.0.

Pericom also offers a full line of timing products for PCIe applications. The PCIe Gen2 jitter spec is <3 ps, requiring a very stable and low jitter timing source, such as a crystal oscillator. Pericom is unique in having both quartz and silicon timing products under one roof, and they have leveraged this combined technology to offer a family of very low jitter PCIe XOs and buffers. These products enable servers, storage, and networking customers using PCIe Gen2 platform designs to operate to the PCIe specs and provide high reliability signals within the platform.

Like Gennum, Vitesse developed its line of signal integrity products primarily to serve digital video transport applications, but they have recently realized that the same technology is really great for solving the design problems designers face in high-performance PCIe systems. Vitesse transceivers use a combination of equalization techniques that allows their signal integrity extended range and delivers extra channel margin. They even support the unique voltage signaling levels and patterns used by PCIe’s Receive Detect mechanism to synchronize end points during startup and hot plug operations. Vitesse switching products combine the same transceivers with a high-bandwidth crosspoint to provide redundancy switching capabilities in large PCIe-based systems. Both products are especially useful in ATCA and blade servers where reach is much longer than original PCIe spec anticipated and high reliability is desired.

Most Vitesse signal integrity products also come with VScope, a very useful scope-on-chip function that provides designers, manufacturing engineers or field service engineers with a receiver-eye-view of the incoming SerDes signal. One great example of all these advanced features is Vitesse’s VSC3406, a multichannel backplane transceiver with integrated switch and VScope. It supports 6 GHz transmission over 50 inches of backplane and line card through multiple connectors. Our December 2008 review provides a full description of all its capabilities.

Vitesse expects that its high-speed capabilities will become even more useful in PCIe Gen 3 (anticipated for finalization in 2010 with first commercial silicon by the end of 2009) applications that will run at 8 Gbit/s. The Gen 2 systems built today should probably be future-proofed with backplanes and blades that support faster Gen 3 speeds. Of course there are no products on the market yet, but, as soon as the Gen 3 spec is finalized, I’d expect Vitesse to be there to meet the challenge.

The EN-Genius 2009 PCI Express Product Directory

Altera:

Gennum:

IDT:

Lattice Semiconductor:

PLX Technologies:

Pericom:

Tundra Semiconductor:

Vitesse:

Xilinx: