Recently, a question came to us from an inquisitive designer regarding what used to be called the “Fabric Wars”, which lead to an intriguing discussion about the evolution of connectivity choices in embedded computing systems. The conversation that resulted in this blog post centers more specifically around those choices in OpenVPX and SOSA™ (Sensor Open Systems Architecture) architectures.
“I'd be curious to hear your opinion on why PCIe won over Ethernet on VPX (ergo why PCIe will win at rack level over Ethernet). Is it because there’s no protocol conversion with PCIe? Or because PCIe is faster and lower latency, but performance is “table stakes”? PCIe just appeared the winner to the public, but I suspect the serial bus of choice was pounded out in a working group somewhere.”
Actually, at the moment, there is no real “winner” between Ethernet and PCIe, at least not in OpenVPX. Here’s the scoop…there’s a general (but not universal) consensus about separating backplane connections into Data Planes (DP), Expansion Planes (EP) and Control Planes (CP). Each has an intended role in VPX systems, although developers are not necessarily limited to that functionality in their applications.
• A Control Plane is for system control messages, which are often centrally issued. Ethernet makes sense for this.
• An Expansion Plane is generally for tightly connecting two or more boards together, usual with a root/leaf organization. PCIe makes sense for this. A good example is an SBC and a GPU or an I/O board.
o EP is also used for things like FPGA-to-FPGA links, thus Aurora or front panel data port (S-FPDP) are often used
• With the Data Plane, the lines blur a bit more:
o Often the DP is for peer-to-peer/all-to-all connectivity. Ethernet is much easier to make this sort of connection than PCIe (non-transparent bridging and all, not good software support, and generally ugly).
o Ethernet has a lot more overhead than PCIe – both in protocol and in software stack. Thus, some people will accept the pain of working with PCIe to get the performance.
o Ethernet and PCIe keep leapfrogging each other over wire speed – some folks just want to chase the latest/fastest.
SOSA has chosen to focus on Ethernet DP, Ethernet CP, and EP with multiple protocols (PCIe, Aurora, simple LVDS to start, with others to come in the future). This is largely to enable interoperability and interchangeability between Plug-in Cards (PICs). In this case, Ethernet is the heavy lifter system-wide, with PCIe (and the others) more localized between specific PICs. However, SOSA isn’t the whole market, and folks are still introducing new boards that are not aligned to the standard.
So, the market is still very much in flux. And while SOSA is putting bounds on a certain (but sizable) section of the market, there are still options out there, albeit smaller in market size, like Serial RapidIO, Infiniband, etc.
What about features and functions? RapidIO introduced a series of features with each release that increased its capabilities and appeal, but even this ever-growing set of capabilities could not ensure its survival.
Your comments about features are also sound. RapidIO had two nice features that are really useful for HPEC systems – most implementations included DMA engines (Direct Memory Access) to offload data movements from the processor, and it included an atomic test-and-set function. DMA is only now becoming available with RoCE (Remote DMA over Converged Ethernet), and I don’t think that Ethernet has an atomic test-and-set functionality yet (at least natively – there are higher-level protocols that provide it). Still, RapidIO faded from our space. RoCE isn’t yet ubiquitous, so folks haven’t made heavy use of it in VPX systems, but it’s just a matter of time. Then it will probably be a “must have” feature.
That’s a good view of a PCIe to Ethernet comparison, but in terms of viability, both protocols are viable. System architects need to select protocols that are going to be around, be successful. So to ask in broader terms, why does one protocol win out over others? What drives its success?
Maybe it’s the economics and the strength of the customer base. If the protocol has large sales, it survives, but FORE Systems alone was selling $700M of ATM (Asynchronous Transfer Mode) a year, and ATM was sold into the big Telcos (a wealthy customer base), yet ATM went obsolete.
You mentioned RapidIO (sells to the embedded market, pretty viable I guess) and InfiniBand (sells to supercomputer industry, pretty large and viable), but both are all but dead.
I suppose the best answer to ensuring propagation is the “ubiquitous-ness” of the protocol. How broad is the base of developers familiar with the protocol, how broad is the industry awareness? InfiniBand has stayed in the supercomputer industry, which has ensured its demise, even though the supercomputer market was fairly big in absolute dollars.
It is hard to believe the Ethernet LAN market was bigger in absolute dollars compared to the ATM Telcom market, but there were more developers of Ethernet, more awareness of Ethernet than ATM. It seems to be that it’s the number of developers that drive the protocol’s propagation, not its performance, features or economics, so much. ATM was at 155 Mbps when Ethernet was at 10 Mbps. So, if it was about performance, how did ATM lose their lead, and then fall behind in performance - Ethernet had more developers familiar with it. Fibre Channel never crept out of storage. Big success there in storage, but not enough people aware of it and pushing it forward. So, it goes away.
So, why did Ethernet win over ATM, for example? It seems ATM pushed from the WAN vertical into the LAN vertical (at least for FORE Systems). Ethernet not only pushed it back out of the LAN, but Ethernet came into the MAN and WAN and pushed ATM out of both. How does that happen?
Your analysis is all quite sound, and I think it’s really a combination of the factors that you describe that ends up pulling the market one way or another.
First off, embedded systems are driven largely by space, weight, and power concerns (SWaP), as well as the operating environment in which these systems deploy. While ATM was the market performance leader, it did not lend itself to embedded systems because of the power it needed and the heat it generated. There was also the question of parts that were rated for the extended temperatures that our systems operate under. Infiniband suffered from this too, although to a lesser extent because it was lower power/heat and there were, in fact, some extended temperature parts available. RapidIO succeeded for a time because it could meet these challenges.
I think there are two additional things that worked against ATM, RapidIO, and Infiniband and that would be a ubiquitous software stack and price. RapidIO and Infiniband required something like MPI, which people either loved or hated, and was basically maintained by a relatively small, specialized community. Linux/Ethernet software is maintained by an absolute army of developers in the open-source community, and there are a ton of packages to support just about any data movement model one might want. In addition, it’s easy to find talent who know how to use it. It is a truly universal fabric, but the only detraction is the performance hit of the software stack. Pricewise, Ethernet is just plain cheaper than all three, and the cost of a solution is always a big decision driver. With both price and the ubiquitous software stack in Ethernet’s favor, the others really didn’t stand a chance.
What if we come at this from the opposite direction and ask which protocol is going to die next or in what order are protocols going to die rather than ask what protocols are going to survive?
I think if we know the number of developers per each protocol, we can predict the future decline or rise of the protocol.
I think we’ve just about funneled the primary protocol options to the minimum set in the high-performance embedded space. Ethernet and PCIe are pretty clear winners. I think that there is room for contenders in the streaming data space. Aurora is popular, but it’s Xilinx-only. There is talk about Serial-FPDP transforming from a wire-based sensor-to-node protocol into a backplane-based node-to-node protocol. However, it will take time to determine if the market really adopts that. Thus, I think, there may be room for a new low-overhead high-performance streaming protocol that supports CPUs and FPGAs from various suppliers. Only time will tell.
What I definitely see is new functionality being called for in Ethernet. The two big ones at the moment are remote DMA (in the form of RoCE) and Time Sensitive Networking (TSN). Switch and plug-in card suppliers who support RoCE and TSN will be in high demand over the next few years.
We are taking AI Computing to the next level! Find out how our membership in the NVIDIA® Partner Network complements the designs of our rugged computing systems to deliver enhanced deployable systems specifically designed to operate in harsh environments.
It’s no secret that higher performance means higher thermal management requirements.