You need to test, we're here to help.

You need to test, we're here to help.

09 August 2021

Debugging Dynamic Link Behaviors with CrossSync PHY for PCIe

Figure 1: These upstream lanes show unexpected equalization
behavior. If you only had the electrical signals to work with,
how would you begin debugging them?
Recent generations of PCI Express® rely heavily on the link equalization process to support signal integrity at bit rates of 8, 16 or 32 Gbps. In Phase 1, both sides of the link exchange TS1 ordered sets to establish an operational link. In Phase 2, the upstream port requests the downstream port to configure its transmitter equalization to compensate for the link channel. In Phase 3, the opposite happens, and the downstream port requests the upstream port to configure its transmitter equalization. 

At PCI Express compliance events, or when doing pre-compliance testing in the lab, a transmitter link equalization test is performed to determine whether a device is capable of correct link equalization in isolation. A piece of test equipment, usually a protocol-aware BERT, acts as the link partner for the device under test. The BERT requests specific preset changes from the device, in response to which the device (in theory) changes its preset to provide the correct channel compensation. The changes are captured by an oscilloscope, which is capable of visualizing the transmitter equalization changes in the electrical layer and measuring first of all, if they happen quickly enough, and secondly, if the device actually changed to the preset levels requested, which occur in a known sequence. 

But what happens when you suspect "weird" equalization behavior in a live link between two devices—for example, something off with the upstream equalization in Phase 3?  How would you capture that to begin debugging the problem? Where would you look for a clue as to what is happening?

Figure 2: CrossSync PHY technology utilizes interposer
for cross-probing of signals.
Your first problem would be getting the same signals from each side of the link into two different instruments—an oscilloscope for the electrical layer view, and a protocol analyzer for the protocol layer view. In the CrossSync™ PHY for PCIe set up, a CrossSync PHY-capable interposer connects the DUT to the host(s), so that the signals can be split off both to the protocol analyzer and the oscilloscope simultaneously. Both the high-speed data content and the sideband signals are cross-probed from the interposer, resulting in two, fully time-synchronized acquisitions.

Figure 2: Triggering on the first TS2 after the
speed change captures the exchange of presets.
If it’s a 16 Gbps PCIe® 4.0 device you're debugging, you could configure your protocol analyzer to trigger on the first TS2 after the speed changed to 16 Gbps, capturing the exchange of presets in the protocol layer, but how do you also get an oscilloscope trace of the exact same events in the physical layer? That kind of triggering requires a level of protocol awareness you cannot get with an oscilloscope alone. But by using a tool like CrossSync™ PHY for PCIe, which is capable of cross-triggering both protocol analyzer and oscilloscope on protocol events like speed change requests, you’ll get a protocol trace that shows the reported equalization presets and coefficients at the end of Phase 3, and time-correlated oscilloscope traces that show the electrical effects of those settings.

Figure 3: Expanding a TS1 packet shows the presets each
lane is reporting, but is the discrepancy logical or electrical?
Using the protocol display to navigate to the same time on all traces, you can expand one of the final TS1 packets on the upstream direction to check for the reported presets. In Figure 2, at the end of Phase 3, Lanes 0 and 2 report having trained to transmitter equalization preset P6. Lanes 1 and 3 report having trained to TxEQ preset P10. So, you know the lanes have trained to different presets. Is this an expected behavior? According to the specification, P10 isn't a preset meant to be used in a live link at all, it is designed for testing. The P10 boost limits aren't even fixed, and unlike every other preset, the device can't even know what to expect if it requests P10. Obviously, there’s a problem, but is the problem that the device is really training to P10, or is the problem that it’s just reporting (incorrectly) that it's trained to P10? Is it a purely logical problem, or is it a logical-electrical problem? 

Figure 4: Zooming the oscilloscope traces around an
EIEOS symbol clearly shows the lanes are
actually training to different presets electrically.
By zooming the oscilloscope traces to the time around an EIEOS symbol—which can be done simply by clicking an EIEOS symbol on the protocol display in the combined CrossSync PHY view—you have a really clear view of the difference in electrical emphasis between the two signals. The P10 lane clearly has far more emphasis placed on the signal than the lane that's trained to P6. So, it seems that some kind of logical problem is causing the lane to train to an incorrect preset, and that knowledge gives you a point to start looking in your firmware to understand why your device is training to P10.

Watch this type of PCIe analysis demonstrated by Gordon Getty and Patrick Connally in the on-demand webinar, “Debugging PCI Express Power Management and Dynamic Link Behaviors.”

Also see:

Debugging L1 Substates Timing Errors with CrossSync PHY for PCIe

Anatomy of a PCIe Link

The Important Difference Between ProtoSync and CrossSync PHY for PCIe


No comments:

Post a Comment