You need to test, we're here to help.

You need to test, we're here to help.

28 June 2021

MAUI Studio Pro: "Impersonating" Remote Oscilloscopes

Figure 1: After recalling the LabNotebook, MAUI Studio Pro
 (now simulating an HDO4024A) will "impersonate" the
WaveRunner 9404 on which the file was saved.
Our recent release of MAUI Studio Pro adds several key new features to the MAUI Studio remote oscilloscope software for Windows 10 PCs. This short series will demonstrate how to make use of these features.

One of the most useful new features is the ability to inherit the entire "personality" of an oscilloscope. By that, we mean the MAUI Studio Pro software replicates the oscilloscope model type and user interface, along with its full complement of software options. It "impersonates" the oscilloscope. There are a couple ways this can be done.

21 June 2021

Automotive Ethernet MDI S-parameter Testing

Figure 1: MDI S-parameter tests treat the Base-T1
pair as a balance transmission line and check that
reflections don't cause either excessive power loss
or mode conversion that can disrupt the signal.
As said earlier, the automotive industry has very stringent EMC/EMI requirements, and all Automotive Ethernet standards are designed to ensure good operation even in the presence of high EMI. Not only is there the potential for interference from all the different electronic systems within the vehicle, nothing is stopping you from parking your vehicle below high-voltage transmission wires or in other high EMI fields. 

For this reason, all Automotive Ethernet standards have defined S-parameter tests to be performed at the Medium Dependent Interface (MDI). The assumption is that the single twisted pair that is the basis for all Base-T1 transmissions can be treated as a balanced, differential transmission line with some crosstalk. It is a very real-world application of S-parameters, which can seem so academic.

Two, mixed-mode S-parameters are measured at the MDI reference plane. The tests ensure that there is neither too much loss of power from reflections, nor too much mode conversion into differential signal, that it will disrupt the information of the PAM3 encoded signal.

14 June 2021

Real-World Workload Replays on SSDs

The Problem

SSDs are expected to hit the market after undergoing rigorous sessions of testing and quality assurance. From a functional testing perspective, synthetic IO workloads have been the de facto method of testing. While synthetic workload testing is essential, it doesn’t exactly reflect the real-world environment, whether that is inside a data center or a client’s laptop. Real life workloads tend to be a lot more “bursty”, and it’s during these short bursts that SSDs run into long queues and high latency, ultimately degrading the quality of service.

In this blog post, we explore techniques to replay these real-world workloads and examine their results.

 

Overview of Real-World Workloads

First and foremost, what is a real world workload and how do I get it? Fig. 1 shows the typical process of how a real life workload is captured.

 

Fig. 1. Capturing a real-world storage workload.

Fig. 1. Capturing a real-world storage workload.

Applications typically run on top of a file system that in turn push IOs to an abstracted OS storage layer before making it to the SSD. For Linux, this storage layer is the block layer, and for Windows, it can be either the DiskIO layer or the StorPort layer. IO capture tools are readily available to snoop all IOs within the storage layer. The most popular one for Linux is blktrace, and for Windows, Windows Performance Recorder is widely used capture tool. Alternatively, Workload Intelligence DataAgent (coming soon) is a powerful trace capture tool with built in intelligence to trigger and filter on user determined criteria. The outputs of these capture tools are trace files that contain all the IOs along with their timestamps and typically other information, like process IDs, CPU etc. It is these trace files that are used to generate IO replays from for offline testing.

Where can you get these IO trace files? If you work with a customer that uses your SSDs or storage subsystems, they typically collect such trace files for their own analysis purposes and may be open to sharing them. This would be the best scenario as you can get a live view of a real-world environment. If you are the SSD user, you can use the aforementioned capture tools to trace the IOs in your application environment.

Alternatively, SNIA provides a number of open-source real-life storage workloads: http://iotta.snia.org/tracetypes/3. Finally, you can always concoct your own setup- run a MySql or Cassandra database, simulate some application queries, and capture storage layer IOs. While strictly not “real-world”, this will at least give you a good sense on how application activities trickle down to storage IOs.

 

Replaying Workloads

With the workload traces on hand, it’s time to replay them in your test environment. The goal of doing so is to reproduce the issue that was caught during the production run, and also to tune or make fixes to SSD firmware to see if the issue gets resolved. If one is evaluating SSDs, this is a great way to see if the selected SSDs pass the workload criteria.

The recommended storage replay tool is the Workload Intelligence Replay. Replay files can be generated from Workload Intelligence Analytics, to run on an Oakgate SVF Pro appliance. The replay results can then be ported back to Workload Intelligence Analytics to compare with the original trace, or with other replay results. Fig. 2. shows the flow of how storage traces are captured, replayed, and compared.

Fig. 2. Process of replaying and analyzing a storage trace.

Fig. 2. Process of replaying and analyzing a storage trace.

When replaying a storage workload to target drive, a number of factors must be considered to make sure the replay emulates the production environment as much as possible:

  1. Target drive selection – Ideally, the DUT should have the same make and model as the drive that the production trace was captured on. If this is not possible and the target DUT has a different capacity, the replay software should be able to re-map LBAs so that they fit within the DUT. Oakgate SVF Pro Replay handles this situation by offering different LBA out of range policies, including skipping the IOs, wrapping the IOs, and linearly re-mapping IOs that exceed the LBA range. We’ve found re-mapping LBAs to be the most effective in duplicating production scenarios. 
  2. Target drive prefill – It is essential the DUT gets prefilled properly to emulate the state the production drive was in before the capture happened. While it’s impossible to condition the DUT to the exact state the production drive was in, we’ve found that a simple 1x write prefill was able to duplicate production drive scenarios. Alternatively, Fiosynth offers fairly sophisticated preconditioning techniques that are reflective of data center workloads.
  3. Accurate IO timing and parameters – The heart of an effective replay is software that can emulate when and what the production host sent to the device. To achieve that goal, Oakgate SVF Pro matches the replay trace file IO in terms of relative timestamp, IO size, and LBA. Fig. 3 and 4 show the LBA vs. Time and IO Size vs. Time, respectively, of a production blktrace versus that replayed by Oakgate SVF. As can be seen, the timings and values of the replayed IOs is exactly same as the production trace. The variable of interest is typically the latency of each IO and the IO queue depth and are expected to be different from the production trace.

 Fig. 3. LBA vs time of production blktrace (blue) and the replayed trace (orange)


Fig. 3. LBA vs time of production blktrace (blue) and the replayed trace (orange) #2

Fig. 3. LBA vs time of production blktrace (blue) and the replayed trace (orange)

Fig. 4. IO Size vs time of production blktrace (blue) and the replayed trace (orange)

Fig. 4. IO Size vs time of production blktrace (blue) and the replayed trace (orange) #2

 Fig. 4. IO Size vs time of production blktrace (blue) and the replayed trace (orange)


Replay Experiments

Now that we’ve been primed about the intricacies of workload replay, let’s take a look at some experiments.

In all experiments run, we’ve preconditioned the drive 1x. The chosen drives were standardized to 0.5 TB. All production runs were captured when the drive was formatted to 512 bytes, so for the replay, we’ve also formatted the drive to 512 bytes.

 

Replay Repeatability

In the first set of experiments, we tackle the question of how repeatable a workload replay is. This question is important because we need to know whether the replay software can duplicate the same profile on the same drive during a production run. Even if we can duplicate the profile, it is also crucial to know whether this same profile can be regenerated after 10s or 100s of replays. If the drive shows a statistically different profile on every replay run, it can indicate that the drive does not have very stable performance.

For this experiment, the production workload we are running consists of a combination of applications, primarily: Postgresql with light traffic, and Spark + HDFS data store with large datasets that get cached and persisted onto disk. We captured a blktrace on the production machine.

 

Fig. 5 shows the latency vs time of the production blktrace over a 10 minute snapshot.

 Fig. 5. Latency vs time for a production blktrace with a big data workload

Fig. 5. Latency vs time for a production blktrace with a big data workload

Since we control the “production” environment, we’ve taken this same drive over to an Oakgate replay appliance and ran a replay file generated from Fig. 4’s workload. We iterated multiple rounds of replay (preconditioning between each iteration) and took those results and fed it back into WI Analytics for comparison. Fig. 6 shows 3 iterations of the replay.

Fig. 6. Latency vs time for three rounds of replay

Fig. 6. Latency vs time for three rounds of replay #2

Fig. 6. Latency vs time for three rounds of replay #3

Fig. 6. Latency vs time for three rounds of replay

In a quick glance, we can see that the latency spikes and their timings for the replayed results are extremely similar to the original production trace. There are some statistical differences, if we closely, which is expected. Overall, the replays have faithfully regenerated the latency profile in an offline environment, and this particular drive has shown stable results after multiple rounds of replay.

 

Replay Comparison for Different SSDs

Next, let’s run experiments on how different SSDs react to the same workload. Such a use case can very well be used to qualify different drives for a data center’s set of workloads. From a different angle, the same experiment can be run on the same SSD with different firmware to see if firmware changes are effective against a particular set of workloads.

In these experiments, we had chosen an actual data center workload that showed a particularly stressful profile. Fig. 7 shows the read latency vs. time of the production workload. We can see very big latency spikes happen, and we would like to know whether other SSDs can handle this same workload without these latency spikes. Fig. 8 shows the main obstacle the workload entails – a burst of large discard (trim) IOs that is likely causing the disk to become blocked while servicing those discards, contributing the high read latency spikes.

 

Fig. 7. Data center production blktrace.

Fig. 7. Data center production blktrace.

 

Fig. 8. Discard (trim) IO sizes vs time for production blktrace.

Fig. 8. Discard (trim) IO sizes vs time for production blktrace.

 

Fig. 9 shows drives from various vendors that did not encounter the read latency spikes despite undergoing the same production workload. Latency scales are normalized to that of the production workload the contrast the latency differences.

Fig. 9. Workload replayed against drives that did not show latency spikes.
 
Fig. 9. Workload replayed against drives that did not show latency spikes. #2

Fig. 9. Workload replayed against drives that did not show latency spikes. #3

Fig. 9. Workload replayed against drives that did not show latency spikes. #4

Fig. 9. Workload replayed against drives that did not show latency spikes.

By contrast, Fig. 10 shows the set of drives that show an adverse reaction to the burst of discard IOs. Vendor F in particular never recovered to its baseline latencies after getting swamped with the discard bursts.

 

Fig. 10. Drives that reacted adversely to the replayed production workload.

Fig. 10. Drives that reacted adversely to the replayed production workload. #2

Fig. 10. Drives that reacted adversely to the replayed production workload.

 Conclusion

The ability to replay production storage workloads is an essential tool to improving or qualifying SSD real-world performance. Unlike synthetic workloads, production workloads are much more unpredictable, dynamic, and reflective of real application environments. In this blog, we discuss the various aspects of accurately replaying workloads in an offline environment and showed examples on how to properly reproduce production environments. We also showed how different SSDs can react very differently to the same production workload. Hopefully, we’ve conveyed the point that it is essential to use real-world replays to design better SSDs and better match sets of SSDs relevant to one’s real-world environment.


By Fred Tzeng, Ph.D. Software Architect
OakGate Product Family, Teledyne LeCroy


Learn more about Teledyne LeCroy OakGate Products










Automotive Ethernet in the Vehicle

Figure 1 Block diagram of a typical ADAS system
showing the in-vehicle networks used.
As all Automotive Ethernet technologies utilize a single twisted-pair cable (T1) and a point-to-point network topology, they differ primarily in their data rates and encoding methods. For the most part, the data rate determines the applications for which a particular Automotive Ethernet standard can be used. If we take the case of an automatic driver assistance system (ADAS), we can see where each Automotive Ethernet variant might best be used to replace existing automotive protocols. 

(Click on any figure to enlarge the image.)

08 June 2021

Fundamentals of Automotive Ethernet

Figure 1: Automotive Ethernet is designed to support
increasingly complex vehicle electronic systems.
When we speak of “Automotive Ethernet”, we’re referring to a group of Ethernet interfaces intended for in-vehicle use, customized to meet the needs of the automotive industry. The first Automotive Ethernet standard was defined by Broadcom in 2011 with BroadR-Reach. Since then, IEEE has released standards for 100Base-T1, 1000Base-T1, 10Base-T1S and most recently MultiGBase-T1. Together, these standards define the general technology known as Automotive Ethernet.

Probably the first question you ask is, “Why not just use standard Ethernet?” A summary of the fundamental features of Automotive Ethernet will show how much better Automotive Ethernet is than standard Ethernet at meeting the industry’s demand for a higher speed, robust, lightweight and lower cost data interface, one that can ultimately replace the many other protocols currently used throughout the vehicle.

01 June 2021

Five Tips to Improve Dynamic Range

Dynamic range is the ratio of the maximum signal level to the smallest signal level achievable in a measurement.  Tools with good dynamic range are especially helpful for analyzing wide dynamic range signals in which a full-scale signal must be acquired, while at the same time, very small amplitude signal details must be visible. Here are five tips for improving the dynamic range of your measurement instrument.