TX/RX round trip latency and coherence

Good afternoon,

First of all, I know that there are already a couple of topics on this discourse similar to this one, but, in my opinion, the information is still very diffuse and could be made more succinct and clear. Also, I believe it is a very important subject for most of the custom applications that one can do with a LimeSDR.

In particular, I am developing a c++ app, based on the LMS API and I intent to transmit custom pulses and, almost instantly, acquiring some signal. So far, so good, the “proof of concept” was obtained using the LimeSDR. However, a lot more struggle is required if one intents to repeat this process several times and average the results (which is my intent). The major problems in order to achieve this are the following:

  • The TX stream takes some apparently “undeterministic” time to be sent after you run the “send stream” line. This value does change with the sample rate, the problem is that is not constant. One can make use of timestamps to start transmiting only after X samples are received (eventually acquiring everything, i.e., pulse + response). However, the absolute timing in which the pulse is sent is still not clear (just the relative timing with respect to RX).

  • When repeating several “send stream - acquire stream” processes, it is not straightforward on how to perform a good TX/RX coherence, in the sense that you always aquire the signals at the same data samples (time) in order for the average process occur.

In fact, unless one has the courage to go inside the FPGA GW and perform some changes, it can be a very tedious process to understand everything that is going on. I also believe that one shouldn’t need to go inside the FPGA for a very common task such as this one (and I am also avoiding digging into its complexity to solve this, unless no other solution is possible/known).

This is what I have discovered so far, with the contribute of some members of this community. The trick here is to obtain a perfect synchronization between the FPGA FIFOs and the transmitting/receiving of the samples. A data packet size is 4096 bytes, of which 16 bytes are the header (configuration bits) and the remaining 4080 bytes are the payload. That means that, if one is working with 24 bit samples (i.e 12 bit I sample + 12 bit Q samples), a data packet has 1360 samples. Analogously, if one is working with 32 bit samples (16+16), a data packet has 1080 samples. As pointed me out by @cmichal before, if one works with buffer sizes that are integer multiples of these values, the TX/RX coherence improves greatly - there is a major chance of the acquired signal to be always in the same data samples. According to @joshblum in this thread, this happens because the FPGA FIFOs are filling and draining “synchronized” with the host CPU, in the sense that there is no waiting time. The latency of the streams can also be set to minimum in the LMS API and that improves the results greatly. Bypassing every TSP module also reduces the latency to a minimum of 10 clock cycles for the TX and another 10 clock cycles for the RX.

But there is still another variable that needs to be correctly configured in order to obtain a perfect coherence: the sampling rate. My intuition tells me that there should be some set of values for the sampling rate (corresponding to integer multiples of some golden value) that enables the optimal coherence that one desires for some applications. So far, however, I haven’t been able to achieve it, even with multiples of 30.72 MHz. I can perform up to 256 experiments completely coherent between themselves but only for low sampling rates of 100 KSpS, which is not optimal in my case. For higher sampling rates of 1MSpS or 10 MSpS (for both DAC and ADC) this coherence is destroyed, i.e., the signals are position in different data samples from essay to essay. Could this be because I’m using an old laptop (A6 AMD processor and USB 2.0 port). With the use of an oscilloscope I’ve found out that sometimes the pulse is simply not transmitted at higher data rates. Since I set the streams to flush partial data packets, this means that the “blank space” where the pulse should be sent is actually smaller that the initial pulse lenght. This is the reason the next signals are acquired at slightly different times.

Does anyone has more information to add to this subject? Any hint or knowledge about the guts of the FPGA that can explain what is wrong and can lead me in the correct way? I hope I don’t have to deal with the FPGA to solve this, but instead find some optimal values that satisfy the conditions that I need. Maybe you can also extent this knowledge for applications that are using both channels. I believe that this is a valuable piece of information for the public in order for the LimeSDR to become an RF development board as it is intended to be.

If you have any hints/tips/good practices of how to utilize the LimeSDR for such purpose, please feel free to share your knowledge.

Thanks in advance,
OCB

I’ve been able to do what you’re trying to do, and it is possible. This is the problem that timestamps solve. But it helps a bit to know that the timestamp is basically just a count of how many samples the fpga has received from the lms7002m since RX stream start.

Start the RX and TX streams. Receive data from the RX stream, then look at the timestamp of the last data frame you received and pick a time some way (say 50 ms) in the future. Construct a TX packet that contains the pulse you want to transmit, being sure to turn on the waitForTimestamps bit. Send your packet and continue to receive - make sure the RX fifo never overflows. The signals you’re looking for should come in the right place relative to the pulses. So if you transmit from pulses 21000 to 22500, the signals you’re looking for should come starting after 22500. You can throw away the samples you received before 22500, but you do need to receive them.

Now to do a second transient you have a couple of choices. You can turn off the streams and start everything over again, but if you want precise timing between transients, just keep receiving during the recycle delay, pick the time to send your next pulse and put it in a new packet to transmit.

You can just stop sending transmit packets whenever you like (but don’t turn off the TX stream every time), but you need to keep receiving all the time (well, you need to make sure the rx fifo doesn’t overflow).

There are a few quirks I’ve seen.

  1. f you are using a single RX channel - never put an odd number as the timestamp in the tx packet header. Bad things happen. Internally the rx timestamps are only ever even numbers if there’s only one RX channel available, so presumably the odd number never matches.

  2. if you do interrupt the TX stream by not sending packets, make sure the TX fifo is empty before you start sending again.

If you are using a buffer size that is an integer multiple of the native packet size, I don’t think that the flushPartialPacket flag does anything - not totally sure about that though.

If your pulse is sometimes not transmitted at higher data rates, it seems likely that it is arriving from the host too late, and the timestamp you’ve requested has been missed? Try picking a time farther in the future for the TX packet. I’ve used 50 MSPS without trouble - of course that’s USB3. For USB 2 I would have thought 1 MSPS would be no problem, though 10 MSPS might be pushing it.

You shouldn’t have to do anything very tricky here to get the TX and RX samples synchronized to each other to within 1 sample. It turns out that the calibration routines do mess with the relative sync of TX vs RX, but its a pretty small effect, and it sounds like you have much bigger problems.

1 Like

I’ll make the simple, but apparently-not-obvious-to-everyone observation that THE ONLY WAY to get predictable TX latency in SDR systems running on general-purpose operating systems is to use some kind of hardware timestamp scheme–that is true regardless of the SDR hardware in use. You cannot get down-to-the-sample latency predictability in a software stack that is several layers deep on a general-purpose OS.

1 Like