Time-synced TX stream start hazard with LimeSDR_GW

I’m trying to do external triggered TX stream start on an XTRX rev 1.2 with fresh Litex based gateware. Previous gateware versions were also tested, going back to v4 (dec. 4) and v5, they show the same issue.

Versions: LimeSDR_GW 3.6 (714ded1e), LimeSuiteNG bd3c1322

Here is my test script:

limeConfig -i
limeConfig -l info --refclk=26e6 --samplerate=10e6 --rxen=0 --txen=1 --txlo=2000e6 --txpath=Band2 --txlpf=10e6 --txoversample=4
limeSPI write --chip=FPGA --stream=00CA0001

# trying to start with known last buffered sample
limeTRX -l info --input=8192.i16 --looptx

echo "prefill done"

# 16384.i16 is simple DC like [I=16384 Q=0]
limeTRX -l info --input=16384.i16 --looptx --syncPPS

Expectation: when I kill the first limeTRX with ctrl-c, the next starts, and waits for trigger signal with overflowing FIFO. After trigger edge samples gets clocked out with different amplitude.

Reality: first test signal comes out of device, with its signature amplitude (-9.66dBm for example), when I kill the process this CW stays. Next TRX process starts, and complains as expected for No samples or timeout (previous output sample is still active in RF) while no trigger, then after trigger Samples received: starts to increment, but TX signal stays the exact same old sample, not the new amplitude - actual TX stream does not start.

There might be a serious bug/race condition in stream buffer handling somewhere. The problem is that the feature works in some way, just not with arbitrary samples! If the to-be triggered sample file contains some specific IQ values (0, some other were caught working in float dataformat in our custom app) it works, on trigger the CW changes amplitude. On others it does nothing, never starts streaming TX, only RX.

I tested arbitrary amplitudes without the trigger function, and it works properly (2 powers, some less specific numbers).

@VytautasB Do you have any thoughts on this?

We have our custom application, which displays internal logs too:

Here is an immediate streaming example, which works as designed:

[LOG] /dev/limepcie0/trx0 Rx0: 113.902 MB/s | TS:28030212 pkt:111232 o:0(+0) l:0(+0) dma:13904/13904(+0) swFIFO:0
[LOG] /dev/limepcie0/trx0 Tx0: 273.696 MB/s | TS:0 pkt:279097 u:0(+0) l:0(+0) dma:49249/49252(+3) tsAdvance:+10000000000000000/+0/+0us, f:2
txNow=67289088 qEnd=67411968 emitted=67289088 late=0 gaps=0 underrunChunks=0
[LOG] /dev/limepcie0/trx0 Tx0: 275.847 MB/s | TS:0 pkt:560362 u:0(+0) l:0(+0) dma:98886/98887(+1) tsAdvance:+10000000000000000/+0/+0us, f:2
[LOG] /dev/limepcie0/trx0 Rx0: 113.934 MB/s | TS:56060676 pkt:222464 o:0(+0) l:0(+0) dma:27808/27812(+4) swFIFO:0
txNow=135110656 qEnd=135196672 emitted=135110656 late=0 gaps=0 underrunChunks=0
[LOG] /dev/limepcie0/trx0 Tx0: 275.438 MB/s | TS:0 pkt:841236 u:0(+0) l:0(+0) dma:148449/148453(+4) tsAdvance:+10000000000000000/+0/+0us, f:2
[LOG] /dev/limepcie0/trx0 Rx0: 113.934 MB/s | TS:84091140 pkt:333696 o:0(+0) l:0(+0) dma:41712/41720(+8) swFIFO:0
txNow=202801152 qEnd=202911744 emitted=202801152 late=0 gaps=0 underrunChunks=0
[LOG] /dev/limepcie0/trx0 Tx0: 275.822 MB/s | TS:0 pkt:1122484 u:0(+0) l:0(+0) dma:198081/198085(+4) tsAdvance:+10000000000000000/+0/+0us, f:2
[LOG] /dev/limepcie0/trx0 Rx0: 113.934 MB/s | TS:112137732 pkt:444992 o:0(+0) l:0(+0) dma:55624/55628(+4) swFIFO:0
txNow=270610432 qEnd=270757888 emitted=270610432 late=0 gaps=0 underrunChunks=0

This next one is starting on trigger, but not stable, it has very bad spectral contamination in LPF bandwidth, like sample is getting mistaken for 0 sometimes in the stream, transmitting I=0.70 Q=0.00:

[LOG] /dev/limepcie0/trx0 Rx0: 0.000 MB/s | TS:0 pkt:0 o:0(+0) l:0(+0) dma:0/0(+0) swFIFO:0
[LOG] /dev/limepcie0/trx0 Tx0: 0.000 MB/s | TS:2016 pkt:1461 u:1(+0) l:0(+0) dma:1/256(+255) tsAdvance:+10000000000000000/+0/+0us, f:4876
txNow=7090176 qEnd=16789504 emitted=7090176 late=0 gaps=131072 underrunChunks=16
[LOG] /dev/limepcie0/trx0 Tx0: 0.000 MB/s | TS:2016 pkt:1461 u:1(+0) l:0(+0) dma:1/256(+255) tsAdvance:+10000000000000000/+0/+0us, f:4876
[LOG] /dev/limepcie0/trx0 Rx0: 0.000 MB/s | TS:0 pkt:0 o:0(+0) l:0(+0) dma:0/0(+0) swFIFO:0
txNow=60887040 qEnd=61067264 emitted=60887040 late=0 gaps=16818176 underrunChunks=2053
[LOG] /dev/limepcie0/trx0 Tx0: 283.983 MB/s | TS:0 pkt:289586 u:1(+0) l:0(+0) dma:51102/51103(+1) tsAdvance:+10000000000000000/+0/+0us, f:2
[LOG] /dev/limepcie0/trx0 Rx0: 106.496 MB/s | TS:26191620 pkt:103936 o:0(+0) l:0(+0) dma:12992/13000(+8) swFIFO:0
txNow=128720896 qEnd=128876544 emitted=128720896 late=0 gaps=16818176 underrunChunks=2053
[LOG] /dev/limepcie0/trx0 Tx0: 275.955 MB/s | TS:0 pkt:570970 u:1(+0) l:0(+0) dma:100758/100759(+1) tsAdvance:+10000000000000000/+0/+0us, f:2
[LOG] /dev/limepcie0/trx0 Rx0: 113.902 MB/s | TS:54224100 pkt:215176 o:0(+0) l:0(+0) dma:26897/26904(+7) swFIFO:0
txNow=196632576 qEnd=196702208 emitted=196632576 late=0 gaps=16818176 underrunChunks=2053
[LOG] /dev/limepcie0/trx0 Tx0: 276.463 MB/s | TS:2016 pkt:852872 u:1(+0) l:0(+0) dma:150505/150506(+1) tsAdvance:+10000000000000000/+0/+0us, f:4
[LOG] /dev/limepcie0/trx0 Rx0: 113.902 MB/s | TS:82260612 pkt:326432 o:0(+0) l:0(+0) dma:40804/40808(+4) swFIFO:0
txNow=264527872 qEnd=264630272 emitted=264527872 late=0 gaps=16818176 underrunChunks=2053

And this is a I=0.35, Q=0.00:

[LOG] /dev/limepcie0/trx0 Rx0: 0.000 MB/s | TS:0 pkt:0 o:0(+0) l:0(+0) dma:0/0(+0) swFIFO:0
[LOG] /dev/limepcie0/trx0 Tx0: 0.000 MB/s | TS:2016 pkt:1461 u:1(+0) l:0(+0) dma:1/256(+255) tsAdvance:+10000000000000000/+0/+0us, f:4876
txNow=7073792 qEnd=14819328 emitted=7073792 late=0 gaps=98304 underrunChunks=12
[LOG] /dev/limepcie0/trx0 Rx0: 0.000 MB/s | TS:0 pkt:0 o:0(+0) l:0(+0) dma:0/0(+0) swFIFO:0
[LOG] /dev/limepcie0/trx0 Tx0: 0.000 MB/s | TS:2016 pkt:1461 u:1(+0) l:0(+0) dma:1/256(+255) tsAdvance:+10000000000000000/+0/+0us, f:4876
txNow=7090176 qEnd=16781312 emitted=7090176 late=0 gaps=131072 underrunChunks=16
[LOG] /dev/limepcie0/trx0 Rx0: 79.495 MB/s | TS:19554948 pkt:77600 o:0(+0) l:0(+0) dma:9700/9704(+4) swFIFO:0
[LOG] /dev/limepcie0/trx0 Tx0: 0.000 MB/s | TS:2016 pkt:1461 u:1(+0) l:0(+0) dma:1/256(+255) tsAdvance:+10000000000000000/+0/+0us, f:4876
txNow=7106560 qEnd=18743296 emitted=7106560 late=0 gaps=163840 underrunChunks=20
[LOG] /dev/limepcie0/trx0 Rx0: 113.902 MB/s | TS:47585412 pkt:188832 o:0(+0) l:0(+0) dma:23604/23608(+4) swFIFO:0
[LOG] /dev/limepcie0/trx0 Tx0: 0.000 MB/s | TS:2016 pkt:1461 u:1(+0) l:0(+0) dma:1/256(+255) tsAdvance:+10000000000000000/+0/+0us, f:4876
txNow=7122944 qEnd=20705280 emitted=7122944 late=0 gaps=196608 underrunChunks=24
[LOG] /dev/limepcie0/trx0 Rx0: 113.902 MB/s | TS:75615876 pkt:300064 o:0(+0) l:0(+0) dma:37508/37512(+4) swFIFO:0
[LOG] /dev/limepcie0/trx0 Tx0: 0.000 MB/s | TS:2016 pkt:1461 u:1(+0) l:0(+0) dma:1/256(+255) tsAdvance:+10000000000000000/+0/+0us, f:4876

RX starts on trigger, TX does not, depending on sample amplitude…

It’s the LMS7002M chip, when input data stops, DAC continues to transmit last sample’s value.

This sounds like data misalignment in Tx buffer. From software side, the tx packets contain headers (that include timestamp, and flags, such as whether the transmission should wait for designated timestamp, or be transmitted immediately)
It could be that somewhere in FPGA packets data gets misaligned, and samples data gets interpreted as packet header, therefore depending on the provided samples data the stream might get blocked waiting to transmit on erroneous timestamp, at the same time if it wouldn’t get blocked, the actual packet header would be interpreted as samples data, and produce

That was my thought process too, I just don’t have the expertise to inspect the gateware for this.

Am I supposed to create an issue on GitHub, or is this a more direct path to the FPGA devs?

You should create a github issue for tracking purposes.

Usually the software starts transmitting after it receives some samples from Rx. In your case you preemtively start transmitting data.
From software side, there are two gateware signal changes. PCIe DMA enable, and FPGA stream enable.
usual use case: TxDMA_enable → stream_enable → send_data
your use case: TxDMA_enable → stream_enable(with wait for PPS) → send_data → actual stream_enable on PPS

So I would assume the point that needs to be investigated is what happens in gateware when stream_enable changes, does it clear FIFO coming from DMA. If it did that on enablement, it would drop some of the already present data, and result in data misalignment.