RX bursts from channel 0 are not aligned properly for MIMO configuration

ccsh · 3 October 2017 14:19

Dear all,

Sorry for doubling this here and on LimeSuite github tracker, but I just want to make sure that everybody who can help will have a chance to see it, so here is the link where I have described interesting problem with RX bursts alignment for channel 0 in MIMO configuration:

ccsh · 6 October 2017 10:17

Looks like the issue is fixed in newest version of LimeSuite

ccsh · 9 October 2017 13:15

Unfortunately, it looks like I have called it off too soon. The problem was not occurring anymore for that particular configuration which I have used, but after changing it, I can still see the same issue. For new tested configuration basically all the transmission received via channel 0 has incorrect timing (at the same time, channel 1 is okay)…

ccsh · 11 October 2017 19:10

I have tested new configuration (T_data=50ms, Tburst=250ms) on two different, quite modern laptops (i7- and i5-based) and got the same result in both cases: transmission received via RX channel 0 was unreliable as timing of RX bursts was incorrect (only ca. 40 ms long part of TX burst was actually received). However, the problem does not occurr for the same configuration on another very powerful, desktop computer (i7-7700K) with recent motherboard. Also, if I increase Tburst to 300ms, the problem no longer occurr even on mentioned laptops.

So, it looks like this is hapenning when LimeSDR is receiving late RX requests. But this is very wrong that device is not reporting this problem at all (not a single error code is returned via readStream(), device is claiming that is was able to successfully receive all the requested samples). Moreover, I don’t understand why it fires RX burst too soon in that case (I am able to see the beginning of actual transmitted burst in the received data). If the RX call is late indeed, I would rather expect firing RX burst too late as well and thus seeing only end of actual transmitted burst in the received data. Why only channel 0 is affected? This is another mystery.

@andrewback, is anybody going to look into this?

andrewback · 11 October 2017 19:13

This sounds like one for @joshblum.

joshblum · 21 October 2017 22:54

Its looks like the issue is that in the ConnectionSTREAM::ReceivePacketsLoop it sometimes writes different values to each stream fifo based on space conditions in the fifo.

I made a change in this commit: https://github.com/myriadrf/LimeSuite/commit/d82631fe53b8193ca0cf31ef58ad0e421ed6d0b0 an
And similar checks in soapylms: https://github.com/myriadrf/LimeSuite/commit/0cc3cfaba7cc29d5f733d252b6459aa8493b7f86

@IgnasJ We can fix this in soapylms, but it may be more helpful to enforce alignment in the driver layer for other APIs to benefit from. However, we may need to make modifications to several driver recv loops other than LimeSDR. What do you think:

Should we try to enforce alignment in the ReceivePacketsLoop for LimeSDR and other devices so the identical sample size and time are written to each channel fifo?
Or should I add some code to soapylms layer to re-align the streams when we see different timestamps from different channel fifos?

Reference: https://github.com/myriadrf/LimeSuite/issues/141

IgnasJ · 23 October 2017 07:40

It should be fixed in soapylms. From my perspective ConnectionSTREAM::ReceivePacketsLoop works as intended and should remain as it is (with overwrite flag for both channels). The purpose of ReceivePacketsLoop is to keep reading samples from hardware without stalling to minimize chance of samples being dropped in hardware. It is up to higher level to choose appropriate fifo size and to read samples from fifo fast enough.

ReceivePacketsLoop already writes identical sample size and time to each channel fifo. It just overwrites oldest samples if channel fifo is full.

joshblum · 23 October 2017 15:47

Understood. But under a data race condition, OVERWRITE_OLD can occur on one channel and not the other. Suppose that the fifo[0] and fifo[1] was already full, here is race condition example:

ReceivePacketsLoop() writes to fifo[0] and drops old samples
User reads fifo[0] and fifo[1], now both fifos have space
ReceivePacketsLoop() writes to fifo[1] without any drops

Now each fifo has the same number of samples, but the samples and timestamps are out of alignment. Software has to realign the samples by dropping samples from one fifo based on the timestamp until alignment is achieved.

I guess ideally, OVERWRITE_OLD would cause identical behaviour on each fifo. Just wanted to make you aware of the race. Because even with a solution for SoapyLMS, a user of the C API may have to to implement similar logic to recover after fifo overflow. Just my 0.02$