LimeSDR Mini USB Throughput

Hello everyone,

We’re developing a C++ receive-only application with the LimeSDR mini, and we’re having a bit of a throughput problem.

First, here’s how we’re setting up the SDR, simplified/pseudo-code but I tried to include all the pertinent details.

LMS_Open(…)
LMS_Init(…)
LMS_EnableChannel(device, LMS_CH_RX, 0, true)
LMS_SetLOFrequency(device, LMS_CH_RX, 0, 97e6); //center frequency 97 MHz
LMS_GetAntennaList(…)
LMS_GetAntenna(… LMS_PATH_LNAW)

LMS_SetSampleRate(… 192e5)
LMS_SetLPFBW(… 192e5)
LMS_Calibrate(… 192e5)

memset(&streamId,0,sizeof(streamId));
streamId.channel = 0; //channel number
streamId.fifoSize = 1024 * 1024; //fifo size in samples
streamId.throughputVsLatency = 1.0; //optimize for max throughput
streamId.isTx = false; //RX channel
streamId.dataFmt = lms_stream_t::LMS_FMT_F32; //32-bit floats
LMS_SetupStream(device, &streamId)

We then start a highest priority thread (or time critical, we’ve tried both) in which to receive data from the LimeSDR mini.

We then call LMS_StartStream(&streamId).

In the thread, we do the following:

CMeasureSamplerate msr; //measures average sample rate from incoming samples

int blocksize_IQ=32768;

DWORD dwTick=GetTickCount();

while(1)
{
samplesRead = LMS_RecvStream(&streamId, pData, blocksize_IQ, NULL, 1000);

if(samplesRead)
{
	fifo->PutBytes(pData,samplesRead*2*sizeof(float)); //lock-free circular buffer
	SetEvent(hFftThread); //signal the data processing thread

	msr.Samples(samplesRead);
}


if(GetTickCount()-dwTick>1000)
{
	dwTick=GetTickCount();
	lms_stream_status_t status;

	LMS_GetStreamStatus(&streamId, &status);

	printf(tempstring,sizeof(tempstring),"RX rate: %.02f MB/s, FIFO: %.0f %%  %.02f MSPS D: %i",status.linkRate / 1e6, (100 * status.fifoFilledCount) / (double) status.fifoSize,msr.GetCurrentSamplerate()/1e6, status.droppedPackets);
	
}

}

We’re reading from the LimeSDR mini in one thread, and doing all data processing in other threads.
Now, here’s the problem:

When CPU load goes up (let’s say 50% on a Core i9-9900, evenly distributed across all 16 cores), it seems that samples (or whole frames) is being dropped before they get to us. status.linkRate starts drooping, as does my own sample rate measurement (roughly 17 MHz instead of 19.2), and status.droppedPackets skyrockets.

Here’s the funny part, though. If we ask LimeSDR mini for 30.72 MSPS, then we get 28. Not all that we’re asking for, but more than enough to sustain 19.2 MSPS!

If we ask for 9.6 MSPS then we do get all that we’re asking for, but at higher rates, it’s like it’s haggling with us, giving us not quite all that we ask for just to keep us in line.

It seems to me that this might be a driver bug somewhere… because if I wasn’t handling the expediciously enough, I would expect that the fifoFilledCount, as reported by LMS_GetStreamStatus, would start increasing… but it isn’t. It’s always at 0 to 1% of the fifo size, yet samples are being dropped, before they’re even entering the fifo.

Our application does require a continuous stream of samples without discontinuities, as would any RF application we can think of. So, we’re hoping that a problem this big couldn’t possibly have gone unnoticed for this long, and that we’re doing something silly while initializing the LimeSDR mini, or perhaps even reading the data in a suboptimal way… but, I cannot for the life of me figure out WHAT we’re doing wrong.

I would very much appreciate some help or pointers.

Thank you very much in advance, any ideas?

What OS? Dou you have other USB devices attached to PC (slow devices like HID can cause problems )? Mini works as USB 3.0 or 2.0? Did you test your app on another PC?
Did you try throughput test with Limesuite-Gui?

Windows 10 Pro OS.

No other devices on the bus. Directly attached to motherboard header. Same results on 3 different PCs.

How do you do the throughput test with Limesuite?

Hi there,

I’m working with goobenet on this project.

We’ve been working non-stop trying to get to the bottom of it.
We’ve tried using LMS_FMT_I12 rather than LMS_FMT_F32 - it behaved only marginally better (if at all).

Utilizing a divide-and-conquer approach, we tried the following:

We modified singleRX.cpp very slightly, adding display of dropped samples, so the lines following LMS_GetStreamStatus are as follows:
cout << "RX data rate: " << status.linkRate / 1e6 << " MB/s "; //link data rate
cout << "Dropped: " << status.droppedPackets << " ";
cout << "RX fifo: " << 100 * status.fifoFilledCount / status.fifoSize << “%” << endl; //percentage of FIFO filled
Also, we changed the sample rate to 19.2 MSPS. We then compiled it to singleRX.exe.

If we run just singleRX.exe, we see some dropped samples in the beginning, then usually none, but sometimes a few, even with virtually zero CPU load.

RX LPF configured
Normalized RX Gain: 0.739726
RX Gain: 54 dB
Rx calibration finished
RX data rate: 0 MB/s Dropped: 27 RX fifo: 0%
RX data rate: 57.9994 MB/s Dropped: 0 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 1 RX fifo: 0%
30 lines snipped, dropped 0, fifo 0-1%
RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 1%
RX data rate: 57.8683 MB/s Dropped: 2 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 0%

CPU load did not peak above 5% during this idle load run, nor did RX fifo go above 1%, yet there were drops.

If we run several copies of Prime95 at the same time, pushing the CPU load close to 100%, we see the following:

RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 2 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 2 RX fifo: 0%
RX data rate: 57.4095 MB/s Dropped: 101 RX fifo: 0%
RX data rate: 57.8105 MB/s Dropped: 5 RX fifo: 0%
RX data rate: 57.8105 MB/s Dropped: 8 RX fifo: 0%
RX data rate: 57.8028 MB/s Dropped: 4 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 1 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 5 RX fifo: 0%
RX data rate: 57.8028 MB/s Dropped: 7 RX fifo: 0%
RX data rate: 57.8105 MB/s Dropped: 3 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 2 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 7 RX fifo: 0%
RX data rate: 57.8028 MB/s Dropped: 6 RX fifo: 0%
RX data rate: 57.8028 MB/s Dropped: 7 RX fifo: 0%
RX data rate: 57.8028 MB/s Dropped: 14 RX fifo: 0%
RX data rate: 57.8028 MB/s Dropped: 14 RX fifo: 0%
RX data rate: 57.8028 MB/s Dropped: 9 RX fifo: 0%
RX data rate: 57.8028 MB/s Dropped: 5 RX fifo: 0%
RX data rate: 57.8028 MB/s Dropped: 10 RX fifo: 0%
RX data rate: 57.8028 MB/s Dropped: 7 RX fifo: 0%
RX data rate: 57.8028 MB/s Dropped: 6 RX fifo: 0%
RX data rate: 57.8028 MB/s Dropped: 9 RX fifo: 0%
RX data rate: 57.8028 MB/s Dropped: 3 RX fifo: 0%
RX data rate: 57.8028 MB/s Dropped: 12 RX fifo: 0%
RX data rate: 57.8028 MB/s Dropped: 9 RX fifo: 0%
RX data rate: 57.8028 MB/s Dropped: 9 RX fifo: 0%

Lots of drops, yet fifo status remains empty, i.e. we’re dealing with the data as quickly as it comes in, so we were not overloading the CPU. I also made the singleRX.exe thread TIME_CRITICAL, and the entire process REALTIME_PRIORITY. Made no difference, still saw drops.

I then added the following before the while() loop:
int sleep_counter=0;

…and the following inside the while() loop:
sleep_counter++;
if(sleep_counter % 100)
{
Sleep(10);
}

…thus intentionally slowing down the singleRX.cpp main loop just enough that it wasn’t keeping up, so RX fifo started growing, up to 98 or 99 %, where it stayed… yet, it did NOT show any dropped samples. So, “dropped” is not showing anything related to the FIFO – it’s showing something else. WHAT is it showing? Where is the data being dropped? Why is data being dropped when there is a perfectly good fifo to put it into? We need that data, we don’t want it dropped.

Then, for the final test, we made our main program read and act on a file with pre-recorded data, rather than talking directly to the dongle.

That way, we were able to use the singleRX.cpp example to talk to the dongle, while loading the CPU exactly like we normally would.

Our application uses lots of Intel IPP FFT’s to process the data in many threads at once, for a total CPU load of about 50%. Using QueryPerformanceFrequency as the time base for the entire application, and reading a file (rather than receiving samples directly from LimeSDR Mini), our application runs fine, whether we’re running singleRX.exe or not.

singleRX.exe, on the other hand, is NOT happy when our program is running, even though it’s at realtime priority and our program is at user level priority, and neither the CPU load nor the buffer is full… in fact, it’s leaking like a sieve:

RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 0%
RX data rate: 57.8028 MB/s Dropped: 0 RX fifo: 1%
RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 0%
RX data rate: 57.8105 MB/s Dropped: 0 RX fifo: 1%
RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 0 RX fifo: 0%
RX data rate: 57.8683 MB/s Dropped: 1819 RX fifo: 0%
RX data rate: 50.2814 MB/s Dropped: 2350 RX fifo: 0%
RX data rate: 48.3 MB/s Dropped: 1440 RX fifo: 0%
RX data rate: 51.97 MB/s Dropped: 158 RX fifo: 0%
RX data rate: 57.2129 MB/s Dropped: 246 RX fifo: 0%
RX data rate: 56.8197 MB/s Dropped: 313 RX fifo: 0%
RX data rate: 56.6231 MB/s Dropped: 265 RX fifo: 0%
RX data rate: 56.7542 MB/s Dropped: 310 RX fifo: 0%
RX data rate: 56.5576 MB/s Dropped: 232 RX fifo: 0%
RX data rate: 56.8284 MB/s Dropped: 170 RX fifo: 0%
RX data rate: 57.2129 MB/s Dropped: 233 RX fifo: 0%
RX data rate: 56.8852 MB/s Dropped: 282 RX fifo: 0%
RX data rate: 56.632 MB/s Dropped: 222 RX fifo: 0%
RX data rate: 56.9508 MB/s Dropped: 143 RX fifo: 0%

You can see quite clearly where our program started.

So… how can we find out where those dropped samples going? Why are they being dropped?

Could anyone tell me what status.droppedPackets is actually measuring?

It seems to be there may be a flaw in the logic of the signal path. Since the path is provably capable of 28 MSPS, I don’t understand why samples need to be dropped when we’re just running at 19.2 MSPS?

Or, is there a flaw in my testing methodology? Please do let me know.

I have lots of computers and a few LimeSDR mini’s to test with, and even a standard LimeSDR, which is much better behaved but does not meet our needs in other ways, namely lack of RF sensitivity in the VHF range, connector type, etc etc.

My main machine is a Core i9-9900K with Windows 10 (1511)

Any ideas would be greatly appreciated.

Thanks in advance!

About droped packets on Windows OS (it is for ethernet card but also applies for other devices):
https://www.thesycon.de/eng/latency_check.shtml
DPC is old as Windows (or any other OS with different names for IRQ driver latency). Your app is single thread with single core affinity and share CPU resource with other process. With USB link speed ~57MB/s and buffer size 32768 IRQ rate is severe task for your CPU (FFT is time demanding job). Try to increase buffer size several times and to check DPC latency for your system.
On LimeSuite-Gui open FFT viewer and start RX (my example is for 1.536MS/s):

Thanks for your reply Goran, but it does not apply. As I stated earlier, my app is multi-threaded. I used singleRX.exe only as a canonical example, something that can be reproduced by others. I did run LatencyMon while my program was running, using the LimeSDR mini and experiencing significant dropouts (~100) even though CPU load was just 22%. However, LatencyMon did not detect any problems.

57 MB/s is roughly 10% of USB 3 capacity… and again, no samples should be dropped, not while there is still room in the FIFO.

With SDR# and 19.2MS/s only few dropped packet in debug mode. On higher rates things getting worse but not that bad as your example. This is C# app and on my PC about 30MS/s is usable limit.
I did notice significant difference in performance between debug and release LimeSuite library versions.

I’m running in release mode (C++ app) when testing for performance.
You hit the nail on the head though – my basic question is: why are even a few packets dropped in 19.2 when the link is capable of 28, or even 30? The number of drops should be 0 all the way up to the maximum throughput of the interface, and the way to achieve that is by the hardware holding on to a packet until it can be sent over USB, rather than dropping it at the first sign of congestion, which is what it seems to be doing?

Today I was playing with two different USB 3.0 extension cables: one 0.5m long and second 1.8m. Both are no name (most likely from China). Connected directly to motherboard they works as I previously described. But when I use PC case front USB 3.0 connectors disaster happened! Not even 1.5MS/s without dropped packets. I try Mini directly also very poor performance. I guess that motherboard PCB construction is crucial for USB 3.0 data packet integrity at super speeds.
Please someone correct me if I mistake: FT601 uses BULK endpoints for send-receive data, error free protocol where data packets are repeated until CRC is correct, which is ok for mass storage devices but not for real time SDR radio. It will be better to use ISOC packet stream without error checking but with guaranteed bandwidth. Not sure if FT601 is even supports ISOC endpoints. Just my two cents.

So, even with no cables, plugged into a header on the motherboard, the same problem exists… So, is the mini just not capable of a constant 24MSPS (or even 19.2MSPS) data stream reliably? I see other posters saying similar things. If not, thats a damn shame. Glad i havent ordered 1000 units for a product we are developing. If it can be solved, great! Hell, I will pay someone to give us the solution on a reliable 24MSPS solution.

But here is the rub. I tried the mini (3 different ones) on 4 different systems, 6 different cables, directly connected, didn’t matter. The Lime USB full size worked no problem on every instance. This does seem like there is some issue with the mini itself.

FWIW I’ve never got good, reliable throughput from the Mini at anything above 20MSPS, I know what I’m doing when reading USB. Even below 20MSPS it can be hit and miss.

Meh mine dosen’t even work on usb3 so stuck with 5MSPS

I’ll bet part of the issue is power, we don’t have a way to provide extra current. Anyway, the Lime USB works fine now that the idiot programmer has read the docs several times :slight_smile: .

So, something’s interesting. It seems that as if the driver is somehow running in the userland space? We can remove all the time_critical processes from the app, and it behaves most of the time (still drops packets here and there, but still does it).

Power may be part of it, but it seems the FTDI chip is just… stupid. While it does work, it does not work well. I understand it was to cut costs, but I’d happily like to see and pay for a “middle ground” LimeSDR, the format of the mini, with external power options and the FX3 USB interface.

My gripe with the LimeUSB is the filters. Our application is based in the 70-300Mhz range, which there’s not many options to begin with as far as SDR options. The LimeUSB is great if you’re into cellular or SHF stuff, but anything in VHF? Forget it. I’m aware of the HF mods done by several here, but even after doing those, which are great for HF, the 70-300Mhz areas are still deaf. The Mini, not having those particular filters provide quite good signal to noise, but at the cost of these damned dropped “packets”.

I would love to hear from Myriad themselves on this, but I’m not sure that’ll happen, especially if they know there’s a problem. We’re developing a hardware product for broadcast use. The Lime products fit the bill initially, but with this discovery, I’m not so sure anymore.

My starting point to narrow down the source of your problem would be by isolating each data transfer and processing steps. Ramp up the sample rate and see where the problem starts and then go a little bit below where it is dropping samples and move onto the next test case.

So my very first test would be the USB bus, read the samples (and throw them away), and keep on increasing the sample rate until the problem appears. And the reason for throwing them away is because I want to avoid looking for bad memory chips or low memory bandwidth issues. Basically what is the maximum lossless data rate that can be achieved with that USB cable, USB chipset, with that USB driver, and that exact OS. I’d also disable any OS power saving options on the USB port for the duration of testing, to eliminate that.

The next step would be store the samples into multiple RAM buffers with no processing and again ramp up the sample rate and see where the problem starts. And when the buffers are all full, reuse them without any processing (in effect throwing the samples away again). And then go a little bit below where it is dropping samples and move onto the next test case.

The next test case would be to add some processing of the samples in the buffers (maybe disable any GUI output if there is any) and ramp up the sample rate until there are drops.

Then you will have a better idea if you have a USB throughput issue, RAM bandwidth issue, or a CPU issue. Some CPU’s using boost clockrates overclock their frequency when they see a high processing load (for better benchmarking scores), then hit thermal overload and under clock until they cool back down. What I’m saying is that you can not eliminate anything as a source of a bottleneck to performance without actually testing it in isolation or at least with some knowledge of the best achievable performance of the previous steps of the data flow.

Since you are using a blackbox “Intel IPP FFT’s” you have no idea what it is using in terms of memory bandwidth or interrupts, the CPU could show as being nearly idle because it is waiting for access to resources. Are the 16 processing threads blocking access to a segment of RAM until they are all finished running before the data stream is allowed to write new data to that segment (not buffer, segment, it could be a different buffer sharing the same segment). Looking at CPU usage and equating that to a measure of a bottleneck is not always valid.

The solution could be something as simple as adding lots more RAM buffers and lowering the priority of the processing threads, maybe double, or half, the number of data processing threads. Lowering their priority could mean that they get lower priority interrupts (if they are using interrupts) than the USB driver. It could also be a cooling or power issue, you need to check those as well.

mzs,

In my case I have a high priority thread which reads data and sends it to other threads for processing. I also use IPP for FFT and when available CUDA. Neither of these will affect the reading from USB3.

Now the weakness of the Mini doesn’t worry me too much as I much prefer a SDR connected via a cable rather than being connected directly to a USB port, also the Lime USB supports external power. (*)

I can run the LimeUSB at 61.44 MHz bandwidth without problem. Same computer, good powered USB3 ports and the Mini isn’t 100% reliable at 10MHz.

(*) USB powered SDRs which don’t support external power cause me no end of support issues. Some users think you can plug any number of devices into a USB port: mouse, oven, battery chargers etc.

1 Like

Form FT601 documentation:


at bottom of page 5 reports speeds ~223-372MB/s depends on FIFO (1024-8192) and Idle (150-200).

This is Mini with SDR# 1672 at 35MHz BW:


I can’t hear any cracks and RDS working correctly. On 40MHz BW no audio cracks but RDS shows errors.

This is picture of LimeSuite-Gui FFT viewer at 40MHz BW:


Rx rate is only half of what FTDI claims for 1024 FIFO size. Unfortunatelly no dropped packet information.
For Mini performance test we would need modified FPGA data loopback image in conjunction with FT600DataStreamerDemoApp.

Here’s a win64 cpp executable from a modified singleRX.cpp, adding display of dropped samples by adding the following lines after LMS_GetStreamStatus.

cout << "RX data rate: " << status.linkRate / 1e6 << " MB/s "; //link data rate
cout << "Dropped: " << status.droppedPackets << " ";
cout << "RX fifo: " << 100 * status.fifoFilledCount / status.fifoSize << “%” << endl; //percentage of FIFO filled

Also, it runs for 1000 seconds unless cancelled by pressing SCROLL LOCK, and sets the sample rate to 19.2 MSPS.

I’m still very interested to see a conceptual block diagram of the driver and hardware. Is anyone able to answer the question as to how it’s possible for samples to be dropped without the fifo being full? Where is the fifo in the chain?

How big is the buffer that sits between the LMS7002 and the FT601? Is that where packets are being dropped? Is it possible to turn a portion of the FPGA into a bigger buffer on the HW side so that the HW stops dropping our precious packets?

@IgnasJ, perhaps you could advise?

2 Likes