LimeSDR Mini - Dropping Samples on bandwidths higher than ~10MSPS - Your experiences?

Hi,
I’m using a LimeSDR mini with an external 10MHz clock. The LimeSuite version is v20.10.0, and the gateware is also up-to-date. The used OS is Debian 10 and Windows 10. The connection is USB3 using a short 30cm cable. All tests have been performed with two computers with relative modern Intel i5’s.

When using sample rates higher than ~10MSPS I get dropped samples already which is a little disappointing. I tested it in GnuRadio 3.8, SDRAngel, GQRX and soapysdr/python on Linux and SDRAngel on Windows on two different machines. My temperatures are between 45°C and 55°C when running for a longer time.
I tried a realtime kernel on Debian 10 which seems to improve the performance a little but not much. To check if my CPUs are my bottleneck, I built a gr3.8 flowgraph which records directly to a file without any signal processing or gui. I then checked the file in a separate flowgraph for dropped samples. During my recording, my CPUs weren’t utilized that much. So I assume my CPUs aren’t the problem.

So am I the only one with this underperformance? Did someone of you also experience these problems and got a fix?.
Regards

That would be testing your USB performance and your disk access times/performance, and maybe even possibly your memory access if there was enough buffering involved (I do not know, I have not read through the source code for the gr-LimeSDR source and file sink blocks in gnuradio). But that test would definitely not be testing your CPU performance at all. Testing from a prerecorded file, will slow down/throttle the processing to whatever your CPU can handle (without an actual hardware sink block or a throttle block, this could be faster than real time, it will process the samples as fast as your CPU can).

Maybe just say exact model of your CPU ?
e.g. A modern Intel Core i5-4230U @ 1.90GHz, from 2018, would struggle to do much processing with more than about 10MSPS (maybe 20MSPS) note-1.

Or show the flow graph that your computer can not handle more than 10MSPS. The usual first block in a typical flow is to filter and decimate as much as possible to reduce the sample rate, to lower the CPU requirements as fast as possible.

( note-1 It really depends on how much actual processing you are doing to the samples, and how many samples, processing data with a large amount of decimation in the first few blocks can reduce the CPU requirements by a lot ).

I use to get a full 40 MSPS with older versions of LimeSuite and SduGlut.
Now, with v20.10.0 I can only get 10 MSPS and when I try to go I faster get errors -

LimeSDR Mini [USB 2.0] 1D40E98019939A (0) count 2 sound restarted
LimeSDR Mini [USB 2.0] 1D40E98019939A (0) count 6 sound restarted

It is a USB 3.0 system - What is up with [USB 2.0] ?

Hi, thanks for your answer, maybe I should have been more specific. As you wrote, normally, one does filter/decimate the input signal and do more DSP related stuff which is CPU intensive, so I skipped all of that and connected the gr-lime rx block directly to the file sink to eliminated the possibility of my cpu being overwhelmed. USB3 shouldn’t be the bottleneck either, as per technical specifications. Both systems use sata and nvme ssd so a disk bottleneck can be eliminated too. The recorded file is than processed in a different flowgraph which has no realtime dependence and can take as long as it wants to process.
So using this setup I record a FM Broadcast to a file which in a separated flowgraph is demodulated to an audio file which I later open and playback to listen for hickups which appear when samples are dropped.
This is purely a way to check if samples are dropped because gr-limesdr as no indicator for that. I want to test if my LimeSDR Mini is capable of recording larger slices of spectrum as stated in the specs for a later project

As I said both processors are Intel i5 (9th 9400H) and an older 4th gen.

So that leaves something USB, Hardware or LimeSuite related as culprit in opinion

Oh, I saw that note after posting my previous comment. Yes that is exactly why I splitted recording and processing in two different flowgraphs.

Do you recall which version of LimeSuite worked? Maybe I can try to downgrade and see if this changes something?

I not sure what the version was. Here is a video of one of the tests of where it was working - maybe it has a clue to the version -

SdrGlut Simultaneously Running Five SDRs

Thanks, I’ll have a look which version of limesuite was release when you uploaded the video. Maybe I’ll try it with SDRGlut to as it seems to work on your machine. Thanks for the hint

That CPU should have enough processing power, if it is in a windows machine, I would temporarily change the power saving settings to maximum performance, because sometimes windows may decide to throttle the CPU performance. The 4th generation i5 might struggle depending on what it is actually.

Most of my tests are done on linux and as I wrote above, the CPU doesn’t seem to be the bottleneck. It is rarely used when recording directly to a file but samples are still dropped. When I tested it on windows the performance was set to maximum performance.

Here is the home page for SdrGlut and the last two versions -

https://github.com/righthalfplane/SdrGlut/releases/tag/v1.20

SATA I is 1.5Gb/s, up to 150MB/s.
SATA II is 3Gb/s, up to 300MB/s.
SATA III is 6Gb/s, up to 600MB/s.

10MSPS within gnuradio are naively stored in memory as 32-bit floats (complex signal has 32 bits for the real part and 32 bits for the imaginary part), so that could be writing 80 MB/second to your harddisk if you are writing the default complex samples, that would be a large percentage of the maximum throughput for a SATA I hard disk (I have no idea about your actual hardware, I’m assuming slower because that would have been a way to reduce the price point of the machine). Or even if it was a newer disk with a lot of fragmentation, that could delay writes. If there was no additional buffering in RAM, like I said I have not read the gnu radio source/sink blocks source code, any delays could cause buffers to be overwritten.

You could have two different problems on two different machines, one could be CPU and disk performance, and the other might be the windows OS throttling the CPU because of a powersave option.

Thanks for that detailed answer. One SSD is SATA3 on the older i5 and on the i5 9400H I use a NVME drive so that should be capable of handling 80 MB/s.
Just to make sure, I’ll setup a ramdisk and record to that and see if the problem still exists. I’ll report back when I tested it
Regards

You could change the file sink block to write two 16-bit integers instead of two 32-bit floats. And change the file source block to expect two 16-bit integers. That would half the disk throughput requirement, but you would need to add a complex to ishort block as well.

I have an I/O speed test program, that I used for years -

svn export https://github.com/righthalfplane/SdrGlut/trunk/misc/ioSpeedTest/iotest2.c .
cc -O2 -o iotest2 iotest2.c
iotest2
0.84 Seconds To Write 400000000 Bytes 40000000 Blocksize 477.33 MBytes/Sec
rm ioTest.jnk

I do something similar using the default UNIX command line tools to do basically the same thing, and with tiny adjustments it works on pretty much all flavours of UNIX. The big advantage is that you do not need to download and compile anything (not every system has a compiler installed), which is useful for machines on an Intranet with no direct Internet access.

$ # 10MiB
$ time dd if=/dev/zero of=10mb.delme bs=1M count=10
$ time dd if=10mb.delme of=/dev/null bs=1K count=10K
$ rm 10mb.delme
$ # 100MiB
$ time dd  if=/dev/zero of=100mb.delme  bs=1M count=100
$ time dd  if=100mb.delme of=/dev/null  bs=4K count=25K
$ rm 100mb.delme
$ # 1GiB
$ time dd if=/dev/zero of=1gb.delme bs=1G count=1
$ time dd if=1gb.delme of=/dev/null bs=64K count=16K
$ rm 1gb.delme

And you can adjust the block size to see the effect on sequential read/write performance. The optimal storage access block size is non intuitive, because there are so many buffers involved. The low level block size on the hardware could be 512 bytes (spinning rust),4096 bytes (spinning rust), or even 8 MiB(ssd), but due to cache RAM in the storage devices the optimal block size might be 64KiB or even 128MiB for reads and something totally different for writes. You can easily create a bourne shell script to loop through all the possible combinations, but it would be mostly similar to the above commands. You also need to be very careful with SSD’s, especially older SSD’s that are close to their end and nearly at their block write limit (typically program/erase cycles per block is ~100k for SLC, ~10k for MLC, ~5k for TLC), they have a habit of permanently failing read-only if you hit them with lots (billions to trillions) of tiny disk writes (1-2 bytes). With SDR this is not normally a concern because you never write tiny files.

time dd if=/dev/zero of=1gb.delme bs=1000000000 count=1
1+0 records in
1+0 records out
1000000000 bytes transferred in 1.546176 secs (646756920 bytes/sec)
0.000u 0.643s 0:01.60 40.0% 0+0k 0+0io 0pf+0w
time dd if=1gb.delme of=/dev/null bs=64000 count=16000
15625+0 records in
15625+0 records out
1000000000 bytes transferred in 0.140137 secs (7135864221 bytes/sec)
0.014u 0.127s 0:00.14 92.8% 0+0k 0+0io 0pf+0w

A read at 7.1 GB/SEC - is at best reading a buffer not the file. The hard drive is rated at something like 0.9 GB/SEC. There is no telling if the write finish writing the buffer before it quit - so using these routines for speed tests does not work on my system.

If you see that you are hitting RAM buffers instead of storage, just keep on making the filesize 10x larger until the buffers are all full. If they can not buffer the data, then you will start to see the performance of the real storage. Or use a filesize larger than the RAM in the machine by a factor of 10.

EDIT: The file written is all zeros, so if your storage is compressed (e.g. ZFS, btrfs,…) then you may get ridiculous read performance (and write). If it is the case that you have some form of compression enabled then use /dev/urandom instead of /dev/zero, which will not compress (which would be closer to most IQ files generated by SDR’s, which do not compress well at all).

20.10 Limesuite works fine for me (49MS/s for USB and 30MS/s for Mini on Win10, Ryzen 7 2700x or Ryzen 5 1500x CPU). I did not see any reference on USB 3.0 controller used in your experiment. Is it part of chip-set or add in card? Do you use extension cord connected to motherboard or front panel of the PC housing? Different cables? Did you try without 10MHz external reference? And finally what app for Win10 you are using? I recommend SDR# 64bit version (on Ryzen 7 2700x 30MS/s only 5% of CPU time is used).

1 Like

Thank you all for your help. I found the error and it was as stated by @mzs , my NVME disk is having some weird problem when saving the streamed data. I tested the performance using dd and got values between 170MB/s and 300MB/s on a 10GB file depending on the blocksize which led me to belive it isn’t the bottleneck. When using a ramdisk, the problem is gone.

I’ll look further into saving the samples as 16 bit integers using the complex to ishort block which seems to work but when reading and converting back from ishort to complex it seems to mess up somewhere which I’ll need to fix.

Thank you all for the help!

1 Like