Timestamps support via SoapySDR

ccsh · 18 September 2017 21:14

@joshblum, thank you again for detailed answer. As always it explained some things. However I will insist that UHD send() calls are indeed blocking in burst-like transmission, see details below

joshblum:

ccsh:

you can’t really get real current device timestamp in case of LimeSDR - what you get by calling getHardwareTime() is actually a cached value of last timestamp seen by RX thread. It may be a little problem if RX calls are happening rarely (say that you are requesting RX burst every 10 seconds, then the value returned by getHardwareTime() will be updated every 10 seconds as well).
hardware timer will start only when internal RX thread will be enabled - it means that you cannot create truly TX-only, timestamp-based application for LimeSDR (it has to have RX calls as well).

getHardwareTime() is kept up to date by the internal rx stream thread. You do have to activate the rx stream but you dont need to call readStream(). The timestamp will be kept up to date regardless.

I agree that keeping the rx stream activated for a transmit only application is annoying and unexpected. I think register access to the last timestamp counter would be preferable in this case. @Zack thoughts?

I also think that direct access to register would be much better here. So let’s hope @Zack will come up with some clever idea

Okay, I am quite sure that in case of UHD it does not work like that, but gonna try that again with SoapySDR.

Yeah, something like that. But if I can get reliable stream status as you have described above (which will report late command especially), then it will be good enough.

As I said, I got some experience with UHD and USRP devices, so you can really trust me on it - these send() calls do block and wait for user timestamp when configured to work with bursts I have crafted simple program to prove it (just make sure that you are testing it with actual USRP device and not some other hardware, “hacked” to work with UHD, and that you are setting metadata structure like in this example):

#include <uhd/usrp/multi_usrp.hpp>
#include <boost/date_time/posix_time/posix_time.hpp>

using namespace std;

namespace pt = boost::posix_time;

int main(int argc, char *argv[])
{
	//disable cout and cerr buffering
	setvbuf(stdout, NULL, _IONBF, 0);
	setvbuf(stderr, NULL, _IONBF, 0);

	//adjust following lines for different USRP device than USRP B200
	uhd::usrp::multi_usrp::sptr device = uhd::usrp::multi_usrp::make(string("type=b200")); //e.g. addr=192.168.10.1 for USRP N210
	double f_clk = 33.333333e6;	//e.g. 100e6 for USRP N210
	short D = 2; // f_clk/D=sampling_rate, thus set it to 6 for USRP N210 to keep the same sampling rate
	device->set_tx_antenna("TX/RX");
	device->set_tx_gain(15); //should be good to go for any USRP
	device->set_tx_freq(1457e6); //may be needed to change if e.g. daughterboard does not cover it

	//following lines probably does not have to be changed
	device->set_master_clock_rate(f_clk);
	double sampling_rate = f_clk / (double)D;
	device->set_tx_rate(sampling_rate);
	while (!device->get_tx_sensor("lo_locked").to_bool()) //wait for actual tuning
		usleep(100);
	uhd::stream_args_t stream_args_tx("fc32");
	uhd::tx_streamer::sptr tx_stream = device->get_tx_stream(stream_args_tx);

	double burst_period = 2.2; //TX bursts will be repeated that often (in seconds)
	double tx_burst_length = 5e-3; //TX bursts will be that long (in seconds)

	int tx_buffer_size = tx_burst_length * sampling_rate;
	vector<complex<float>> tx_buffer(tx_buffer_size, complex<float>(1.0f, 0.0f)); //means that unmodulated carrier will be transmitted

	int64_t no_of_ticks_per_bursts_period = D * burst_period * sampling_rate; //number of master clock ticks per burst repeating period
	int64_t now_tick = device->get_time_now().to_ticks(f_clk); //current device hardware time
	double time_in_future = burst_period; //some time in future needed for first send() call, in this case burst period is actually long enough
	int64_t tx_tick = now_tick + uhd::time_spec_t(time_in_future).to_ticks(f_clk);
	double timeout = 10; //intentionally huge value (in seconds) to show that in normal conditions send() is returning earlier

	cout << endl << "Let's check how long do we need to wait for send() to return if burst period is equal to " << 1e3 * burst_period << " [ms]...";

	try
	{
		for (int i=0; i<6; i++)
		{
			//convert ticks to time in ns (like in SoapySDR API)
			int64_t burst_time = 1e9 * uhd::time_spec_t::from_ticks(tx_tick, f_clk).get_real_secs();
			uhd::time_spec_t current_time_spec = device->get_time_now();
			int64_t current_time = 1e9*current_time_spec.get_real_secs();

			//very important - first three fields in the metadata structure need to be set to true!
			uhd::tx_metadata_t tx_md;
			tx_md.start_of_burst = true; 	//as it actually is start of new burst
			tx_md.end_of_burst = true;		//as it also is end of this burst
			tx_md.has_time_spec = true;		//as we are going to provide timestamp
			tx_md.time_spec = uhd::time_spec_t::from_ticks(tx_tick, f_clk); //as promised, timestamp is given here

			pt::ptime t1 = pt::ptime(pt::microsec_clock::universal_time()); //remember system time before calling send()

			//call send - as it will turn out IT IS BLOCKING until device hardware timestamp will match the one requested by user
			int no_of_transmitted_samples = tx_stream->send
			(
				&tx_buffer[0],
				tx_buffer_size,
				tx_md,
				timeout
			);

			pt::ptime t2 = pt::ptime(pt::microsec_clock::universal_time()); //remember system time after calling send()

			//return time for first iteration may be a little bit different than in normal conditions (due to used time in future etc.)
			//so let's skip that
			if (i>0)
			{
				cout << endl << "iteration no. " << i
					<< ": current_time=" << current_time << " [ns], "
					<< "burst_time=" << burst_time << " [ns], "
					<< "send() returned after " << (t2-t1).total_milliseconds() << " [ms] from calling it"
					<< ", no_of_transmitted_samples=" << no_of_transmitted_samples << "/" << tx_buffer_size;
			}
			else
				cout << endl << "iteration no. " << i << ": ignoring it due abnormal wait time introduced by time_in_future for first call...";

			tx_tick += no_of_ticks_per_bursts_period;
		}
	}
	catch (const std::exception& e)
	{
		cerr << endl << "main_loop: " << string(e.what()) << endl;
		return 1;
	}
	catch (...)
	{
		cerr << endl << "main_loop: Unexpected exception occurred. " << endl;
		return 1;
	}

	sleep(1);
	cout << endl << "All done!" << endl << endl;
	return 0;
}

Here is example output when burst period was set to 2.2 [s]. Note that send() calls were returning after 2.2 [s] as well - coincidence?

Let's check how long do we need to wait for send() to return if burst period is equal to 2200 [ms]...
iteration no. 0: ignoring it due abnormal wait time introduced by time_in_future for first call...
iteration no. 1: current_time=3137237816 [ns], burst_time=5335434293 [ns], send() returned after 2199 [ms] from calling it, no_of_transmitted_samples=83333/83333
iteration no. 2: current_time=5337391265 [ns], burst_time=7535434275 [ns], send() returned after 2199 [ms] from calling it, no_of_transmitted_samples=83333/83333
iteration no. 3: current_time=7537363845 [ns], burst_time=9735434257 [ns], send() returned after 2199 [ms] from calling it, no_of_transmitted_samples=83333/83333
iteration no. 4: current_time=9737214264 [ns], burst_time=11935434239 [ns], send() returned after 2199 [ms] from calling it, no_of_transmitted_samples=83333/83333
iteration no. 5: current_time=11937210694 [ns], burst_time=14135434221 [ns], send() returned after 2199 [ms] from calling it, no_of_transmitted_samples=83333/83333
All done!

joshblum · 20 September 2017 04:01

Lol, time to pull out the big guns

@ccsh I’m just trying to help make your code reliable. I’m the primary author on UHD, the driver architecture, the network and USB host implementations, and much of the FPGA stack. Send fundamentally isnt allowed to block on time because that would be a architectural problem. The software cant know the device time so it sends the samples off into the FPGA ASAP to let the hardware perform the waiting. Send enough samples, and then send starts to backpressure, send only blocks when the fifo is full.

So what you are seeing (again) is because your transmit burst is larger than the available buffering. The send call backs up after it sends most of the burst into the device. The call isnt blocking on the time, its blocking on the fifo back-pressure. Of course, ultimately the fifo is blocking on the FPGA core which is blocking on the time, but its a difference. Reduce the burst length or fragment send() into multiple calls and you will see what I mean. The first calls will return ASAP.

So you could be making an assumption that could break on a different device or different set of burst parameters.

If we were to reduce the software buffering in the LimeSuite library, I think you would get the desired behaviour. There is a stream argument called “bufferLength” for this purpose. I think it might be interesting to experiment with.

I think this sort of burst blocking behaviour is actually kind of interesting and could be useful in some apps. I hadnt thought of it, and I had always recommended using the rx time to pace the tx stream (even in the case of uhd). Usually the buffer is too small, and well if it blocks, thats too bad, users just have to wait in your transmit loop. – On the other hand, I think you discovered it as sort of an unintended feature of a small transmit buffer. You have to be careful, because its working by accident/experimental discovery, but its definitely useful.

Yeah, something like that. But if I can get reliable stream status as you have described above (which will report late command especially), then it will be good enough.

Totally agree. The stream status has really let you down. I have tried to make the software a little more correct here, but its fundamentally an FPGA feature/bug fix that needs to be in place for the support. That said, I think your code examples worked once the blocking timeout was removed, so I hope you can make it work by some of the other means discussed like waiting on a particular timestamp with getHardwareTime().

joshblum · 20 September 2017 04:11

Yea, it could be a lemon. ccsh ran the example FWIW and it gave the same delay value. Make sure you have latest master or stable branch in case you are just missing a recent fix. And that LimeUtil --make doesnt show any out of date fw/fpga images. I guess thats my best advice until you can compare on another device

There is an internally synthesized clock inside the LMS7, its basically tuned to some power of two multiple of the sample rate. The Rx and Tx clocks that go to the FPGA are just power of two divisors of this clock. All of the FPGA timing logic operates on the Rx sample clock in the FPGA. There is an expectation for the timing logic that the Rx and Tx sample rates match. But other than that its a pretty typical upconverter/downconverter architecture, you just see the halfband chains in the LMS7, rather than say the FPGA.

ccsh · 20 September 2017 09:49

WHITE FLAG! WHITE FLAG! I don’t want to fight with you, I know your gun is much bigger than mine! And I really admire your work and love the idea behind SoapySDR and Pothos especially

What you have described above makes perfect sense, so I gave it a try with smaller tx buffer size, and it turned out that you are right (what a surprise ) , so I roll back everything I said about blocking send() calls and humbly admit that I was using the “feature” discovered by accident . Funny that I didn’t note that earlier, but I was actually working with buffers long enough to mask that on both tested USRP devices

That is probably the way most of apps will work anyway, e.g. in the case where slave station must synchronize its timestamps (both TX and RX) to the timing of master station. Already adopted it in my tester as well.

Actually I got both TX and RX threads to work with SoapySDR, thanks to your suggestions Still got some problem with receiving stream status for TX, though (it always returns timeout, even when I provide very wrong timestamp in some particular iteration). But I am going to play with that little more after improving signal preparation module for MIMO

Thank you again for quick and detailed answer!

ccsh · 3 October 2017 19:49

I did some experiments and can confirm that even for SISO configuration timing precision may be affected for high sampling rates. However, my investigation showed that actually problem is on RX side, not TX. I have also figured out that high master clock rate is responsible for that.

Typically, sampling rate is suggested to be 8 times lower than master clock rate. For such relation the maximum master clock rate for which I observed stable timing was 378 MHz, which results in sampling rate of 47.25 MHz.

For mentioned configuration I saw no errors and was able to hit my bursts with +/- 1 sample on average. I was also using USRP B200 mini to monitor signal transmitted via TX port of LimeSDR. It looked fine both in time and frequency domain. @Zero, could you confirm that these settings works for you as well?

Zero · 3 October 2017 20:33

I’m trying to work with the Soapy API - if possible I’d like my code to be hardware agnostic.

And I don’t recall seeing anything in the API allowing me to make these clock tweaks. I’m off to take a look at the code to see if it’s possible to perform your experiment.

Thanks!

ccsh · 3 October 2017 20:37

Actually I also use SoapySDR API right now, and all you need to do is to call

device->setMasterClockRate(f_clk_in_Hz);

before setting sampling rate for TX and RX channels

Zero · 3 October 2017 22:30

Got it - hadn’t messed with the clocking API previously.

I’m making the mods to my code and will report back. Do you suggest I start somewhere around 300 MHz, and try to find the high end as you did?

ccsh · 4 October 2017 06:03

That sounds like a good idea, just make sure to keep f_clk/fs ratio to 8 all the time. I am also going to check this clock rate with less powerful host machine today (was testing on i7-7700k with quite recent motherboard)

//edit:
Just tested configuration mentioned above (f_clk=378 MHz, D=8) with i5-based laptop and it also worked correctly for fs=47.25 MHz. For MIMO I had to use twice lower master clock rate (189 MHz), which gave sampling rate of 23.625 MHz for all TX and RX channels.

Zero · 4 October 2017 11:24

ccsh -

Can you tell me what version of LimeSuite you’re using? In the course of trying to figure out what was going on, one of the things I tried was grabbing the latest from GitHub, moving from (I believe) 17.07 to 17.09.

When I started using the new software I started having brand new problems, which (since I’m running in loopback) I haven’t isolated to the TX or RX yet.

I try to send a 1024 sample burst, and I get it in the receiver - but I also get an ‘echo’, or repeated section of the original burst (ref: below:)

The bottom left window shows my 1024 sample burst (samples 35408 to 36430), but then there’s that second ‘burp’ of data. It’s clearly related to the burst, but I sure didn’t ask for it to be sent, nor is it 1024 samples. By eyeball it appears to be the tail end of the first burst.

Changelog for 17.09 indicates some mods to write burst termination, which really looks like the issue I’m seeing; I’m guessing there’s a problem with the fix.

Just so we can compare apples to apples I’d like to switch to whichever version you’re running.

Thanks!

ccsh · 4 October 2017 13:22

That’s actually a good question. I have downloaded LimeSuite @2017-09-06 and it was then marked as v17.09.1. Current version number seems to be the same, but some modifications were made in source code since then, so I think the easiest way to make sure that we are using the same version will be simply posting link to my *zip file
https://expirebox.com/download/4231c6efc71ba07a92c9e651c4613449.html

Link will be automatically deleted in 2 days.

I also got several questions meant to compare our use cases:

What master clock rate and sampling rate was used when generating above images?
Did you insert some “silence” periods at the beginning and at the end of TX buffer meant to consume transients and constant TX-RX offset (offset of several microseconds is actually to be expected)?
Are you using the same RX buffer length as on TX side?
How often are your bursts repeated?
Did you make sure that no errors are being reported through SoapySDR for both RX and TX directions (especially the latter with readStreamStatus)?
Are your TX and RX timestamps the same (or to be more precise: the same, but TX burst are scheduled for example two burst periods later than corresponding RX bursts)?
Are you using SISO or MIMO configuration?

Zero · 4 October 2017 14:02

As it happens, I have my code set up to grab the SDR configuration from a command file; here are my settings:
master_clock_rate 307.2e6
sample_rate 15.36e6
rx_bufsize 131072
rx_timeout 10000000000
rx_frequency 921.6e6
rx_port LNAL
rx_bandwidth 130e6
rx_gain 10
tx_bufsize 1024
tx_time 2000000
tx_timeout 10000000000
tx_frequency 921.6e6
tx_port BAND1
tx_bandwidth 130e6
tx_gain 10
num_bursts 1
output_file /home/kdoherty/Desktop/output.csv
input_file /home/kdoherty/Desktop/input.csv

I do not have the TX and RX buffer sizes the same, nor do I have any silence before or after the TX burst (I expect all 1024 samples to go, and nothing else). RX starts at time zero, and I send exactly one TX burst (presumably at tx_time), then close the stream.

I believe I’m in SISO mode - at least, I don’t set up anything but TX and RX channel 0.

I do not call readStreamStatus, but will add calls to my code.

Thanks for your advice -

IgnasJ · 5 October 2017 19:17

Hello,
I have looked into end of burst functionality and made some fixes. I haven’t tested it with SoapySDR but I think that ‘echo’ should be gone now.

ccsh · 5 October 2017 21:54

I have just confirmed that for SISO configuration maximum master clock rate which for which RX bursts timing precision is correct is 378 MHz, which gives sampling rate of 47.25 MHz for both channels (1xTX and 1xRX).

For MIMO basically one should use twice lower master clock rate, which gives sampling rate of 23.625 MHz for all four channels (2xTX and 2xRX). This configuration looks correct for RX channel 1, but ca. 20% of RX bursts received via RX channel 0 has still incorrect timing. This actually seems to be totally another bug (not depending on sample rate), which I have already reported on LimeSuite github tracker and on this forum. I find this issue very serious and hope that somebody from Lime Team will be assigned to this soon.

Zero · 5 October 2017 22:21

Yes! It’s gone! Awesomeness!

Thanks for the fix - I had been thinking I would need to find the problem myself.

ccsh · 6 October 2017 10:15

Wow, I have just tested it with the newest version of LimeSuite and it looks like this strange RX ch0 issue is also gone now. So I am able to get perfect burst timing in MIMO configuration with sampling rate of 23.625 MHz for all four channels. Thanks a lot @IgnasJ!