[RFC,3/3] net/pcap: support hardware Tx timestamps

Message ID 20200625190119.265739-4-vivien.didelot@gmail.com (mailing list archive)
State Changes Requested, archived
Delegated to: Ferruh Yigit
Headers
Series mlx5 to PCAP capture with hardware timestamps |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Vivien Didelot June 25, 2020, 7:01 p.m. UTC
  When hardware timestamping is enabled on Rx path, system time should
no longer be used to calculate the timestamps when dumping packets.

Instead, use the value stored by the driver in mbuf->timestamp
and assume it is already converted to nanoseconds (otherwise the
application may edit the packet headers itself afterwards).

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: Patrick Keroulas <patrick.keroulas@radio-canada.ca>
---
 doc/guides/rel_notes/release_20_08.rst |  1 +
 drivers/net/pcap/rte_eth_pcap.c        | 30 +++++++++++++++-----------
 2 files changed, 18 insertions(+), 13 deletions(-)
  

Comments

Stephen Hemminger June 30, 2020, 2:59 p.m. UTC | #1
On Thu, 25 Jun 2020 15:01:19 -0400
Vivien Didelot <vivien.didelot@gmail.com> wrote:

> +		struct timeval cur_time = {
> +			.tv_sec = cycles / hz,
> +			.tv_usec = (cycles % hz) * NSEC_PER_SEC / hz,

This is hot path, use rte_reciprocal_divide rather than slow divide instruction.
  
Patrick Keroulas July 6, 2020, 6:36 p.m. UTC | #2
On Fri, Jun 26, 2020 at 2:48 AM Olivier Matz <olivier.matz@6wind.com> wrote:
> I don't get why you expect that timestamp to be in nanoseconds.
> The conversion is done in librte_pdump (in the previous patch),
> but it won't work if the library is not used, right?
>

You're right Olivier. I took advantage of the definition of mbuf->timestamp
as being "not normalized". The proposed conversion needs NIC clock info
which can't be accessed from the secondary process. Do you have a better
suggestion? Should I set a user flag in mbuf for nanoseconds?

> Out of curiosity, can you explain your motivation for using the hardware
> timestamp? Is it faster? More accurate? (knowing it timestamps the Rx
> operation, not the Tx)

Accuracy is our requirement in the broadcast industry (and probably
in finance as well) where core systems are very time sensitive. Our
application uses the NIC mostly as a receiver for UDP stream monitoring
in order to measure the network propagation delay and jitter. Using SW Rx
timestamps completely breaks this requirement.
  
Olivier Matz July 7, 2020, 2:47 p.m. UTC | #3
Hi Patrick,

On Mon, Jul 06, 2020 at 02:36:30PM -0400, PATRICK KEROULAS wrote:
> On Fri, Jun 26, 2020 at 2:48 AM Olivier Matz <olivier.matz@6wind.com> wrote:
> > I don't get why you expect that timestamp to be in nanoseconds.
> > The conversion is done in librte_pdump (in the previous patch),
> > but it won't work if the library is not used, right?
> >
> 
> You're right Olivier. I took advantage of the definition of mbuf->timestamp
> as being "not normalized". The proposed conversion needs NIC clock info
> which can't be accessed from the secondary process. Do you have a better
> suggestion? Should I set a user flag in mbuf for nanoseconds?
>
> > Out of curiosity, can you explain your motivation for using the hardware
> > timestamp? Is it faster? More accurate? (knowing it timestamps the Rx
> > operation, not the Tx)
> 
> Accuracy is our requirement in the broadcast industry (and probably
> in finance as well) where core systems are very time sensitive. Our
> application uses the NIC mostly as a receiver for UDP stream monitoring
> in order to measure the network propagation delay and jitter. Using SW Rx
> timestamps completely breaks this requirement.

OK, your main motivation for hardware timestamping is the accuracy, and
your application logs the Rx timestamp into the pcap when using the PMD.

For the pmd pcap part, the dynamic Tx timestamp flag that is being
introduced by Slava [1] may fit your needs: ie. when the flag is set, it
uses the provided timestamp which should be in nanosecs.

[1] https://patchwork.dpdk.org/patch/73427/

For the conversion from hardware Rx timestamp into Tx timestamp (in
nanosec), could it be done by your application? Early in the Rx path, if
a packet has a Rx timestamp flag, do the conversion to nsecs, and set
the timestamp field and the Tx timestamp flag.

Regards,
Olivier
  
Patrick Keroulas July 10, 2020, 7:23 p.m. UTC | #4
Hello Olivier and Slava,

On Tue, Jul 7, 2020 at 10:47 AM Olivier Matz <olivier.matz@6wind.com> wrote:
> For the pmd pcap part, the dynamic Tx timestamp flag that is being
> introduced by Slava [1] may fit your needs: ie. when the flag is set, it
> uses the provided timestamp which should be in nanosecs.
>
> [1] https://patchwork.dpdk.org/patch/73427/
>
> For the conversion from hardware Rx timestamp into Tx timestamp (in
> nanosec), could it be done by your application? Early in the Rx path, if
> a packet has a Rx timestamp flag, do the conversion to nsecs, and set
> the timestamp field and the Tx timestamp flag.

Just to be clear, we use testpmd and dpdk-pdump rather than developing
our own app. The application I mentioned earlier is a pcap analyzer and
has nothing to do with dpdk.

Could you help me identify a location where to set this dyn Tx field and flag?
My current Rx data path includes drivers/net/mlx5/mlx5_rxtx_vec_sse.h
Is it a good candidate?

For the unit conversion, I finally followed the recommendations of using
start_time & clock_freq rather than libibverbs. The time accuracy looks
similarly good, based on pcap analysis.
The conversion is performed in rte_pdump, is that acceptable?

Thanks,

PK
  

Patch

diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index a67015519..cd1ca987f 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -61,6 +61,7 @@  New Features
   Updated PCAP driver with new features and improvements, including:
 
   * Support software Tx nanosecond timestamps precision.
+  * Support hardware Tx timestamps.
 
 * **Updated Mellanox mlx5 driver.**
 
diff --git a/drivers/net/pcap/rte_eth_pcap.c b/drivers/net/pcap/rte_eth_pcap.c
index 13a3d0ac7..3d80b699b 100644
--- a/drivers/net/pcap/rte_eth_pcap.c
+++ b/drivers/net/pcap/rte_eth_pcap.c
@@ -290,19 +290,23 @@  eth_null_rx(void *queue __rte_unused,
 #define NSEC_PER_SEC	1000000000L
 
 static inline void
-calculate_timestamp(struct timeval *ts) {
-	uint64_t cycles;
-	struct timeval cur_time;
+calculate_timestamp(const struct rte_mbuf *mbuf, struct timeval *ts) {
+	if (mbuf->ol_flags & PKT_RX_TIMESTAMP) {
+		ts->tv_sec = mbuf->timestamp / NSEC_PER_SEC;
+		ts->tv_usec = mbuf->timestamp % NSEC_PER_SEC;
+	} else {
+		uint64_t cycles = rte_get_timer_cycles() - start_cycles;
+		struct timeval cur_time = {
+			.tv_sec = cycles / hz,
+			.tv_usec = (cycles % hz) * NSEC_PER_SEC / hz,
+		};
 
-	cycles = rte_get_timer_cycles() - start_cycles;
-	cur_time.tv_sec = cycles / hz;
-	cur_time.tv_usec = (cycles % hz) * NSEC_PER_SEC / hz;
-
-	ts->tv_sec = start_time.tv_sec + cur_time.tv_sec;
-	ts->tv_usec = start_time.tv_usec + cur_time.tv_usec;
-	if (ts->tv_usec >= NSEC_PER_SEC) {
-		ts->tv_usec -= NSEC_PER_SEC;
-		ts->tv_sec += 1;
+		ts->tv_sec = start_time.tv_sec + cur_time.tv_sec;
+		ts->tv_usec = start_time.tv_usec + cur_time.tv_usec;
+		if (ts->tv_usec >= NSEC_PER_SEC) {
+			ts->tv_usec -= NSEC_PER_SEC;
+			ts->tv_sec += 1;
+		}
 	}
 }
 
@@ -339,7 +343,7 @@  eth_pcap_tx_dumper(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 			caplen = sizeof(temp_data);
 		}
 
-		calculate_timestamp(&header.ts);
+		calculate_timestamp(mbuf, &header.ts);
 		header.len = len;
 		header.caplen = caplen;
 		/* rte_pktmbuf_read() returns a pointer to the data directly