[v2,1/2] ethdev: introduce conntrack flow action and item

Message ID 1618504877-95597-2-git-send-email-bingz@nvidia.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series ethdev: introduce conntrack flow action and item |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Bing Zhao April 15, 2021, 4:41 p.m. UTC
  This commit introduced the conntrack action and item.

Usually the HW offloading is stateless. For some stateful offloading
like a TCP connection, HW module will help provide the ability of a
full offloading w/o SW participation after the connection was
established.

The basic usage is that in the first flow the application should add
the conntrack action and in the following flow(s) the application
should use the conntrack item to match on the result.

A TCP connection has two directions traffic. To set a conntrack
action context correctly, information from packets of both directions
are required.

The conntrack action should be created on one port and supply the
peer port as a parameter to the action. After context creating, it
could only be used between the ports (dual-port mode) or a single
port. The application should modify the action via the API
"action_handle_update" only when before using it to create a flow
with opposite direction. This will help the driver to recognize the
direction of the flow to be created, especially in single port mode.
The traffic from both directions will go through the same port if
the application works as an "forwarding engine" but not a end point.
There is no need to call the update interface if the subsequent flows
have nothing to be changed.

Query will be supported via action_ctx_query interface, about the
current packets information and connection status. Tha fields
query capabilities depends on the HW.

For the packets received during the conntrack setup, it is suggested
to re-inject the packets in order to take full advantage of the
conntrack. Only the valid packets should pass the conntrack, packets
with invalid TCP information, like out of window, or with invalid
header, like malformed, should not pass.

Naming and definition:
https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/netfilter/nf_conntrack_tcp.h
https://elixir.bootlin.com/linux/latest/source/net/netfilter/nf_conntrack_proto_tcp.c

Other reference:
https://www.usenix.org/legacy/events/sec01/invitedtalks/rooij.pdf

Signed-off-by: Bing Zhao <bingz@nvidia.com>
---
 lib/librte_ethdev/rte_flow.c |   2 +
 lib/librte_ethdev/rte_flow.h | 195 +++++++++++++++++++++++++++++++++++
 2 files changed, 197 insertions(+)
  

Comments

Thomas Monjalon April 16, 2021, 10:49 a.m. UTC | #1
15/04/2021 18:41, Bing Zhao:
> This commit introduced the conntrack action and item.
> 
> Usually the HW offloading is stateless. For some stateful offloading
> like a TCP connection, HW module will help provide the ability of a
> full offloading w/o SW participation after the connection was
> established.
> 
> The basic usage is that in the first flow the application should add
> the conntrack action and in the following flow(s) the application
> should use the conntrack item to match on the result.

You probably mean "flow rule", not "traffic flow".
Please make it clear to avoid confusion.

> A TCP connection has two directions traffic. To set a conntrack
> action context correctly, information from packets of both directions
> are required.
> 
> The conntrack action should be created on one port and supply the
> peer port as a parameter to the action. After context creating, it
> could only be used between the ports (dual-port mode) or a single
> port. The application should modify the action via the API
> "action_handle_update" only when before using it to create a flow
> with opposite direction. This will help the driver to recognize the
> direction of the flow to be created, especially in single port mode.
> The traffic from both directions will go through the same port if
> the application works as an "forwarding engine" but not a end point.
> There is no need to call the update interface if the subsequent flows
> have nothing to be changed.

I am not sure this is a feature description for the commit log
or an usage explanation for the doc.
In any case, please distinguish "ethdev port" and "TCP port"
to avoid confusion.

> Query will be supported via action_ctx_query interface, about the
> current packets information and connection status. Tha fields
> query capabilities depends on the HW.
> 
> For the packets received during the conntrack setup, it is suggested
> to re-inject the packets in order to take full advantage of the

What do you mean by "full advantage"?
It is counter-intuitive to re-inject for offloading.
Does it improve the performance?

> conntrack. Only the valid packets should pass the conntrack, packets
> with invalid TCP information, like out of window, or with invalid
> header, like malformed, should not pass.
> 
> Naming and definition:

You mean naming is inspired from Linux?

> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/netfilter/nf_conntrack_tcp.h
> https://elixir.bootlin.com/linux/latest/source/net/netfilter/nf_conntrack_proto_tcp.c
> 
> Other reference:
> https://www.usenix.org/legacy/events/sec01/invitedtalks/rooij.pdf
> 
> Signed-off-by: Bing Zhao <bingz@nvidia.com>
[...]
> +	/**
> +	 * [META]
> +	 *
> +	 * Matches conntrack state.
> +	 *
> +	 * See struct rte_flow_item_conntrack.

Please use @see for hyperlink in doxygen.

> +	 */
> +	RTE_FLOW_ITEM_TYPE_CONNTRACK,
>  };
[...]
> +/**
> + * The packet is with valid state after conntrack checking.

"is with valid state" looks strange.
I propose "The packet is valid after conntrack checking."

> + */
> +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_VALID (1 << 0)

Please use RTE_BIT32().

> +/**
> + * The state of the connection was changed.
> + */
> +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_CHANGED (1 << 1)
> +/**
> + * Error is detected on this packet for this connection and
> + * an invalid state is set.
> + */
> +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_INVAL (1 << 2)

"INVAL" is strange. Can we add the missing 2 characters?
RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_INVALID

On a related note, do we really need the word FLAG?
And it is conflicting with the prefix in
enum rte_flow_conntrack_tcp_last_index
I think RTE_FLOW_CONNTRACK_PKT_STATE_ is a good prefix, long enough.

> +/**
> + * The HW connection tracking module is disabled.
> + * It can be due to application command or an invalid state.
> + */
> +#define RTE_FLOW_CONNTRACK_FLAG_HW_DISABLED (1 << 3)

This one does not have PKT in its name.
And it is limiting to HW, while the driver could implement conntrack in SW.
I propose RTE_FLOW_CONNTRACK_PKT_DISABLED

> +/**
> + * The packet contains some bad field(s) and cannot continue
> + * with the conntrack module checking.
> + */
> +#define RTE_FLOW_CONNTRACK_FLAG_PKT_BAD (1 << 4)
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * RTE_FLOW_ITEM_TYPE_CONNTRACK
> + *
> + * Matches the state of a packet after it passed the connection tracking
> + * examination. The state is a bit mask of one RTE_FLOW_CONNTRACK_FLAG*

s/bit mask/bitmap/ ?

RTE_FLOW_CONNTRACK_PKT_STATE_*
otherwise it is messed with rte_flow_conntrack_tcp_last_index

> + * or a reasonable combination of these bits.
> + */
> +struct rte_flow_item_conntrack {
> +	uint32_t flags;
> +};
[...]
> +
> +	/**
> +	 * [META]
> +	 *
> +	 * Enable tracking a TCP connection state.
> +	 *
> +	 * Send packet to HW connection tracking module for examination.

Not necessarily HW.
No packet is sent.
I think you can remove this sentence completely.

> +	 *
> +	 * See struct rte_flow_action_conntrack.

@see

> +	 */
> +	RTE_FLOW_ACTION_TYPE_CONNTRACK,
>  };
>  
>  /**
> @@ -2875,6 +2940,136 @@ struct rte_flow_action_set_dscp {
>   */
>  struct rte_flow_action_handle;
>  
> +/**
> + * The state of a TCP connection.
> + */
> +enum rte_flow_conntrack_state {
> +	/**< SYN-ACK packet was seen. */
> +	RTE_FLOW_CONNTRACK_STATE_SYN_RECV,
> +	/**< 3-way handshark was done. */

s/handshark/handshake/

> +	RTE_FLOW_CONNTRACK_STATE_ESTABLISHED,
> +	/**< First FIN packet was received to close the connection. */
> +	RTE_FLOW_CONNTRACK_STATE_FIN_WAIT,
> +	/**< First FIN was ACKed. */
> +	RTE_FLOW_CONNTRACK_STATE_CLOSE_WAIT,
> +	/**< Second FIN was received, waiting for the last ACK. */
> +	RTE_FLOW_CONNTRACK_STATE_LAST_ACK,
> +	/**< Second FIN was ACKed, connection was closed. */
> +	RTE_FLOW_CONNTRACK_STATE_TIME_WAIT,
> +};
> +
> +/**
> + * The last passed TCP packet flags of a connection.
> + */
> +enum rte_flow_conntrack_tcp_last_index {
> +	RTE_FLOW_CONNTRACK_FLAG_NONE = 0, /**< No Flag. */
> +	RTE_FLOW_CONNTRACK_FLAG_SYN = (1 << 0), /**< With SYN flag. */
> +	RTE_FLOW_CONNTRACK_FLAG_SYNACK = (1 << 1), /**< With SYN+ACK flag. */
> +	RTE_FLOW_CONNTRACK_FLAG_FIN = (1 << 2), /**< With FIN flag. */
> +	RTE_FLOW_CONNTRACK_FLAG_ACK = (1 << 3), /**< With ACK flag. */
> +	RTE_FLOW_CONNTRACK_FLAG_RST = (1 << 4), /**< With RST flag. */
> +};

Please use RTE_BIT32().

> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * Configuration parameters for each direction of a TCP connection.
> + */
> +struct rte_flow_tcp_dir_param {
> +	uint32_t scale:4; /**< TCP window scaling factor, 0xF to disable. */
> +	uint32_t close_initiated:1; /**< The FIN was sent by this direction. */
> +	/**< An ACK packet has been received by this side. */

Move all comments on their own line before the struct member.
Comment should then start with /**

> +	uint32_t last_ack_seen:1;
> +	/**< If set, indicates that there is unacked data of the connection. */

not sure what means "unacked data of the connection"

> +	uint32_t data_unacked:1;
> +	/**< Maximal value of sequence + payload length over sent
> +	 * packets (next ACK from the opposite direction).
> +	 */
> +	uint32_t sent_end;
> +	/**< Maximal value of (ACK + window size) over received packet + length
> +	 * over sent packet (maximal sequence could be sent).
> +	 */
> +	uint32_t reply_end;
> +	/**< Maximal value of actual window size over sent packets. */
> +	uint32_t max_win;
> +	/**< Maximal value of ACK over sent packets. */
> +	uint32_t max_ack;

Not sure about the word "over" in above definitions.

> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * RTE_FLOW_ACTION_TYPE_CONNTRACK
> + *
> + * Configuration and initial state for the connection tracking module.
> + * This structure could be used for both setting and query.
> + */
> +struct rte_flow_action_conntrack {
> +	uint16_t peer_port; /**< The peer port number, can be the same port. */
> +	/**< Direction of this connection when creating a flow, the value only
> +	 * affects the subsequent flows creation.
> +	 */

As for rte_flow_tcp_dir_param, better to move comments before,
on their own line.

> +	uint32_t is_original_dir:1;
> +	/**< Enable / disable the conntrack HW module. When disabled, the
> +	 * result will always be RTE_FLOW_CONNTRACK_FLAG_DISABLED.
> +	 * In this state the HW will act as passthrough.
> +	 * It only affects this conntrack object in the HW without any effect
> +	 * to the other objects.
> +	 */
> +	uint32_t enable:1;
> +	/**< At least one ack was seen, after the connection was established. */
> +	uint32_t live_connection:1;
> +	/**< Enable selective ACK on this connection. */
> +	uint32_t selective_ack:1;
> +	/**< A challenge ack has passed. */
> +	uint32_t challenge_ack_passed:1;
> +	/**< 1: The last packet is seen that comes from the original direction.
> +	 * 0: From the reply direction.
> +	 */
> +	uint32_t last_direction:1;
> +	/**< No TCP check will be done except the state change. */
> +	uint32_t liberal_mode:1;
> +	/**< The current state of the connection. */
> +	enum rte_flow_conntrack_state state;
> +	/**< Scaling factor for maximal allowed ACK window. */
> +	uint8_t max_ack_window;
> +	/**< Maximal allowed number of retransmission times. */
> +	uint8_t retransmission_limit;
> +	/**< TCP parameters of the original direction. */
> +	struct rte_flow_tcp_dir_param original_dir;
> +	/**< TCP parameters of the reply direction. */
> +	struct rte_flow_tcp_dir_param reply_dir;
> +	/**< The window value of the last packet passed this conntrack. */
> +	uint16_t last_window;
> +	enum rte_flow_conntrack_tcp_last_index last_index;
> +	/**< The sequence of the last packet passed this conntrack. */
> +	uint32_t last_seq;
> +	/**< The acknowledgement of the last packet passed this conntrack. */
> +	uint32_t last_ack;
> +	/**< The total value ACK + payload length of the last packet passed
> +	 * this conntrack.
> +	 */
> +	uint32_t last_end;
> +};
> +
> +/**
> + * RTE_FLOW_ACTION_TYPE_CONNTRACK
> + *
> + * Wrapper structure for the context update interface.
> + * Ports cannot support updating, and the only valid solution is to
> + * destroy the old context and create a new one instead.
> + */
> +struct rte_flow_modify_conntrack {
> +	/**< New connection tracking parameters to be updated. */
> +	struct rte_flow_action_conntrack new_ct;
> +	uint32_t direction:1; /**< The direction field will be updated. */
> +	/**< All the other fields except direction will be updated. */
> +	uint32_t state:1;
> +	uint32_t reserved:30; /**< Reserved bits for the future usage. */
> +};
  
Ori Kam April 16, 2021, 12:41 p.m. UTC | #2
Hi Bing,

One more thought, PSB

Best,
Ori
> -----Original Message-----
> From: Bing Zhao <bingz@nvidia.com>
> Sent: Thursday, April 15, 2021 7:41 PM
> To: Ori Kam <orika@nvidia.com>; NBU-Contact-Thomas Monjalon
> <thomas@monjalon.net>; ferruh.yigit@intel.com;
> andrew.rybchenko@oktetlabs.ru
> Cc: dev@dpdk.org; ajit.khaparde@broadcom.com
> Subject: [PATCH v2 1/2] ethdev: introduce conntrack flow action and item
> 
> This commit introduced the conntrack action and item.
> 
> Usually the HW offloading is stateless. For some stateful offloading
> like a TCP connection, HW module will help provide the ability of a
> full offloading w/o SW participation after the connection was
> established.
> 
> The basic usage is that in the first flow the application should add
> the conntrack action and in the following flow(s) the application
> should use the conntrack item to match on the result.
> 
> A TCP connection has two directions traffic. To set a conntrack
> action context correctly, information from packets of both directions
> are required.
> 
> The conntrack action should be created on one port and supply the
> peer port as a parameter to the action. After context creating, it
> could only be used between the ports (dual-port mode) or a single
> port. The application should modify the action via the API
> "action_handle_update" only when before using it to create a flow
> with opposite direction. This will help the driver to recognize the
> direction of the flow to be created, especially in single port mode.
> The traffic from both directions will go through the same port if
> the application works as an "forwarding engine" but not a end point.
> There is no need to call the update interface if the subsequent flows
> have nothing to be changed.
> 
> Query will be supported via action_ctx_query interface, about the
> current packets information and connection status. Tha fields
> query capabilities depends on the HW.
> 
> For the packets received during the conntrack setup, it is suggested
> to re-inject the packets in order to take full advantage of the
> conntrack. Only the valid packets should pass the conntrack, packets
> with invalid TCP information, like out of window, or with invalid
> header, like malformed, should not pass.
> 
> Naming and definition:
> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/netfilter/nf_co
> nntrack_tcp.h
> https://elixir.bootlin.com/linux/latest/source/net/netfilter/nf_conntrack_proto_
> tcp.c
> 
> Other reference:
> https://www.usenix.org/legacy/events/sec01/invitedtalks/rooij.pdf
> 
> Signed-off-by: Bing Zhao <bingz@nvidia.com>
> ---
>  lib/librte_ethdev/rte_flow.c |   2 +
>  lib/librte_ethdev/rte_flow.h | 195 +++++++++++++++++++++++++++++++++++
>  2 files changed, 197 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
> index 27a161559d..0af601d508 100644
> --- a/lib/librte_ethdev/rte_flow.c
> +++ b/lib/librte_ethdev/rte_flow.c
> @@ -98,6 +98,7 @@ static const struct rte_flow_desc_data
> rte_flow_desc_item[] = {
>  	MK_FLOW_ITEM(PFCP, sizeof(struct rte_flow_item_pfcp)),
>  	MK_FLOW_ITEM(ECPRI, sizeof(struct rte_flow_item_ecpri)),
>  	MK_FLOW_ITEM(GENEVE_OPT, sizeof(struct
> rte_flow_item_geneve_opt)),
> +	MK_FLOW_ITEM(CONNTRACK, sizeof(uint32_t)),
>  };
> 
>  /** Generate flow_action[] entry. */
> @@ -186,6 +187,7 @@ static const struct rte_flow_desc_data
> rte_flow_desc_action[] = {
>  	 * indirect action handle.
>  	 */
>  	MK_FLOW_ACTION(INDIRECT, 0),
> +	MK_FLOW_ACTION(CONNTRACK, sizeof(struct
> rte_flow_action_conntrack)),
>  };
> 
>  int
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index 91ae25b1da..024d1a2026 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -551,6 +551,15 @@ enum rte_flow_item_type {
>  	 * See struct rte_flow_item_geneve_opt
>  	 */
>  	RTE_FLOW_ITEM_TYPE_GENEVE_OPT,
> +
> +	/**
> +	 * [META]
> +	 *
> +	 * Matches conntrack state.
> +	 *
> +	 * See struct rte_flow_item_conntrack.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_CONNTRACK,
>  };
> 
>  /**
> @@ -1685,6 +1694,51 @@ rte_flow_item_geneve_opt_mask = {
>  };
>  #endif
> 
> +/**
> + * The packet is with valid state after conntrack checking.
> + */
> +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_VALID (1 << 0)
> +/**
> + * The state of the connection was changed.
> + */
> +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_CHANGED (1 << 1)
> +/**
> + * Error is detected on this packet for this connection and
> + * an invalid state is set.
> + */
> +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_INVAL (1 << 2)
> +/**
> + * The HW connection tracking module is disabled.
> + * It can be due to application command or an invalid state.
> + */
> +#define RTE_FLOW_CONNTRACK_FLAG_HW_DISABLED (1 << 3)
> +/**
> + * The packet contains some bad field(s) and cannot continue
> + * with the conntrack module checking.
> + */
> +#define RTE_FLOW_CONNTRACK_FLAG_PKT_BAD (1 << 4)
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * RTE_FLOW_ITEM_TYPE_CONNTRACK
> + *
> + * Matches the state of a packet after it passed the connection tracking
> + * examination. The state is a bit mask of one RTE_FLOW_CONNTRACK_FLAG*
> + * or a reasonable combination of these bits.
> + */
> +struct rte_flow_item_conntrack {
> +	uint32_t flags;
> +};
> +
> +/** Default mask for RTE_FLOW_ITEM_TYPE_CONNTRACK. */
> +#ifndef __cplusplus
> +static const struct rte_flow_item_conntrack rte_flow_item_conntrack_mask =
> {
> +	.flags = 0xffffffff,
> +};
> +#endif
> +
>  /**
>   * Matching pattern item definition.
>   *
> @@ -2277,6 +2331,17 @@ enum rte_flow_action_type {
>  	 * same port or across different ports.
>  	 */
>  	RTE_FLOW_ACTION_TYPE_INDIRECT,
> +
> +	/**
> +	 * [META]
> +	 *
> +	 * Enable tracking a TCP connection state.
> +	 *
> +	 * Send packet to HW connection tracking module for examination.
> +	 *
> +	 * See struct rte_flow_action_conntrack.
> +	 */
> +	RTE_FLOW_ACTION_TYPE_CONNTRACK,
>  };
> 
>  /**
> @@ -2875,6 +2940,136 @@ struct rte_flow_action_set_dscp {
>   */
>  struct rte_flow_action_handle;
> 
> +/**
> + * The state of a TCP connection.
> + */
> +enum rte_flow_conntrack_state {
> +	/**< SYN-ACK packet was seen. */
> +	RTE_FLOW_CONNTRACK_STATE_SYN_RECV,
> +	/**< 3-way handshark was done. */
> +	RTE_FLOW_CONNTRACK_STATE_ESTABLISHED,
> +	/**< First FIN packet was received to close the connection. */
> +	RTE_FLOW_CONNTRACK_STATE_FIN_WAIT,
> +	/**< First FIN was ACKed. */
> +	RTE_FLOW_CONNTRACK_STATE_CLOSE_WAIT,
> +	/**< Second FIN was received, waiting for the last ACK. */
> +	RTE_FLOW_CONNTRACK_STATE_LAST_ACK,
> +	/**< Second FIN was ACKed, connection was closed. */
> +	RTE_FLOW_CONNTRACK_STATE_TIME_WAIT,
> +};
> +
> +/**
> + * The last passed TCP packet flags of a connection.
> + */
> +enum rte_flow_conntrack_tcp_last_index {
> +	RTE_FLOW_CONNTRACK_FLAG_NONE = 0, /**< No Flag. */
> +	RTE_FLOW_CONNTRACK_FLAG_SYN = (1 << 0), /**< With SYN flag. */
> +	RTE_FLOW_CONNTRACK_FLAG_SYNACK = (1 << 1), /**< With SYN+ACK
> flag. */
> +	RTE_FLOW_CONNTRACK_FLAG_FIN = (1 << 2), /**< With FIN flag. */
> +	RTE_FLOW_CONNTRACK_FLAG_ACK = (1 << 3), /**< With ACK flag. */
> +	RTE_FLOW_CONNTRACK_FLAG_RST = (1 << 4), /**< With RST flag. */
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * Configuration parameters for each direction of a TCP connection.
> + */
> +struct rte_flow_tcp_dir_param {
> +	uint32_t scale:4; /**< TCP window scaling factor, 0xF to disable. */
> +	uint32_t close_initiated:1; /**< The FIN was sent by this direction. */
> +	/**< An ACK packet has been received by this side. */
> +	uint32_t last_ack_seen:1;
> +	/**< If set, indicates that there is unacked data of the connection. */
> +	uint32_t data_unacked:1;
> +	/**< Maximal value of sequence + payload length over sent
> +	 * packets (next ACK from the opposite direction).
> +	 */
> +	uint32_t sent_end;
> +	/**< Maximal value of (ACK + window size) over received packet +
> length
> +	 * over sent packet (maximal sequence could be sent).
> +	 */
> +	uint32_t reply_end;

This comment is for all members that are part of the packet,
Do you think it should be in network order?
I can see the advantage in both ways nice I assume the app needs this data 
in host byte-order  but since in most other cases we use network byte-order to 
set values that are coming from the packet itself maybe it is better to use network
byte-order (will also save the conversion)

> +	/**< Maximal value of actual window size over sent packets. */
> +	uint32_t max_win;
> +	/**< Maximal value of ACK over sent packets. */
> +	uint32_t max_ack;
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * RTE_FLOW_ACTION_TYPE_CONNTRACK
> + *
> + * Configuration and initial state for the connection tracking module.
> + * This structure could be used for both setting and query.
> + */
> +struct rte_flow_action_conntrack {
> +	uint16_t peer_port; /**< The peer port number, can be the same port.
> */
> +	/**< Direction of this connection when creating a flow, the value only
> +	 * affects the subsequent flows creation.
> +	 */
> +	uint32_t is_original_dir:1;
> +	/**< Enable / disable the conntrack HW module. When disabled, the
> +	 * result will always be RTE_FLOW_CONNTRACK_FLAG_DISABLED.
> +	 * In this state the HW will act as passthrough.
> +	 * It only affects this conntrack object in the HW without any effect
> +	 * to the other objects.
> +	 */
> +	uint32_t enable:1;
> +	/**< At least one ack was seen, after the connection was established.
> */
> +	uint32_t live_connection:1;
> +	/**< Enable selective ACK on this connection. */
> +	uint32_t selective_ack:1;
> +	/**< A challenge ack has passed. */
> +	uint32_t challenge_ack_passed:1;
> +	/**< 1: The last packet is seen that comes from the original direction.
> +	 * 0: From the reply direction.
> +	 */
> +	uint32_t last_direction:1;
> +	/**< No TCP check will be done except the state change. */
> +	uint32_t liberal_mode:1;
> +	/**< The current state of the connection. */
> +	enum rte_flow_conntrack_state state;
> +	/**< Scaling factor for maximal allowed ACK window. */
> +	uint8_t max_ack_window;
> +	/**< Maximal allowed number of retransmission times. */
> +	uint8_t retransmission_limit;
> +	/**< TCP parameters of the original direction. */
> +	struct rte_flow_tcp_dir_param original_dir;
> +	/**< TCP parameters of the reply direction. */
> +	struct rte_flow_tcp_dir_param reply_dir;
> +	/**< The window value of the last packet passed this conntrack. */
> +	uint16_t last_window;
> +	enum rte_flow_conntrack_tcp_last_index last_index;
> +	/**< The sequence of the last packet passed this conntrack. */
> +	uint32_t last_seq;
> +	/**< The acknowledgement of the last packet passed this conntrack. */
> +	uint32_t last_ack;
> +	/**< The total value ACK + payload length of the last packet passed
> +	 * this conntrack.
> +	 */
> +	uint32_t last_end;
> +};
> +
> +/**
> + * RTE_FLOW_ACTION_TYPE_CONNTRACK
> + *
> + * Wrapper structure for the context update interface.
> + * Ports cannot support updating, and the only valid solution is to
> + * destroy the old context and create a new one instead.
> + */
> +struct rte_flow_modify_conntrack {
> +	/**< New connection tracking parameters to be updated. */
> +	struct rte_flow_action_conntrack new_ct;
> +	uint32_t direction:1; /**< The direction field will be updated. */
> +	/**< All the other fields except direction will be updated. */
> +	uint32_t state:1;
> +	uint32_t reserved:30; /**< Reserved bits for the future usage. */
> +};
> +
>  /**
>   * Field IDs for MODIFY_FIELD action.
>   */
> --
> 2.19.0.windows.1
  
Bing Zhao April 16, 2021, 6:05 p.m. UTC | #3
Hi Ori,
My comments are inline, PSB.

> -----Original Message-----
> From: Ori Kam <orika@nvidia.com>
> Sent: Friday, April 16, 2021 8:42 PM
> To: Bing Zhao <bingz@nvidia.com>; NBU-Contact-Thomas Monjalon
> <thomas@monjalon.net>; ferruh.yigit@intel.com;
> andrew.rybchenko@oktetlabs.ru
> Cc: dev@dpdk.org; ajit.khaparde@broadcom.com
> Subject: RE: [PATCH v2 1/2] ethdev: introduce conntrack flow action
> and item
> 
> Hi Bing,
> 
> One more thought, PSB
> 
> Best,
> Ori
> > -----Original Message-----
> > From: Bing Zhao <bingz@nvidia.com>
> > Sent: Thursday, April 15, 2021 7:41 PM
> > To: Ori Kam <orika@nvidia.com>; NBU-Contact-Thomas Monjalon
> > <thomas@monjalon.net>; ferruh.yigit@intel.com;
> > andrew.rybchenko@oktetlabs.ru
> > Cc: dev@dpdk.org; ajit.khaparde@broadcom.com
> > Subject: [PATCH v2 1/2] ethdev: introduce conntrack flow action
> and
> > item
> >
> > This commit introduced the conntrack action and item.
> >
> > Usually the HW offloading is stateless. For some stateful
> offloading
> > like a TCP connection, HW module will help provide the ability of
> a
> > full offloading w/o SW participation after the connection was
> > established.
> >
> > The basic usage is that in the first flow the application should
> add
> > the conntrack action and in the following flow(s) the application
> > should use the conntrack item to match on the result.
> >
> > A TCP connection has two directions traffic. To set a conntrack
> action
> > context correctly, information from packets of both directions are
> > required.
> >
> > The conntrack action should be created on one port and supply the
> peer
> > port as a parameter to the action. After context creating, it
> could
> > only be used between the ports (dual-port mode) or a single port.
> The
> > application should modify the action via the API
> > "action_handle_update" only when before using it to create a flow
> with
> > opposite direction. This will help the driver to recognize the
> > direction of the flow to be created, especially in single port
> mode.
> > The traffic from both directions will go through the same port if
> the
> > application works as an "forwarding engine" but not a end point.
> > There is no need to call the update interface if the subsequent
> flows
> > have nothing to be changed.
> >
> > Query will be supported via action_ctx_query interface, about the
> > current packets information and connection status. Tha fields
> query
> > capabilities depends on the HW.
> >
> > For the packets received during the conntrack setup, it is
> suggested
> > to re-inject the packets in order to take full advantage of the
> > conntrack. Only the valid packets should pass the conntrack,
> packets
> > with invalid TCP information, like out of window, or with invalid
> > header, like malformed, should not pass.
> >
> > Naming and definition:
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fel
> ix
> >
> ir.bootlin.com%2Flinux%2Flatest%2Fsource%2Finclude%2Fuapi%2Flinux%2F
> ne
> >
> tfilter%2Fnf_co&amp;data=04%7C01%7Cbingz%40nvidia.com%7C29da48bebdc9
> 44
> >
> b0127508d900d4f89a%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C6375
> 41
> >
> 736960852707%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2l
> uM
> >
> zIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2Bbsd48yNzMbhUyp
> In
> > kol%2B7LskVzx1WHj%2Fkd%2Fu0zks0A%3D&amp;reserved=0
> > nntrack_tcp.h
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fel
> ix
> >
> ir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fnet%2Fnetfilter%2Fnf_conn
> tr
> >
> ack_proto_&amp;data=04%7C01%7Cbingz%40nvidia.com%7C29da48bebdc944b01
> 27
> >
> 508d900d4f89a%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637541736
> 96
> >
> 0852707%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi
> LC
> >
> JBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=CwKk%2FgQWxRY22%2BAaCF
> OP
> > 1TbGphcqURrBFSf4NupMPPA%3D&amp;reserved=0
> > tcp.c
> >
> > Other reference:
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww
> w.
> >
> usenix.org%2Flegacy%2Fevents%2Fsec01%2Finvitedtalks%2Frooij.pdf&amp;
> da
> >
> ta=04%7C01%7Cbingz%40nvidia.com%7C29da48bebdc944b0127508d900d4f89a%7
> C4
> >
> 3083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637541736960852707%7CUnkno
> wn
> > %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWw
> iLCJ
> >
> XVCI6Mn0%3D%7C1000&amp;sdata=geKUBifelEBuFzQviu%2FPeV19DOjWzZAbdAlo%
> 2B
> > cVX%2FXs%3D&amp;reserved=0
> >
> > Signed-off-by: Bing Zhao <bingz@nvidia.com>
> > ---
> >  lib/librte_ethdev/rte_flow.c |   2 +
> >  lib/librte_ethdev/rte_flow.h | 195
> > +++++++++++++++++++++++++++++++++++
> >  2 files changed, 197 insertions(+)
> >
> > diff --git a/lib/librte_ethdev/rte_flow.c
> > b/lib/librte_ethdev/rte_flow.c index 27a161559d..0af601d508 100644
> > --- a/lib/librte_ethdev/rte_flow.c
> > +++ b/lib/librte_ethdev/rte_flow.c
> > @@ -98,6 +98,7 @@ static const struct rte_flow_desc_data
> > rte_flow_desc_item[] = {
> >  	MK_FLOW_ITEM(PFCP, sizeof(struct rte_flow_item_pfcp)),
> >  	MK_FLOW_ITEM(ECPRI, sizeof(struct rte_flow_item_ecpri)),
> >  	MK_FLOW_ITEM(GENEVE_OPT, sizeof(struct
> rte_flow_item_geneve_opt)),
> > +	MK_FLOW_ITEM(CONNTRACK, sizeof(uint32_t)),
> >  };
> >
> >  /** Generate flow_action[] entry. */
> > @@ -186,6 +187,7 @@ static const struct rte_flow_desc_data
> > rte_flow_desc_action[] = {
> >  	 * indirect action handle.
> >  	 */
> >  	MK_FLOW_ACTION(INDIRECT, 0),
> > +	MK_FLOW_ACTION(CONNTRACK, sizeof(struct
> > rte_flow_action_conntrack)),
> >  };
> >
> >  int
> > diff --git a/lib/librte_ethdev/rte_flow.h
> > b/lib/librte_ethdev/rte_flow.h index 91ae25b1da..024d1a2026 100644
> > --- a/lib/librte_ethdev/rte_flow.h
> > +++ b/lib/librte_ethdev/rte_flow.h
> > @@ -551,6 +551,15 @@ enum rte_flow_item_type {
> >  	 * See struct rte_flow_item_geneve_opt
> >  	 */
> >  	RTE_FLOW_ITEM_TYPE_GENEVE_OPT,
> > +
> > +	/**
> > +	 * [META]
> > +	 *
> > +	 * Matches conntrack state.
> > +	 *
> > +	 * See struct rte_flow_item_conntrack.
> > +	 */
> > +	RTE_FLOW_ITEM_TYPE_CONNTRACK,
> >  };
> >
> >  /**
> > @@ -1685,6 +1694,51 @@ rte_flow_item_geneve_opt_mask = {  };
> #endif
> >
> > +/**
> > + * The packet is with valid state after conntrack checking.
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_VALID (1 << 0)
> > +/**
> > + * The state of the connection was changed.
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_CHANGED (1 << 1)
> > +/**
> > + * Error is detected on this packet for this connection and
> > + * an invalid state is set.
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_INVAL (1 << 2)
> > +/**
> > + * The HW connection tracking module is disabled.
> > + * It can be due to application command or an invalid state.
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_HW_DISABLED (1 << 3)
> > +/**
> > + * The packet contains some bad field(s) and cannot continue
> > + * with the conntrack module checking.
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_BAD (1 << 4)
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior
> notice
> > + *
> > + * RTE_FLOW_ITEM_TYPE_CONNTRACK
> > + *
> > + * Matches the state of a packet after it passed the connection
> > +tracking
> > + * examination. The state is a bit mask of one
> > +RTE_FLOW_CONNTRACK_FLAG*
> > + * or a reasonable combination of these bits.
> > + */
> > +struct rte_flow_item_conntrack {
> > +	uint32_t flags;
> > +};
> > +
> > +/** Default mask for RTE_FLOW_ITEM_TYPE_CONNTRACK. */ #ifndef
> > +__cplusplus static const struct rte_flow_item_conntrack
> > +rte_flow_item_conntrack_mask =
> > {
> > +	.flags = 0xffffffff,
> > +};
> > +#endif
> > +
> >  /**
> >   * Matching pattern item definition.
> >   *
> > @@ -2277,6 +2331,17 @@ enum rte_flow_action_type {
> >  	 * same port or across different ports.
> >  	 */
> >  	RTE_FLOW_ACTION_TYPE_INDIRECT,
> > +
> > +	/**
> > +	 * [META]
> > +	 *
> > +	 * Enable tracking a TCP connection state.
> > +	 *
> > +	 * Send packet to HW connection tracking module for
> examination.
> > +	 *
> > +	 * See struct rte_flow_action_conntrack.
> > +	 */
> > +	RTE_FLOW_ACTION_TYPE_CONNTRACK,
> >  };
> >
> >  /**
> > @@ -2875,6 +2940,136 @@ struct rte_flow_action_set_dscp {
> >   */
> >  struct rte_flow_action_handle;
> >
> > +/**
> > + * The state of a TCP connection.
> > + */
> > +enum rte_flow_conntrack_state {
> > +	/**< SYN-ACK packet was seen. */
> > +	RTE_FLOW_CONNTRACK_STATE_SYN_RECV,
> > +	/**< 3-way handshark was done. */
> > +	RTE_FLOW_CONNTRACK_STATE_ESTABLISHED,
> > +	/**< First FIN packet was received to close the connection. */
> > +	RTE_FLOW_CONNTRACK_STATE_FIN_WAIT,
> > +	/**< First FIN was ACKed. */
> > +	RTE_FLOW_CONNTRACK_STATE_CLOSE_WAIT,
> > +	/**< Second FIN was received, waiting for the last ACK. */
> > +	RTE_FLOW_CONNTRACK_STATE_LAST_ACK,
> > +	/**< Second FIN was ACKed, connection was closed. */
> > +	RTE_FLOW_CONNTRACK_STATE_TIME_WAIT,
> > +};
> > +
> > +/**
> > + * The last passed TCP packet flags of a connection.
> > + */
> > +enum rte_flow_conntrack_tcp_last_index {
> > +	RTE_FLOW_CONNTRACK_FLAG_NONE = 0, /**< No Flag. */
> > +	RTE_FLOW_CONNTRACK_FLAG_SYN = (1 << 0), /**< With SYN flag. */
> > +	RTE_FLOW_CONNTRACK_FLAG_SYNACK = (1 << 1), /**< With SYN+ACK
> > flag. */
> > +	RTE_FLOW_CONNTRACK_FLAG_FIN = (1 << 2), /**< With FIN flag. */
> > +	RTE_FLOW_CONNTRACK_FLAG_ACK = (1 << 3), /**< With ACK flag. */
> > +	RTE_FLOW_CONNTRACK_FLAG_RST = (1 << 4), /**< With RST flag.
> */ };
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior
> notice
> > + *
> > + * Configuration parameters for each direction of a TCP
> connection.
> > + */
> > +struct rte_flow_tcp_dir_param {
> > +	uint32_t scale:4; /**< TCP window scaling factor, 0xF to
> disable. */
> > +	uint32_t close_initiated:1; /**< The FIN was sent by this
> direction. */
> > +	/**< An ACK packet has been received by this side. */
> > +	uint32_t last_ack_seen:1;
> > +	/**< If set, indicates that there is unacked data of the
> connection. */
> > +	uint32_t data_unacked:1;
> > +	/**< Maximal value of sequence + payload length over sent
> > +	 * packets (next ACK from the opposite direction).
> > +	 */
> > +	uint32_t sent_end;
> > +	/**< Maximal value of (ACK + window size) over received packet
> +
> > length
> > +	 * over sent packet (maximal sequence could be sent).
> > +	 */
> > +	uint32_t reply_end;
> 
> This comment is for all members that are part of the packet, Do you
> think it should be in network order?

Almost none of the fields are part of the packet. Indeed, most of them are calculated from the packets information. So I prefer to keep the host order easy for using and
keep all the fields of the whole structure the same endianness format.
What do you think?

> I can see the advantage in both ways nice I assume the app needs
> this data in host byte-order  but since in most other cases we use
> network byte-order to set values that are coming from the packet
> itself maybe it is better to use network byte-order (will also save
> the conversion)

Only the seq/ack/window in the common part are part of the packets, others are not.

BTW, should we support liberal mode separately for both direction as some "half-duplex". One direction could work normally and the opposite direct will work in the liberal mode?

> 
> > +	/**< Maximal value of actual window size over sent packets. */
> > +	uint32_t max_win;
> > +	/**< Maximal value of ACK over sent packets. */
> > +	uint32_t max_ack;
> > +};
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior
> notice
> > + *
> > + * RTE_FLOW_ACTION_TYPE_CONNTRACK
> > + *
> > + * Configuration and initial state for the connection tracking
> module.
> > + * This structure could be used for both setting and query.
> > + */
> > +struct rte_flow_action_conntrack {
> > +	uint16_t peer_port; /**< The peer port number, can be the same
> port.
> > */
> > +	/**< Direction of this connection when creating a flow, the
> value only
> > +	 * affects the subsequent flows creation.
> > +	 */
> > +	uint32_t is_original_dir:1;
> > +	/**< Enable / disable the conntrack HW module. When disabled,
> the
> > +	 * result will always be RTE_FLOW_CONNTRACK_FLAG_DISABLED.
> > +	 * In this state the HW will act as passthrough.
> > +	 * It only affects this conntrack object in the HW without any
> effect
> > +	 * to the other objects.
> > +	 */
> > +	uint32_t enable:1;
> > +	/**< At least one ack was seen, after the connection was
> established.
> > */
> > +	uint32_t live_connection:1;
> > +	/**< Enable selective ACK on this connection. */
> > +	uint32_t selective_ack:1;
> > +	/**< A challenge ack has passed. */
> > +	uint32_t challenge_ack_passed:1;
> > +	/**< 1: The last packet is seen that comes from the original
> direction.
> > +	 * 0: From the reply direction.
> > +	 */
> > +	uint32_t last_direction:1;
> > +	/**< No TCP check will be done except the state change. */
> > +	uint32_t liberal_mode:1;
> > +	/**< The current state of the connection. */
> > +	enum rte_flow_conntrack_state state;
> > +	/**< Scaling factor for maximal allowed ACK window. */
> > +	uint8_t max_ack_window;
> > +	/**< Maximal allowed number of retransmission times. */
> > +	uint8_t retransmission_limit;
> > +	/**< TCP parameters of the original direction. */
> > +	struct rte_flow_tcp_dir_param original_dir;
> > +	/**< TCP parameters of the reply direction. */
> > +	struct rte_flow_tcp_dir_param reply_dir;
> > +	/**< The window value of the last packet passed this conntrack.
> */
> > +	uint16_t last_window;
> > +	enum rte_flow_conntrack_tcp_last_index last_index;
> > +	/**< The sequence of the last packet passed this conntrack. */
> > +	uint32_t last_seq;
> > +	/**< The acknowledgement of the last packet passed this
> conntrack. */
> > +	uint32_t last_ack;
> > +	/**< The total value ACK + payload length of the last packet
> passed
> > +	 * this conntrack.
> > +	 */
> > +	uint32_t last_end;
> > +};
> > +
> > +/**
> > + * RTE_FLOW_ACTION_TYPE_CONNTRACK
> > + *
> > + * Wrapper structure for the context update interface.
> > + * Ports cannot support updating, and the only valid solution is
> to
> > + * destroy the old context and create a new one instead.
> > + */
> > +struct rte_flow_modify_conntrack {
> > +	/**< New connection tracking parameters to be updated. */
> > +	struct rte_flow_action_conntrack new_ct;
> > +	uint32_t direction:1; /**< The direction field will be updated.
> */
> > +	/**< All the other fields except direction will be updated. */
> > +	uint32_t state:1;
> > +	uint32_t reserved:30; /**< Reserved bits for the future usage.
> */ };
> > +
> >  /**
> >   * Field IDs for MODIFY_FIELD action.
> >   */
> > --
> > 2.19.0.windows.1

BR. Bing
  
Bing Zhao April 16, 2021, 6:18 p.m. UTC | #4
Hi Thomas,
Thanks for your comments. Almost all the comments are addressed.
PSB.

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, April 16, 2021 6:50 PM
> To: Bing Zhao <bingz@nvidia.com>
> Cc: Ori Kam <orika@nvidia.com>; ferruh.yigit@intel.com;
> andrew.rybchenko@oktetlabs.ru; dev@dpdk.org;
> ajit.khaparde@broadcom.com; jerinj@marvell.com; humin29@huawei.com;
> rosen.xu@intel.com; hemant.agrawal@nxp.com
> Subject: Re: [dpdk-dev] [PATCH v2 1/2] ethdev: introduce conntrack
> flow action and item
> 
> External email: Use caution opening links or attachments
> 
> 
> 15/04/2021 18:41, Bing Zhao:
> > This commit introduced the conntrack action and item.
> >
> > Usually the HW offloading is stateless. For some stateful
> offloading
> > like a TCP connection, HW module will help provide the ability of
> a
> > full offloading w/o SW participation after the connection was
> > established.
> >
> > The basic usage is that in the first flow the application should
> add
> > the conntrack action and in the following flow(s) the application
> > should use the conntrack item to match on the result.
> 
> You probably mean "flow rule", not "traffic flow".
> Please make it clear to avoid confusion.

Done

> 
> > A TCP connection has two directions traffic. To set a conntrack
> action
> > context correctly, information from packets of both directions are
> > required.
> >
> > The conntrack action should be created on one port and supply the
> peer
> > port as a parameter to the action. After context creating, it
> could
> > only be used between the ports (dual-port mode) or a single port.
> The
> > application should modify the action via the API
> > "action_handle_update" only when before using it to create a flow
> with
> > opposite direction. This will help the driver to recognize the
> > direction of the flow to be created, especially in single port
> mode.
> > The traffic from both directions will go through the same port if
> the
> > application works as an "forwarding engine" but not a end point.
> > There is no need to call the update interface if the subsequent
> flows
> > have nothing to be changed.
> 
> I am not sure this is a feature description for the commit log or an
> usage explanation for the doc.
> In any case, please distinguish "ethdev port" and "TCP port"
> to avoid confusion.

Changed, thanks.

> 
> > Query will be supported via action_ctx_query interface, about the
> > current packets information and connection status. Tha fields
> query
> > capabilities depends on the HW.
> >
> > For the packets received during the conntrack setup, it is
> suggested
> > to re-inject the packets in order to take full advantage of the
> 
> What do you mean by "full advantage"?
> It is counter-intuitive to re-inject for offloading.
> Does it improve the performance?

No, it is not for the performance but for the functionality correctness. Before the CT established, some data+ack packets may already be received by the SW, and the application will use the initial information to setup a conntrack. It may result into some error checking for the following packets. By re-injecting the packets already received by SW before the established CT, it will make the HW have all the packets information and check the following packets correctly.

> 
> > conntrack. Only the valid packets should pass the conntrack,
> packets
> > with invalid TCP information, like out of window, or with invalid
> > header, like malformed, should not pass.
> >
> > Naming and definition:
> 
> You mean naming is inspired from Linux?

The naming and critical fields definition. The original idea are from the paper listed below (correct me if I was wrong), and there is some well-known definition in this area, to my understanding, it would be better to follow it.

> 
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fel
> ix
> >
> ir.bootlin.com%2Flinux%2Flatest%2Fsource%2Finclude%2Fuapi%2Flinux%2F
> ne
> >
> tfilter%2Fnf_conntrack_tcp.h&amp;data=04%7C01%7Cbingz%40nvidia.com%7
> Ca
> >
> d68b128653e4428da6e08d900c56373%7C43083d15727340c1b7db39efd9ccc17a%7
> C0
> > %7C1%7C637541670056627642%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAi
> >
> LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=kCn9
> kF
> > bi7yWrd7A94zFibvQEB97phXXUudSA%2BhAueTU%3D&amp;reserved=0
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fel
> ix
> >
> ir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fnet%2Fnetfilter%2Fnf_conn
> tr
> >
> ack_proto_tcp.c&amp;data=04%7C01%7Cbingz%40nvidia.com%7Cad68b128653e
> 44
> >
> 28da6e08d900c56373%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C1%7C6375
> 41
> >
> 670056627642%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2l
> uM
> >
> zIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Ajs9NtCaEpG2Kfnjy
> t5
> > X8uwOFo2HyfMdWZbx%2BHkbvX8%3D&amp;reserved=0
> >
> > Other reference:
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww
> w.
> >
> usenix.org%2Flegacy%2Fevents%2Fsec01%2Finvitedtalks%2Frooij.pdf&amp;
> da
> >
> ta=04%7C01%7Cbingz%40nvidia.com%7Cad68b128653e4428da6e08d900c56373%7
> C4
> >
> 3083d15727340c1b7db39efd9ccc17a%7C0%7C1%7C637541670056627642%7CUnkno
> wn
> > %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWw
> iLCJ
> >
> XVCI6Mn0%3D%7C1000&amp;sdata=987HVU%2FefoJ40%2B6aM0Q1RVxcGH5nVJS4bzy
> 4A
> > 4ZoYSE%3D&amp;reserved=0
> >
> > Signed-off-by: Bing Zhao <bingz@nvidia.com>
> [...]
> > +     /**
> > +      * [META]
> > +      *
> > +      * Matches conntrack state.
> > +      *
> > +      * See struct rte_flow_item_conntrack.
> 
> Please use @see for hyperlink in doxygen.
> 
> > +      */
> > +     RTE_FLOW_ITEM_TYPE_CONNTRACK,
> >  };
> [...]
> > +/**
> > + * The packet is with valid state after conntrack checking.
> 
> "is with valid state" looks strange.
> I propose "The packet is valid after conntrack checking."

Done

> 
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_VALID (1 << 0)
> 
> Please use RTE_BIT32().

Done

> 
> > +/**
> > + * The state of the connection was changed.
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_CHANGED (1 << 1)
> > +/**
> > + * Error is detected on this packet for this connection and
> > + * an invalid state is set.
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_INVAL (1 << 2)
> 
> "INVAL" is strange. Can we add the missing 2 characters?
> RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_INVALID
> 
> On a related note, do we really need the word FLAG?
> And it is conflicting with the prefix in enum
> rte_flow_conntrack_tcp_last_index I think
> RTE_FLOW_CONNTRACK_PKT_STATE_ is a good prefix, long enough.
> 

Done

> > +/**
> > + * The HW connection tracking module is disabled.
> > + * It can be due to application command or an invalid state.
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_HW_DISABLED (1 << 3)
> 
> This one does not have PKT in its name.
> And it is limiting to HW, while the driver could implement conntrack
> in SW.
> I propose RTE_FLOW_CONNTRACK_PKT_DISABLED
> 

Done

> > +/**
> > + * The packet contains some bad field(s) and cannot continue
> > + * with the conntrack module checking.
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_BAD (1 << 4)
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior
> notice
> > + *
> > + * RTE_FLOW_ITEM_TYPE_CONNTRACK
> > + *
> > + * Matches the state of a packet after it passed the connection
> > +tracking
> > + * examination. The state is a bit mask of one
> > +RTE_FLOW_CONNTRACK_FLAG*
> 
> s/bit mask/bitmap/ ?

Done

> 
> RTE_FLOW_CONNTRACK_PKT_STATE_*
> otherwise it is messed with rte_flow_conntrack_tcp_last_index
> 
> > + * or a reasonable combination of these bits.
> > + */
> > +struct rte_flow_item_conntrack {
> > +     uint32_t flags;
> > +};
> [...]
> > +
> > +     /**
> > +      * [META]
> > +      *
> > +      * Enable tracking a TCP connection state.
> > +      *
> > +      * Send packet to HW connection tracking module for
> examination.
> 
> Not necessarily HW.
> No packet is sent.
> I think you can remove this sentence completely.
> 

Done

> > +      *
> > +      * See struct rte_flow_action_conntrack.
> 
> @see
> 
> > +      */
> > +     RTE_FLOW_ACTION_TYPE_CONNTRACK,
> >  };
> >
> >  /**
> > @@ -2875,6 +2940,136 @@ struct rte_flow_action_set_dscp {
> >   */
> >  struct rte_flow_action_handle;
> >
> > +/**
> > + * The state of a TCP connection.
> > + */
> > +enum rte_flow_conntrack_state {
> > +     /**< SYN-ACK packet was seen. */
> > +     RTE_FLOW_CONNTRACK_STATE_SYN_RECV,
> > +     /**< 3-way handshark was done. */
> 
> s/handshark/handshake/
> 

Done

> > +     RTE_FLOW_CONNTRACK_STATE_ESTABLISHED,
> > +     /**< First FIN packet was received to close the connection.
> */
> > +     RTE_FLOW_CONNTRACK_STATE_FIN_WAIT,
> > +     /**< First FIN was ACKed. */
> > +     RTE_FLOW_CONNTRACK_STATE_CLOSE_WAIT,
> > +     /**< Second FIN was received, waiting for the last ACK. */
> > +     RTE_FLOW_CONNTRACK_STATE_LAST_ACK,
> > +     /**< Second FIN was ACKed, connection was closed. */
> > +     RTE_FLOW_CONNTRACK_STATE_TIME_WAIT,
> > +};
> > +
> > +/**
> > + * The last passed TCP packet flags of a connection.
> > + */
> > +enum rte_flow_conntrack_tcp_last_index {
> > +     RTE_FLOW_CONNTRACK_FLAG_NONE = 0, /**< No Flag. */
> > +     RTE_FLOW_CONNTRACK_FLAG_SYN = (1 << 0), /**< With SYN flag.
> */
> > +     RTE_FLOW_CONNTRACK_FLAG_SYNACK = (1 << 1), /**< With SYN+ACK
> flag. */
> > +     RTE_FLOW_CONNTRACK_FLAG_FIN = (1 << 2), /**< With FIN flag.
> */
> > +     RTE_FLOW_CONNTRACK_FLAG_ACK = (1 << 3), /**< With ACK flag.
> */
> > +     RTE_FLOW_CONNTRACK_FLAG_RST = (1 << 4), /**< With RST flag.
> */
> > +};
> 
> Please use RTE_BIT32().
> 

Done

> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior
> notice
> > + *
> > + * Configuration parameters for each direction of a TCP
> connection.
> > + */
> > +struct rte_flow_tcp_dir_param {
> > +     uint32_t scale:4; /**< TCP window scaling factor, 0xF to
> disable. */
> > +     uint32_t close_initiated:1; /**< The FIN was sent by this
> direction. */
> > +     /**< An ACK packet has been received by this side. */
> 
> Move all comments on their own line before the struct member.
> Comment should then start with /**
> 

All done, BTW, I see in the current code, the format "/**<" is used in a lot of parts.

> > +     uint32_t last_ack_seen:1;
> > +     /**< If set, indicates that there is unacked data of the
> > + connection. */
> 
> not sure what means "unacked data of the connection"

Updated the description, it means some packets were sent but not all of them are ACKed.

> 
> > +     uint32_t data_unacked:1;
> > +     /**< Maximal value of sequence + payload length over sent
> > +      * packets (next ACK from the opposite direction).
> > +      */
> > +     uint32_t sent_end;
> > +     /**< Maximal value of (ACK + window size) over received
> packet + length
> > +      * over sent packet (maximal sequence could be sent).
> > +      */
> > +     uint32_t reply_end;
> > +     /**< Maximal value of actual window size over sent packets.
> */
> > +     uint32_t max_win;
> > +     /**< Maximal value of ACK over sent packets. */
> > +     uint32_t max_ack;
> 
> Not sure about the word "over" in above definitions.

Changed to "in"

> 
> > +};
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior
> notice
> > + *
> > + * RTE_FLOW_ACTION_TYPE_CONNTRACK
> > + *
> > + * Configuration and initial state for the connection tracking
> module.
> > + * This structure could be used for both setting and query.
> > + */
> > +struct rte_flow_action_conntrack {
> > +     uint16_t peer_port; /**< The peer port number, can be the
> same port. */
> > +     /**< Direction of this connection when creating a flow, the
> value only
> > +      * affects the subsequent flows creation.
> > +      */
> 
> As for rte_flow_tcp_dir_param, better to move comments before, on
> their own line.
> 
> > +     uint32_t is_original_dir:1;
> > +     /**< Enable / disable the conntrack HW module. When disabled,
> the
> > +      * result will always be RTE_FLOW_CONNTRACK_FLAG_DISABLED.
> > +      * In this state the HW will act as passthrough.
> > +      * It only affects this conntrack object in the HW without
> any effect
> > +      * to the other objects.
> > +      */
> > +     uint32_t enable:1;
> > +     /**< At least one ack was seen, after the connection was
> established. */
> > +     uint32_t live_connection:1;
> > +     /**< Enable selective ACK on this connection. */
> > +     uint32_t selective_ack:1;
> > +     /**< A challenge ack has passed. */
> > +     uint32_t challenge_ack_passed:1;
> > +     /**< 1: The last packet is seen that comes from the original
> direction.
> > +      * 0: From the reply direction.
> > +      */
> > +     uint32_t last_direction:1;
> > +     /**< No TCP check will be done except the state change. */
> > +     uint32_t liberal_mode:1;
> > +     /**< The current state of the connection. */
> > +     enum rte_flow_conntrack_state state;
> > +     /**< Scaling factor for maximal allowed ACK window. */
> > +     uint8_t max_ack_window;
> > +     /**< Maximal allowed number of retransmission times. */
> > +     uint8_t retransmission_limit;
> > +     /**< TCP parameters of the original direction. */
> > +     struct rte_flow_tcp_dir_param original_dir;
> > +     /**< TCP parameters of the reply direction. */
> > +     struct rte_flow_tcp_dir_param reply_dir;
> > +     /**< The window value of the last packet passed this
> conntrack. */
> > +     uint16_t last_window;
> > +     enum rte_flow_conntrack_tcp_last_index last_index;
> > +     /**< The sequence of the last packet passed this conntrack.
> */
> > +     uint32_t last_seq;
> > +     /**< The acknowledgement of the last packet passed this
> conntrack. */
> > +     uint32_t last_ack;
> > +     /**< The total value ACK + payload length of the last packet
> passed
> > +      * this conntrack.
> > +      */
> > +     uint32_t last_end;
> > +};
> > +
> > +/**
> > + * RTE_FLOW_ACTION_TYPE_CONNTRACK
> > + *
> > + * Wrapper structure for the context update interface.
> > + * Ports cannot support updating, and the only valid solution is
> to
> > + * destroy the old context and create a new one instead.
> > + */
> > +struct rte_flow_modify_conntrack {
> > +     /**< New connection tracking parameters to be updated. */
> > +     struct rte_flow_action_conntrack new_ct;
> > +     uint32_t direction:1; /**< The direction field will be
> updated. */
> > +     /**< All the other fields except direction will be updated.
> */
> > +     uint32_t state:1;
> > +     uint32_t reserved:30; /**< Reserved bits for the future
> usage.
> > +*/ };
> 
> 

Thanks
  
Ajit Khaparde April 16, 2021, 9:47 p.m. UTC | #5
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this structure may change without prior
> > notice
> > > + *
> > > + * Configuration parameters for each direction of a TCP
> > connection.
> > > + */
> > > +struct rte_flow_tcp_dir_param {
> > > +   uint32_t scale:4; /**< TCP window scaling factor, 0xF to
> > disable. */
> > > +   uint32_t close_initiated:1; /**< The FIN was sent by this
> > direction. */
> > > +   /**< An ACK packet has been received by this side. */
> > > +   uint32_t last_ack_seen:1;
> > > +   /**< If set, indicates that there is unacked data of the
> > connection. */
> > > +   uint32_t data_unacked:1;
> > > +   /**< Maximal value of sequence + payload length over sent
> > > +    * packets (next ACK from the opposite direction).
> > > +    */
> > > +   uint32_t sent_end;
> > > +   /**< Maximal value of (ACK + window size) over received packet
> > +
> > > length
> > > +    * over sent packet (maximal sequence could be sent).
> > > +    */
> > > +   uint32_t reply_end;
> >
> > This comment is for all members that are part of the packet, Do you
> > think it should be in network order?
>
> Almost none of the fields are part of the packet. Indeed, most of them are calculated from the packets information. So I prefer to keep the host order easy for using and
> keep all the fields of the whole structure the same endianness format.
> What do you think?

Can you mention it in the documentation and comments?
That all the values are in host byte order and need to be converted to
network byte order if the HW needs it that way
  
Bing Zhao April 17, 2021, 6:10 a.m. UTC | #6
Hi Ajit,

> -----Original Message-----
> From: Ajit Khaparde <ajit.khaparde@broadcom.com>
> Sent: Saturday, April 17, 2021 5:47 AM
> To: Bing Zhao <bingz@nvidia.com>
> Cc: Ori Kam <orika@nvidia.com>; NBU-Contact-Thomas Monjalon
> <thomas@monjalon.net>; ferruh.yigit@intel.com;
> andrew.rybchenko@oktetlabs.ru; dev@dpdk.org
> Subject: Re: [PATCH v2 1/2] ethdev: introduce conntrack flow action
> and item
> 
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: this structure may change without prior
> > > notice
> > > > + *
> > > > + * Configuration parameters for each direction of a TCP
> > > connection.
> > > > + */
> > > > +struct rte_flow_tcp_dir_param {
> > > > +   uint32_t scale:4; /**< TCP window scaling factor, 0xF to
> > > disable. */
> > > > +   uint32_t close_initiated:1; /**< The FIN was sent by this
> > > direction. */
> > > > +   /**< An ACK packet has been received by this side. */
> > > > +   uint32_t last_ack_seen:1;
> > > > +   /**< If set, indicates that there is unacked data of the
> > > connection. */
> > > > +   uint32_t data_unacked:1;
> > > > +   /**< Maximal value of sequence + payload length over sent
> > > > +    * packets (next ACK from the opposite direction).
> > > > +    */
> > > > +   uint32_t sent_end;
> > > > +   /**< Maximal value of (ACK + window size) over received
> packet
> > > +
> > > > length
> > > > +    * over sent packet (maximal sequence could be sent).
> > > > +    */
> > > > +   uint32_t reply_end;
> > >
> > > This comment is for all members that are part of the packet, Do
> you
> > > think it should be in network order?
> >
> > Almost none of the fields are part of the packet. Indeed, most of
> them are calculated from the packets information. So I prefer to
> keep the host order easy for using and
> > keep all the fields of the whole structure the same endianness
> format.
> > What do you think?
> 
> Can you mention it in the documentation and comments?
> That all the values are in host byte order and need to be converted
> to
> network byte order if the HW needs it that way

Sure, I think it would be better to add it in the documentation.
What do you think?

BR. Bing
  
Ajit Khaparde April 17, 2021, 2:54 p.m. UTC | #7
On Fri, Apr 16, 2021 at 11:10 PM Bing Zhao <bingz@nvidia.com> wrote:
>
> Hi Ajit,
>
> > -----Original Message-----
> > From: Ajit Khaparde <ajit.khaparde@broadcom.com>
> > Sent: Saturday, April 17, 2021 5:47 AM
> > To: Bing Zhao <bingz@nvidia.com>
> > Cc: Ori Kam <orika@nvidia.com>; NBU-Contact-Thomas Monjalon
> > <thomas@monjalon.net>; ferruh.yigit@intel.com;
> > andrew.rybchenko@oktetlabs.ru; dev@dpdk.org
> > Subject: Re: [PATCH v2 1/2] ethdev: introduce conntrack flow action
> > and item
> >
> > > > > +
> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: this structure may change without prior
> > > > notice
> > > > > + *
> > > > > + * Configuration parameters for each direction of a TCP
> > > > connection.
> > > > > + */
> > > > > +struct rte_flow_tcp_dir_param {
> > > > > +   uint32_t scale:4; /**< TCP window scaling factor, 0xF to
> > > > disable. */
> > > > > +   uint32_t close_initiated:1; /**< The FIN was sent by this
> > > > direction. */
> > > > > +   /**< An ACK packet has been received by this side. */
> > > > > +   uint32_t last_ack_seen:1;
> > > > > +   /**< If set, indicates that there is unacked data of the
> > > > connection. */
> > > > > +   uint32_t data_unacked:1;
> > > > > +   /**< Maximal value of sequence + payload length over sent
> > > > > +    * packets (next ACK from the opposite direction).
> > > > > +    */
> > > > > +   uint32_t sent_end;
> > > > > +   /**< Maximal value of (ACK + window size) over received
> > packet
> > > > +
> > > > > length
> > > > > +    * over sent packet (maximal sequence could be sent).
> > > > > +    */
> > > > > +   uint32_t reply_end;
> > > >
> > > > This comment is for all members that are part of the packet, Do
> > you
> > > > think it should be in network order?
> > >
> > > Almost none of the fields are part of the packet. Indeed, most of
> > them are calculated from the packets information. So I prefer to
> > keep the host order easy for using and
> > > keep all the fields of the whole structure the same endianness
> > format.
> > > What do you think?
> >
> > Can you mention it in the documentation and comments?
> > That all the values are in host byte order and need to be converted
> > to
> > network byte order if the HW needs it that way
>
> Sure, I think it would be better to add it in the documentation.
> What do you think?
Documentation - yes.
In the comments of the structure in the header file - if possible.

>
> BR. Bing
  

Patch

diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index 27a161559d..0af601d508 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -98,6 +98,7 @@  static const struct rte_flow_desc_data rte_flow_desc_item[] = {
 	MK_FLOW_ITEM(PFCP, sizeof(struct rte_flow_item_pfcp)),
 	MK_FLOW_ITEM(ECPRI, sizeof(struct rte_flow_item_ecpri)),
 	MK_FLOW_ITEM(GENEVE_OPT, sizeof(struct rte_flow_item_geneve_opt)),
+	MK_FLOW_ITEM(CONNTRACK, sizeof(uint32_t)),
 };
 
 /** Generate flow_action[] entry. */
@@ -186,6 +187,7 @@  static const struct rte_flow_desc_data rte_flow_desc_action[] = {
 	 * indirect action handle.
 	 */
 	MK_FLOW_ACTION(INDIRECT, 0),
+	MK_FLOW_ACTION(CONNTRACK, sizeof(struct rte_flow_action_conntrack)),
 };
 
 int
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index 91ae25b1da..024d1a2026 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -551,6 +551,15 @@  enum rte_flow_item_type {
 	 * See struct rte_flow_item_geneve_opt
 	 */
 	RTE_FLOW_ITEM_TYPE_GENEVE_OPT,
+
+	/**
+	 * [META]
+	 *
+	 * Matches conntrack state.
+	 *
+	 * See struct rte_flow_item_conntrack.
+	 */
+	RTE_FLOW_ITEM_TYPE_CONNTRACK,
 };
 
 /**
@@ -1685,6 +1694,51 @@  rte_flow_item_geneve_opt_mask = {
 };
 #endif
 
+/**
+ * The packet is with valid state after conntrack checking.
+ */
+#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_VALID (1 << 0)
+/**
+ * The state of the connection was changed.
+ */
+#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_CHANGED (1 << 1)
+/**
+ * Error is detected on this packet for this connection and
+ * an invalid state is set.
+ */
+#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_INVAL (1 << 2)
+/**
+ * The HW connection tracking module is disabled.
+ * It can be due to application command or an invalid state.
+ */
+#define RTE_FLOW_CONNTRACK_FLAG_HW_DISABLED (1 << 3)
+/**
+ * The packet contains some bad field(s) and cannot continue
+ * with the conntrack module checking.
+ */
+#define RTE_FLOW_CONNTRACK_FLAG_PKT_BAD (1 << 4)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ *
+ * RTE_FLOW_ITEM_TYPE_CONNTRACK
+ *
+ * Matches the state of a packet after it passed the connection tracking
+ * examination. The state is a bit mask of one RTE_FLOW_CONNTRACK_FLAG*
+ * or a reasonable combination of these bits.
+ */
+struct rte_flow_item_conntrack {
+	uint32_t flags;
+};
+
+/** Default mask for RTE_FLOW_ITEM_TYPE_CONNTRACK. */
+#ifndef __cplusplus
+static const struct rte_flow_item_conntrack rte_flow_item_conntrack_mask = {
+	.flags = 0xffffffff,
+};
+#endif
+
 /**
  * Matching pattern item definition.
  *
@@ -2277,6 +2331,17 @@  enum rte_flow_action_type {
 	 * same port or across different ports.
 	 */
 	RTE_FLOW_ACTION_TYPE_INDIRECT,
+
+	/**
+	 * [META]
+	 *
+	 * Enable tracking a TCP connection state.
+	 *
+	 * Send packet to HW connection tracking module for examination.
+	 *
+	 * See struct rte_flow_action_conntrack.
+	 */
+	RTE_FLOW_ACTION_TYPE_CONNTRACK,
 };
 
 /**
@@ -2875,6 +2940,136 @@  struct rte_flow_action_set_dscp {
  */
 struct rte_flow_action_handle;
 
+/**
+ * The state of a TCP connection.
+ */
+enum rte_flow_conntrack_state {
+	/**< SYN-ACK packet was seen. */
+	RTE_FLOW_CONNTRACK_STATE_SYN_RECV,
+	/**< 3-way handshark was done. */
+	RTE_FLOW_CONNTRACK_STATE_ESTABLISHED,
+	/**< First FIN packet was received to close the connection. */
+	RTE_FLOW_CONNTRACK_STATE_FIN_WAIT,
+	/**< First FIN was ACKed. */
+	RTE_FLOW_CONNTRACK_STATE_CLOSE_WAIT,
+	/**< Second FIN was received, waiting for the last ACK. */
+	RTE_FLOW_CONNTRACK_STATE_LAST_ACK,
+	/**< Second FIN was ACKed, connection was closed. */
+	RTE_FLOW_CONNTRACK_STATE_TIME_WAIT,
+};
+
+/**
+ * The last passed TCP packet flags of a connection.
+ */
+enum rte_flow_conntrack_tcp_last_index {
+	RTE_FLOW_CONNTRACK_FLAG_NONE = 0, /**< No Flag. */
+	RTE_FLOW_CONNTRACK_FLAG_SYN = (1 << 0), /**< With SYN flag. */
+	RTE_FLOW_CONNTRACK_FLAG_SYNACK = (1 << 1), /**< With SYN+ACK flag. */
+	RTE_FLOW_CONNTRACK_FLAG_FIN = (1 << 2), /**< With FIN flag. */
+	RTE_FLOW_CONNTRACK_FLAG_ACK = (1 << 3), /**< With ACK flag. */
+	RTE_FLOW_CONNTRACK_FLAG_RST = (1 << 4), /**< With RST flag. */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ *
+ * Configuration parameters for each direction of a TCP connection.
+ */
+struct rte_flow_tcp_dir_param {
+	uint32_t scale:4; /**< TCP window scaling factor, 0xF to disable. */
+	uint32_t close_initiated:1; /**< The FIN was sent by this direction. */
+	/**< An ACK packet has been received by this side. */
+	uint32_t last_ack_seen:1;
+	/**< If set, indicates that there is unacked data of the connection. */
+	uint32_t data_unacked:1;
+	/**< Maximal value of sequence + payload length over sent
+	 * packets (next ACK from the opposite direction).
+	 */
+	uint32_t sent_end;
+	/**< Maximal value of (ACK + window size) over received packet + length
+	 * over sent packet (maximal sequence could be sent).
+	 */
+	uint32_t reply_end;
+	/**< Maximal value of actual window size over sent packets. */
+	uint32_t max_win;
+	/**< Maximal value of ACK over sent packets. */
+	uint32_t max_ack;
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ *
+ * RTE_FLOW_ACTION_TYPE_CONNTRACK
+ *
+ * Configuration and initial state for the connection tracking module.
+ * This structure could be used for both setting and query.
+ */
+struct rte_flow_action_conntrack {
+	uint16_t peer_port; /**< The peer port number, can be the same port. */
+	/**< Direction of this connection when creating a flow, the value only
+	 * affects the subsequent flows creation.
+	 */
+	uint32_t is_original_dir:1;
+	/**< Enable / disable the conntrack HW module. When disabled, the
+	 * result will always be RTE_FLOW_CONNTRACK_FLAG_DISABLED.
+	 * In this state the HW will act as passthrough.
+	 * It only affects this conntrack object in the HW without any effect
+	 * to the other objects.
+	 */
+	uint32_t enable:1;
+	/**< At least one ack was seen, after the connection was established. */
+	uint32_t live_connection:1;
+	/**< Enable selective ACK on this connection. */
+	uint32_t selective_ack:1;
+	/**< A challenge ack has passed. */
+	uint32_t challenge_ack_passed:1;
+	/**< 1: The last packet is seen that comes from the original direction.
+	 * 0: From the reply direction.
+	 */
+	uint32_t last_direction:1;
+	/**< No TCP check will be done except the state change. */
+	uint32_t liberal_mode:1;
+	/**< The current state of the connection. */
+	enum rte_flow_conntrack_state state;
+	/**< Scaling factor for maximal allowed ACK window. */
+	uint8_t max_ack_window;
+	/**< Maximal allowed number of retransmission times. */
+	uint8_t retransmission_limit;
+	/**< TCP parameters of the original direction. */
+	struct rte_flow_tcp_dir_param original_dir;
+	/**< TCP parameters of the reply direction. */
+	struct rte_flow_tcp_dir_param reply_dir;
+	/**< The window value of the last packet passed this conntrack. */
+	uint16_t last_window;
+	enum rte_flow_conntrack_tcp_last_index last_index;
+	/**< The sequence of the last packet passed this conntrack. */
+	uint32_t last_seq;
+	/**< The acknowledgement of the last packet passed this conntrack. */
+	uint32_t last_ack;
+	/**< The total value ACK + payload length of the last packet passed
+	 * this conntrack.
+	 */
+	uint32_t last_end;
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_CONNTRACK
+ *
+ * Wrapper structure for the context update interface.
+ * Ports cannot support updating, and the only valid solution is to
+ * destroy the old context and create a new one instead.
+ */
+struct rte_flow_modify_conntrack {
+	/**< New connection tracking parameters to be updated. */
+	struct rte_flow_action_conntrack new_ct;
+	uint32_t direction:1; /**< The direction field will be updated. */
+	/**< All the other fields except direction will be updated. */
+	uint32_t state:1;
+	uint32_t reserved:30; /**< Reserved bits for the future usage. */
+};
+
 /**
  * Field IDs for MODIFY_FIELD action.
  */