diff mbox series

[v3,1/3] ethdev: introduce conntrack flow action and item

Message ID 1618595649-157464-2-git-send-email-bingz@nvidia.com (mailing list archive)
State Superseded
Delegated to: Ferruh Yigit
Headers show
Series ethdev: introduce conntrack flow action and item | expand

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Bing Zhao April 16, 2021, 5:54 p.m. UTC
This commit introduces the conntrack action and item.

Usually the HW offloading is stateless. For some stateful offloading
like a TCP connection, HW module will help provide the ability of a
full offloading w/o SW participation after the connection was
established.

The basic usage is that in the first flow rule the application should
add the conntrack action and jump to the next flow table. In the
following flow rule(s) of the next table, the application should use
the conntrack item to match on the result.

A TCP connection has two directions traffic. To set a conntrack
action context correctly, the information of packets from both
directions are required.

The conntrack action should be created on one ethdev port and supply
the peer ethdev port as a parameter to the action. After context
created, it could only be used between these two ethdev ports
(dual-port mode) or a single port. The application should modify the
action via the API "rte_action_handle_update" only when before using
it to create a flow rule with conntrack conntrack for the opposite
direction. This will help the driver to recognize the direction of
the flow to be created, especially in the single-port mode, in which
case the traffic from both directions will go through the same
ethdev port if the application works as an "forwarding engine" but
not an end point. There is no need to call the update interface if
the subsequent flow rules have nothing to be changed.

Query will be supported via "rte_action_handle_query" interface,
about the current packets information and connection status. The
fields query capabilities depends on the HW.

For the packets received during the conntrack setup, it is suggested
to re-inject the packets in order to make sure the conntrack module
works correctly without missing any packet. Only the valid packets
should pass the conntrack, packets with invalid TCP information,
like out of window, or with invalid header, like malformed, should
not pass.

Naming and definition:
https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/netfilter/nf_conntrack_tcp.h
https://elixir.bootlin.com/linux/latest/source/net/netfilter/nf_conntrack_proto_tcp.c

Other reference:
https://www.usenix.org/legacy/events/sec01/invitedtalks/rooij.pdf

Signed-off-by: Bing Zhao <bingz@nvidia.com>
---
 lib/librte_ethdev/rte_flow.c |   2 +
 lib/librte_ethdev/rte_flow.h | 207 +++++++++++++++++++++++++++++++++++
 2 files changed, 209 insertions(+)

Comments

Ajit Khaparde April 16, 2021, 6:30 p.m. UTC | #1
On Fri, Apr 16, 2021 at 10:54 AM Bing Zhao <bingz@nvidia.com> wrote:
>
> This commit introduces the conntrack action and item.
>
> Usually the HW offloading is stateless. For some stateful offloading
> like a TCP connection, HW module will help provide the ability of a
> full offloading w/o SW participation after the connection was
> established.
>
> The basic usage is that in the first flow rule the application should
> add the conntrack action and jump to the next flow table. In the
> following flow rule(s) of the next table, the application should use
> the conntrack item to match on the result.
>
> A TCP connection has two directions traffic. To set a conntrack

s/has two directions traffic/can have traffic in two directions.

> action context correctly, the information of packets from both
> directions are required.
>
> The conntrack action should be created on one ethdev port and supply
> the peer ethdev port as a parameter to the action. After context
> created, it could only be used between these two ethdev ports
> (dual-port mode) or a single port. The application should modify the
> action via the API "rte_action_handle_update" only when before using
> it to create a flow rule with conntrack conntrack for the opposite
> direction. This will help the driver to recognize the direction of
> the flow to be created, especially in the single-port mode, in which
> case the traffic from both directions will go through the same
> ethdev port if the application works as an "forwarding engine" but
> not an end point. There is no need to call the update interface if
> the subsequent flow rules have nothing to be changed.
>
> Query will be supported via "rte_action_handle_query" interface,
> about the current packets information and connection status. The
> fields query capabilities depends on the HW.
How about this:
The fields which can be queried will depend on the HW capabilities.

>
> For the packets received during the conntrack setup, it is suggested
> to re-inject the packets in order to make sure the conntrack module
> works correctly without missing any packet. Only the valid packets
> should pass the conntrack, packets with invalid TCP information,
> like out of window, or with invalid header, like malformed, should
> not pass.
>
> Naming and definition:
> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/netfilter/nf_conntrack_tcp.h
> https://elixir.bootlin.com/linux/latest/source/net/netfilter/nf_conntrack_proto_tcp.c
>
> Other reference:
> https://www.usenix.org/legacy/events/sec01/invitedtalks/rooij.pdf
>
> Signed-off-by: Bing Zhao <bingz@nvidia.com>
> ---
>  lib/librte_ethdev/rte_flow.c |   2 +
>  lib/librte_ethdev/rte_flow.h | 207 +++++++++++++++++++++++++++++++++++
>  2 files changed, 209 insertions(+)
>
> diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
> index 0d2610b7c4..c7c7108933 100644
> --- a/lib/librte_ethdev/rte_flow.c
> +++ b/lib/librte_ethdev/rte_flow.c
> @@ -98,6 +98,7 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = {
>         MK_FLOW_ITEM(PFCP, sizeof(struct rte_flow_item_pfcp)),
>         MK_FLOW_ITEM(ECPRI, sizeof(struct rte_flow_item_ecpri)),
>         MK_FLOW_ITEM(GENEVE_OPT, sizeof(struct rte_flow_item_geneve_opt)),
> +       MK_FLOW_ITEM(CONNTRACK, sizeof(uint32_t)),
>  };
>
>  /** Generate flow_action[] entry. */
> @@ -186,6 +187,7 @@ static const struct rte_flow_desc_data rte_flow_desc_action[] = {
>          * indirect action handle.
>          */
>         MK_FLOW_ACTION(INDIRECT, 0),
> +       MK_FLOW_ACTION(CONNTRACK, sizeof(struct rte_flow_action_conntrack)),
>  };
>
>  int
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index 324d00abdc..c9d7bdfa57 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -551,6 +551,15 @@ enum rte_flow_item_type {
>          * See struct rte_flow_item_geneve_opt
>          */
>         RTE_FLOW_ITEM_TYPE_GENEVE_OPT,
> +
> +       /**
> +        * [META]
> +        *
> +        * Matches conntrack state.
> +        *
> +        * @see struct rte_flow_item_conntrack.
> +        */
> +       RTE_FLOW_ITEM_TYPE_CONNTRACK,
>  };
>
>  /**
> @@ -1685,6 +1694,51 @@ rte_flow_item_geneve_opt_mask = {
>  };
>  #endif
>
> +/**
> + * The packet is valid after conntrack checking.
> + */
> +#define RTE_FLOW_CONNTRACK_PKT_STATE_VALID RTE_BIT32(0)
> +/**
> + * The state of the connection is changed.
> + */
> +#define RTE_FLOW_CONNTRACK_PKT_STATE_CHANGED RTE_BIT32(1)
> +/**
> + * Error is detected on this packet for this connection and
> + * an invalid state is set.
> + */
> +#define RTE_FLOW_CONNTRACK_PKT_STATE_INVALID RTE_BIT32(2)
> +/**
> + * The HW connection tracking module is disabled.
> + * It can be due to application command or an invalid state.
> + */
> +#define RTE_FLOW_CONNTRACK_PKT_STATE_DISABLED RTE_BIT32(3)
> +/**
> + * The packet contains some bad field(s) and cannot continue
> + * with the conntrack module checking.
> + */
> +#define RTE_FLOW_CONNTRACK_PKT_STATE_BAD RTE_BIT32(4)
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * RTE_FLOW_ITEM_TYPE_CONNTRACK
> + *
> + * Matches the state of a packet after it passed the connection tracking
> + * examination. The state is a bitmap of one RTE_FLOW_CONNTRACK_PKT_STATE*
> + * or a reasonable combination of these bits.
> + */
> +struct rte_flow_item_conntrack {
> +       uint32_t flags;
> +};
> +
> +/** Default mask for RTE_FLOW_ITEM_TYPE_CONNTRACK. */
> +#ifndef __cplusplus
> +static const struct rte_flow_item_conntrack rte_flow_item_conntrack_mask = {
> +       .flags = 0xffffffff,
> +};
> +#endif
> +
>  /**
>   * Matching pattern item definition.
>   *
> @@ -2277,6 +2331,15 @@ enum rte_flow_action_type {
>          * same port or across different ports.
>          */
>         RTE_FLOW_ACTION_TYPE_INDIRECT,
> +
> +       /**
> +        * [META]
> +        *
> +        * Enable tracking a TCP connection state.
> +        *
> +        * @see struct rte_flow_action_conntrack.
> +        */
> +       RTE_FLOW_ACTION_TYPE_CONNTRACK,
>  };
>
>  /**
> @@ -2875,6 +2938,150 @@ struct rte_flow_action_set_dscp {
>   */
>  struct rte_flow_action_handle;
>
> +/**
> + * The state of a TCP connection.
> + */
> +enum rte_flow_conntrack_state {
> +       /**< SYN-ACK packet was seen. */
> +       RTE_FLOW_CONNTRACK_STATE_SYN_RECV,
> +       /**< 3-way handshake was done. */
> +       RTE_FLOW_CONNTRACK_STATE_ESTABLISHED,
> +       /**< First FIN packet was received to close the connection. */
> +       RTE_FLOW_CONNTRACK_STATE_FIN_WAIT,
> +       /**< First FIN was ACKed. */
> +       RTE_FLOW_CONNTRACK_STATE_CLOSE_WAIT,
> +       /**< Second FIN was received, waiting for the last ACK. */
> +       RTE_FLOW_CONNTRACK_STATE_LAST_ACK,
> +       /**< Second FIN was ACKed, connection was closed. */
> +       RTE_FLOW_CONNTRACK_STATE_TIME_WAIT,
> +};
> +
> +/**
> + * The last passed TCP packet flags of a connection.
> + */
> +enum rte_flow_conntrack_tcp_last_index {
> +       RTE_FLOW_CONNTRACK_FLAG_NONE = 0, /**< No Flag. */
> +       RTE_FLOW_CONNTRACK_FLAG_SYN = RTE_BIT32(0), /**< With SYN flag. */
> +       RTE_FLOW_CONNTRACK_FLAG_SYNACK = RTE_BIT32(1), /**< With SYNACK flag. */
> +       RTE_FLOW_CONNTRACK_FLAG_FIN = RTE_BIT32(2), /**< With FIN flag. */
> +       RTE_FLOW_CONNTRACK_FLAG_ACK = RTE_BIT32(3), /**< With ACK flag. */
> +       RTE_FLOW_CONNTRACK_FLAG_RST = RTE_BIT32(4), /**< With RST flag. */
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * Configuration parameters for each direction of a TCP connection.
> + */
> +struct rte_flow_tcp_dir_param {
> +       /** TCP window scaling factor, 0xF to disable. */
> +       uint32_t scale:4;
> +       /** The FIN was sent by this direction. */
> +       uint32_t close_initiated:1;
> +       /** An ACK packet has been received by this side. */
> +       uint32_t last_ack_seen:1;
> +       /**
> +        * If set, it indicates that there is unacknowledged data for the
> +        * packets sent from this direction.
> +        */
> +       uint32_t data_unacked:1;
> +       /**
> +        * Maximal value of sequence + payload length in sent
> +        * packets (next ACK from the opposite direction).
> +        */
> +       uint32_t sent_end;
> +       /**
> +        * Maximal value of (ACK + window size) in received packet + length
> +        * over sent packet (maximal sequence could be sent).
> +        */
> +       uint32_t reply_end;
> +       /** Maximal value of actual window size in sent packets. */
> +       uint32_t max_win;
> +       /** Maximal value of ACK in sent packets. */
> +       uint32_t max_ack;
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * RTE_FLOW_ACTION_TYPE_CONNTRACK
> + *
> + * Configuration and initial state for the connection tracking module.
> + * This structure could be used for both setting and query.
> + */
> +struct rte_flow_action_conntrack {
> +       /** The peer port number, can be the same port. */
> +       uint16_t peer_port;
> +       /**
> +        * Direction of this connection when creating a flow, the value
> +        * only affects the subsequent flows creation.

s/flows/flow
or
s/the subsequent flows creation/the creation of subsequent flows


> +        */
> +       uint32_t is_original_dir:1;
> +       /**
> +        * Enable / disable the conntrack HW module. When disabled, the
> +        * result will always be RTE_FLOW_CONNTRACK_FLAG_DISABLED.
> +        * In this state the HW will act as passthrough.
> +        * It only affects this conntrack object in the HW without any effect
> +        * to the other objects.
> +        */
> +       uint32_t enable:1;
> +       /** At least one ack was seen after the connection was established. */
> +       uint32_t live_connection:1;
> +       /** Enable selective ACK on this connection. */
> +       uint32_t selective_ack:1;
> +       /** A challenge ack has passed. */
> +       uint32_t challenge_ack_passed:1;
> +       /**
> +        * 1: The last packet is seen from the original direction.
> +        * 0: The last packet is seen from the reply direction.
> +        */
> +       uint32_t last_direction:1;
> +       /** No TCP check will be done except the state change. */
> +       uint32_t liberal_mode:1;
> +       /**<The current state of this connection. */
> +       enum rte_flow_conntrack_state state;
> +       /** Scaling factor for maximal allowed ACK window. */
> +       uint8_t max_ack_window;
> +       /** Maximal allowed number of retransmission times. */
s/times/limit

> +       uint8_t retransmission_limit;
> +       /** TCP parameters of the original direction. */
> +       struct rte_flow_tcp_dir_param original_dir;
> +       /** TCP parameters of the reply direction. */
> +       struct rte_flow_tcp_dir_param reply_dir;
> +       /** The window value of the last packet passed this conntrack. */
s/value/size

> +       uint16_t last_window;
> +       enum rte_flow_conntrack_tcp_last_index last_index;
> +       /** The sequence of the last packet passed this conntrack. */
sequence number of the ...

> +       uint32_t last_seq;
> +       /** The acknowledgement of the last packet passed this conntrack. */
ACK number of the..
s/passed this/passed by this
or
passing this

> +       uint32_t last_ack;
> +       /**
> +        * The total value ACK + payload length of the last packet
> +        * passed this conntrack.
s/passed this/passed by this
or passing this

> +        */
> +       uint32_t last_end;
> +};
> +
> +/**
> + * RTE_FLOW_ACTION_TYPE_CONNTRACK
> + *
> + * Wrapper structure for the context update interface.
> + * Ports cannot support updating, and the only valid solution is to
> + * destroy the old context and create a new one instead.
> + */
> +struct rte_flow_modify_conntrack {
> +       /** New connection tracking parameters to be updated. */
> +       struct rte_flow_action_conntrack new_ct;
> +       /** The direction field will be updated. */
> +       uint32_t direction:1;
> +       /** All the other fields except direction will be updated. */
> +       uint32_t state:1;
> +       /** Reserved bits for the future usage. */
> +       uint32_t reserved:30;
> +};
> +
>  /**
>   * Field IDs for MODIFY_FIELD action.
>   */
> --
> 2.19.0.windows.1
>
Thomas Monjalon April 19, 2021, 2:06 p.m. UTC | #2
16/04/2021 19:54, Bing Zhao:
> +/**
> + * The packet is valid after conntrack checking.
> + */
> +#define RTE_FLOW_CONNTRACK_PKT_STATE_VALID RTE_BIT32(0)
> +/**
> + * The state of the connection is changed.
> + */
> +#define RTE_FLOW_CONNTRACK_PKT_STATE_CHANGED RTE_BIT32(1)
> +/**
> + * Error is detected on this packet for this connection and
> + * an invalid state is set.
> + */
> +#define RTE_FLOW_CONNTRACK_PKT_STATE_INVALID RTE_BIT32(2)
> +/**
> + * The HW connection tracking module is disabled.
> + * It can be due to application command or an invalid state.
> + */
> +#define RTE_FLOW_CONNTRACK_PKT_STATE_DISABLED RTE_BIT32(3)
> +/**
> + * The packet contains some bad field(s) and cannot continue
> + * with the conntrack module checking.
> + */
> +#define RTE_FLOW_CONNTRACK_PKT_STATE_BAD RTE_BIT32(4)

I like it better now that all bits have the same prefix, thanks.

> +enum rte_flow_conntrack_state {
> +	/**< SYN-ACK packet was seen. */
> +	RTE_FLOW_CONNTRACK_STATE_SYN_RECV,
> +	/**< 3-way handshake was done. */
> +	RTE_FLOW_CONNTRACK_STATE_ESTABLISHED,
> +	/**< First FIN packet was received to close the connection. */
> +	RTE_FLOW_CONNTRACK_STATE_FIN_WAIT,
> +	/**< First FIN was ACKed. */
> +	RTE_FLOW_CONNTRACK_STATE_CLOSE_WAIT,
> +	/**< Second FIN was received, waiting for the last ACK. */
> +	RTE_FLOW_CONNTRACK_STATE_LAST_ACK,
> +	/**< Second FIN was ACKed, connection was closed. */
> +	RTE_FLOW_CONNTRACK_STATE_TIME_WAIT,
> +};

These doxygen comments should not have "<" in them,
because they are "before".

[...]
> +	/** No TCP check will be done except the state change. */
> +	uint32_t liberal_mode:1;
> +	/**<The current state of this connection. */

s,/**<,/** ,

> +	enum rte_flow_conntrack_state state;

Looks good overrall, thanks.
Thomas Monjalon April 19, 2021, 2:08 p.m. UTC | #3
16/04/2021 20:30, Ajit Khaparde:
> On Fri, Apr 16, 2021 at 10:54 AM Bing Zhao <bingz@nvidia.com> wrote:
> > +struct rte_flow_action_conntrack {
> > +       /** The peer port number, can be the same port. */
> > +       uint16_t peer_port;
> > +       /**
> > +        * Direction of this connection when creating a flow, the value
> > +        * only affects the subsequent flows creation.
> 
> s/flows/flow
> or
> s/the subsequent flows creation/the creation of subsequent flows

s/flows/flow rules/
Bing Zhao April 19, 2021, 4:13 p.m. UTC | #4
Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, April 19, 2021 10:06 PM
> To: Bing Zhao <bingz@nvidia.com>
> Cc: Ori Kam <orika@nvidia.com>; ferruh.yigit@intel.com;
> andrew.rybchenko@oktetlabs.ru; dev@dpdk.org;
> ajit.khaparde@broadcom.com; xiaoyun.li@intel.com
> Subject: Re: [dpdk-dev] [PATCH v3 1/3] ethdev: introduce conntrack
> flow action and item
> 
> External email: Use caution opening links or attachments
> 
> 
> 16/04/2021 19:54, Bing Zhao:
> > +/**
> > + * The packet is valid after conntrack checking.
> > + */
> > +#define RTE_FLOW_CONNTRACK_PKT_STATE_VALID RTE_BIT32(0)
> > +/**
> > + * The state of the connection is changed.
> > + */
> > +#define RTE_FLOW_CONNTRACK_PKT_STATE_CHANGED RTE_BIT32(1)
> > +/**
> > + * Error is detected on this packet for this connection and
> > + * an invalid state is set.
> > + */
> > +#define RTE_FLOW_CONNTRACK_PKT_STATE_INVALID RTE_BIT32(2)
> > +/**
> > + * The HW connection tracking module is disabled.
> > + * It can be due to application command or an invalid state.
> > + */
> > +#define RTE_FLOW_CONNTRACK_PKT_STATE_DISABLED RTE_BIT32(3)
> > +/**
> > + * The packet contains some bad field(s) and cannot continue
> > + * with the conntrack module checking.
> > + */
> > +#define RTE_FLOW_CONNTRACK_PKT_STATE_BAD RTE_BIT32(4)
> 
> I like it better now that all bits have the same prefix, thanks.
> 
> > +enum rte_flow_conntrack_state {
> > +     /**< SYN-ACK packet was seen. */
> > +     RTE_FLOW_CONNTRACK_STATE_SYN_RECV,
> > +     /**< 3-way handshake was done. */
> > +     RTE_FLOW_CONNTRACK_STATE_ESTABLISHED,
> > +     /**< First FIN packet was received to close the connection.
> */
> > +     RTE_FLOW_CONNTRACK_STATE_FIN_WAIT,
> > +     /**< First FIN was ACKed. */
> > +     RTE_FLOW_CONNTRACK_STATE_CLOSE_WAIT,
> > +     /**< Second FIN was received, waiting for the last ACK. */
> > +     RTE_FLOW_CONNTRACK_STATE_LAST_ACK,
> > +     /**< Second FIN was ACKed, connection was closed. */
> > +     RTE_FLOW_CONNTRACK_STATE_TIME_WAIT,
> > +};
> 
> These doxygen comments should not have "<" in them, because they are
> "before".

All "<" are removed, thanks.

> 
> [...]
> > +     /** No TCP check will be done except the state change. */
> > +     uint32_t liberal_mode:1;
> > +     /**<The current state of this connection. */
> 
> s,/**<,/** ,
> 
> > +     enum rte_flow_conntrack_state state;
> 
> Looks good overrall, thanks.
> 

BR. Bing
Bing Zhao April 19, 2021, 4:21 p.m. UTC | #5
Hi,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, April 19, 2021 10:08 PM
> To: Bing Zhao <bingz@nvidia.com>; Ajit Khaparde
> <ajit.khaparde@broadcom.com>
> Cc: dev@dpdk.org; Ori Kam <orika@nvidia.com>; Ferruh Yigit
> <ferruh.yigit@intel.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; dpdk-dev <dev@dpdk.org>; Xiaoyun Li
> <xiaoyun.li@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v3 1/3] ethdev: introduce conntrack
> flow action and item
> 
> External email: Use caution opening links or attachments
> 
> 
> 16/04/2021 20:30, Ajit Khaparde:
> > On Fri, Apr 16, 2021 at 10:54 AM Bing Zhao <bingz@nvidia.com>
> wrote:
> > > +struct rte_flow_action_conntrack {
> > > +       /** The peer port number, can be the same port. */
> > > +       uint16_t peer_port;
> > > +       /**
> > > +        * Direction of this connection when creating a flow,
> the value
> > > +        * only affects the subsequent flows creation.
> >
> > s/flows/flow
> > or
> > s/the subsequent flows creation/the creation of subsequent flows
> 
> s/flows/flow rules/

Done

> 
> 
> 

BR. Bing
diff mbox series

Patch

diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index 0d2610b7c4..c7c7108933 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -98,6 +98,7 @@  static const struct rte_flow_desc_data rte_flow_desc_item[] = {
 	MK_FLOW_ITEM(PFCP, sizeof(struct rte_flow_item_pfcp)),
 	MK_FLOW_ITEM(ECPRI, sizeof(struct rte_flow_item_ecpri)),
 	MK_FLOW_ITEM(GENEVE_OPT, sizeof(struct rte_flow_item_geneve_opt)),
+	MK_FLOW_ITEM(CONNTRACK, sizeof(uint32_t)),
 };
 
 /** Generate flow_action[] entry. */
@@ -186,6 +187,7 @@  static const struct rte_flow_desc_data rte_flow_desc_action[] = {
 	 * indirect action handle.
 	 */
 	MK_FLOW_ACTION(INDIRECT, 0),
+	MK_FLOW_ACTION(CONNTRACK, sizeof(struct rte_flow_action_conntrack)),
 };
 
 int
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index 324d00abdc..c9d7bdfa57 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -551,6 +551,15 @@  enum rte_flow_item_type {
 	 * See struct rte_flow_item_geneve_opt
 	 */
 	RTE_FLOW_ITEM_TYPE_GENEVE_OPT,
+
+	/**
+	 * [META]
+	 *
+	 * Matches conntrack state.
+	 *
+	 * @see struct rte_flow_item_conntrack.
+	 */
+	RTE_FLOW_ITEM_TYPE_CONNTRACK,
 };
 
 /**
@@ -1685,6 +1694,51 @@  rte_flow_item_geneve_opt_mask = {
 };
 #endif
 
+/**
+ * The packet is valid after conntrack checking.
+ */
+#define RTE_FLOW_CONNTRACK_PKT_STATE_VALID RTE_BIT32(0)
+/**
+ * The state of the connection is changed.
+ */
+#define RTE_FLOW_CONNTRACK_PKT_STATE_CHANGED RTE_BIT32(1)
+/**
+ * Error is detected on this packet for this connection and
+ * an invalid state is set.
+ */
+#define RTE_FLOW_CONNTRACK_PKT_STATE_INVALID RTE_BIT32(2)
+/**
+ * The HW connection tracking module is disabled.
+ * It can be due to application command or an invalid state.
+ */
+#define RTE_FLOW_CONNTRACK_PKT_STATE_DISABLED RTE_BIT32(3)
+/**
+ * The packet contains some bad field(s) and cannot continue
+ * with the conntrack module checking.
+ */
+#define RTE_FLOW_CONNTRACK_PKT_STATE_BAD RTE_BIT32(4)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ *
+ * RTE_FLOW_ITEM_TYPE_CONNTRACK
+ *
+ * Matches the state of a packet after it passed the connection tracking
+ * examination. The state is a bitmap of one RTE_FLOW_CONNTRACK_PKT_STATE*
+ * or a reasonable combination of these bits.
+ */
+struct rte_flow_item_conntrack {
+	uint32_t flags;
+};
+
+/** Default mask for RTE_FLOW_ITEM_TYPE_CONNTRACK. */
+#ifndef __cplusplus
+static const struct rte_flow_item_conntrack rte_flow_item_conntrack_mask = {
+	.flags = 0xffffffff,
+};
+#endif
+
 /**
  * Matching pattern item definition.
  *
@@ -2277,6 +2331,15 @@  enum rte_flow_action_type {
 	 * same port or across different ports.
 	 */
 	RTE_FLOW_ACTION_TYPE_INDIRECT,
+
+	/**
+	 * [META]
+	 *
+	 * Enable tracking a TCP connection state.
+	 *
+	 * @see struct rte_flow_action_conntrack.
+	 */
+	RTE_FLOW_ACTION_TYPE_CONNTRACK,
 };
 
 /**
@@ -2875,6 +2938,150 @@  struct rte_flow_action_set_dscp {
  */
 struct rte_flow_action_handle;
 
+/**
+ * The state of a TCP connection.
+ */
+enum rte_flow_conntrack_state {
+	/**< SYN-ACK packet was seen. */
+	RTE_FLOW_CONNTRACK_STATE_SYN_RECV,
+	/**< 3-way handshake was done. */
+	RTE_FLOW_CONNTRACK_STATE_ESTABLISHED,
+	/**< First FIN packet was received to close the connection. */
+	RTE_FLOW_CONNTRACK_STATE_FIN_WAIT,
+	/**< First FIN was ACKed. */
+	RTE_FLOW_CONNTRACK_STATE_CLOSE_WAIT,
+	/**< Second FIN was received, waiting for the last ACK. */
+	RTE_FLOW_CONNTRACK_STATE_LAST_ACK,
+	/**< Second FIN was ACKed, connection was closed. */
+	RTE_FLOW_CONNTRACK_STATE_TIME_WAIT,
+};
+
+/**
+ * The last passed TCP packet flags of a connection.
+ */
+enum rte_flow_conntrack_tcp_last_index {
+	RTE_FLOW_CONNTRACK_FLAG_NONE = 0, /**< No Flag. */
+	RTE_FLOW_CONNTRACK_FLAG_SYN = RTE_BIT32(0), /**< With SYN flag. */
+	RTE_FLOW_CONNTRACK_FLAG_SYNACK = RTE_BIT32(1), /**< With SYNACK flag. */
+	RTE_FLOW_CONNTRACK_FLAG_FIN = RTE_BIT32(2), /**< With FIN flag. */
+	RTE_FLOW_CONNTRACK_FLAG_ACK = RTE_BIT32(3), /**< With ACK flag. */
+	RTE_FLOW_CONNTRACK_FLAG_RST = RTE_BIT32(4), /**< With RST flag. */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ *
+ * Configuration parameters for each direction of a TCP connection.
+ */
+struct rte_flow_tcp_dir_param {
+	/** TCP window scaling factor, 0xF to disable. */
+	uint32_t scale:4;
+	/** The FIN was sent by this direction. */
+	uint32_t close_initiated:1;
+	/** An ACK packet has been received by this side. */
+	uint32_t last_ack_seen:1;
+	/**
+	 * If set, it indicates that there is unacknowledged data for the
+	 * packets sent from this direction.
+	 */
+	uint32_t data_unacked:1;
+	/**
+	 * Maximal value of sequence + payload length in sent
+	 * packets (next ACK from the opposite direction).
+	 */
+	uint32_t sent_end;
+	/**
+	 * Maximal value of (ACK + window size) in received packet + length
+	 * over sent packet (maximal sequence could be sent).
+	 */
+	uint32_t reply_end;
+	/** Maximal value of actual window size in sent packets. */
+	uint32_t max_win;
+	/** Maximal value of ACK in sent packets. */
+	uint32_t max_ack;
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ *
+ * RTE_FLOW_ACTION_TYPE_CONNTRACK
+ *
+ * Configuration and initial state for the connection tracking module.
+ * This structure could be used for both setting and query.
+ */
+struct rte_flow_action_conntrack {
+	/** The peer port number, can be the same port. */
+	uint16_t peer_port;
+	/**
+	 * Direction of this connection when creating a flow, the value
+	 * only affects the subsequent flows creation.
+	 */
+	uint32_t is_original_dir:1;
+	/**
+	 * Enable / disable the conntrack HW module. When disabled, the
+	 * result will always be RTE_FLOW_CONNTRACK_FLAG_DISABLED.
+	 * In this state the HW will act as passthrough.
+	 * It only affects this conntrack object in the HW without any effect
+	 * to the other objects.
+	 */
+	uint32_t enable:1;
+	/** At least one ack was seen after the connection was established. */
+	uint32_t live_connection:1;
+	/** Enable selective ACK on this connection. */
+	uint32_t selective_ack:1;
+	/** A challenge ack has passed. */
+	uint32_t challenge_ack_passed:1;
+	/**
+	 * 1: The last packet is seen from the original direction.
+	 * 0: The last packet is seen from the reply direction.
+	 */
+	uint32_t last_direction:1;
+	/** No TCP check will be done except the state change. */
+	uint32_t liberal_mode:1;
+	/**<The current state of this connection. */
+	enum rte_flow_conntrack_state state;
+	/** Scaling factor for maximal allowed ACK window. */
+	uint8_t max_ack_window;
+	/** Maximal allowed number of retransmission times. */
+	uint8_t retransmission_limit;
+	/** TCP parameters of the original direction. */
+	struct rte_flow_tcp_dir_param original_dir;
+	/** TCP parameters of the reply direction. */
+	struct rte_flow_tcp_dir_param reply_dir;
+	/** The window value of the last packet passed this conntrack. */
+	uint16_t last_window;
+	enum rte_flow_conntrack_tcp_last_index last_index;
+	/** The sequence of the last packet passed this conntrack. */
+	uint32_t last_seq;
+	/** The acknowledgement of the last packet passed this conntrack. */
+	uint32_t last_ack;
+	/**
+	 * The total value ACK + payload length of the last packet
+	 * passed this conntrack.
+	 */
+	uint32_t last_end;
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_CONNTRACK
+ *
+ * Wrapper structure for the context update interface.
+ * Ports cannot support updating, and the only valid solution is to
+ * destroy the old context and create a new one instead.
+ */
+struct rte_flow_modify_conntrack {
+	/** New connection tracking parameters to be updated. */
+	struct rte_flow_action_conntrack new_ct;
+	/** The direction field will be updated. */
+	uint32_t direction:1;
+	/** All the other fields except direction will be updated. */
+	uint32_t state:1;
+	/** Reserved bits for the future usage. */
+	uint32_t reserved:30;
+};
+
 /**
  * Field IDs for MODIFY_FIELD action.
  */