From patchwork Wed Jan 10 14:03:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Hu, Jiayu" X-Patchwork-Id: 33474 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 12D4A1B24D; Wed, 10 Jan 2018 14:59:28 +0100 (CET) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 61C3A1B1CF for ; Wed, 10 Jan 2018 14:59:23 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Jan 2018 05:59:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,340,1511856000"; d="scan'208";a="26046660" Received: from dpdk15.sh.intel.com ([10.67.111.77]) by orsmga002.jf.intel.com with ESMTP; 10 Jan 2018 05:59:20 -0800 From: Jiayu Hu To: dev@dpdk.org Cc: thomas@monjalon.net, junjie.j.chen@intel.com, jianfeng.tan@intel.com, lei.a.yao@intel.com, Jiayu Hu Date: Wed, 10 Jan 2018 22:03:10 +0800 Message-Id: <1515592992-70278-2-git-send-email-jiayu.hu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1515592992-70278-1-git-send-email-jiayu.hu@intel.com> References: <1515132769-52572-1-git-send-email-jiayu.hu@intel.com> <1515592992-70278-1-git-send-email-jiayu.hu@intel.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v5 1/3] gro: codes cleanup X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patch updates codes as follows: - change appropriate names for internal structures, variants and functions - update comments and the content of the gro programmer guide for better understanding - remove needless check and redundant comments Signed-off-by: Jiayu Hu Reviewed-by: Junjie Chen --- .../prog_guide/generic_receive_offload_lib.rst | 238 +++++++++------- doc/guides/prog_guide/img/gro-key-algorithm.svg | 223 +++++++++++++++ lib/librte_gro/gro_tcp4.c | 306 ++++++++++----------- lib/librte_gro/gro_tcp4.h | 123 ++++----- lib/librte_gro/rte_gro.c | 96 +++---- lib/librte_gro/rte_gro.h | 92 +++---- 6 files changed, 649 insertions(+), 429 deletions(-) create mode 100644 doc/guides/prog_guide/img/gro-key-algorithm.svg diff --git a/doc/guides/prog_guide/generic_receive_offload_lib.rst b/doc/guides/prog_guide/generic_receive_offload_lib.rst index 22e50ec..1652e64 100644 --- a/doc/guides/prog_guide/generic_receive_offload_lib.rst +++ b/doc/guides/prog_guide/generic_receive_offload_lib.rst @@ -32,128 +32,154 @@ Generic Receive Offload Library =============================== Generic Receive Offload (GRO) is a widely used SW-based offloading -technique to reduce per-packet processing overhead. It gains performance -by reassembling small packets into large ones. To enable more flexibility -to applications, DPDK implements GRO as a standalone library. Applications -explicitly use the GRO library to merge small packets into large ones. - -The GRO library assumes all input packets have correct checksums. In -addition, the GRO library doesn't re-calculate checksums for merged -packets. If input packets are IP fragmented, the GRO library assumes -they are complete packets (i.e. with L4 headers). - -Currently, the GRO library implements TCP/IPv4 packet reassembly. - -Reassembly Modes ----------------- - -The GRO library provides two reassembly modes: lightweight and -heavyweight mode. If applications want to merge packets in a simple way, -they can use the lightweight mode API. If applications want more -fine-grained controls, they can choose the heavyweight mode API. - -Lightweight Mode -~~~~~~~~~~~~~~~~ - -The ``rte_gro_reassemble_burst()`` function is used for reassembly in -lightweight mode. It tries to merge N input packets at a time, where -N should be less than or equal to ``RTE_GRO_MAX_BURST_ITEM_NUM``. - -In each invocation, ``rte_gro_reassemble_burst()`` allocates temporary -reassembly tables for the desired GRO types. Note that the reassembly -table is a table structure used to reassemble packets and different GRO -types (e.g. TCP/IPv4 GRO and TCP/IPv6 GRO) have different reassembly table -structures. The ``rte_gro_reassemble_burst()`` function uses the reassembly -tables to merge the N input packets. - -For applications, performing GRO in lightweight mode is simple. They -just need to invoke ``rte_gro_reassemble_burst()``. Applications can get -GROed packets as soon as ``rte_gro_reassemble_burst()`` returns. - -Heavyweight Mode -~~~~~~~~~~~~~~~~ - -The ``rte_gro_reassemble()`` function is used for reassembly in heavyweight -mode. Compared with the lightweight mode, performing GRO in heavyweight mode -is relatively complicated. - -Before performing GRO, applications need to create a GRO context object -by calling ``rte_gro_ctx_create()``. A GRO context object holds the -reassembly tables of desired GRO types. Note that all update/lookup -operations on the context object are not thread safe. So if different -processes or threads want to access the same context object simultaneously, -some external syncing mechanisms must be used. - -Once the GRO context is created, applications can then use the -``rte_gro_reassemble()`` function to merge packets. In each invocation, -``rte_gro_reassemble()`` tries to merge input packets with the packets -in the reassembly tables. If an input packet is an unsupported GRO type, -or other errors happen (e.g. SYN bit is set), ``rte_gro_reassemble()`` -returns the packet to applications. Otherwise, the input packet is either -merged or inserted into a reassembly table. - -When applications want to get GRO processed packets, they need to use -``rte_gro_timeout_flush()`` to flush them from the tables manually. +technique to reduce per-packet processing overheads. By reassembling +small packets into larger ones, GRO enables applications to process +fewer large packets directly, thus reducing the number of packets to +be processed. To benefit DPDK-based applications, like Open vSwitch, +DPDK also provides own GRO implementation. In DPDK, GRO is implemented +as a standalone library. Applications explicitly use the GRO library to +reassemble packets. + +Overview +-------- + +In the GRO library, there are many GRO types which are defined by packet +types. One GRO type is in charge of process one kind of packets. For +example, TCP/IPv4 GRO processes TCP/IPv4 packets. + +Each GRO type has a reassembly function, which defines own algorithm and +table structure to reassemble packets. We assign input packets to the +corresponding GRO functions by MBUF->packet_type. + +The GRO library doesn't check if input packets have correct checksums and +doesn't re-calculate checksums for merged packets. The GRO library +assumes the packets are complete (i.e., MF==0 && frag_off==0), when IP +fragmentation is possible (i.e., DF==0). Additionally, it requires IPv4 +ID to be increased by one. -TCP/IPv4 GRO ------------- +Currently, the GRO library provides GRO supports for TCP/IPv4 packets. + +Two Sets of API +--------------- + +For different usage scenarios, the GRO library provides two sets of API. +The one is called the lightweight mode API, which enables applications to +merge a small number of packets rapidly; the other is called the +heavyweight mode API, which provides fine-grained controls to +applications and supports to merge a large number of packets. + +Lightweight Mode API +~~~~~~~~~~~~~~~~~~~~ + +The lightweight mode only has one function ``rte_gro_reassemble_burst()``, +which process N packets at a time. Using the lightweight mode API to +merge packets is very simple. Calling ``rte_gro_reassemble_burst()`` is +enough. The GROed packets are returned to applications as soon as it +finishes. + +In ``rte_gro_reassemble_burst()``, table structures of different GRO +types are allocated in the stack. This design simplifies applications' +operations. However, limited by the stack size, the maximum number of +packets that ``rte_gro_reassemble_burst()`` can process in an invocation +should be less than or equal to ``RTE_GRO_MAX_BURST_ITEM_NUM``. + +Heavyweight Mode API +~~~~~~~~~~~~~~~~~~~~ + +Compared with the lightweight mode, using the heavyweight mode API is +relatively complex. Firstly, applications need to create a GRO context +by ``rte_gro_ctx_create()``. ``rte_gro_ctx_create()`` allocates tables +structures in the heap and stores their pointers in the GRO context. +Secondly, applications use ``rte_gro_reassemble()`` to merge packets. +If input packets have invalid parameters, ``rte_gro_reassemble()`` +returns them to applications. For example, packets of unsupported GRO +types or TCP SYN packets are returned. Otherwise, the input packets are +either merged with the existed packets in the tables or inserted into the +tables. Finally, applications use ``rte_gro_timeout_flush()`` to flush +packets from the tables, when they want to get the GROed packets. + +Note that all update/lookup operations on the GRO context are not thread +safe. So if different processes or threads want to access the same +context object simultaneously, some external syncing mechanisms must be +used. + +Reassembly Algorithm +-------------------- -TCP/IPv4 GRO supports merging small TCP/IPv4 packets into large ones, -using a table structure called the TCP/IPv4 reassembly table. +The reassembly algorithm is used for reassembling packets. In the GRO +library, different GRO types can use different algorithms. In this +section, we will introduce an algorithm, which is used by TCP/IPv4 GRO. -TCP/IPv4 Reassembly Table -~~~~~~~~~~~~~~~~~~~~~~~~~ +Challenges +~~~~~~~~~~ -A TCP/IPv4 reassembly table includes a "key" array and an "item" array. -The key array keeps the criteria to merge packets and the item array -keeps the packet information. +The reassembly algorithm determines the efficiency of GRO. There are two +challenges in the algorithm design: -Each key in the key array points to an item group, which consists of -packets which have the same criteria values but can't be merged. A key -in the key array includes two parts: +- a high cost algorithm/implementation would cause packet dropping in a + high speed network. -* ``criteria``: the criteria to merge packets. If two packets can be - merged, they must have the same criteria values. +- packet reordering makes it hard to merge packets. For example, Linux + GRO fails to merge packets when encounters packet reordering. -* ``start_index``: the item array index of the first packet in the item - group. +The above two challenges require our algorithm is: -Each element in the item array keeps the information of a packet. An item -in the item array mainly includes three parts: +- lightweight enough to scale fast networking speed -* ``firstseg``: the mbuf address of the first segment of the packet. +- capable of handling packet reordering -* ``lastseg``: the mbuf address of the last segment of the packet. +In DPDK GRO, we use a key-based algorithm to address the two challenges. -* ``next_pkt_index``: the item array index of the next packet in the same - item group. TCP/IPv4 GRO uses ``next_pkt_index`` to chain the packets - that have the same criteria value but can't be merged together. +Key-based Reassembly Algorithm +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:numref:`figure_gro-key-algorithm` illustrates the procedure of the +key-based algorithm. Packets are classified into "flows" by some header +fields (we call them as "key"). To process an input packet, the algorithm +searches for a matched "flow" (i.e., the same value of key) for the +packet first, then checks all packets in the "flow" and tries to find a +"neighbor" for it. If find a "neighbor", merge the two packets together. +If can't find a "neighbor", store the packet into its "flow". If can't +find a matched "flow", insert a new "flow" and store the packet into the +"flow". + +.. note:: + Packets in the same "flow" that can't merge are always caused + by packet reordering. + +The key-based algorithm has two characters: + +- classifying packets into "flows" to accelerate packet aggregation is + simple (address challenge 1). + +- storing out-of-order packets makes it possible to merge later (address + challenge 2). + +.. _figure_gro-key-algorithm: + +.. figure:: img/gro-key-algorithm.* + :align: center + + Key-based Reassembly Algorithm + +TCP/IPv4 GRO +------------ -Procedure to Reassemble a Packet -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The table structure used by TCP/IPv4 GRO contains two arrays: flow array +and item array. The flow array keeps flow information, and the item array +keeps packet information. -To reassemble an incoming packet needs three steps: +Header fields used to define a TCP/IPv4 flow include: -#. Check if the packet should be processed. Packets with one of the - following properties aren't processed and are returned immediately: +- source and destination: Ethernet and IP address, TCP port - * FIN, SYN, RST, URG, PSH, ECE or CWR bit is set. +- TCP acknowledge number - * L4 payload length is 0. +TCP/IPv4 packets whose FIN, SYN, RST, URG, PSH, ECE or CWR bit is set +won't be processed. -#. Traverse the key array to find a key which has the same criteria - value with the incoming packet. If found, go to the next step. - Otherwise, insert a new key and a new item for the packet. +Header fields deciding if two packets are neighbors include: -#. Locate the first packet in the item group via ``start_index``. Then - traverse all packets in the item group via ``next_pkt_index``. If a - packet is found which can be merged with the incoming one, merge them - together. If one isn't found, insert the packet into this item group. - Note that to merge two packets is to link them together via mbuf's - ``next`` field. +- TCP sequence number -When packets are flushed from the reassembly table, TCP/IPv4 GRO updates -packet header fields for the merged packets. Note that before reassembling -the packet, TCP/IPv4 GRO doesn't check if the checksums of packets are -correct. Also, TCP/IPv4 GRO doesn't re-calculate checksums for merged -packets. +- IPv4 ID. The IPv4 ID fields of the packets should be increased by 1. diff --git a/doc/guides/prog_guide/img/gro-key-algorithm.svg b/doc/guides/prog_guide/img/gro-key-algorithm.svg new file mode 100644 index 0000000..94e42f5 --- /dev/null +++ b/doc/guides/prog_guide/img/gro-key-algorithm.svg @@ -0,0 +1,223 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Page-1 + + + + Rounded rectangle + Categorize into an existed “flow” + + + + + + + + + Categorize into an existed flow + + Rounded rectangle.2 + Search for a “neighbor” + + + + + + + + + Search for a “neighbor + + Rounded rectangle.3 + Insert a new “flow” and store the packet + + + + + + + + + Insert a new flow” and store the packet + + Rounded rectangle.4 + Store the packet + + + + + + + + + Store the packet + + Rounded rectangle.5 + Merge the packet + + + + + + + + + Merge the packet + + Dynamic connector + + + + + + + + + Dynamic connector.7 + + + + + + + + + Dynamic connector.8 + + + + + + + + + Dynamic connector.9 + + + + + + + + + Dynamic connector.10 + + + + + + + + + Sheet.11 + packet + + + + + + + + packet + + Sheet.14 + find a “flow” + + + + + + + + find a “flow” + + Sheet.15 + find a “neighbor” + + + + + + + + find a “neighbor” + + Sheet.13 + not find + + + + + + + + not find + + Sheet.12 + not find + + + + + + + + not find + + diff --git a/lib/librte_gro/gro_tcp4.c b/lib/librte_gro/gro_tcp4.c index 03e5ccf..a38a06e 100644 --- a/lib/librte_gro/gro_tcp4.c +++ b/lib/librte_gro/gro_tcp4.c @@ -44,20 +44,20 @@ gro_tcp4_tbl_create(uint16_t socket_id, } tbl->max_item_num = entries_num; - size = sizeof(struct gro_tcp4_key) * entries_num; - tbl->keys = rte_zmalloc_socket(__func__, + size = sizeof(struct gro_tcp4_flow) * entries_num; + tbl->flows = rte_zmalloc_socket(__func__, size, RTE_CACHE_LINE_SIZE, socket_id); - if (tbl->keys == NULL) { + if (tbl->flows == NULL) { rte_free(tbl->items); rte_free(tbl); return NULL; } - /* INVALID_ARRAY_INDEX indicates empty key */ + /* INVALID_ARRAY_INDEX indicates an empty flow */ for (i = 0; i < entries_num; i++) - tbl->keys[i].start_index = INVALID_ARRAY_INDEX; - tbl->max_key_num = entries_num; + tbl->flows[i].start_index = INVALID_ARRAY_INDEX; + tbl->max_flow_num = entries_num; return tbl; } @@ -69,7 +69,7 @@ gro_tcp4_tbl_destroy(void *tbl) if (tcp_tbl) { rte_free(tcp_tbl->items); - rte_free(tcp_tbl->keys); + rte_free(tcp_tbl->flows); } rte_free(tcp_tbl); } @@ -81,50 +81,46 @@ gro_tcp4_tbl_destroy(void *tbl) * the original packet. */ static inline int -merge_two_tcp4_packets(struct gro_tcp4_item *item_src, +merge_two_tcp4_packets(struct gro_tcp4_item *item, struct rte_mbuf *pkt, - uint16_t ip_id, + int cmp, uint32_t sent_seq, - int cmp) + uint16_t ip_id) { struct rte_mbuf *pkt_head, *pkt_tail, *lastseg; - uint16_t tcp_datalen; + uint16_t hdr_len; if (cmp > 0) { - pkt_head = item_src->firstseg; + pkt_head = item->firstseg; pkt_tail = pkt; } else { pkt_head = pkt; - pkt_tail = item_src->firstseg; + pkt_tail = item->firstseg; } - /* check if the packet length will be beyond the max value */ - tcp_datalen = pkt_tail->pkt_len - pkt_tail->l2_len - - pkt_tail->l3_len - pkt_tail->l4_len; - if (pkt_head->pkt_len - pkt_head->l2_len + tcp_datalen > - TCP4_MAX_L3_LENGTH) + /* check if the IPv4 packet length is greater than the max value */ + hdr_len = pkt_head->l2_len + pkt_head->l3_len + pkt_head->l4_len; + if (unlikely(pkt_head->pkt_len - pkt_head->l2_len + pkt_tail->pkt_len - + hdr_len > MAX_IPV4_PKT_LENGTH)) return 0; - /* remove packet header for the tail packet */ - rte_pktmbuf_adj(pkt_tail, - pkt_tail->l2_len + - pkt_tail->l3_len + - pkt_tail->l4_len); + /* remove the packet header for the tail packet */ + rte_pktmbuf_adj(pkt_tail, hdr_len); /* chain two packets together */ if (cmp > 0) { - item_src->lastseg->next = pkt; - item_src->lastseg = rte_pktmbuf_lastseg(pkt); + item->lastseg->next = pkt; + item->lastseg = rte_pktmbuf_lastseg(pkt); /* update IP ID to the larger value */ - item_src->ip_id = ip_id; + item->ip_id = ip_id; } else { lastseg = rte_pktmbuf_lastseg(pkt); - lastseg->next = item_src->firstseg; - item_src->firstseg = pkt; + lastseg->next = item->firstseg; + item->firstseg = pkt; /* update sent_seq to the smaller value */ - item_src->sent_seq = sent_seq; + item->sent_seq = sent_seq; } - item_src->nb_merged++; + item->nb_merged++; /* update mbuf metadata for the merged packet */ pkt_head->nb_segs += pkt_tail->nb_segs; @@ -133,45 +129,46 @@ merge_two_tcp4_packets(struct gro_tcp4_item *item_src, return 1; } +/* + * Check if two TCP/IPv4 packets are neighbors. + */ static inline int check_seq_option(struct gro_tcp4_item *item, - struct tcp_hdr *tcp_hdr, - uint16_t tcp_hl, - uint16_t tcp_dl, + struct tcp_hdr *tcph, + uint32_t sent_seq, uint16_t ip_id, - uint32_t sent_seq) + uint16_t tcp_hl, + uint16_t tcp_dl) { - struct rte_mbuf *pkt0 = item->firstseg; - struct ipv4_hdr *ipv4_hdr0; - struct tcp_hdr *tcp_hdr0; - uint16_t tcp_hl0, tcp_dl0; - uint16_t len; - - ipv4_hdr0 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt0, char *) + - pkt0->l2_len); - tcp_hdr0 = (struct tcp_hdr *)((char *)ipv4_hdr0 + pkt0->l3_len); - tcp_hl0 = pkt0->l4_len; - - /* check if TCP option fields equal. If not, return 0. */ - len = RTE_MAX(tcp_hl, tcp_hl0) - sizeof(struct tcp_hdr); - if ((tcp_hl != tcp_hl0) || - ((len > 0) && (memcmp(tcp_hdr + 1, - tcp_hdr0 + 1, + struct rte_mbuf *pkt_orig = item->firstseg; + struct ipv4_hdr *iph_orig; + struct tcp_hdr *tcph_orig; + uint16_t len, tcp_hl_orig; + + iph_orig = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_orig, char *) + + pkt_orig->l2_len); + tcph_orig = (struct tcp_hdr *)((char *)iph_orig + pkt_orig->l3_len); + tcp_hl_orig = pkt_orig->l4_len; + + /* Check if TCP option fields equal */ + len = RTE_MAX(tcp_hl, tcp_hl_orig) - sizeof(struct tcp_hdr); + if ((tcp_hl != tcp_hl_orig) || + ((len > 0) && (memcmp(tcph + 1, tcph_orig + 1, len) != 0))) return 0; /* check if the two packets are neighbors */ - tcp_dl0 = pkt0->pkt_len - pkt0->l2_len - pkt0->l3_len - tcp_hl0; - if ((sent_seq == (item->sent_seq + tcp_dl0)) && - (ip_id == (item->ip_id + 1))) + len = pkt_orig->pkt_len - pkt_orig->l2_len - pkt_orig->l3_len - + tcp_hl_orig; + if ((sent_seq == item->sent_seq + len) && (ip_id == item->ip_id + 1)) /* append the new packet */ return 1; - else if (((sent_seq + tcp_dl) == item->sent_seq) && - ((ip_id + item->nb_merged) == item->ip_id)) + else if ((sent_seq + tcp_dl == item->sent_seq) && + (ip_id + item->nb_merged == item->ip_id)) /* pre-pend the new packet */ return -1; - else - return 0; + + return 0; } static inline uint32_t @@ -187,13 +184,13 @@ find_an_empty_item(struct gro_tcp4_tbl *tbl) } static inline uint32_t -find_an_empty_key(struct gro_tcp4_tbl *tbl) +find_an_empty_flow(struct gro_tcp4_tbl *tbl) { uint32_t i; - uint32_t max_key_num = tbl->max_key_num; + uint32_t max_flow_num = tbl->max_flow_num; - for (i = 0; i < max_key_num; i++) - if (tbl->keys[i].start_index == INVALID_ARRAY_INDEX) + for (i = 0; i < max_flow_num; i++) + if (tbl->flows[i].start_index == INVALID_ARRAY_INDEX) return i; return INVALID_ARRAY_INDEX; } @@ -201,10 +198,10 @@ find_an_empty_key(struct gro_tcp4_tbl *tbl) static inline uint32_t insert_new_item(struct gro_tcp4_tbl *tbl, struct rte_mbuf *pkt, - uint16_t ip_id, - uint32_t sent_seq, + uint64_t start_time, uint32_t prev_idx, - uint64_t start_time) + uint32_t sent_seq, + uint16_t ip_id) { uint32_t item_idx; @@ -221,7 +218,7 @@ insert_new_item(struct gro_tcp4_tbl *tbl, tbl->items[item_idx].nb_merged = 1; tbl->item_num++; - /* if the previous packet exists, chain the new one with it */ + /* if the previous packet exists, chain them together. */ if (prev_idx != INVALID_ARRAY_INDEX) { tbl->items[item_idx].next_pkt_idx = tbl->items[prev_idx].next_pkt_idx; @@ -237,7 +234,7 @@ delete_item(struct gro_tcp4_tbl *tbl, uint32_t item_idx, { uint32_t next_idx = tbl->items[item_idx].next_pkt_idx; - /* set NULL to firstseg to indicate it's an empty item */ + /* NULL indicates an empty item */ tbl->items[item_idx].firstseg = NULL; tbl->item_num--; if (prev_item_idx != INVALID_ARRAY_INDEX) @@ -247,44 +244,42 @@ delete_item(struct gro_tcp4_tbl *tbl, uint32_t item_idx, } static inline uint32_t -insert_new_key(struct gro_tcp4_tbl *tbl, - struct tcp4_key *key_src, +insert_new_flow(struct gro_tcp4_tbl *tbl, + struct tcp4_flow_key *src, uint32_t item_idx) { - struct tcp4_key *key_dst; - uint32_t key_idx; + struct tcp4_flow_key *dst; + uint32_t flow_idx; - key_idx = find_an_empty_key(tbl); - if (key_idx == INVALID_ARRAY_INDEX) + flow_idx = find_an_empty_flow(tbl); + if (unlikely(flow_idx == INVALID_ARRAY_INDEX)) return INVALID_ARRAY_INDEX; - key_dst = &(tbl->keys[key_idx].key); + dst = &(tbl->flows[flow_idx].key); - ether_addr_copy(&(key_src->eth_saddr), &(key_dst->eth_saddr)); - ether_addr_copy(&(key_src->eth_daddr), &(key_dst->eth_daddr)); - key_dst->ip_src_addr = key_src->ip_src_addr; - key_dst->ip_dst_addr = key_src->ip_dst_addr; - key_dst->recv_ack = key_src->recv_ack; - key_dst->src_port = key_src->src_port; - key_dst->dst_port = key_src->dst_port; + ether_addr_copy(&(src->eth_saddr), &(dst->eth_saddr)); + ether_addr_copy(&(src->eth_daddr), &(dst->eth_daddr)); + dst->ip_src_addr = src->ip_src_addr; + dst->ip_dst_addr = src->ip_dst_addr; + dst->recv_ack = src->recv_ack; + dst->src_port = src->src_port; + dst->dst_port = src->dst_port; - /* non-INVALID_ARRAY_INDEX value indicates this key is valid */ - tbl->keys[key_idx].start_index = item_idx; - tbl->key_num++; + tbl->flows[flow_idx].start_index = item_idx; + tbl->flow_num++; - return key_idx; + return flow_idx; } +/* + * Check if two TCP/IPv4 packets belong to the same flow. + */ static inline int -is_same_key(struct tcp4_key k1, struct tcp4_key k2) +is_same_tcp4_flow(struct tcp4_flow_key k1, struct tcp4_flow_key k2) { - if (is_same_ether_addr(&k1.eth_saddr, &k2.eth_saddr) == 0) - return 0; - - if (is_same_ether_addr(&k1.eth_daddr, &k2.eth_daddr) == 0) - return 0; - - return ((k1.ip_src_addr == k2.ip_src_addr) && + return (is_same_ether_addr(&k1.eth_saddr, &k2.eth_saddr) && + is_same_ether_addr(&k1.eth_daddr, &k2.eth_daddr) && + (k1.ip_src_addr == k2.ip_src_addr) && (k1.ip_dst_addr == k2.ip_dst_addr) && (k1.recv_ack == k2.recv_ack) && (k1.src_port == k2.src_port) && @@ -292,7 +287,7 @@ is_same_key(struct tcp4_key k1, struct tcp4_key k2) } /* - * update packet length for the flushed packet. + * update the packet length for the flushed packet. */ static inline void update_header(struct gro_tcp4_item *item) @@ -315,27 +310,31 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt, struct ipv4_hdr *ipv4_hdr; struct tcp_hdr *tcp_hdr; uint32_t sent_seq; - uint16_t tcp_dl, ip_id; + uint16_t tcp_dl, ip_id, hdr_len; - struct tcp4_key key; + struct tcp4_flow_key key; uint32_t cur_idx, prev_idx, item_idx; - uint32_t i, max_key_num; + uint32_t i, max_flow_num, remaining_flow_num; int cmp; + uint8_t find; eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *); ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len); tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len); + hdr_len = pkt->l2_len + pkt->l3_len + pkt->l4_len; /* - * if FIN, SYN, RST, PSH, URG, ECE or - * CWR is set, return immediately. + * Don't process the packet which has FIN, SYN, RST, PSH, URG, ECE + * or CWR set. */ if (tcp_hdr->tcp_flags != TCP_ACK_FLAG) return -1; - /* if payload length is 0, return immediately */ - tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len - - pkt->l4_len; - if (tcp_dl == 0) + /* + * Don't process the packet whose payload length is less than or + * equal to 0. + */ + tcp_dl = pkt->pkt_len - hdr_len; + if (tcp_dl <= 0) return -1; ip_id = rte_be_to_cpu_16(ipv4_hdr->packet_id); @@ -349,25 +348,34 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt, key.dst_port = tcp_hdr->dst_port; key.recv_ack = tcp_hdr->recv_ack; - /* search for a key */ - max_key_num = tbl->max_key_num; - for (i = 0; i < max_key_num; i++) { - if ((tbl->keys[i].start_index != INVALID_ARRAY_INDEX) && - is_same_key(tbl->keys[i].key, key)) - break; + /* Search for a matched flow. */ + max_flow_num = tbl->max_flow_num; + remaining_flow_num = tbl->flow_num; + find = 0; + for (i = 0; i < max_flow_num && remaining_flow_num; i++) { + if (tbl->flows[i].start_index != INVALID_ARRAY_INDEX) { + if (is_same_tcp4_flow(tbl->flows[i].key, key)) { + find = 1; + break; + } + remaining_flow_num--; + } } - /* can't find a key, so insert a new key and a new item. */ - if (i == tbl->max_key_num) { - item_idx = insert_new_item(tbl, pkt, ip_id, sent_seq, - INVALID_ARRAY_INDEX, start_time); + /* + * Fail to find a matched flow. Insert a new flow and store the + * packet into the flow. + */ + if (find == 0) { + item_idx = insert_new_item(tbl, pkt, start_time, + INVALID_ARRAY_INDEX, sent_seq, ip_id); if (item_idx == INVALID_ARRAY_INDEX) return -1; - if (insert_new_key(tbl, &key, item_idx) == + if (insert_new_flow(tbl, &key, item_idx) == INVALID_ARRAY_INDEX) { /* - * fail to insert a new key, so - * delete the inserted item + * Fail to insert a new flow, so delete the + * stored packet. */ delete_item(tbl, item_idx, INVALID_ARRAY_INDEX); return -1; @@ -375,24 +383,26 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt, return 0; } - /* traverse all packets in the item group to find one to merge */ - cur_idx = tbl->keys[i].start_index; + /* + * Check all packets in the flow and try to find a neighbor for + * the input packet. + */ + cur_idx = tbl->flows[i].start_index; prev_idx = cur_idx; do { cmp = check_seq_option(&(tbl->items[cur_idx]), tcp_hdr, - pkt->l4_len, tcp_dl, ip_id, sent_seq); + sent_seq, ip_id, pkt->l4_len, tcp_dl); if (cmp) { if (merge_two_tcp4_packets(&(tbl->items[cur_idx]), - pkt, ip_id, - sent_seq, cmp)) + pkt, cmp, sent_seq, ip_id)) return 1; /* - * fail to merge two packets since the packet - * length will be greater than the max value. - * So insert the packet into the item group. + * Fail to merge the two packets, as the packet + * length is greater than the max value. Store + * the packet into the flow. */ - if (insert_new_item(tbl, pkt, ip_id, sent_seq, - prev_idx, start_time) == + if (insert_new_item(tbl, pkt, start_time, prev_idx, + sent_seq, ip_id) == INVALID_ARRAY_INDEX) return -1; return 0; @@ -401,12 +411,9 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt, cur_idx = tbl->items[cur_idx].next_pkt_idx; } while (cur_idx != INVALID_ARRAY_INDEX); - /* - * can't find a packet in the item group to merge, - * so insert the packet into the item group. - */ - if (insert_new_item(tbl, pkt, ip_id, sent_seq, prev_idx, - start_time) == INVALID_ARRAY_INDEX) + /* Fail to find a neighbor, so store the packet into the flow. */ + if (insert_new_item(tbl, pkt, start_time, prev_idx, sent_seq, + ip_id) == INVALID_ARRAY_INDEX) return -1; return 0; @@ -420,44 +427,33 @@ gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl, { uint16_t k = 0; uint32_t i, j; - uint32_t max_key_num = tbl->max_key_num; + uint32_t max_flow_num = tbl->max_flow_num; - for (i = 0; i < max_key_num; i++) { - /* all keys have been checked, return immediately */ - if (tbl->key_num == 0) + for (i = 0; i < max_flow_num; i++) { + if (unlikely(tbl->flow_num == 0)) return k; - j = tbl->keys[i].start_index; + j = tbl->flows[i].start_index; while (j != INVALID_ARRAY_INDEX) { if (tbl->items[j].start_time <= flush_timestamp) { out[k++] = tbl->items[j].firstseg; if (tbl->items[j].nb_merged > 1) update_header(&(tbl->items[j])); /* - * delete the item and get - * the next packet index + * Delete the packet and get the next + * packet in the flow. */ - j = delete_item(tbl, j, - INVALID_ARRAY_INDEX); + j = delete_item(tbl, j, INVALID_ARRAY_INDEX); + tbl->flows[i].start_index = j; + if (j == INVALID_ARRAY_INDEX) + tbl->flow_num--; - /* - * delete the key as all of - * packets are flushed - */ - if (j == INVALID_ARRAY_INDEX) { - tbl->keys[i].start_index = - INVALID_ARRAY_INDEX; - tbl->key_num--; - } else - /* update start_index of the key */ - tbl->keys[i].start_index = j; - - if (k == nb_out) + if (unlikely(k == nb_out)) return k; } else /* - * left packets of this key won't be - * timeout, so go to check other keys. + * The left packets in this flow won't be + * timeout. Go to check other flows. */ break; } diff --git a/lib/librte_gro/gro_tcp4.h b/lib/librte_gro/gro_tcp4.h index d129523..49e03b4 100644 --- a/lib/librte_gro/gro_tcp4.h +++ b/lib/librte_gro/gro_tcp4.h @@ -9,13 +9,13 @@ #define GRO_TCP4_TBL_MAX_ITEM_NUM (1024UL * 1024UL) /* - * the max L3 length of a TCP/IPv4 packet. The L3 length - * is the sum of ipv4 header, tcp header and L4 payload. + * The max length of a IPv4 packet, which includes the length of the L3 + * header, the L4 header and the data payload. */ -#define TCP4_MAX_L3_LENGTH UINT16_MAX +#define MAX_IPV4_PKT_LENGTH UINT16_MAX -/* criteria of mergeing packets */ -struct tcp4_key { +/* Header fields representing a TCP/IPv4 flow */ +struct tcp4_flow_key { struct ether_addr eth_saddr; struct ether_addr eth_daddr; uint32_t ip_src_addr; @@ -26,41 +26,38 @@ struct tcp4_key { uint16_t dst_port; }; -struct gro_tcp4_key { - struct tcp4_key key; +struct gro_tcp4_flow { + struct tcp4_flow_key key; /* - * the index of the first packet in the item group. - * If the value is INVALID_ARRAY_INDEX, it means - * the key is empty. + * The index of the first packet in the flow. + * INVALID_ARRAY_INDEX indicates an empty flow. */ uint32_t start_index; }; struct gro_tcp4_item { /* - * first segment of the packet. If the value + * The first MBUF segment of the packet. If the value * is NULL, it means the item is empty. */ struct rte_mbuf *firstseg; - /* last segment of the packet */ + /* The last MBUF segment of the packet */ struct rte_mbuf *lastseg; /* - * the time when the first packet is inserted - * into the table. If a packet in the table is - * merged with an incoming packet, this value - * won't be updated. We set this value only - * when the first packet is inserted into the - * table. + * The time when the first packet is inserted into the table. + * This value won't be updated, even if the packet is merged + * with other packets. */ uint64_t start_time; /* - * we use next_pkt_idx to chain the packets that - * have same key value but can't be merged together. + * next_pkt_idx is used to chain the packets that + * are in the same flow but can't be merged together + * (e.g. caused by packet reordering). */ uint32_t next_pkt_idx; - /* the sequence number of the packet */ + /* TCP sequence number of the packet */ uint32_t sent_seq; - /* the IP ID of the packet */ + /* IPv4 ID of the packet */ uint16_t ip_id; /* the number of merged packets */ uint16_t nb_merged; @@ -72,31 +69,31 @@ struct gro_tcp4_item { struct gro_tcp4_tbl { /* item array */ struct gro_tcp4_item *items; - /* key array */ - struct gro_tcp4_key *keys; + /* flow array */ + struct gro_tcp4_flow *flows; /* current item number */ uint32_t item_num; - /* current key num */ - uint32_t key_num; + /* current flow num */ + uint32_t flow_num; /* item array size */ uint32_t max_item_num; - /* key array size */ - uint32_t max_key_num; + /* flow array size */ + uint32_t max_flow_num; }; /** * This function creates a TCP/IPv4 reassembly table. * * @param socket_id - * socket index for allocating TCP/IPv4 reassemble table + * Socket index for allocating the TCP/IPv4 reassemble table * @param max_flow_num - * the maximum number of flows in the TCP/IPv4 GRO table + * The maximum number of flows in the TCP/IPv4 GRO table * @param max_item_per_flow - * the maximum packet number per flow. + * The maximum number of packets per flow * * @return - * if create successfully, return a pointer which points to the - * created TCP/IPv4 GRO table. Otherwise, return NULL. + * - Return the table pointer on success. + * - Return NULL on failure. */ void *gro_tcp4_tbl_create(uint16_t socket_id, uint16_t max_flow_num, @@ -106,62 +103,56 @@ void *gro_tcp4_tbl_create(uint16_t socket_id, * This function destroys a TCP/IPv4 reassembly table. * * @param tbl - * a pointer points to the TCP/IPv4 reassembly table. + * Pointer pointing to the TCP/IPv4 reassembly table. */ void gro_tcp4_tbl_destroy(void *tbl); /** - * This function searches for a packet in the TCP/IPv4 reassembly table - * to merge with the inputted one. To merge two packets is to chain them - * together and update packet headers. Packets, whose SYN, FIN, RST, PSH - * CWR, ECE or URG bit is set, are returned immediately. Packets which - * only have packet headers (i.e. without data) are also returned - * immediately. Otherwise, the packet is either merged, or inserted into - * the table. Besides, if there is no available space to insert the - * packet, this function returns immediately too. + * This function merges a TCP/IPv4 packet. It doesn't process the packet, + * which has SYN, FIN, RST, PSH, CWR, ECE or URG set, or doesn't have + * payload. * - * This function assumes the inputted packet is with correct IPv4 and - * TCP checksums. And if two packets are merged, it won't re-calculate - * IPv4 and TCP checksums. Besides, if the inputted packet is IP - * fragmented, it assumes the packet is complete (with TCP header). + * This function doesn't check if the packet has correct checksums and + * doesn't re-calculate checksums for the merged packet. Additionally, + * it assumes the packets are complete (i.e., MF==0 && frag_off==0), + * when IP fragmentation is possible (i.e., DF==0). It returns the + * packet, if the packet has invalid parameters (e.g. SYN bit is set) + * or there is no available space in the table. * * @param pkt - * packet to reassemble. + * Packet to reassemble * @param tbl - * a pointer that points to a TCP/IPv4 reassembly table. + * Pointer pointing to the TCP/IPv4 reassembly table * @start_time - * the start time that the packet is inserted into the table + * The time when the packet is inserted into the table * * @return - * if the packet doesn't have data, or SYN, FIN, RST, PSH, CWR, ECE - * or URG bit is set, or there is no available space in the table to - * insert a new item or a new key, return a negative value. If the - * packet is merged successfully, return an positive value. If the - * packet is inserted into the table, return 0. + * - Return a positive value if the packet is merged. + * - Return zero if the packet isn't merged but stored in the table. + * - Return a negative value for invalid parameters or no available + * space in the table. */ int32_t gro_tcp4_reassemble(struct rte_mbuf *pkt, struct gro_tcp4_tbl *tbl, uint64_t start_time); /** - * This function flushes timeout packets in a TCP/IPv4 reassembly table - * to applications, and without updating checksums for merged packets. - * The max number of flushed timeout packets is the element number of - * the array which is used to keep flushed packets. + * This function flushes timeout packets in a TCP/IPv4 reassembly table, + * and without updating checksums. * * @param tbl - * a pointer that points to a TCP GRO table. + * TCP/IPv4 reassembly table pointer * @param flush_timestamp - * this function flushes packets which are inserted into the table - * before or at the flush_timestamp. + * Flush packets which are inserted into the table before or at the + * flush_timestamp. * @param out - * pointer array which is used to keep flushed packets. + * Pointer array used to keep flushed packets * @param nb_out - * the element number of out. It's also the max number of timeout + * The element number in 'out'. It also determines the maximum number of * packets that can be flushed finally. * * @return - * the number of packets that are returned. + * The number of flushed packets */ uint16_t gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl, uint64_t flush_timestamp, @@ -173,10 +164,10 @@ uint16_t gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl, * reassembly table. * * @param tbl - * pointer points to a TCP/IPv4 reassembly table. + * TCP/IPv4 reassembly table pointer * * @return - * the number of packets in the table + * The number of packets in the table */ uint32_t gro_tcp4_tbl_pkt_count(void *tbl); #endif diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c index d6b8cd1..0b64866 100644 --- a/lib/librte_gro/rte_gro.c +++ b/lib/librte_gro/rte_gro.c @@ -23,11 +23,14 @@ static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM] = { static gro_tbl_pkt_count_fn tbl_pkt_count_fn[RTE_GRO_TYPE_MAX_NUM] = { gro_tcp4_tbl_pkt_count, NULL}; +#define IS_IPV4_TCP_PKT(ptype) (RTE_ETH_IS_IPV4_HDR(ptype) && \ + ((ptype & RTE_PTYPE_L4_TCP) == RTE_PTYPE_L4_TCP)) + /* - * GRO context structure, which is used to merge packets. It keeps - * many reassembly tables of desired GRO types. Applications need to - * create GRO context objects before using rte_gro_reassemble to - * perform GRO. + * GRO context structure. It keeps the table structures, which are + * used to merge packets, for different GRO types. Before using + * rte_gro_reassemble(), applications need to create the GRO context + * first. */ struct gro_ctx { /* GRO types to perform */ @@ -85,8 +88,6 @@ rte_gro_ctx_destroy(void *ctx) uint64_t gro_type_flag; uint8_t i; - if (gro_ctx == NULL) - return; for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) { gro_type_flag = 1ULL << i; if ((gro_ctx->gro_types & gro_type_flag) == 0) @@ -103,62 +104,54 @@ rte_gro_reassemble_burst(struct rte_mbuf **pkts, uint16_t nb_pkts, const struct rte_gro_param *param) { - uint16_t i; - uint16_t nb_after_gro = nb_pkts; - uint32_t item_num; - /* allocate a reassembly table for TCP/IPv4 GRO */ struct gro_tcp4_tbl tcp_tbl; - struct gro_tcp4_key tcp_keys[RTE_GRO_MAX_BURST_ITEM_NUM]; + struct gro_tcp4_flow tcp_flows[RTE_GRO_MAX_BURST_ITEM_NUM]; struct gro_tcp4_item tcp_items[RTE_GRO_MAX_BURST_ITEM_NUM] = {{0} }; struct rte_mbuf *unprocess_pkts[nb_pkts]; - uint16_t unprocess_num = 0; + uint32_t item_num; int32_t ret; - uint64_t current_time; + uint16_t i, unprocess_num = 0, nb_after_gro = nb_pkts; - if ((param->gro_types & RTE_GRO_TCP_IPV4) == 0) + if (unlikely((param->gro_types & RTE_GRO_TCP_IPV4) == 0)) return nb_pkts; - /* get the actual number of packets */ + /* Get the maximum number of packets */ item_num = RTE_MIN(nb_pkts, (param->max_flow_num * - param->max_item_per_flow)); + param->max_item_per_flow)); item_num = RTE_MIN(item_num, RTE_GRO_MAX_BURST_ITEM_NUM); for (i = 0; i < item_num; i++) - tcp_keys[i].start_index = INVALID_ARRAY_INDEX; + tcp_flows[i].start_index = INVALID_ARRAY_INDEX; - tcp_tbl.keys = tcp_keys; + tcp_tbl.flows = tcp_flows; tcp_tbl.items = tcp_items; - tcp_tbl.key_num = 0; + tcp_tbl.flow_num = 0; tcp_tbl.item_num = 0; - tcp_tbl.max_key_num = item_num; + tcp_tbl.max_flow_num = item_num; tcp_tbl.max_item_num = item_num; - current_time = rte_rdtsc(); - for (i = 0; i < nb_pkts; i++) { - if ((pkts[i]->packet_type & (RTE_PTYPE_L3_IPV4 | - RTE_PTYPE_L4_TCP)) == - (RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP)) { - ret = gro_tcp4_reassemble(pkts[i], - &tcp_tbl, - current_time); + if (IS_IPV4_TCP_PKT(pkts[i]->packet_type)) { + /* + * The timestamp is ignored, since all packets + * will be flushed from the tables. + */ + ret = gro_tcp4_reassemble(pkts[i], &tcp_tbl, 0); if (ret > 0) /* merge successfully */ nb_after_gro--; - else if (ret < 0) { - unprocess_pkts[unprocess_num++] = - pkts[i]; - } + else if (ret < 0) + unprocess_pkts[unprocess_num++] = pkts[i]; } else unprocess_pkts[unprocess_num++] = pkts[i]; } - /* re-arrange GROed packets */ if (nb_after_gro < nb_pkts) { - i = gro_tcp4_tbl_timeout_flush(&tcp_tbl, current_time, - pkts, nb_pkts); + /* Flush all packets from the tables */ + i = gro_tcp4_tbl_timeout_flush(&tcp_tbl, 0, pkts, nb_pkts); + /* Copy unprocessed packets */ if (unprocess_num > 0) { memcpy(&pkts[i], unprocess_pkts, sizeof(struct rte_mbuf *) * @@ -174,31 +167,28 @@ rte_gro_reassemble(struct rte_mbuf **pkts, uint16_t nb_pkts, void *ctx) { - uint16_t i, unprocess_num = 0; struct rte_mbuf *unprocess_pkts[nb_pkts]; struct gro_ctx *gro_ctx = ctx; + void *tcp_tbl; uint64_t current_time; + uint16_t i, unprocess_num = 0; - if ((gro_ctx->gro_types & RTE_GRO_TCP_IPV4) == 0) + if (unlikely((gro_ctx->gro_types & RTE_GRO_TCP_IPV4) == 0)) return nb_pkts; + tcp_tbl = gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX]; current_time = rte_rdtsc(); for (i = 0; i < nb_pkts; i++) { - if ((pkts[i]->packet_type & (RTE_PTYPE_L3_IPV4 | - RTE_PTYPE_L4_TCP)) == - (RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP)) { - if (gro_tcp4_reassemble(pkts[i], - gro_ctx->tbls - [RTE_GRO_TCP_IPV4_INDEX], + if (IS_IPV4_TCP_PKT(pkts[i]->packet_type)) { + if (gro_tcp4_reassemble(pkts[i], tcp_tbl, current_time) < 0) unprocess_pkts[unprocess_num++] = pkts[i]; } else unprocess_pkts[unprocess_num++] = pkts[i]; } if (unprocess_num > 0) { - memcpy(pkts, unprocess_pkts, - sizeof(struct rte_mbuf *) * + memcpy(pkts, unprocess_pkts, sizeof(struct rte_mbuf *) * unprocess_num); } @@ -224,6 +214,7 @@ rte_gro_timeout_flush(void *ctx, flush_timestamp, out, max_nb_out); } + return 0; } @@ -232,19 +223,20 @@ rte_gro_get_pkt_count(void *ctx) { struct gro_ctx *gro_ctx = ctx; gro_tbl_pkt_count_fn pkt_count_fn; + uint64_t gro_types = gro_ctx->gro_types, flag; uint64_t item_num = 0; - uint64_t gro_type_flag; uint8_t i; - for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) { - gro_type_flag = 1ULL << i; - if ((gro_ctx->gro_types & gro_type_flag) == 0) + for (i = 0; i < RTE_GRO_TYPE_MAX_NUM && gro_types; i++) { + flag = 1ULL << i; + if ((gro_types & flag) == 0) continue; + gro_types ^= flag; pkt_count_fn = tbl_pkt_count_fn[i]; - if (pkt_count_fn == NULL) - continue; - item_num += pkt_count_fn(gro_ctx->tbls[i]); + if (pkt_count_fn) + item_num += pkt_count_fn(gro_ctx->tbls[i]); } + return item_num; } diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h index 81a2eac..85d8143 100644 --- a/lib/librte_gro/rte_gro.h +++ b/lib/librte_gro/rte_gro.h @@ -31,8 +31,8 @@ extern "C" { /**< TCP/IPv4 GRO flag */ /** - * A structure which is used to create GRO context objects or tell - * rte_gro_reassemble_burst() what reassembly rules are demanded. + * Structure used to create GRO context objects or used to pass + * application-determined parameters to rte_gro_reassemble_burst(). */ struct rte_gro_param { uint64_t gro_types; @@ -78,26 +78,23 @@ void rte_gro_ctx_destroy(void *ctx); /** * This is one of the main reassembly APIs, which merges numbers of - * packets at a time. It assumes that all inputted packets are with - * correct checksums. That is, applications should guarantee all - * inputted packets are correct. Besides, it doesn't re-calculate - * checksums for merged packets. If inputted packets are IP fragmented, - * this function assumes them are complete (i.e. with L4 header). After - * finishing processing, it returns all GROed packets to applications - * immediately. + * packets at a time. It doesn't check if input packets have correct + * checksums and doesn't re-calculate checksums for merged packets. + * It assumes the packets are complete (i.e., MF==0 && frag_off==0), + * when IP fragmentation is possible (i.e., DF==0). The GROed packets + * are returned as soon as the function finishes. * * @param pkts - * a pointer array which points to the packets to reassemble. Besides, - * it keeps mbuf addresses for the GROed packets. + * Pointer array pointing to the packets to reassemble. Besides, it + * keeps MBUF addresses for the GROed packets. * @param nb_pkts - * the number of packets to reassemble. + * The number of packets to reassemble * @param param - * applications use it to tell rte_gro_reassemble_burst() what rules - * are demanded. + * Application-determined parameters for reassembling packets. * * @return - * the number of packets after been GROed. If no packets are merged, - * the returned value is nb_pkts. + * The number of packets after been GROed. If no packets are merged, + * the return value is equals to nb_pkts. */ uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts, uint16_t nb_pkts, @@ -107,32 +104,28 @@ uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts, * @warning * @b EXPERIMENTAL: this API may change without prior notice * - * Reassembly function, which tries to merge inputted packets with - * the packets in the reassembly tables of a given GRO context. This - * function assumes all inputted packets are with correct checksums. - * And it won't update checksums if two packets are merged. Besides, - * if inputted packets are IP fragmented, this function assumes they - * are complete packets (i.e. with L4 header). + * Reassembly function, which tries to merge input packets with the + * existed packets in the reassembly tables of a given GRO context. + * It doesn't check if input packets have correct checksums and doesn't + * re-calculate checksums for merged packets. Additionally, it assumes + * the packets are complete (i.e., MF==0 && frag_off==0), when IP + * fragmentation is possible (i.e., DF==0). * - * If the inputted packets don't have data or are with unsupported GRO - * types etc., they won't be processed and are returned to applications. - * Otherwise, the inputted packets are either merged or inserted into - * the table. If applications want get packets in the table, they need - * to call flush API. + * If the input packets have invalid parameters (e.g. no data payload, + * unsupported GRO types), they are returned to applications. Otherwise, + * they are either merged or inserted into the table. Applications need + * to flush packets from the tables by flush API, if they want to get the + * GROed packets. * * @param pkts - * packet to reassemble. Besides, after this function finishes, it - * keeps the unprocessed packets (e.g. without data or unsupported - * GRO types). + * Packets to reassemble. It's also used to store the unprocessed packets. * @param nb_pkts - * the number of packets to reassemble. + * The number of packets to reassemble * @param ctx - * a pointer points to a GRO context object. + * GRO context object pointer * * @return - * return the number of unprocessed packets (e.g. without data or - * unsupported GRO types). If all packets are processed (merged or - * inserted into the table), return 0. + * The number of unprocessed packets. */ uint16_t rte_gro_reassemble(struct rte_mbuf **pkts, uint16_t nb_pkts, @@ -142,29 +135,28 @@ uint16_t rte_gro_reassemble(struct rte_mbuf **pkts, * @warning * @b EXPERIMENTAL: this API may change without prior notice * - * This function flushes the timeout packets from reassembly tables of - * desired GRO types. The max number of flushed timeout packets is the - * element number of the array which is used to keep the flushed packets. + * This function flushes the timeout packets from the reassembly tables + * of desired GRO types. The max number of flushed packets is the + * element number of 'out'. * - * Besides, this function won't re-calculate checksums for merged - * packets in the tables. That is, the returned packets may be with - * wrong checksums. + * Additionally, the flushed packets may have incorrect checksums, since + * this function doesn't re-calculate checksums for merged packets. * * @param ctx - * a pointer points to a GRO context object. + * GRO context object pointer. * @param timeout_cycles - * max TTL for packets in reassembly tables, measured in nanosecond. + * The max TTL for packets in reassembly tables, measured in nanosecond. * @param gro_types - * this function only flushes packets which belong to the GRO types - * specified by gro_types. + * This function flushes packets whose GRO types are specified by + * gro_types. * @param out - * a pointer array that is used to keep flushed timeout packets. + * Pointer array used to keep flushed packets. * @param max_nb_out - * the element number of out. It's also the max number of timeout + * The element number of 'out'. It's also the max number of timeout * packets that can be flushed finally. * * @return - * the number of flushed packets. If no packets are flushed, return 0. + * The number of flushed packets. */ uint16_t rte_gro_timeout_flush(void *ctx, uint64_t timeout_cycles, @@ -180,10 +172,10 @@ uint16_t rte_gro_timeout_flush(void *ctx, * of a given GRO context. * * @param ctx - * pointer points to a GRO context object. + * GRO context object pointer. * * @return - * the number of packets in all reassembly tables. + * The number of packets in the tables. */ uint64_t rte_gro_get_pkt_count(void *ctx);