From patchwork Tue Nov 14 16:04:40 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Dewar X-Patchwork-Id: 31367 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id BE7C61B263; Tue, 14 Nov 2017 17:05:00 +0100 (CET) Received: from mail-pg0-f65.google.com (mail-pg0-f65.google.com [74.125.83.65]) by dpdk.org (Postfix) with ESMTP id 7696D1B25F for ; Tue, 14 Nov 2017 17:04:58 +0100 (CET) Received: by mail-pg0-f65.google.com with SMTP id s75so15697697pgs.0 for ; Tue, 14 Nov 2017 08:04:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=JeobhB6qpR/x7wc8q7IyRIi7RCZz7e5gsVwXBRomO6I=; b=Xhke3twwqRx20XQdV+66nEt4M9L8sLvHds2Idovo5j8nFNCk0KrJfDp4jMMJcN09xa msaqddyr87shB4ivjRG3go/gCGPnFnVTnZof7+Uy5jwsOloY3r8Fl03mcda4xL6Awjzk n7ReUtO2YStj2x5CwhJMr1vPZT14qrmXjKVoQUhOxcd+Tq8vQ82dS9hTzUVaKhLyCDcs kLMBH+ifl69CQ8a/fQMWHwiSYKtz+uzEGsjG6jBVPx5ul+z46c9PXvM8k/NLOpwK0h5d 2nwY7VhKUwcTYIdis4f3RS5jH98tBqiccTqi7e8rAWkXmGtbdsQeVx9x0mCR/r5XtHXQ MGkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=JeobhB6qpR/x7wc8q7IyRIi7RCZz7e5gsVwXBRomO6I=; b=RiYHFLuBK1lKcI262/+FuP3XLHwD5nVAOBNr5N6tj0r5DDP26bfzTIt8R/2svQSzX1 z76aj1hIY1P8Hi0HOeeQDgeQMAniSegmnr37MM9KFOka7SMa3xTcLw7q4ZUrimfkcbUx 9XkK9JMVY8xezZEhSXp/yz0bfghQnTr/Ifm3Y/HIsPPFt4MSUI98Y9X6EYFI6r24EhgN R7kjf3prf7GmpAoxpKpIbJjwND8uLugalqh+iS53PnIpjkTnrO3zY01ZoKDZd6kGzz2n GB8T1U2o91AB4Z+5Urc1LV/9niQZBetJZqn20fU4Vq32D72TZrKYfU7df8G0xQqfOkhi eDfA== X-Gm-Message-State: AJaThX4JUwFLM2p5UUaox03DhCpukFzV1ju4wZD61MnYjZ+zjzerEr6x l1P/b42kl+s6tqTfMwZjhg== X-Google-Smtp-Source: AGs4zMYa4pxxwQbwNrzFw5FHdOYxqgsQgu3026ei1mAY3i7TZvzprHjzouVULHA1hyHvS6Jc9hp/vQ== X-Received: by 10.84.254.78 with SMTP id a14mr12629804pln.353.1510675497422; Tue, 14 Nov 2017 08:04:57 -0800 (PST) Received: from bra-l27t7p12.corp.brocade.com ([213.251.34.151]) by smtp.gmail.com with ESMTPSA id u9sm44017501pfa.40.2017.11.14.08.04.51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 14 Nov 2017 08:04:56 -0800 (PST) From: alangordondewar@gmail.com X-Google-Original-From: alan.dewar@att.com To: cristian.dumitrescu@intel.com Cc: dev@dpdk.org, chas3@att.com, luca.boccassi@att.com, alan.robertson@att.com, Alan Dewar Date: Tue, 14 Nov 2017 16:04:40 +0000 Message-Id: <1510675480-14456-1-git-send-email-alan.dewar@att.com> X-Mailer: git-send-email 2.1.4 Subject: [dpdk-dev] [PATCH] sched: fix overflow errors in WRR weighting code X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Alan Dewar The WRR code calculates the lowest common denominator between the four WRR weights as a uint32_t value and divides the LCD by each of the WRR weights and casts the results as a uint8_t. This casting can cause the ratios of the computed wrr costs to be wrong. For example with WRR weights of 3, 5, 7 and 11, the LCD is computed to be 1155. The WRR costs get computed as: 1155/3 = 385, 1155/5 = 231, 1155/7 = 165, 1155/11 = 105. When the value 385 is cast into an uint8_t it ends up as 129. Rather than casting straight into a uint8_t, this patch shifts the computed WRR costs right so that the largest value is only eight bits wide. In grinder_schedule, the packet length is multiplied by the WRR cost and added to the grinder's wrr_tokens value. The grinder's wrr_tokens field is a uint16_t, so combination of a packet length of 1500 bytes and a wrr cost of 44 will overflow this field on the first packet. This patch increases the width of the grinder's wrr_tokens and wrr_mask fields from uint16_t to uint32_t. In grinder_wrr_store, the remaining tokens in the grinder's wrr_tokens array are copied to the appropriate pipe's wrr_tokens array. However the pipe's wrr_tokens array is only a uint8_t array so unused tokens were quite frequently lost which upsets the balance of traffic across the four WRR queues. This patch increases the width of the pipe's wrr_tokens array from a uint8_t to uint32_t. Signed-off-by: Alan Dewar Reviewed-by: Luca Boccassi --- lib/librte_sched/rte_sched.c | 57 ++++++++++++++++++++++++++----------- lib/librte_sched/rte_sched_common.h | 15 ++++++++++ 2 files changed, 55 insertions(+), 17 deletions(-) diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c index 7252f85..a65c98f 100644 --- a/lib/librte_sched/rte_sched.c +++ b/lib/librte_sched/rte_sched.c @@ -130,7 +130,7 @@ struct rte_sched_pipe { uint32_t tc_credits[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; /* Weighted Round Robin (WRR) */ - uint8_t wrr_tokens[RTE_SCHED_QUEUES_PER_PIPE]; + uint32_t wrr_tokens[RTE_SCHED_QUEUES_PER_PIPE]; /* TC oversubscription */ uint32_t tc_ov_credits; @@ -205,8 +205,8 @@ struct rte_sched_grinder { struct rte_mbuf *pkt; /* WRR */ - uint16_t wrr_tokens[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; - uint16_t wrr_mask[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; + uint32_t wrr_tokens[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; + uint32_t wrr_mask[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; uint8_t wrr_cost[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; }; @@ -542,6 +542,17 @@ rte_sched_time_ms_to_bytes(uint32_t time_ms, uint32_t rate) return time; } +static uint32_t rte_sched_reduce_to_byte(uint32_t value) +{ + uint32_t shift = 0; + + while (value & 0xFFFFFF00) { + value >>= 1; + shift++; + } + return shift; +} + static void rte_sched_port_config_pipe_profile_table(struct rte_sched_port *port, struct rte_sched_port_params *params) { @@ -583,6 +594,8 @@ rte_sched_port_config_pipe_profile_table(struct rte_sched_port *port, struct rte uint32_t wrr_cost[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; uint32_t lcd, lcd1, lcd2; uint32_t qindex; + uint32_t low_pos; + uint32_t shift; qindex = j * RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; @@ -594,16 +607,22 @@ rte_sched_port_config_pipe_profile_table(struct rte_sched_port *port, struct rte lcd1 = rte_get_lcd(wrr_cost[0], wrr_cost[1]); lcd2 = rte_get_lcd(wrr_cost[2], wrr_cost[3]); lcd = rte_get_lcd(lcd1, lcd2); + low_pos = rte_min_pos_4_u32(wrr_cost); wrr_cost[0] = lcd / wrr_cost[0]; wrr_cost[1] = lcd / wrr_cost[1]; wrr_cost[2] = lcd / wrr_cost[2]; wrr_cost[3] = lcd / wrr_cost[3]; - dst->wrr_cost[qindex] = (uint8_t) wrr_cost[0]; - dst->wrr_cost[qindex + 1] = (uint8_t) wrr_cost[1]; - dst->wrr_cost[qindex + 2] = (uint8_t) wrr_cost[2]; - dst->wrr_cost[qindex + 3] = (uint8_t) wrr_cost[3]; + shift = rte_sched_reduce_to_byte(wrr_cost[low_pos]); + dst->wrr_cost[qindex] = (uint8_t) (wrr_cost[0] >> + shift); + dst->wrr_cost[qindex + 1] = (uint8_t) (wrr_cost[1] >> + shift); + dst->wrr_cost[qindex + 2] = (uint8_t) (wrr_cost[2] >> + shift); + dst->wrr_cost[qindex + 3] = (uint8_t) (wrr_cost[3] >> + shift); } rte_sched_port_log_pipe_profile(port, i); @@ -1941,15 +1960,19 @@ grinder_wrr_load(struct rte_sched_port *port, uint32_t pos) qindex = tc_index * 4; - grinder->wrr_tokens[0] = ((uint16_t) pipe->wrr_tokens[qindex]) << RTE_SCHED_WRR_SHIFT; - grinder->wrr_tokens[1] = ((uint16_t) pipe->wrr_tokens[qindex + 1]) << RTE_SCHED_WRR_SHIFT; - grinder->wrr_tokens[2] = ((uint16_t) pipe->wrr_tokens[qindex + 2]) << RTE_SCHED_WRR_SHIFT; - grinder->wrr_tokens[3] = ((uint16_t) pipe->wrr_tokens[qindex + 3]) << RTE_SCHED_WRR_SHIFT; + grinder->wrr_tokens[0] = pipe->wrr_tokens[qindex] << + RTE_SCHED_WRR_SHIFT; + grinder->wrr_tokens[1] = pipe->wrr_tokens[qindex + 1] << + RTE_SCHED_WRR_SHIFT; + grinder->wrr_tokens[2] = pipe->wrr_tokens[qindex + 2] << + RTE_SCHED_WRR_SHIFT; + grinder->wrr_tokens[3] = pipe->wrr_tokens[qindex + 3] << + RTE_SCHED_WRR_SHIFT; - grinder->wrr_mask[0] = (qmask & 0x1) * 0xFFFF; - grinder->wrr_mask[1] = ((qmask >> 1) & 0x1) * 0xFFFF; - grinder->wrr_mask[2] = ((qmask >> 2) & 0x1) * 0xFFFF; - grinder->wrr_mask[3] = ((qmask >> 3) & 0x1) * 0xFFFF; + grinder->wrr_mask[0] = (qmask & 0x1) * 0xFFFFFFFF; + grinder->wrr_mask[1] = ((qmask >> 1) & 0x1) * 0xFFFFFFFF; + grinder->wrr_mask[2] = ((qmask >> 2) & 0x1) * 0xFFFFFFFF; + grinder->wrr_mask[3] = ((qmask >> 3) & 0x1) * 0xFFFFFFFF; grinder->wrr_cost[0] = pipe_params->wrr_cost[qindex]; grinder->wrr_cost[1] = pipe_params->wrr_cost[qindex + 1]; @@ -1981,14 +2004,14 @@ static inline void grinder_wrr(struct rte_sched_port *port, uint32_t pos) { struct rte_sched_grinder *grinder = port->grinder + pos; - uint16_t wrr_tokens_min; + uint32_t wrr_tokens_min; grinder->wrr_tokens[0] |= ~grinder->wrr_mask[0]; grinder->wrr_tokens[1] |= ~grinder->wrr_mask[1]; grinder->wrr_tokens[2] |= ~grinder->wrr_mask[2]; grinder->wrr_tokens[3] |= ~grinder->wrr_mask[3]; - grinder->qpos = rte_min_pos_4_u16(grinder->wrr_tokens); + grinder->qpos = rte_min_pos_4_u32(grinder->wrr_tokens); wrr_tokens_min = grinder->wrr_tokens[grinder->qpos]; grinder->wrr_tokens[0] -= wrr_tokens_min; diff --git a/lib/librte_sched/rte_sched_common.h b/lib/librte_sched/rte_sched_common.h index aed144b..a3c6bc2 100644 --- a/lib/librte_sched/rte_sched_common.h +++ b/lib/librte_sched/rte_sched_common.h @@ -77,6 +77,21 @@ rte_min_pos_4_u16(uint16_t *x) return pos0; } +static inline uint32_t +rte_min_pos_4_u32(uint32_t *x) +{ + uint32_t pos0 = 0; + uint32_t pos1 = 2; + + if (x[1] <= x[0]) + pos0 = 1; + if (x[3] <= x[2]) + pos1 = 3; + if (x[pos1] <= x[pos0]) + pos0 = pos1; + + return pos0; +} #endif /*