Message ID | 1538917054-68283-1-git-send-email-orika@mellanox.com (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 8F75D1E20; Sun, 7 Oct 2018 14:58:49 +0200 (CEST) Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-he1eur01on0077.outbound.protection.outlook.com [104.47.0.77]) by dpdk.org (Postfix) with ESMTP id 2846D239 for <dev@dpdk.org>; Sun, 7 Oct 2018 14:58:48 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zGpG5GJQQOg+jYequkgPDShsSTVNpbIs/epgT6XmNSo=; b=mIZoyjqNPrizOghCcm8NkBgalqzJWZOyzNqwWwr+NjdKRCrD0ZjTjC6VLuIvSdJ8oZYmT8pqrvdH2NtfKXfjU7P9mJBCmzp0WZweXWBuSCgSlfIPZyqdKp+WHbvr3xjaHfL3PZVtACcD2YG+0aKbUdlF00bVN+VygHkqoxVoq4M= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=orika@mellanox.com; Received: from mellanox.com (37.142.13.130) by HE1PR05MB3436.eurprd05.prod.outlook.com (2603:10a6:7:33::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1207.18; Sun, 7 Oct 2018 12:58:44 +0000 From: Ori Kam <orika@mellanox.com> To: arybchenko@solarflare.com, ferruh.yigit@intel.com, stephen@networkplumber.org, adrien.mazarguil@6wind.com Cc: dev@dpdk.org, dekelp@mellanox.com, thomas@monjalon.net, nelio.laranjeiro@6wind.com, yskoh@mellanox.com, orika@mellanox.com, shahafs@mellanox.com Date: Sun, 7 Oct 2018 12:57:31 +0000 Message-Id: <1538917054-68283-1-git-send-email-orika@mellanox.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537995646-95260-1-git-send-email-orika@mellanox.com> References: <1537995646-95260-1-git-send-email-orika@mellanox.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [37.142.13.130] X-ClientProxiedBy: LO2P265CA0119.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:c::35) To HE1PR05MB3436.eurprd05.prod.outlook.com (2603:10a6:7:33::26) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: f99ef04b-2354-4674-8816-08d62c549dcd X-MS-Office365-Filtering-HT: Tenant X-Microsoft-Antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(4618075)(2017052603328)(7153060)(7193020); SRVR:HE1PR05MB3436; X-Microsoft-Exchange-Diagnostics: 1; HE1PR05MB3436; 3:QzIZBo85UdhYqOUliOqTFqmQlukwdpindsMaiG9wG5rVuI9FaoIDAHZcKepylZQYv6mKiB4WuQT3vHDQlDf7ZzSsmY9368p528ldUHAaIA4oEEFRcgh19eFirq3hfVstrMVWm14v+z10y9r6wNXCROgQeFy4+/cog/+4dTVn8ovGVwvFRzJdUh9ka6++ivSGKupMDLnl47JHOkxkRixFoO1xSVLbhgVD25fj+4l8vQPvuQpBeflEXs9y9TXk49s4; 25:+7Zk5AqGx4UMCku7XByWbMv4Lv5eT+j+haAq2H2RNRCK+NRPUV6J5jrjeGjyH2+zF7j/oXN6KWgSOEiGhm3shFUJWTf1tLfutNVzf/y1vTuTiZkOgFX7H6eW2OuyJrUM4L6f/h4HYEdPF8Y279lyYIdr3zMwrowGr4ItahxzETVvpZoJ0rLSKtcCPqd/WKWi+DuTZvTvNv+Oaigz/ABoJ8IbIxfHRGIAYbbyhDmplhytMDbNLpZ6k262YHndT9JJPhasM8OhUwAyWNFd+k+xE5ARK0VlqflCZXlDDlyzkdUL6pRka5U/f/urBoNhBC+0k9pW9cYZZcufAdlthLeymg==; 31:l6K8pdbfMVVcxLErBx2/HcUhMl2W1/R0ALa8+GkGpC+86zaH4mZ0IMQrD49viHGUwQ2syy/5ACjQVuB77j1PeNx+SmnzgjBewGQVMeE7Ll4SppPXbTlBbD/inyM3zPRI2LqPJQ45BJ6692PYaN+8oo2N4Snc/QzY5xoo4JLL5xXzyWtBbROBFAeiBMIZ1m/0mvDNSmUgPgVhYgfSPMjuwrzTWLSprLGHrqXVGLtxNe8= X-MS-TrafficTypeDiagnostic: HE1PR05MB3436: X-LD-Processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr X-Microsoft-Exchange-Diagnostics: 1; HE1PR05MB3436; 20:MdDH6EDHOLKm89qPxgF3PfrvQTlsBs+jKBu1sR+BGSSK8iEbn2HPTrRtCJHRpx6gcqaTYwtRT0lRlvuzzqsQBd45Elb3u/7F6ZDOtqrf9HO1V7C9xAGR7SMo25U2dkbFr922J7IaZgKiOaTSIvkAedyBOTxGByQo4f83kaZUOnZ61WyQVIV2UwBc9W4SRQhYy9ch5ULQauWTTgE13HeG4CE0GYEE2e5tEmwFpJdeBP2wqUOBPE6SEpgV+t9aiK8t7k9EUQNTjusXtA0PzIkUrcEfit4d3LzDlJ2AH0NElTyRyV3xZyVJakgFRpRKY22KoqiaElKtk+aG4s6bcrO7/4HKSowty/4MTAGfoFtD/0NyeTs1ClIdmj7gb//GPw0IiR/H0m57dYE8PXdRMmaCQ7imdILxa/MuUPLjAX71UC/HH1It34UGLQTT1NufHTiRsCuVVE1o3RrfmT8l56x6bUK/Lhj3x34RCcErwcp+GFNt9Z+s839n0qJPurboZH9G; 4:0sgsIOMyaKAGL32U2S3BXasO+N00Rm/Rr91zZ1TA9oCWoiakMPNkOmUDtsE9lDv7DyWPzqx2mqDPa0elNP964i3F6AQtBjoQ0f5S4hiVEsLujstRfyX0JLk73RVplVUpU64S0NqLuGWHThUSYSpRT4GLNrORHxMZApV7Qx8/uzKCAzoX+m5hMnez1KQGb/Ow7WDHNY248CsM9UV5ZtKMNV1d4nodwtfaH+slBZLjlj1xfKuuNnYDAyBZYbjd8MoxbgPWkPnV9tkYxam3nuSeSw== X-Microsoft-Antispam-PRVS: <HE1PR05MB3436F8E60EB3D01A6A806FF4DBE50@HE1PR05MB3436.eurprd05.prod.outlook.com> X-Exchange-Antispam-Report-Test: UriScan:; X-MS-Exchange-SenderADCheck: 1 X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(5005006)(8121501046)(10201501046)(93006095)(93001095)(3002001)(3231355)(944501410)(52105095)(6055026)(149066)(150057)(6041310)(20161123564045)(20161123558120)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(20161123560045)(201708071742011)(7699051)(76991055); SRVR:HE1PR05MB3436; BCL:0; PCL:0; RULEID:; SRVR:HE1PR05MB3436; X-Forefront-PRVS: 0818724663 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(979002)(376002)(396003)(136003)(346002)(366004)(39860400002)(199004)(189003)(105586002)(76176011)(2616005)(3846002)(51416003)(7696005)(106356001)(52116002)(6116002)(386003)(55016002)(486006)(476003)(11346002)(446003)(956004)(6306002)(26005)(86362001)(305945005)(21086003)(53936002)(316002)(36756003)(69596002)(16586007)(16526019)(186003)(4743002)(8936002)(81156014)(50226002)(4326008)(478600001)(2906002)(66066001)(8676002)(4720700003)(81166006)(7736002)(6666003)(25786009)(8886007)(50466002)(68736007)(48376002)(107886003)(97736004)(47776003)(33026002)(5660300001)(41533002)(969003)(989001)(999001)(1009001)(1019001); DIR:OUT; SFP:1101; SCL:1; SRVR:HE1PR05MB3436; H:mellanox.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; Received-SPF: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; HE1PR05MB3436; 23:QiShZI8rY6MbrBAVwj/qQrn5dEqu2vS4KWpO4P59F?= MUfqJpJiBdNVXwefQF0xUWgxi+GiIPgcOmh+RwgnR2wHi6PXERR0nfCWQ0gzBXU+9UUG1RiPmmo6xKVFrr+zzl5kcVakTN6wVXHmiBTLYykmVavrIb+uCmT5gIVyyvgVAzKNMOgnpGe3NrbsG1D7Ri7i0BxZKAd6ltzI8Hrlp8y8r7Bok5vH0OJC7a83Q3s1f2bBaycptWdAfzZOijF42GA/+LbZKP3cwTei/uyqebYOd6tbbTi20D02a6qgxIknXN0KkPVeQYg08pukC1InVrvqVF1dUe6SlVexaDgbe4htzzsuAorCe/LN/m5sLIESieXE2B/JQIcEyTGR0ez4cGFl8B4Yw3TrxNbaiTEZGdg8pJhUbSN2eaa/p8p81MH7TOiWsSJV82FZ1gxNrzyC4a/+/5Oj0B4R1qHCCIVfnJhsOUJYLtYQOFL9k3PPNOS7B7U4Ocd0mCPcbKWgSGIr5ZtFbIMGjV4QrqsYADrAeL8qqpP1wjxXDVUkZ7wT/gn0xGC0+7IDUaJYAOPjc/jNE9iw4ye+aCFZK+vpBo3+Hr3eKI5z8FiyfRs8wLUuCQJr7QPd1CZoNvF5DGSjzs9YEynwgasnSmiMxAcOnymiZL0VEXu644IMlyWIY2S00AN/kNshasSvTXdCUznHmGmJLjglfe69XyCnZwE/NQnnLSvKB8CHDYHCKXpHVx9hmPtdErWybC3AqWqAC86BqYv9Rp7y9HxPyu8ev6bJZGlK0LSmATj4W710a0W4j2Hedz3LJXQmE5lNO5aF+hxNmvaJfSktxlvkteP6Xk9YJfgE8ByVQjw7SqMwtQgipa4RkLKzw0Ts2CK6K8yHBQJR/GnXkRp8QUaNDVh5QtZj5LknLreRX9XHSVWHMuWHaSJRF/YdWAiCN38w3JZ/vgjbZbzBuEWbYw1G4hZCCt4LnUK2l418ETAiU1IshYdLqEnDI7Dh4+XPxJQ3rSlh2PFeh6EIq65A9cz5iYP24HaMbDBEkrTM1nqyEoPcjJvuVKDBglaH7rLdzzYJ5KZCnP+iydOvh34b+tilFc3cErMWeF/Lks8VTtJXCDitEh2hbfGXmdFvJB2THrpZ5lTBcHWcFwuw8q/OeuK3J5V6HYpAZiMlTY3plWRIdDTe+in6T0Nk+A0781t3n+5sr+NwtrPYr5QkpdHEG/je80DXBJt0Ofy6OHwRc4gMI3kbBNP72WKLaEfF+XuvrUzjIkkCYbpdKrIAO5rr9KBmNcvkN/tAH/5jx8m+HlxCWdmEAp/O37mjERd4FtQs00SQhmD/JCRVAzhQYv2R8oK473Hn/QFyInFmxZCFX6d8qQI6KOQmIqVRckmqVNzcCxqnqRgevSPbBsTuDW7Pqr5DFrh2OXu8Teddc0ORA== X-Microsoft-Antispam-Message-Info: ll6HDxtWlqG4iQ33Qh0BH+1NQdDOeG5nWY5zksndkvgETqBW8MQLHF1fg5y3sGd3QDRaA+rV4lwpPq39OYtxv8KTQKGiYcUy73H1dzmzAH1kqUxLtz+T7c12VN/YS8eI9OTz9c1/NdnJRHjHoa8kS8HIJkzlV903sjDYLQtq//Ix5FAVb6vx46ZUGUR0c2AZkTwHjtNTa87fBd8INbpOioK0KwnPjPiuBUJR7txtaq1MWC8a/u6adGXd1J40agxiT8k4iTah9K0xq75mpDFF6zjOcVSCGZP2J5YDNHmO3SEM9DT4Dgl3/ku8y6bGURIACM3rEOVwaAUPMUH2DROHrcJGujSDHEjvijFy1IjrL2U= X-Microsoft-Exchange-Diagnostics: 1; HE1PR05MB3436; 6:naMbbP1cYsj3hhb7OON693ucXV0cnZtV4tbzMUQUq3778TGdEOBcaMAcDWfGuQWH6XxIl46AqWp4TRQCvRMbahrMhOqoL1XJbAJGahmwa37IeC4UJJQnoQL1ywjRo5rb5nN7ht7YoSH6Dx4OoWpuW0nPkWs7mUGoxLhinvUgCN4c3GmT6Q3NgxS8IkFmwBzUnWVy3sPufKSM7L1ghqsjAsR9SpSkP5tJDTQlCjnfI2wE23iJC/ZGqJ/gWSidekZISA2DLzb5ekZVVnXDROo9TAJyk9BcE1vFDXYXnlIvq/r2mhPFN0aaTA27K0rcJFc0BBx6SeAKl7KS7xwgCdO7q8L2zm4hvUnmc46WOs96tD4QtiD3+lwpEmq3rr5hbCiV0sFr5v8XVv4+RKFI08obLEFVu9vcuAuv64n1Xa74zyUSddx3WO2RrzXfnt8ZK3xEeTcrOIfamQo9MOsK8bM5cA==; 5:7SJ3CVShynw8N9HMBFNBQCMsA8jgGuhjJPBrH52nENUYHBl1LhrS7Jrei5kT3DdUqmdFlq0wRw1lJpcit1sa1r43BPqNp/Y2g6pmHhom0Az3vi26OJwmiRQQZzkKaTzuhjZonWZkNZV4m4u4xzg3oUsTdpXZguao8cIL/pUXdFE=; 7:CXOXEbD26j8cQy8o6L2qBda4gAGpepmq4dGOwh/GIiRf3ckZQJLjVGazcTI+vpmd3zbqqOaTYatKOtrDO79ZZ5C/Ghmm8AzG//Yj0/T4TQ8SBL+bD5wYuuxzy3u8Vv1vma60Opapx0YL3RTjoiGzxeRoacf9kmQHzLP0G5sNvBUQTj2b2G8H3NpPg/S/ekwiKvipd5ScfGuuqDqUM+cce6UeSJoIWfKntCEaO+g2gWx88/0QwGR+5GBGUhNPdZIr SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 07 Oct 2018 12:58:44.2277 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f99ef04b-2354-4674-8816-08d62c549dcd X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR05MB3436 Subject: [dpdk-dev] [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation actions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Series |
ethdev: add generic L2/L3 tunnel encapsulation actions
|
|
Message
Ori Kam
Oct. 7, 2018, 12:57 p.m. UTC
This series implement the generic L2/L3 tunnel encapsulation actions and is based on rfc [1] "add generic L2/L3 tunnel encapsulation actions" Currenlty the encap/decap actions only support encapsulation of VXLAN and NVGRE L2 packets (L2 encapsulation is where the inner packet has a valid Ethernet header, while L3 encapsulation is where the inner packet doesn't have the Ethernet header). In addtion the parameter to to the encap action is a list of rte items, this results in 2 extra translation, between the application to the action and from the action to the NIC. This results in negetive impact on the insertion performance. Looking forward there are going to be a need to support many more tunnel encapsulations. For example MPLSoGRE, MPLSoUDP. Adding the new encapsulation will result in duplication of code. For example the code for handling NVGRE and VXLAN are exactly the same, and each new tunnel will have the same exact structure. This series introduce a generic encapsulation for L2 tunnel types, and generic encapsulation for L3 tunnel types. In addtion the new encapsulations commands are using raw buffer inorder to save the converstion time, both for the application and the PMD. [1]https://mails.dpdk.org/archives/dev/2018-August/109944.html v3: * rebase on tip. v2: * add missing decap_l3 structure. * fix typo. Ori Kam (3): ethdev: add generic L2/L3 tunnel encapsulation actions app/testpmd: convert testpmd encap commands to new API ethdev: remove vxlan and nvgre encapsulation commands app/test-pmd/cmdline_flow.c | 292 +++++++++++++++++-------------------- app/test-pmd/config.c | 2 - doc/guides/prog_guide/rte_flow.rst | 115 ++++++--------- lib/librte_ethdev/rte_flow.c | 44 +----- lib/librte_ethdev/rte_flow.h | 108 ++++++-------- 5 files changed, 231 insertions(+), 330 deletions(-)
Comments
On 10/7/2018 1:57 PM, Ori Kam wrote: > This series implement the generic L2/L3 tunnel encapsulation actions > and is based on rfc [1] "add generic L2/L3 tunnel encapsulation actions" > > Currenlty the encap/decap actions only support encapsulation > of VXLAN and NVGRE L2 packets (L2 encapsulation is where > the inner packet has a valid Ethernet header, while L3 encapsulation > is where the inner packet doesn't have the Ethernet header). > In addtion the parameter to to the encap action is a list of rte items, > this results in 2 extra translation, between the application to the action > and from the action to the NIC. This results in negetive impact on the > insertion performance. > > Looking forward there are going to be a need to support many more tunnel > encapsulations. For example MPLSoGRE, MPLSoUDP. > Adding the new encapsulation will result in duplication of code. > For example the code for handling NVGRE and VXLAN are exactly the same, > and each new tunnel will have the same exact structure. > > This series introduce a generic encapsulation for L2 tunnel types, and > generic encapsulation for L3 tunnel types. In addtion the new > encapsulations commands are using raw buffer inorder to save the > converstion time, both for the application and the PMD. > > [1]https://mails.dpdk.org/archives/dev/2018-August/109944.html > > v3: > * rebase on tip. > > v2: > * add missing decap_l3 structure. > * fix typo. > > > Ori Kam (3): > ethdev: add generic L2/L3 tunnel encapsulation actions > app/testpmd: convert testpmd encap commands to new API > ethdev: remove vxlan and nvgre encapsulation commands Reminder of this patchset, any reviews welcome.
On 10/9/18 7:48 PM, Ferruh Yigit wrote: > On 10/7/2018 1:57 PM, Ori Kam wrote: >> This series implement the generic L2/L3 tunnel encapsulation actions >> and is based on rfc [1] "add generic L2/L3 tunnel encapsulation actions" >> >> Currenlty the encap/decap actions only support encapsulation >> of VXLAN and NVGRE L2 packets (L2 encapsulation is where >> the inner packet has a valid Ethernet header, while L3 encapsulation >> is where the inner packet doesn't have the Ethernet header). >> In addtion the parameter to to the encap action is a list of rte items, >> this results in 2 extra translation, between the application to the action >> and from the action to the NIC. This results in negetive impact on the >> insertion performance. >> >> Looking forward there are going to be a need to support many more tunnel >> encapsulations. For example MPLSoGRE, MPLSoUDP. >> Adding the new encapsulation will result in duplication of code. >> For example the code for handling NVGRE and VXLAN are exactly the same, >> and each new tunnel will have the same exact structure. >> >> This series introduce a generic encapsulation for L2 tunnel types, and >> generic encapsulation for L3 tunnel types. In addtion the new >> encapsulations commands are using raw buffer inorder to save the >> converstion time, both for the application and the PMD. >> >> [1]https://mails.dpdk.org/archives/dev/2018-August/109944.html >> >> v3: >> * rebase on tip. >> >> v2: >> * add missing decap_l3 structure. >> * fix typo. >> >> >> Ori Kam (3): >> ethdev: add generic L2/L3 tunnel encapsulation actions >> app/testpmd: convert testpmd encap commands to new API >> ethdev: remove vxlan and nvgre encapsulation commands > Reminder of this patchset, any reviews welcome. I've added the author of previous actions in recipients. I like the idea to generalize encap/decap actions. It makes a bit harder for reader to find which encap/decap actions are supported in fact, but it changes nothing for automated usage in the code - just try it (as a generic way used in rte_flow). Arguments about a way of encap/decap headers specification (flow items vs raw) sound sensible, but I'm not sure about it. It would be simpler if the tunnel header is added appended or removed as is, but as I understand it is not true. For example, IPv4 ID will be different in incoming packets to be decapsulated and different values should be used on encapsulation. Checksums will be different (but offloaded in any case). Current way allows to specify which fields do not matter and which one must match. It allows to say that, for example, VNI match is sufficient to decapsulate. Also arguments assume that action input is accepted as is by the HW. It could be true, but could be obviously false and HW interface may require parsed input (i.e. driver must parse the input buffer and extract required fields of packet headers). So, I'd say no. It should be better motivated if we change existing approach (even advertised as experimental).
> -----Original Message----- > From: Andrew Rybchenko <arybchenko@solarflare.com> > Sent: Wednesday, October 10, 2018 9:45 AM > To: Ferruh Yigit <ferruh.yigit@intel.com>; Ori Kam <orika@mellanox.com>; > stephen@networkplumber.org; Adrien Mazarguil > <adrien.mazarguil@6wind.com>; Declan Doherty <declan.doherty@intel.com> > Cc: dev@dpdk.org; Dekel Peled <dekelp@mellanox.com>; Thomas Monjalon > <thomas@monjalon.net>; Nélio Laranjeiro <nelio.laranjeiro@6wind.com>; > Yongseok Koh <yskoh@mellanox.com>; Shahaf Shuler > <shahafs@mellanox.com> > Subject: Re: [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation > actions > > On 10/9/18 7:48 PM, Ferruh Yigit wrote: > On 10/7/2018 1:57 PM, Ori Kam wrote: > This series implement the generic L2/L3 tunnel encapsulation actions > and is based on rfc [1] "add generic L2/L3 tunnel encapsulation actions" > > Currenlty the encap/decap actions only support encapsulation > of VXLAN and NVGRE L2 packets (L2 encapsulation is where > the inner packet has a valid Ethernet header, while L3 encapsulation > is where the inner packet doesn't have the Ethernet header). > In addtion the parameter to to the encap action is a list of rte items, > this results in 2 extra translation, between the application to the action > and from the action to the NIC. This results in negetive impact on the > insertion performance. > > Looking forward there are going to be a need to support many more tunnel > encapsulations. For example MPLSoGRE, MPLSoUDP. > Adding the new encapsulation will result in duplication of code. > For example the code for handling NVGRE and VXLAN are exactly the same, > and each new tunnel will have the same exact structure. > > This series introduce a generic encapsulation for L2 tunnel types, and > generic encapsulation for L3 tunnel types. In addtion the new > encapsulations commands are using raw buffer inorder to save the > converstion time, both for the application and the PMD. > > [1]https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail > s.dpdk.org%2Farchives%2Fdev%2F2018- > August%2F109944.html&data=02%7C01%7Corika%40mellanox.com%7C468bfe > a033d642c3af5a08d62e7c0bd2%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0 > %7C0%7C636747507642154668&sdata=cAMr3ThkyhjFrGv6K%2FvIDKHAskzMZI > E8cpRTWiBl1eA%3D&reserved=0 > > v3: > * rebase on tip. > > v2: > * add missing decap_l3 structure. > * fix typo. > > > Ori Kam (3): > ethdev: add generic L2/L3 tunnel encapsulation actions > app/testpmd: convert testpmd encap commands to new API > ethdev: remove vxlan and nvgre encapsulation commands > > Reminder of this patchset, any reviews welcome. > > I've added the author of previous actions in recipients. > > I like the idea to generalize encap/decap actions. It makes a bit harder > for reader to find which encap/decap actions are supported in fact, > but it changes nothing for automated usage in the code - just try it > (as a generic way used in rte_flow). > Even now the user doesn't know which encapsulation is supported since it is PMD and sometime kernel related. On the other end it simplify adding encapsulation to specific costumers with some time just FW update. > Arguments about a way of encap/decap headers specification (flow items > vs raw) sound sensible, but I'm not sure about it. > It would be simpler if the tunnel header is added appended or removed > as is, but as I understand it is not true. For example, IPv4 ID will be > different in incoming packets to be decapsulated and different values > should be used on encapsulation. Checksums will be different (but > offloaded in any case). > I'm not sure I understand your comment. Decapsulation is independent of encapsulation, for example if we decap L2 tunnel type then there is no parameter at all the NIC just removes the outer layers. > Current way allows to specify which fields do not matter and which one > must match. It allows to say that, for example, VNI match is sufficient > to decapsulate. > The encapsulation according to definition, is a list of headers that should encapsulate the packet. So I don't understand your comment about matching fields. The matching is based on the flow and the encapsulation is just data that should be added on top of the packet. > Also arguments assume that action input is accepted as is by the HW. > It could be true, but could be obviously false and HW interface may > require parsed input (i.e. driver must parse the input buffer and extract > required fields of packet headers). > You are correct there some PMD even Mellanox (for the E-Switch) require to parsed input There is no driver that knows rte_flow structure so in any case there should be Some translation between the encapsulation data and the NIC data. I agree that writing the code for translation can be harder in this approach, but the code is only written once is the insertion speed is much higher this way. Also like I said some Virtual Switches are already store this data in raw buffer (they update only specific fields) so this will also save time for the application when creating a rule. > So, I'd say no. It should be better motivated if we change existing > approach (even advertised as experimental). I think the reasons I gave are very good motivation to change the approach please also consider that there is no implementation yet that supports the old approach. while we do have code that uses the new approach. Best, Ori
On 10/10/18 12:00 PM, Ori Kam wrote: >> -----Original Message----- >> From: Andrew Rybchenko <arybchenko@solarflare.com> >> Sent: Wednesday, October 10, 2018 9:45 AM >> To: Ferruh Yigit <ferruh.yigit@intel.com>; Ori Kam <orika@mellanox.com>; >> stephen@networkplumber.org; Adrien Mazarguil >> <adrien.mazarguil@6wind.com>; Declan Doherty <declan.doherty@intel.com> >> Cc: dev@dpdk.org; Dekel Peled <dekelp@mellanox.com>; Thomas Monjalon >> <thomas@monjalon.net>; Nélio Laranjeiro <nelio.laranjeiro@6wind.com>; >> Yongseok Koh <yskoh@mellanox.com>; Shahaf Shuler >> <shahafs@mellanox.com> >> Subject: Re: [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation >> actions >> >> On 10/9/18 7:48 PM, Ferruh Yigit wrote: >> On 10/7/2018 1:57 PM, Ori Kam wrote: >> This series implement the generic L2/L3 tunnel encapsulation actions >> and is based on rfc [1] "add generic L2/L3 tunnel encapsulation actions" >> >> Currenlty the encap/decap actions only support encapsulation >> of VXLAN and NVGRE L2 packets (L2 encapsulation is where >> the inner packet has a valid Ethernet header, while L3 encapsulation >> is where the inner packet doesn't have the Ethernet header). >> In addtion the parameter to to the encap action is a list of rte items, >> this results in 2 extra translation, between the application to the action >> and from the action to the NIC. This results in negetive impact on the >> insertion performance. >> >> Looking forward there are going to be a need to support many more tunnel >> encapsulations. For example MPLSoGRE, MPLSoUDP. >> Adding the new encapsulation will result in duplication of code. >> For example the code for handling NVGRE and VXLAN are exactly the same, >> and each new tunnel will have the same exact structure. >> >> This series introduce a generic encapsulation for L2 tunnel types, and >> generic encapsulation for L3 tunnel types. In addtion the new >> encapsulations commands are using raw buffer inorder to save the >> converstion time, both for the application and the PMD. >> >> [1]https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail >> s.dpdk.org%2Farchives%2Fdev%2F2018- >> August%2F109944.html&data=02%7C01%7Corika%40mellanox.com%7C468bfe >> a033d642c3af5a08d62e7c0bd2%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0 >> %7C0%7C636747507642154668&sdata=cAMr3ThkyhjFrGv6K%2FvIDKHAskzMZI >> E8cpRTWiBl1eA%3D&reserved=0 >> >> v3: >> * rebase on tip. >> >> v2: >> * add missing decap_l3 structure. >> * fix typo. >> >> >> Ori Kam (3): >> ethdev: add generic L2/L3 tunnel encapsulation actions >> app/testpmd: convert testpmd encap commands to new API >> ethdev: remove vxlan and nvgre encapsulation commands >> >> Reminder of this patchset, any reviews welcome. >> >> I've added the author of previous actions in recipients. >> >> I like the idea to generalize encap/decap actions. It makes a bit harder >> for reader to find which encap/decap actions are supported in fact, >> but it changes nothing for automated usage in the code - just try it >> (as a generic way used in rte_flow). >> > Even now the user doesn't know which encapsulation is supported since > it is PMD and sometime kernel related. On the other end it simplify adding > encapsulation to specific costumers with some time just FW update. I was just trying to say that previous way is a bit easier to understand from sources that PMD pretends to support VXLAN or NVGRE encap/decap. In any case it is not that important, so OK to have the new way. >> Arguments about a way of encap/decap headers specification (flow items >> vs raw) sound sensible, but I'm not sure about it. >> It would be simpler if the tunnel header is added appended or removed >> as is, but as I understand it is not true. For example, IPv4 ID will be >> different in incoming packets to be decapsulated and different values >> should be used on encapsulation. Checksums will be different (but >> offloaded in any case). >> > I'm not sure I understand your comment. > Decapsulation is independent of encapsulation, for example if we decap > L2 tunnel type then there is no parameter at all the NIC just removes > the outer layers. OK, I've just mixed filtering and action parameters for decaps. My bad. The argument for encapsulation still makes sense since the header is not appended just as is. IP IDs change, lengths change, checksums change, however, I agree that it is natural and expected behaviour. >> Current way allows to specify which fields do not matter and which one >> must match. It allows to say that, for example, VNI match is sufficient >> to decapsulate. >> > The encapsulation according to definition, is a list of headers that should > encapsulate the packet. So I don't understand your comment about matching > fields. The matching is based on the flow and the encapsulation is just data > that should be added on top of the packet. Yes, my bad as I've described above. >> Also arguments assume that action input is accepted as is by the HW. >> It could be true, but could be obviously false and HW interface may >> require parsed input (i.e. driver must parse the input buffer and extract >> required fields of packet headers). >> > You are correct there some PMD even Mellanox (for the E-Switch) require to parsed input > There is no driver that knows rte_flow structure so in any case there should be > Some translation between the encapsulation data and the NIC data. > I agree that writing the code for translation can be harder in this approach, > but the code is only written once is the insertion speed is much higher this way. > Also like I said some Virtual Switches are already store this data in raw buffer > (they update only specific fields) so this will also save time for the application when > creating a rule. Yes, makes sense. >> So, I'd say no. It should be better motivated if we change existing >> approach (even advertised as experimental). > I think the reasons I gave are very good motivation to change the approach > please also consider that there is no implementation yet that supports the old approach. > while we do have code that uses the new approach. It is really bad practice that features are accepted without at least one implementation/usage. Thanks for the reply. I'll provide my comments on patches.
10/10/2018 11:30, Andrew Rybchenko: > It is really bad practice that features are accepted without at least > one implementation/usage. Yes. In future, we should take care of not accepting new API without at least one implementation.
Sorry if I'm a bit late to the discussion, please see below. On Wed, Oct 10, 2018 at 09:00:52AM +0000, Ori Kam wrote: <snip> > > On 10/7/2018 1:57 PM, Ori Kam wrote: > > This series implement the generic L2/L3 tunnel encapsulation actions > > and is based on rfc [1] "add generic L2/L3 tunnel encapsulation actions" > > > > Currenlty the encap/decap actions only support encapsulation > > of VXLAN and NVGRE L2 packets (L2 encapsulation is where > > the inner packet has a valid Ethernet header, while L3 encapsulation > > is where the inner packet doesn't have the Ethernet header). > > In addtion the parameter to to the encap action is a list of rte items, > > this results in 2 extra translation, between the application to the action > > and from the action to the NIC. This results in negetive impact on the > > insertion performance. Not sure it's a valid concern since in this proposal, PMD is still expected to interpret the opaque buffer contents regardless for validation and to convert it to its internal format. Worse, it will require a packet parser to iterate over enclosed headers instead of a list of convenient rte_flow_whatever objects. It won't be faster without the convenience of pointers to properly aligned structures that only contain relevant data fields. > > Looking forward there are going to be a need to support many more tunnel > > encapsulations. For example MPLSoGRE, MPLSoUDP. > > Adding the new encapsulation will result in duplication of code. > > For example the code for handling NVGRE and VXLAN are exactly the same, > > and each new tunnel will have the same exact structure. > > > > This series introduce a generic encapsulation for L2 tunnel types, and > > generic encapsulation for L3 tunnel types. In addtion the new > > encapsulations commands are using raw buffer inorder to save the > > converstion time, both for the application and the PMD. From a usability standpoint I'm not a fan of the current interface to perform NVGRE/VXLAN encap, however this proposal adds another layer of opaqueness in the name of making things more generic than rte_flow already is. Assuming they are not to be interpreted by PMDs, maybe there's a case for prepending arbitrary buffers to outgoing traffic and removing them from incoming traffic. However this feature should not be named "generic tunnel encap/decap" as it's misleading. Something like RTE_FLOW_ACTION_TYPE_HEADER_(PUSH|POP) would be more appropriate. I think on the "pop" side, only the size would matter. Another problem is that you must not require actions to rely on specific pattern content: [...] * * Decapsulate outer most tunnel from matched flow. * * The flow pattern must have a valid tunnel header */ RTE_FLOW_ACTION_TYPE_TUNNEL_DECAP, For maximum flexibility, all actions should be usable on their own on empty pattern. On the other hand, you can document undefined behavior when performing some action on traffic that doesn't contain something. Reason is that invalid traffic may have already been removed by other flow rules (or whatever happens) before such a rule is reached; it's a user's responsibility to provide such guarantee. When parsing an action, a PMD is not supposed to look at the pattern. Action list should contain all the needed info, otherwise it means the API is badly defined. I'm aware the above makes it tough to implement something like RTE_FLOW_ACTION_TYPE_TUNNEL_DECAP as defined in this series, but that's unfortunately why I think it must not be defined like that. My opinion is that the best generic approach to perform encap/decap with rte_flow would use one dedicated action per protocol header to add/remove/modify. This is the suggestion I originally made for VXLAN/NVGRE [2] and this is one of the reasons the order of actions now matters [3]. Remember that whatever is provided, be it an opaque buffer (like you did), a separate list of items (like VXLAN/NVGRE) or the rte_flow action list itself (what I'm suggesting to do), PMDs have to process it. There will be a CPU cost. Keep in mind odd use cases that involve QinQinQinQinQ. > > I like the idea to generalize encap/decap actions. It makes a bit harder > > for reader to find which encap/decap actions are supported in fact, > > but it changes nothing for automated usage in the code - just try it > > (as a generic way used in rte_flow). > > > > Even now the user doesn't know which encapsulation is supported since > it is PMD and sometime kernel related. On the other end it simplify adding > encapsulation to specific costumers with some time just FW update. Except for raw push/pop of uninterpreted headers, tunnel encapsulations not explicitly supported by rte_flow shouldn't be possible. Who will expect something that isn't defined by the API to work and rely on it in their application? I don't see it happening. Come on, adding new encap/decap actions to DPDK is shouldn't be such a pain that the only alternative is a generic API to work around me :) > > Arguments about a way of encap/decap headers specification (flow items > > vs raw) sound sensible, but I'm not sure about it. > > It would be simpler if the tunnel header is added appended or removed > > as is, but as I understand it is not true. For example, IPv4 ID will be > > different in incoming packets to be decapsulated and different values > > should be used on encapsulation. Checksums will be different (but > > offloaded in any case). > > > > I'm not sure I understand your comment. > Decapsulation is independent of encapsulation, for example if we decap > L2 tunnel type then there is no parameter at all the NIC just removes > the outer layers. According to the pattern? As described above, you can't rely on that. Pattern does not necessarily match the full stack of outer layers. Decap action must be able to determine what to do on its own, possibly in conjunction with other actions in the list but that's all. > > Current way allows to specify which fields do not matter and which one > > must match. It allows to say that, for example, VNI match is sufficient > > to decapsulate. > > > > The encapsulation according to definition, is a list of headers that should > encapsulate the packet. So I don't understand your comment about matching > fields. The matching is based on the flow and the encapsulation is just data > that should be added on top of the packet. > > > Also arguments assume that action input is accepted as is by the HW. > > It could be true, but could be obviously false and HW interface may > > require parsed input (i.e. driver must parse the input buffer and extract > > required fields of packet headers). > > > > You are correct there some PMD even Mellanox (for the E-Switch) require to parsed input > There is no driver that knows rte_flow structure so in any case there should be > Some translation between the encapsulation data and the NIC data. > I agree that writing the code for translation can be harder in this approach, > but the code is only written once is the insertion speed is much higher this way. Avoiding code duplication enough of a reason to do something. Yes NVGRE and VXLAN encap/decap should be redefined because of that. But IMO, they should prepend a single VXLAN or NVGRE header and be followed by other actions that in turn prepend a UDP header, an IPv4/IPv6 one, any number of VLAN headers and finally an Ethernet header. > Also like I said some Virtual Switches are already store this data in raw buffer > (they update only specific fields) so this will also save time for the application when > creating a rule. > > > So, I'd say no. It should be better motivated if we change existing > > approach (even advertised as experimental). > > I think the reasons I gave are very good motivation to change the approach > please also consider that there is no implementation yet that supports the > old approach. Well, although the existing API made this painful, I did submit one [4] and there's an updated version from Slava [5] for mlx5. > while we do have code that uses the new approach. If you need the ability to prepend a raw buffer, please consider a different name for the related actions, redefine them without reliance on specific pattern items and leave NVGRE/VXLAN encap/decap as is for the time being. They can deprecated anytime without ABI impact. On the other hand if that raw buffer is to be interpreted by the PMD for more intelligent tunnel encap/decap handling, I do not agree with the proposed approach for usability reasons. [2] [PATCH v3 2/4] ethdev: Add tunnel encap/decap actions https://mails.dpdk.org/archives/dev/2018-April/096418.html [3] ethdev: alter behavior of flow API actions https://git.dpdk.org/dpdk/commit/?id=cc17feb90413 [4] net/mlx5: add VXLAN encap support to switch flow rules https://mails.dpdk.org/archives/dev/2018-August/110598.html [5] net/mlx5: e-switch VXLAN flow validation routine https://mails.dpdk.org/archives/dev/2018-October/113782.html
Hi PSB. > -----Original Message----- > From: Adrien Mazarguil <adrien.mazarguil@6wind.com> > Sent: Wednesday, October 10, 2018 3:02 PM > To: Ori Kam <orika@mellanox.com> > Cc: Andrew Rybchenko <arybchenko@solarflare.com>; Ferruh Yigit > <ferruh.yigit@intel.com>; stephen@networkplumber.org; Declan Doherty > <declan.doherty@intel.com>; dev@dpdk.org; Dekel Peled > <dekelp@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>; Nélio > Laranjeiro <nelio.laranjeiro@6wind.com>; Yongseok Koh > <yskoh@mellanox.com>; Shahaf Shuler <shahafs@mellanox.com> > Subject: Re: [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation > actions > > Sorry if I'm a bit late to the discussion, please see below. > > On Wed, Oct 10, 2018 at 09:00:52AM +0000, Ori Kam wrote: > <snip> > > > On 10/7/2018 1:57 PM, Ori Kam wrote: > > > This series implement the generic L2/L3 tunnel encapsulation actions > > > and is based on rfc [1] "add generic L2/L3 tunnel encapsulation actions" > > > > > > Currenlty the encap/decap actions only support encapsulation > > > of VXLAN and NVGRE L2 packets (L2 encapsulation is where > > > the inner packet has a valid Ethernet header, while L3 encapsulation > > > is where the inner packet doesn't have the Ethernet header). > > > In addtion the parameter to to the encap action is a list of rte items, > > > this results in 2 extra translation, between the application to the action > > > and from the action to the NIC. This results in negetive impact on the > > > insertion performance. > > Not sure it's a valid concern since in this proposal, PMD is still expected > to interpret the opaque buffer contents regardless for validation and to > convert it to its internal format. > This is the action to take, we should assume that the pattern is valid and not parse it at all. Another issue, we have a lot of complains about the time we take for validation, I know that currently we must validate the rule when creating it, but this can change, why should a rule that was validate and the only change is the IP dest of the encap data? virtual switch after creating the first flow are just modifying it so why force them into revalidating it? (but this issue is a different topic) > Worse, it will require a packet parser to iterate over enclosed headers > instead of a list of convenient rte_flow_whatever objects. It won't be > faster without the convenience of pointers to properly aligned structures > that only contain relevant data fields. > Also in the rte_item we are not aligned so there is no difference in performance, between the two approaches, In the rte_item actually we have unused pointer which are just a waste. Also needs to consider how application are using it. They are already have it in raw buffer so it saves the conversation time for the application. > > > Looking forward there are going to be a need to support many more tunnel > > > encapsulations. For example MPLSoGRE, MPLSoUDP. > > > Adding the new encapsulation will result in duplication of code. > > > For example the code for handling NVGRE and VXLAN are exactly the same, > > > and each new tunnel will have the same exact structure. > > > > > > This series introduce a generic encapsulation for L2 tunnel types, and > > > generic encapsulation for L3 tunnel types. In addtion the new > > > encapsulations commands are using raw buffer inorder to save the > > > converstion time, both for the application and the PMD. > > From a usability standpoint I'm not a fan of the current interface to > perform NVGRE/VXLAN encap, however this proposal adds another layer of > opaqueness in the name of making things more generic than rte_flow already > is. > I'm sorry but I don't understand why it is more opaqueness, as I see it is very simple just give the encapsulation data and that's it. For example on system that support number of encapsulations they don't need to call to a different function just to change the buffer. > Assuming they are not to be interpreted by PMDs, maybe there's a case for > prepending arbitrary buffers to outgoing traffic and removing them from > incoming traffic. However this feature should not be named "generic tunnel > encap/decap" as it's misleading. > > Something like RTE_FLOW_ACTION_TYPE_HEADER_(PUSH|POP) would be > more > appropriate. I think on the "pop" side, only the size would matter. > Maybe the name can be change but again the application does encapsulation so it will be more intuitive for it. > Another problem is that you must not require actions to rely on specific > pattern content: > I don't think this can be true anymore since for example what do you expect to happen when you place an action for example modify ip to packet with no ip? This may raise issues in the NIC. Same goes for decap after the flow is in the NIC he must assume that he can remove otherwise really unexpected beaver can accord. > [...] > * > * Decapsulate outer most tunnel from matched flow. > * > * The flow pattern must have a valid tunnel header > */ > RTE_FLOW_ACTION_TYPE_TUNNEL_DECAP, > > For maximum flexibility, all actions should be usable on their own on empty > pattern. On the other hand, you can document undefined behavior when > performing some action on traffic that doesn't contain something. > Like I said and like it is already defined for VXLAN_enacp we must know the pattern otherwise the rule can be declined in Kernel / crash when trying to decap packet without outer tunnel. > Reason is that invalid traffic may have already been removed by other flow > rules (or whatever happens) before such a rule is reached; it's a user's > responsibility to provide such guarantee. > > When parsing an action, a PMD is not supposed to look at the pattern. Action > list should contain all the needed info, otherwise it means the API is badly > defined. > > I'm aware the above makes it tough to implement something like > RTE_FLOW_ACTION_TYPE_TUNNEL_DECAP as defined in this series, but that's > unfortunately why I think it must not be defined like that. > > My opinion is that the best generic approach to perform encap/decap with > rte_flow would use one dedicated action per protocol header to > add/remove/modify. This is the suggestion I originally made for > VXLAN/NVGRE [2] and this is one of the reasons the order of actions now > matters [3]. I agree that your approach make a lot of sense, but there are number of issues with it * it is harder and takes more time from the application point of view. * it is slower when compared to the raw buffer. > > Remember that whatever is provided, be it an opaque buffer (like you did), a > separate list of items (like VXLAN/NVGRE) or the rte_flow action list itself > (what I'm suggesting to do), PMDs have to process it. There will be a CPU > cost. Keep in mind odd use cases that involve QinQinQinQinQ. > > > > I like the idea to generalize encap/decap actions. It makes a bit harder > > > for reader to find which encap/decap actions are supported in fact, > > > but it changes nothing for automated usage in the code - just try it > > > (as a generic way used in rte_flow). > > > > > > > Even now the user doesn't know which encapsulation is supported since > > it is PMD and sometime kernel related. On the other end it simplify adding > > encapsulation to specific costumers with some time just FW update. > > Except for raw push/pop of uninterpreted headers, tunnel encapsulations not > explicitly supported by rte_flow shouldn't be possible. Who will expect > something that isn't defined by the API to work and rely on it in their > application? I don't see it happening. > Some of our customers are working with private tunnel type, and they can configure it using kernel or just new FW this is a real use case. > Come on, adding new encap/decap actions to DPDK is shouldn't be such a pain > that the only alternative is a generic API to work around me :) > Yes but like I said when a costumer asks for a ecnap and I can give it to him why wait for the DPDK next release? > > > Arguments about a way of encap/decap headers specification (flow items > > > vs raw) sound sensible, but I'm not sure about it. > > > It would be simpler if the tunnel header is added appended or removed > > > as is, but as I understand it is not true. For example, IPv4 ID will be > > > different in incoming packets to be decapsulated and different values > > > should be used on encapsulation. Checksums will be different (but > > > offloaded in any case). > > > > > > > I'm not sure I understand your comment. > > Decapsulation is independent of encapsulation, for example if we decap > > L2 tunnel type then there is no parameter at all the NIC just removes > > the outer layers. > > According to the pattern? As described above, you can't rely on that. > Pattern does not necessarily match the full stack of outer layers. > > Decap action must be able to determine what to do on its own, possibly in > conjunction with other actions in the list but that's all. > Decap removes the outer headers. Some tunnels don't have inner L2 and it must be added after the decap this is what L3 decap means, and the user must supply the valid L2 header. > > > Current way allows to specify which fields do not matter and which one > > > must match. It allows to say that, for example, VNI match is sufficient > > > to decapsulate. > > > > > > > The encapsulation according to definition, is a list of headers that should > > encapsulate the packet. So I don't understand your comment about matching > > fields. The matching is based on the flow and the encapsulation is just data > > that should be added on top of the packet. > > > > > Also arguments assume that action input is accepted as is by the HW. > > > It could be true, but could be obviously false and HW interface may > > > require parsed input (i.e. driver must parse the input buffer and extract > > > required fields of packet headers). > > > > > > > You are correct there some PMD even Mellanox (for the E-Switch) require to > parsed input > > There is no driver that knows rte_flow structure so in any case there should > be > > Some translation between the encapsulation data and the NIC data. > > I agree that writing the code for translation can be harder in this approach, > > but the code is only written once is the insertion speed is much higher this > way. > > Avoiding code duplication enough of a reason to do something. Yes NVGRE and > VXLAN encap/decap should be redefined because of that. But IMO, they should > prepend a single VXLAN or NVGRE header and be followed by other actions that > in turn prepend a UDP header, an IPv4/IPv6 one, any number of VLAN headers > and finally an Ethernet header. > > > Also like I said some Virtual Switches are already store this data in raw buffer > > (they update only specific fields) so this will also save time for the application > when > > creating a rule. > > > > > So, I'd say no. It should be better motivated if we change existing > > > approach (even advertised as experimental). > > > > I think the reasons I gave are very good motivation to change the approach > > please also consider that there is no implementation yet that supports the > > old approach. > > Well, although the existing API made this painful, I did submit one [4] and > there's an updated version from Slava [5] for mlx5. > > > while we do have code that uses the new approach. > > If you need the ability to prepend a raw buffer, please consider a different > name for the related actions, redefine them without reliance on specific > pattern items and leave NVGRE/VXLAN encap/decap as is for the time > being. They can deprecated anytime without ABI impact. > > On the other hand if that raw buffer is to be interpreted by the PMD for > more intelligent tunnel encap/decap handling, I do not agree with the > proposed approach for usability reasons. > > [2] [PATCH v3 2/4] ethdev: Add tunnel encap/decap actions > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d > pdk.org%2Farchives%2Fdev%2F2018- > April%2F096418.html&data=02%7C01%7Corika%40mellanox.com%7C7b9 > 9c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b% > 7C0%7C0%7C636747697489048905&sdata=prABlYixGAkdnyZ2cetpgz5%2F > vkMmiC66T3ZNE%2FewkQ4%3D&reserved=0 > > [3] ethdev: alter behavior of flow API actions > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.dpdk > .org%2Fdpdk%2Fcommit%2F%3Fid%3Dcc17feb90413&data=02%7C01%7C > orika%40mellanox.com%7C7b99c5f781424ba7950608d62ea83efa%7Ca652971 > c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636747697489058915&sdata > =VavsHXeQ3SgMzaTlklBWdkKSEBjELMp9hwUHBlLQlVA%3D&reserved=0 > > [4] net/mlx5: add VXLAN encap support to switch flow rules > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d > pdk.org%2Farchives%2Fdev%2F2018- > August%2F110598.html&data=02%7C01%7Corika%40mellanox.com%7C7b > 99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b > %7C0%7C0%7C636747697489058915&sdata=lpfDWp9oBN8AFNYZ6VL5BjI > 38SDFt91iuU7pvhbC%2F0E%3D&reserved=0 > > [5] net/mlx5: e-switch VXLAN flow validation routine > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d > pdk.org%2Farchives%2Fdev%2F2018- > October%2F113782.html&data=02%7C01%7Corika%40mellanox.com%7C7 > b99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461 > b%7C0%7C0%7C636747697489058915&sdata=8GCbYk6uB2ahZaHaqWX4 > OOq%2B7ZLwxiApcs%2FyRAT9qOw%3D&reserved=0 > > -- > Adrien Mazarguil > 6WIND
On Wed, Oct 10, 2018 at 01:17:01PM +0000, Ori Kam wrote: <snip> > > -----Original Message----- > > From: Adrien Mazarguil <adrien.mazarguil@6wind.com> <snip> > > On Wed, Oct 10, 2018 at 09:00:52AM +0000, Ori Kam wrote: > > <snip> > > > > On 10/7/2018 1:57 PM, Ori Kam wrote: <snip> > > > > In addtion the parameter to to the encap action is a list of rte items, > > > > this results in 2 extra translation, between the application to the action > > > > and from the action to the NIC. This results in negetive impact on the > > > > insertion performance. > > > > Not sure it's a valid concern since in this proposal, PMD is still expected > > to interpret the opaque buffer contents regardless for validation and to > > convert it to its internal format. > > > This is the action to take, we should assume > that the pattern is valid and not parse it at all. > Another issue, we have a lot of complains about the time we take > for validation, I know that currently we must validate the rule when creating it, > but this can change, why should a rule that was validate and the only change > is the IP dest of the encap data? > virtual switch after creating the first flow are just modifying it so why force > them into revalidating it? (but this issue is a different topic) Did you measure what proportion of time is spent on validation when creating a flow rule? Based on past experience with mlx4/mlx5, creation used to involve a number of expensive system calls while validation was basically a single logic loop checking individual items/actions while performing conversion to HW format (mandatory for creation). Context switches related to kernel involvement are the true performance killers. I'm not sure this is a valid argument in favor of this approach since flow rule validation still needs to happen regardless. By the way, applications are not supposed to call rte_flow_validate() before rte_flow_create(). The former can be helpful in some cases (e.g. to get a rough idea of PMD capabilities during initialization) but they should in practice only rely on rte_flow_create(), then fall back to software processing if that fails. > > Worse, it will require a packet parser to iterate over enclosed headers > > instead of a list of convenient rte_flow_whatever objects. It won't be > > faster without the convenience of pointers to properly aligned structures > > that only contain relevant data fields. > > > Also in the rte_item we are not aligned so there is no difference in performance, > between the two approaches, In the rte_item actually we have unused pointer which > are just a waste. Regarding unused pointers: right, VXLAN/NVGRE encap actions shouldn't have relied on _pattern item_ structures, the room for their "last" pointer is arguably wasted. On the other hand, the "mask" pointer allows masking relevant fields that matter to the application (e.g. source/destination addresses as opposed to IPv4 length, version and other irrelevant fields for encap). Not sure why you think it's not aligned. We're comparing an array of rte_flow_item objects with raw packet data. The latter requires interpretation of each protocol header to jump to the next offset. This is more complex on both sides: to build such a buffer for the application, then to have it processed by the PMD. > Also needs to consider how application are using it. They are already have it in raw buffer > so it saves the conversation time for the application. I don't think so. Applications typically know where some traffic is supposed to go and what VNI it should use. They don't have a prefabricated packet handy to prepend to outgoing traffic. If that was the case they'd most likely do so themselves through a extra packet segment and not bother with PMD offloads. <snip> > > From a usability standpoint I'm not a fan of the current interface to > > perform NVGRE/VXLAN encap, however this proposal adds another layer of > > opaqueness in the name of making things more generic than rte_flow already > > is. > > > I'm sorry but I don't understand why it is more opaqueness, as I see it is very simple > just give the encapsulation data and that's it. For example on system that support number of > encapsulations they don't need to call to a different function just to change the buffer. I'm saying it's opaque from an API standpoint if you expect the PMD to interpret that buffer's contents in order to prepend it in a smart way. Since this generic encap does not support masks, there is no way for an application to at least tell a PMD what data matters and what doesn't in the provided buffer. This means invalid checksums, lengths and so on must be sent as is to the wire. What's the use case for such a behavior? > > Assuming they are not to be interpreted by PMDs, maybe there's a case for > > prepending arbitrary buffers to outgoing traffic and removing them from > > incoming traffic. However this feature should not be named "generic tunnel > > encap/decap" as it's misleading. > > > > Something like RTE_FLOW_ACTION_TYPE_HEADER_(PUSH|POP) would be > > more > > appropriate. I think on the "pop" side, only the size would matter. > > > Maybe the name can be change but again the application does encapsulation so it will > be more intuitive for it. > > > Another problem is that you must not require actions to rely on specific > > pattern content: > > > I don't think this can be true anymore since for example what do you expect > to happen when you place an action for example modify ip to packet with no ip? > > This may raise issues in the NIC. > Same goes for decap after the flow is in the NIC he must assume that he can remove otherwise > really unexpected beaver can accord. Right, that's why it must be documented as undefined behavior. The API is not supposed to enforce the relationship. A PMD may require the presence of some pattern item in order to perform some action, but this is a PMD limitation, not a limitation of the API itself. <snip> > For maximum flexibility, all actions should be usable on their own on empty > > pattern. On the other hand, you can document undefined behavior when > > performing some action on traffic that doesn't contain something. > > > > Like I said and like it is already defined for VXLAN_enacp we must know > the pattern otherwise the rule can be declined in Kernel / crash when trying to decap > packet without outer tunnel. Right, PMD limitation then. You are free to document it in the PMD. <snip> > > My opinion is that the best generic approach to perform encap/decap with > > rte_flow would use one dedicated action per protocol header to > > add/remove/modify. This is the suggestion I originally made for > > VXLAN/NVGRE [2] and this is one of the reasons the order of actions now > > matters [3]. > > I agree that your approach make a lot of sense, but there are number of issues with it > * it is harder and takes more time from the application point of view. > * it is slower when compared to the raw buffer. I'm convinced of the opposite :) We could try to implement your raw buffer approach as well as mine in testpmd (one action per layer, not the current VXLAN/NVGRE encap mess mind you) in order to determine which is the most convenient on the application side. <snip> > > Except for raw push/pop of uninterpreted headers, tunnel encapsulations not > > explicitly supported by rte_flow shouldn't be possible. Who will expect > > something that isn't defined by the API to work and rely on it in their > > application? I don't see it happening. > > > Some of our customers are working with private tunnel type, and they can configure it using kernel > or just new FW this is a real use case. You can already use negative types to quickly address HW and customer-specific needs by the way. Could this [6] perhaps address the issue? PMDs can expose public APIs. You could devise one that spits new negative item/action types based on some data, to be subsequently used by flow rules with that PMD only. > > Come on, adding new encap/decap actions to DPDK is shouldn't be such a pain > > that the only alternative is a generic API to work around me :) > > > > Yes but like I said when a costumer asks for a ecnap and I can give it to him why wait for the DPDK next release? I don't know, is rte_flow held to a special standard compared to other DPDK features in this regard? Engineering patches can always be provided, backported and whatnot. Customer applications will have to be modified and recompiled to benefit from any new FW capabilities regardless, it's extremely unlikely to be just a matter of installing a new FW image. <snip> > > Pattern does not necessarily match the full stack of outer layers. > > > > Decap action must be able to determine what to do on its own, possibly in > > conjunction with other actions in the list but that's all. > > > Decap removes the outer headers. > Some tunnels don't have inner L2 and it must be added after the decap > this is what L3 decap means, and the user must supply the valid L2 header. My point is that any data required to perform decap must be provided by the decap action itself, not through a pattern item, whose only purpose is to filter traffic and may not be present. Precisely what you did for L3 decap. <snip> > > > I think the reasons I gave are very good motivation to change the approach > > > please also consider that there is no implementation yet that supports the > > > old approach. > > > > Well, although the existing API made this painful, I did submit one [4] and > > there's an updated version from Slava [5] for mlx5. > > > > > while we do have code that uses the new approach. > > > > If you need the ability to prepend a raw buffer, please consider a different > > name for the related actions, redefine them without reliance on specific > > pattern items and leave NVGRE/VXLAN encap/decap as is for the time > > being. They can deprecated anytime without ABI impact. > > > > On the other hand if that raw buffer is to be interpreted by the PMD for > > more intelligent tunnel encap/decap handling, I do not agree with the > > proposed approach for usability reasons. I'm still not convinced by your approach. If these new actions *must* be included unmodified right now to prevent some customer cataclysm, then fine as an experiment but please leave VXLAN/NVGRE encaps alone for the time being. > > [2] [PATCH v3 2/4] ethdev: Add tunnel encap/decap actions > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d > > pdk.org%2Farchives%2Fdev%2F2018- > > April%2F096418.html&data=02%7C01%7Corika%40mellanox.com%7C7b9 > > 9c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b% > > 7C0%7C0%7C636747697489048905&sdata=prABlYixGAkdnyZ2cetpgz5%2F > > vkMmiC66T3ZNE%2FewkQ4%3D&reserved=0 > > > > [3] ethdev: alter behavior of flow API actions > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.dpdk > > .org%2Fdpdk%2Fcommit%2F%3Fid%3Dcc17feb90413&data=02%7C01%7C > > orika%40mellanox.com%7C7b99c5f781424ba7950608d62ea83efa%7Ca652971 > > c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636747697489058915&sdata > > =VavsHXeQ3SgMzaTlklBWdkKSEBjELMp9hwUHBlLQlVA%3D&reserved=0 > > > > [4] net/mlx5: add VXLAN encap support to switch flow rules > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d > > pdk.org%2Farchives%2Fdev%2F2018- > > August%2F110598.html&data=02%7C01%7Corika%40mellanox.com%7C7b > > 99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b > > %7C0%7C0%7C636747697489058915&sdata=lpfDWp9oBN8AFNYZ6VL5BjI > > 38SDFt91iuU7pvhbC%2F0E%3D&reserved=0 > > > > [5] net/mlx5: e-switch VXLAN flow validation routine > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d > > pdk.org%2Farchives%2Fdev%2F2018- > > October%2F113782.html&data=02%7C01%7Corika%40mellanox.com%7C7 > > b99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461 > > b%7C0%7C0%7C636747697489058915&sdata=8GCbYk6uB2ahZaHaqWX4 > > OOq%2B7ZLwxiApcs%2FyRAT9qOw%3D&reserved=0 [6] "9.2.9. Negative types" http://doc.dpdk.org/guides-18.08/prog_guide/rte_flow.html#negative-types On an unrelated note, is there a way to prevent Outlook from mangling URLs on your side? (those emea01.safelinks things)
Hi Adrian, Thanks for your comments please see my answer below and inline. Due to a very short time limit and the fact that we have more than 4 patches that are based on this we need to close it fast. As I can see there are number of options: * the old approach that neither of us like. And which mean that for every tunnel we create a new command. * My proposed suggestion as is. Which is easier for at least number of application to implement and faster in most cases. * My suggestion with different name, but then we need to find also a name for the decap and also a name for decap_l3. This approach is also problematic since we have 2 API that are doing the same thig. For example in test-pmd encap vxlan in which API shell we use? * Combine between my suggestion and the current one by replacing the raw buffer with list of items. Less code duplication easier on the validation ( that don't think we need to validate the encap data) but we loss insertion rate. * your suggestion of list of action that each action is one item. Main problem is speed. Complexity form the application side and time to implement. > -----Original Message----- > From: Adrien Mazarguil <adrien.mazarguil@6wind.com> > Sent: Wednesday, October 10, 2018 7:10 PM > To: Ori Kam <orika@mellanox.com> > Cc: Andrew Rybchenko <arybchenko@solarflare.com>; Ferruh Yigit > <ferruh.yigit@intel.com>; stephen@networkplumber.org; Declan Doherty > <declan.doherty@intel.com>; dev@dpdk.org; Dekel Peled > <dekelp@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>; Nélio > Laranjeiro <nelio.laranjeiro@6wind.com>; Yongseok Koh > <yskoh@mellanox.com>; Shahaf Shuler <shahafs@mellanox.com> > Subject: Re: [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation > actions > > On Wed, Oct 10, 2018 at 01:17:01PM +0000, Ori Kam wrote: > <snip> > > > -----Original Message----- > > > From: Adrien Mazarguil <adrien.mazarguil@6wind.com> > <snip> > > > On Wed, Oct 10, 2018 at 09:00:52AM +0000, Ori Kam wrote: > > > <snip> > > > > > On 10/7/2018 1:57 PM, Ori Kam wrote: > <snip> > > > > > In addtion the parameter to to the encap action is a list of rte items, > > > > > this results in 2 extra translation, between the application to the action > > > > > and from the action to the NIC. This results in negetive impact on the > > > > > insertion performance. > > > > > > Not sure it's a valid concern since in this proposal, PMD is still expected > > > to interpret the opaque buffer contents regardless for validation and to > > > convert it to its internal format. > > > > > This is the action to take, we should assume > > that the pattern is valid and not parse it at all. > > Another issue, we have a lot of complains about the time we take > > for validation, I know that currently we must validate the rule when creating > it, > > but this can change, why should a rule that was validate and the only change > > is the IP dest of the encap data? > > virtual switch after creating the first flow are just modifying it so why force > > them into revalidating it? (but this issue is a different topic) > > Did you measure what proportion of time is spent on validation when creating > a flow rule? > > Based on past experience with mlx4/mlx5, creation used to involve a number > of expensive system calls while validation was basically a single logic loop > checking individual items/actions while performing conversion to HW > format (mandatory for creation). Context switches related to kernel > involvement are the true performance killers. > I'm sorry to say I don't have the numbers, but I can tell you that in the new API in most cases there will be just one system call. In addition any extra time is a wasted time, again this is a request we got from number of customers. > I'm not sure this is a valid argument in favor of this approach since flow > rule validation still needs to happen regardless. > > By the way, applications are not supposed to call rte_flow_validate() before > rte_flow_create(). The former can be helpful in some cases (e.g. to get a > rough idea of PMD capabilities during initialization) but they should in > practice only rely on rte_flow_create(), then fall back to software > processing if that fails. > First I don't think we need to validate the encapsulation data if the data is wrong then there will the packet will be dropped. Just like you are saying with the restrication of the flow items it is the responsibility of the application. Also I said there is a demand for costumers and there is no reason not to do it but in any case this is not relevant for the current patch. > > > Worse, it will require a packet parser to iterate over enclosed headers > > > instead of a list of convenient rte_flow_whatever objects. It won't be > > > faster without the convenience of pointers to properly aligned structures > > > that only contain relevant data fields. > > > > > Also in the rte_item we are not aligned so there is no difference in > performance, > > between the two approaches, In the rte_item actually we have unused > pointer which > > are just a waste. > > Regarding unused pointers: right, VXLAN/NVGRE encap actions shouldn't have > relied on _pattern item_ structures, the room for their "last" pointer is > arguably wasted. On the other hand, the "mask" pointer allows masking > relevant fields that matter to the application (e.g. source/destination > addresses as opposed to IPv4 length, version and other irrelevant fields for > encap). > At least according to my testing the NIC can't uses masks and and it is working based on the offloading configured to any packet (like checksum ) > Not sure why you think it's not aligned. We're comparing an array of > rte_flow_item objects with raw packet data. The latter requires > interpretation of each protocol header to jump to the next offset. This is > more complex on both sides: to build such a buffer for the application, then > to have it processed by the PMD. > Maybe I missing something but the in a buffer approach likely all the data will be in the cache and will if allocated will also be aligned. On the other hand the rte_items also are not guarantee to be in the same cache line each access to item may result in a cache miss. Also accessing individual members are just as accessing them in raw buffer. > > Also needs to consider how application are using it. They are already have it > in raw buffer > > so it saves the conversation time for the application. > > I don't think so. Applications typically know where some traffic is supposed > to go and what VNI it should use. They don't have a prefabricated packet > handy to prepend to outgoing traffic. If that was the case they'd most > likely do so themselves through a extra packet segment and not bother with > PMD offloads. > Contrail V-Router has such a buffer and it just changes the specific fields. This is one of the thing we wants to offload, from my last check also OVS uses such buffer. > <snip> > > > From a usability standpoint I'm not a fan of the current interface to > > > perform NVGRE/VXLAN encap, however this proposal adds another layer of > > > opaqueness in the name of making things more generic than rte_flow > already > > > is. > > > > > I'm sorry but I don't understand why it is more opaqueness, as I see it is very > simple > > just give the encapsulation data and that's it. For example on system that > support number of > > encapsulations they don't need to call to a different function just to change > the buffer. > > I'm saying it's opaque from an API standpoint if you expect the PMD to > interpret that buffer's contents in order to prepend it in a smart way. > > Since this generic encap does not support masks, there is no way for an > application to at least tell a PMD what data matters and what doesn't in the > provided buffer. This means invalid checksums, lengths and so on must be > sent as is to the wire. What's the use case for such a behavior? > The NIC treats the packet as normal packet that goes throw all normal offloading. > > > Assuming they are not to be interpreted by PMDs, maybe there's a case for > > > prepending arbitrary buffers to outgoing traffic and removing them from > > > incoming traffic. However this feature should not be named "generic tunnel > > > encap/decap" as it's misleading. > > > > > > Something like RTE_FLOW_ACTION_TYPE_HEADER_(PUSH|POP) would be > > > more > > > appropriate. I think on the "pop" side, only the size would matter. > > > > > Maybe the name can be change but again the application does encapsulation > so it will > > be more intuitive for it. > > > > > Another problem is that you must not require actions to rely on specific > > > pattern content: > > > > > I don't think this can be true anymore since for example what do you expect > > to happen when you place an action for example modify ip to packet with no > ip? > > > > This may raise issues in the NIC. > > Same goes for decap after the flow is in the NIC he must assume that he can > remove otherwise > > really unexpected beaver can accord. > > Right, that's why it must be documented as undefined behavior. The API is > not supposed to enforce the relationship. A PMD may require the presence of > some pattern item in order to perform some action, but this is a PMD > limitation, not a limitation of the API itself. > Agree > <snip> > > For maximum flexibility, all actions should be usable on their own on empty > > > pattern. On the other hand, you can document undefined behavior when > > > performing some action on traffic that doesn't contain something. > > > > > > > Like I said and like it is already defined for VXLAN_enacp we must know > > the pattern otherwise the rule can be declined in Kernel / crash when trying > to decap > > packet without outer tunnel. > > Right, PMD limitation then. You are free to document it in the PMD. > Agree > <snip> > > > My opinion is that the best generic approach to perform encap/decap with > > > rte_flow would use one dedicated action per protocol header to > > > add/remove/modify. This is the suggestion I originally made for > > > VXLAN/NVGRE [2] and this is one of the reasons the order of actions now > > > matters [3]. > > > > I agree that your approach make a lot of sense, but there are number of > issues with it > > * it is harder and takes more time from the application point of view. > > * it is slower when compared to the raw buffer. > > I'm convinced of the opposite :) We could try to implement your raw buffer > approach as well as mine in testpmd (one action per layer, not the current > VXLAN/NVGRE encap mess mind you) in order to determine which is the most > convenient on the application side. > There are 2 different implementations one for test-pmd and one for normal application. Writing the code in test-pmd in raw buffer is simpler but less flexible writing the code in a real application I think is simpler in the buffer approach. Since they already have a buffer. > <snip> > > > Except for raw push/pop of uninterpreted headers, tunnel encapsulations > not > > > explicitly supported by rte_flow shouldn't be possible. Who will expect > > > something that isn't defined by the API to work and rely on it in their > > > application? I don't see it happening. > > > > > Some of our customers are working with private tunnel type, and they can > configure it using kernel > > or just new FW this is a real use case. > > You can already use negative types to quickly address HW and > customer-specific needs by the way. Could this [6] perhaps address the > issue? > > PMDs can expose public APIs. You could devise one that spits new negative > item/action types based on some data, to be subsequently used by flow > rules with that PMD only. > > > > Come on, adding new encap/decap actions to DPDK is shouldn't be such a > pain > > > that the only alternative is a generic API to work around me :) > > > > > > > Yes but like I said when a costumer asks for a ecnap and I can give it to him > why wait for the DPDK next release? > > I don't know, is rte_flow held to a special standard compared to other DPDK > features in this regard? Engineering patches can always be provided, > backported and whatnot. > > Customer applications will have to be modified and recompiled to benefit > from any new FW capabilities regardless, it's extremely unlikely to be just > a matter of installing a new FW image. > In some cases this is what's happen 😊 > <snip> > > > Pattern does not necessarily match the full stack of outer layers. > > > > > > Decap action must be able to determine what to do on its own, possibly in > > > conjunction with other actions in the list but that's all. > > > > > Decap removes the outer headers. > > Some tunnels don't have inner L2 and it must be added after the decap > > this is what L3 decap means, and the user must supply the valid L2 header. > > My point is that any data required to perform decap must be provided by the > decap action itself, not through a pattern item, whose only purpose is to > filter traffic and may not be present. Precisely what you did for L3 decap. > Agree we remove the limitation and just say unpredicted result may accord. > <snip> > > > > I think the reasons I gave are very good motivation to change the > approach > > > > please also consider that there is no implementation yet that supports the > > > > old approach. > > > > > > Well, although the existing API made this painful, I did submit one [4] and > > > there's an updated version from Slava [5] for mlx5. > > > > > > > while we do have code that uses the new approach. > > > > > > If you need the ability to prepend a raw buffer, please consider a different > > > name for the related actions, redefine them without reliance on specific > > > pattern items and leave NVGRE/VXLAN encap/decap as is for the time > > > being. They can deprecated anytime without ABI impact. > > > > > > On the other hand if that raw buffer is to be interpreted by the PMD for > > > more intelligent tunnel encap/decap handling, I do not agree with the > > > proposed approach for usability reasons. > > I'm still not convinced by your approach. If these new actions *must* be > included unmodified right now to prevent some customer cataclysm, then fine > as an experiment but please leave VXLAN/NVGRE encaps alone for the time > being. > > > > [2] [PATCH v3 2/4] ethdev: Add tunnel encap/decap actions > > > > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d > > > pdk.org%2Farchives%2Fdev%2F2018- > > > > April%2F096418.html&data=02%7C01%7Corika%40mellanox.com%7C7b9 > > > > 9c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b% > > > > 7C0%7C0%7C636747697489048905&sdata=prABlYixGAkdnyZ2cetpgz5%2F > > > vkMmiC66T3ZNE%2FewkQ4%3D&reserved=0 > > > > > > [3] ethdev: alter behavior of flow API actions > > > > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.dpdk > > > > .org%2Fdpdk%2Fcommit%2F%3Fid%3Dcc17feb90413&data=02%7C01%7C > > > > orika%40mellanox.com%7C7b99c5f781424ba7950608d62ea83efa%7Ca652971 > > > > c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636747697489058915&sdata > > > > =VavsHXeQ3SgMzaTlklBWdkKSEBjELMp9hwUHBlLQlVA%3D&reserved=0 > > > > > > [4] net/mlx5: add VXLAN encap support to switch flow rules > > > > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d > > > pdk.org%2Farchives%2Fdev%2F2018- > > > > August%2F110598.html&data=02%7C01%7Corika%40mellanox.com%7C7b > > > > 99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b > > > > %7C0%7C0%7C636747697489058915&sdata=lpfDWp9oBN8AFNYZ6VL5BjI > > > 38SDFt91iuU7pvhbC%2F0E%3D&reserved=0 > > > > > > [5] net/mlx5: e-switch VXLAN flow validation routine > > > > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d > > > pdk.org%2Farchives%2Fdev%2F2018- > > > > October%2F113782.html&data=02%7C01%7Corika%40mellanox.com%7C7 > > > > b99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461 > > > > b%7C0%7C0%7C636747697489058915&sdata=8GCbYk6uB2ahZaHaqWX4 > > > OOq%2B7ZLwxiApcs%2FyRAT9qOw%3D&reserved=0 > > [6] "9.2.9. Negative types" > > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdoc.dpdk > .org%2Fguides-18.08%2Fprog_guide%2Frte_flow.html%23negative- > types&data=02%7C01%7Corika%40mellanox.com%7C52a7b66d888f47a02 > fa308d62ecae971%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C63 > 6747846398519627&sdata=Rn1s5FgQB8pSgjLvs3K2M4rX%2BVbK5Txi59iy > Q%2FbsUqQ%3D&reserved=0 > > On an unrelated note, is there a way to prevent Outlook from mangling URLs > on your side? (those emea01.safelinks things) > I will try to find a solution. I didn't find one so far. > -- > Adrien Mazarguil > 6WIND
Hey Ori, (removing most of the discussion, I'll only reply to the summary) On Thu, Oct 11, 2018 at 08:48:05AM +0000, Ori Kam wrote: > Hi Adrian, > > Thanks for your comments please see my answer below and inline. > > Due to a very short time limit and the fact that we have more than > 4 patches that are based on this we need to close it fast. > > As I can see there are number of options: > * the old approach that neither of us like. And which mean that for > every tunnel we create a new command. Just to be sure, you mean that for each new tunnel *type* a new rte_flow action *type* must be added to DPDK right? Because the above reads like with your proposal, a single flow rule can manage any number of TEPs and flow rule creation for subsequent tunnels can be somehow bypassed. One flow *rule* is still needed per TEP or did I miss something? > * My proposed suggestion as is. Which is easier for at least number of application > to implement and faster in most cases. > * My suggestion with different name, but then we need to find also a name > for the decap and also a name for decap_l3. This approach is also problematic > since we have 2 API that are doing the same thig. For example in test-pmd encap > vxlan in which API shell we use? Since you're doing this for MPLSoUDP and MPLSoGRE, you could leave VXLAN/NVGRE encap as is, especially since (AFAIK) there are series still relying on their API floating on the ML. > * Combine between my suggestion and the current one by replacing the raw > buffer with list of items. Less code duplication easier on the validation ( that > don't think we need to validate the encap data) but we loss insertion rate. Already suggested in the past [1], this led to VXLAN and NVGRE encap as we know them. > * your suggestion of list of action that each action is one item. Main problem > is speed. Complexity form the application side and time to implement. Speed matters a lot to me also (go figure) but I still doubt this approach is measurably faster. On the usability side, compared to one action per protocol layer which better fits the rte_flow model, I'm also not convinced. If we put aside usability and performance on which we'll never agree, there is still one outstanding issue: the lack of mask. Users cannot tell which fields are relevant and to be kept as is, and which are not. How do applications know what blanks are filled in by HW? How do PMDs know what applications expect? There's a risk of sending incomplete or malformed packets depending on the implementation. One may expect PMDs and HW to just "do the sensible thing" but some applications won't know that some fields are not offloaded and will be emitted with an unexpected value, while others will attempt to force a normally offloaded field to some specific value and expect it to leave unmodified. This cannot be predicted by the PMD, something is needed. Assuming you add a mask pointer to address this, generic encap should be functionally complete but not all that different from what we currently have for VXLAN/NVGRE and from Declan's earlier proposal for generic encap [1]; PMD must parse the buffer (using a proper packet parser with your approach), collect relevant fields, see if anything's unsupported while doing so before proceeding with the flow rule. Anyway, if you add that mask and rename these actions (since they should work with pretty much anything, not necessarily tunnels, i.e. lazy applications could ask HW to prepend missing Ethernet headers to pure IP traffic), they can make sense. How about labeling this "raw" encap/decap? RTE_FLOW_ACTION_TYPE_RAW_(ENCAP|DECAP) struct rte_flow_action_raw_encap { uint8_t *data; /**< Encapsulation data. */ uint8_t *preserve; /**< Bit-mask of @p data to preserve on output. */ size_t size; /**< Size of @p data and @p preserve. */ }; I guess decap could use the same object. Since there is no way to define a sensible default behavior that works across multiple vendors when "preserve" is not provided, I think this field cannot be NULL. As for "L3 decap", well, can't one just provide a separate encap action? I mean a raw decap action, followed by another action doing raw encap of the intended L2? A separate set of actions seems unnecessary for that. [1] "[PATCH v3 2/4] ethdev: Add tunnel encap/decap actions" https://mails.dpdk.org/archives/dev/2018-April/095733.html
Hi Adrian, > -----Original Message----- > From: Adrien Mazarguil <adrien.mazarguil@6wind.com> > Sent: Thursday, October 11, 2018 4:12 PM > To: Ori Kam <orika@mellanox.com> > Cc: Andrew Rybchenko <arybchenko@solarflare.com>; Ferruh Yigit > <ferruh.yigit@intel.com>; stephen@networkplumber.org; Declan Doherty > <declan.doherty@intel.com>; dev@dpdk.org; Dekel Peled > <dekelp@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>; Nélio > Laranjeiro <nelio.laranjeiro@6wind.com>; Yongseok Koh > <yskoh@mellanox.com>; Shahaf Shuler <shahafs@mellanox.com> > Subject: Re: [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation > actions > > Hey Ori, > > (removing most of the discussion, I'll only reply to the summary) > > On Thu, Oct 11, 2018 at 08:48:05AM +0000, Ori Kam wrote: > > Hi Adrian, > > > > Thanks for your comments please see my answer below and inline. > > > > Due to a very short time limit and the fact that we have more than > > 4 patches that are based on this we need to close it fast. > > > > As I can see there are number of options: > > * the old approach that neither of us like. And which mean that for > > every tunnel we create a new command. > > Just to be sure, you mean that for each new tunnel *type* a new rte_flow > action *type* must be added to DPDK right? Because the above reads like with > your proposal, a single flow rule can manage any number of TEPs and flow > rule creation for subsequent tunnels can be somehow bypassed. > > One flow *rule* is still needed per TEP or did I miss something? > Yes you are correct one rte_action per tunnel *type* must be added to the DPDK. Sorry I'm not sure what TEP is. > > * My proposed suggestion as is. Which is easier for at least number of > application > > to implement and faster in most cases. > > * My suggestion with different name, but then we need to find also a name > > for the decap and also a name for decap_l3. This approach is also > problematic > > since we have 2 API that are doing the same thig. For example in test-pmd > encap > > vxlan in which API shell we use? > > Since you're doing this for MPLSoUDP and MPLSoGRE, you could leave > VXLAN/NVGRE encap as is, especially since (AFAIK) there are series still > relying on their API floating on the ML. > I don't care about leaving also the old approach. I even got Acked for removing it 😊 The only issue is the duplication API. for example what will be the test-pmd encap vxlan ? If you agree that test PMD will be converted to the new one, I don't care about leaving the old one. > > * Combine between my suggestion and the current one by replacing the raw > > buffer with list of items. Less code duplication easier on the validation ( that > > don't think we need to validate the encap data) but we loss insertion rate. > > Already suggested in the past [1], this led to VXLAN and NVGRE encap as we > know them. > > > * your suggestion of list of action that each action is one item. Main problem > > is speed. Complexity form the application side and time to implement. > > Speed matters a lot to me also (go figure) but I still doubt this approach > is measurably faster. On the usability side, compared to one action per > protocol layer which better fits the rte_flow model, I'm also not > convinced. > > If we put aside usability and performance on which we'll never agree, there > is still one outstanding issue: the lack of mask. Users cannot tell which > fields are relevant and to be kept as is, and which are not. > 😊 > How do applications know what blanks are filled in by HW? How do PMDs know > what applications expect? There's a risk of sending incomplete or malformed > packets depending on the implementation. > > One may expect PMDs and HW to just "do the sensible thing" but some > applications won't know that some fields are not offloaded and will be > emitted with an unexpected value, while others will attempt to force a > normally offloaded field to some specific value and expect it to leave > unmodified. This cannot be predicted by the PMD, something is needed. > > Assuming you add a mask pointer to address this, generic encap should be > functionally complete but not all that different from what we currently have > for VXLAN/NVGRE and from Declan's earlier proposal for generic encap [1]; > PMD must parse the buffer (using a proper packet parser with your approach), > collect relevant fields, see if anything's unsupported while doing so before > proceeding with the flow rule. > There is no value for mask, since the NIC treats the packet as a normal packet this means that the NIC operation is dependent on the offloads configured. If for example the user configured VXLAN outer checksum then the NIC will modify this field. In any case all values must be given since the PMD cannot guess and insert them. From the NIC the packet after the adding the buffer must be a valid packet. > Anyway, if you add that mask and rename these actions (since they should work > with pretty much anything, not necessarily tunnels, i.e. lazy applications > could ask HW to prepend missing Ethernet headers to pure IP traffic), they > can make sense. How about labeling this "raw" encap/decap? > I like it. > RTE_FLOW_ACTION_TYPE_RAW_(ENCAP|DECAP) > > struct rte_flow_action_raw_encap { enum tunnel_type type; /**< VXLAN/ MPLSoGRE... / UNKNOWN. */ Then NICs can prase the packet even faster. And decide if it is supported. Uint8_t remove_l2; /**< Marks if the L2 should be removed before the encap. */ > uint8_t *data; /**< Encapsulation data. */ > uint8_t *preserve; /**< Bit-mask of @p data to preserve on output. */ Remove the preserve. Since like I said there is no meaning to this. > size_t size; /**< Size of @p data and @p preserve. */ > }; > > I guess decap could use the same object. Since there is no way to define a > sensible default behavior that works across multiple vendors when "preserve" > is not provided, I think this field cannot be NULL. > Like I said lets remove it. > As for "L3 decap", well, can't one just provide a separate encap action? > I mean a raw decap action, followed by another action doing raw encap of the > intended L2? A separate set of actions seems unnecessary for that. Agree we will have RTE_FLOW_ACTION_TYPE_RAW_DECAP Which decapsulate the outer headers, and then we will give the encap command with the L2 header. > > [1] "[PATCH v3 2/4] ethdev: Add tunnel encap/decap actions" > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d > pdk.org%2Farchives%2Fdev%2F2018- > April%2F095733.html&data=02%7C01%7Corika%40mellanox.com%7C1a1 > eb443d3de47d01fce08d62f7b3960%7Ca652971c7d2e4d9ba6a4d149256f461b% > 7C0%7C0%7C636748603638728232&sdata=hqge3zUCH%2FqIuqPwGPeSkY > miqaaOvc3xvJ4zaRz3JNg%3D&reserved=0 > > -- > Adrien Mazarguil > 6WIND Best, Ori