From patchwork Mon Jan 8 18:36:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eads, Gage" X-Patchwork-Id: 33128 X-Patchwork-Delegate: jerinj@marvell.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A7A9A1B1C3; Mon, 8 Jan 2018 19:36:42 +0100 (CET) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 383961B1C0 for ; Mon, 8 Jan 2018 19:36:39 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Jan 2018 10:36:38 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,330,1511856000"; d="scan'208";a="8795419" Received: from fmsmsx104.amr.corp.intel.com ([10.18.124.202]) by orsmga007.jf.intel.com with ESMTP; 08 Jan 2018 10:36:38 -0800 Received: from fmsmsx121.amr.corp.intel.com (10.18.125.36) by fmsmsx104.amr.corp.intel.com (10.18.124.202) with Microsoft SMTP Server (TLS) id 14.3.319.2; Mon, 8 Jan 2018 10:36:37 -0800 Received: from fmsmsx101.amr.corp.intel.com ([169.254.1.195]) by fmsmsx121.amr.corp.intel.com ([10.18.125.36]) with mapi id 14.03.0319.002; Mon, 8 Jan 2018 10:36:37 -0800 From: "Eads, Gage" To: Pavan Nikhilesh , "Van Haaren, Harry" , "jerin.jacob@caviumnetworks.com" , "santosh.shukla@caviumnetworks.com" CC: "dev@dpdk.org" Thread-Topic: [PATCH 2/2] event/sw: use dynamically-sized IQs Thread-Index: AQHTiJXubiKghoFbU06dxpB2N9MtfqNqpbsAgAAEOAD//57noA== Date: Mon, 8 Jan 2018 18:36:36 +0000 Message-ID: <9184057F7FC11744A2107296B6B8EB1E369CDE01@fmsmsx101.amr.corp.intel.com> References: <1512011314-19682-1-git-send-email-gage.eads@intel.com> <1512011314-19682-2-git-send-email-gage.eads@intel.com> <20180108153219.jszoepdgfiggn3bm@Pavan-LT> <20180108160529.gven7vlrbmrrlw2p@Pavan-LT> In-Reply-To: <20180108160529.gven7vlrbmrrlw2p@Pavan-LT> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiODFjZDM1YjYtOWNjZC00YWEyLThkZTgtMzk4Zjk1Y2I2ODY0IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6IkNxam5CSHFzSm5EZklRODVVTGZuSTRWSEJkSm1LVFBXMzhDYVNSM1ZmK289In0= x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-originating-ip: [10.1.200.107] MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 2/2] event/sw: use dynamically-sized IQs X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Pavan, Thanks for the report and the GDB output. We've reproduced this and traced it down to how the PMD (mis)handles the re-configuration case. When the SW PMD is reconfigured, it reallocates the IQ chunks and reinitializes the chunk freelist, but it doesn't delete the stale pointers in sw->qids[*].iq. This causes multiple references to the same IQ memory to exist in the system, eventually resulting in the segfault. I expect a proper fix will take us a day or two, but in the meantime the following change should fix the segfault ***for your specific usage only***: Mulling over the fix raises a question that the documentation is unclear on. If the user sends events into an eventdev, then calls rte_event_dev_stop() -> rte_event_dev_configure() -> rte_event_dev_start(), is the eventdev required to maintain any previously queued events? I would expect not. However, if the user calls calls rte_event_dev_stop() -> rte_event_queue_setup() -> rte_event_dev_start() (i.e. it is an additive reconfiguration), it seems more reasonable that the other event queues would maintain their contents. I'd imagine this is also hardware/device-dependent. Thanks, Gage > -----Original Message----- > From: Pavan Nikhilesh [mailto:pbhagavatula@caviumnetworks.com] > Sent: Monday, January 8, 2018 10:06 AM > To: Van Haaren, Harry ; Eads, Gage > ; jerin.jacob@caviumnetworks.com; > santosh.shukla@caviumnetworks.com > Cc: dev@dpdk.org > Subject: Re: [PATCH 2/2] event/sw: use dynamically-sized IQs > > On Mon, Jan 08, 2018 at 03:50:24PM +0000, Van Haaren, Harry wrote: > > > From: Pavan Nikhilesh [mailto:pbhagavatula@caviumnetworks.com] > > > Sent: Monday, January 8, 2018 3:32 PM > > > To: Eads, Gage ; Van Haaren, Harry > > > ; jerin.jacob@caviumnetworks.com; > > > santosh.shukla@caviumnetworks.com > > > Cc: dev@dpdk.org > > > Subject: Re: [PATCH 2/2] event/sw: use dynamically-sized IQs > > > > > > On Wed, Nov 29, 2017 at 09:08:34PM -0600, Gage Eads wrote: > > > > This commit introduces dynamically-sized IQs, by switching the > > > > underlying data structure from a fixed-size ring to a linked list of queue > 'chunks.' > > > > > > > > > Sw eventdev crashes when used alongside Rx adapter. The crash > > > happens when pumping traffic at > 1.4mpps. This commit seems responsible > for this. > > > > > > > > > Apply the following Rx adapter patch > > > http://dpdk.org/dev/patchwork/patch/31977/ > > > Command used: > > > ./build/eventdev_pipeline_sw_pmd -c 0xfffff8 --vdev="event_sw" -- > > > -r0x800 > > > -t0x100 -w F000 -e 0x10 > > > > Applied the patch to current master, recompiled; cannot reproduce here.. > > > master in the sense dpdk-next-eventdev right? > > Is it 100% reproducible and "instant" or can it take some time to occur there? > > > It is instant > > > > > Backtrace: > > > > > > Thread 4 "lcore-slave-4" received signal SIGSEGV, Segmentation fault. > > > [Switching to Thread 0xffffb6c8f040 (LWP 25291)] > > > 0x0000aaaaaadcc0d4 in iq_dequeue_burst (count=48, ev=0xffffb6c8dd38, > > > iq=0xffff9f764720, sw=0xffff9f332600) at > > > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:14 > > > 2 > > > 142 ev[total++] = current->events[index++]; > > > > Could we get the output of (gdb) info locals? > > > > Thread 4 "lcore-slave-4" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0xffffb6c8f040 (LWP 19751)] > 0x0000aaaaaadcc0d4 in iq_dequeue_burst (count=48, ev=0xffffb6c8dd38, > iq=0xffff9f764620, sw=0xffff9f332500) at > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:142 > 142 ev[total++] = current->events[index++]; > > (gdb) info locals > next = 0x7000041400be73b > current = 0x7000041400be73b > total = 36 > index = 1 > (gdb) > > > Noticed an other crash: > > Thread 4 "lcore-slave-4" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0xffffb6c8f040 (LWP 19690)] > 0x0000aaaaaadcfb78 in iq_alloc_chunk (sw=0xffff9f332500) at > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:63 > 63 sw->chunk_list_head = chunk->next; > > (gdb) info locals > chunk = 0x14340000119 > > (gdb) bt > #0 0x0000aaaaaadcfb78 in iq_alloc_chunk (sw=0xffff9f332500) at > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:63 > #1 iq_enqueue (ev=0xffff9f3967c0, iq=0xffff9f764620, sw=0xffff9f332500) at > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:95 > #2 __pull_port_lb (allow_reorder=0, port_id=5, sw=0xffff9f332500) at > /root/clean/rebase/dpdk-next- > eventdev/drivers/event/sw/sw_evdev_scheduler.c:463 > #3 sw_schedule_pull_port_no_reorder (sw=0xffff9f332500, port_id=5) at > /root/clean/rebase/dpdk-next- > eventdev/drivers/event/sw/sw_evdev_scheduler.c:486 > #4 0x0000aaaaaadd0608 in sw_event_schedule (dev=0xaaaaaafbd200 > ) at > /root/clean/rebase/dpdk-next- > eventdev/drivers/event/sw/sw_evdev_scheduler.c:554 > #5 0x0000aaaaaadca008 in sw_sched_service_func (args=0xaaaaaafbd200 > ) at > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev.c:767 > #6 0x0000aaaaaab54740 in rte_service_runner_do_callback (s=0xffff9fffdf80, > cs=0xffff9ffef900, service_idx=0) at > /root/clean/rebase/dpdk-next- > eventdev/lib/librte_eal/common/rte_service.c:349 > #7 0x0000aaaaaab54868 in service_run (i=0, cs=0xffff9ffef900, > service_mask=18446744073709551615) at > /root/clean/rebase/dpdk-next- > eventdev/lib/librte_eal/common/rte_service.c:376 > #8 0x0000aaaaaab54954 in rte_service_run_iter_on_app_lcore (id=0, > serialize_mt_unsafe=1) at > /root/clean/rebase/dpdk-next- > eventdev/lib/librte_eal/common/rte_service.c:405 > #9 0x0000aaaaaaaef04c in schedule_devices (lcore_id=4) at > /root/clean/rebase/dpdk-next- > eventdev/examples/eventdev_pipeline_sw_pmd/main.c:223 > #10 0x0000aaaaaaaef234 in worker (arg=0xffff9f331c80) at > /root/clean/rebase/dpdk-next- > eventdev/examples/eventdev_pipeline_sw_pmd/main.c:274 > #11 0x0000aaaaaab4382c in eal_thread_loop (arg=0x0) at > /root/clean/rebase/dpdk-next- > eventdev/lib/librte_eal/linuxapp/eal/eal_thread.c:182 > #12 0x0000ffffb7e46d64 in start_thread () from /usr/lib/libpthread.so.0 > #13 0x0000ffffb7da8bbc in thread_start () from /usr/lib/libc.so.6 > > > > > > > > > (gdb) bt > > > #0 0x0000aaaaaadcc0d4 in iq_dequeue_burst (count=48, > > > ev=0xffffb6c8dd38, iq=0xffff9f764720, sw=0xffff9f332600) at > > > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:14 > > > 2 > > > #1 sw_schedule_atomic_to_cq (sw=0xffff9f332600, qid=0xffff9f764700, > > > iq_num=0, > > > count=48) at > > > /root/clean/rebase/dpdk-next- > > > eventdev/drivers/event/sw/sw_evdev_scheduler.c:74 > > > #2 0x0000aaaaaadcdc44 in sw_schedule_qid_to_cq (sw=0xffff9f332600) > > > at > > > /root/clean/rebase/dpdk-next- > > > eventdev/drivers/event/sw/sw_evdev_scheduler.c:262 > > > #3 0x0000aaaaaadd069c in sw_event_schedule (dev=0xaaaaaafbd200 > > > ) at > > > /root/clean/rebase/dpdk-next- > > > eventdev/drivers/event/sw/sw_evdev_scheduler.c:564 > > > #4 0x0000aaaaaadca008 in sw_sched_service_func (args=0xaaaaaafbd200 > > > ) at > > > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev.c:76 > > > 7 > > > #5 0x0000aaaaaab54740 in rte_service_runner_do_callback > > > (s=0xffff9fffdf80, cs=0xffff9ffef900, service_idx=0) at > > > /root/clean/rebase/dpdk-next- > > > eventdev/lib/librte_eal/common/rte_service.c:349 > > > #6 0x0000aaaaaab54868 in service_run (i=0, cs=0xffff9ffef900, > > > service_mask=18446744073709551615) at > > > /root/clean/rebase/dpdk-next- > > > eventdev/lib/librte_eal/common/rte_service.c:376 > > > #7 0x0000aaaaaab54954 in rte_service_run_iter_on_app_lcore (id=0, > > > serialize_mt_unsafe=1) at > > > /root/clean/rebase/dpdk-next- > > > eventdev/lib/librte_eal/common/rte_service.c:405 > > > #8 0x0000aaaaaaaef04c in schedule_devices (lcore_id=4) at > > > /root/clean/rebase/dpdk-next- > > > eventdev/examples/eventdev_pipeline_sw_pmd/main.c:223 > > > #9 0x0000aaaaaaaef234 in worker (arg=0xffff9f331d80) at > > > /root/clean/rebase/dpdk-next- > > > eventdev/examples/eventdev_pipeline_sw_pmd/main.c:274 > > > #10 0x0000aaaaaab4382c in eal_thread_loop (arg=0x0) at > > > /root/clean/rebase/dpdk-next- > > > eventdev/lib/librte_eal/linuxapp/eal/eal_thread.c:182 > > > #11 0x0000ffffb7e46d64 in start_thread () from > > > /usr/lib/libpthread.so.0 > > > #12 0x0000ffffb7da8bbc in thread_start () from /usr/lib/libc.so.6 > > > > > > Segfault seems to happen in sw_event_schedule and only happens under > > > high traffic load. > > > > I've added -n 0 to the command line allowing it to run forever, and > > after ~2 mins its still happily forwarding pkts at ~10G line rate here. > > > > On arm64 the crash is instant even without -n0. > > > > > > Thanks, > > > Pavan > > > > Thanks for reporting - I'm afraid I'll have to ask a few questions to identify why > I can't reproduce here before I can dig in and identify a fix. > > > > Anything special about the system that it is on? > > Running on arm64 octeontx with 8x10G connected. > > > What traffic pattern is being sent to the app? > > Using something similar to trafficgen, IPv4/UDP pkts. > > 0:00:51 958245 |0xB00 2816|0xB10 2832|0xB20 2848|0xB30 > 2864|0xC00 * 3072|0xC10 * 3088|0xC20 * 3104|0xC30 * 3120| Totals > Port Status |XFI30 Up|XFI31 Up|XFI32 Up|XFI33 Up|XFI40 > Up|XFI41 Up|XFI42 Up|XFI43 Up| > 1:Total TX packets | 7197041566| 5194976604| 5120240981| 4424870160| > 5860892739| 5191225514| 5126500427| 4429259828|42545007819 > 3:Total RX packets | 358886055| 323055411| 321000948| 277179800| > 387486466| 350278086| 348080242| 295460613|2661427621 > 6:TX packet rate | 0| 0| 0| 0| 0| 0| 0| > 0| 0 > 7:TX octet rate | 0| 0| 0| 0| 0| 0| 0| > 0| 0 > 8:TX bit rate, Mbps | 0| 0| 0| 0| 0| 0| 0| > 0| 0 > 10:RX packet rate | 0| 0| 0| 0| 0| 0| 0| > 0| 0 > 11:RX octet rate | 0| 0| 0| 0| 0| 0| 0| > 0| 0 > 12:RX bit rate, Mbps | 0| 0| 0| 0| 0| 0| > 0| 0| 0 > 36:tx.size | 60| 60| 60| 60| 60| 60| 60| > 60| > 37:tx.type | IPv4+UDP| IPv4+UDP| IPv4+UDP| IPv4+UDP| > IPv4+UDP| IPv4+UDP| IPv4+UDP| IPv4+UDP| > 38:tx.payload | abc| abc| abc| abc| abc| abc| > abc| abc| > 47:dest.mac | fb71189c0| fb71189d0| fb71189e0| fb71189bf| > fb7118ac0| fb7118ad0| fb7118ae0| fb7118abf| > 51:src.mac | fb71189bf| fb71189cf| fb71189df| fb71189ef| > fb7118abf| fb7118acf| fb7118adf| fb7118aef| > 55:dest.ip | 11.1.0.99| 11.17.0.99| 11.33.0.99| 11.0.0.99| 14.1.0.99| > 14.17.0.99| 14.33.0.99| 14.0.0.99| > 59:src.ip | 11.0.0.99| 11.16.0.99| 11.32.0.99| 11.48.0.99| 14.0.0.99| > 14.16.0.99| 14.32.0.99| 14.48.0.99| > 73:bridge | off| off| off| off| off| off| > off| off| > 77:validate packets | off| off| off| off| off| off| > off| off| > > Thanks, > Pavan. > > > > > Thanks > > > > > > > > diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c index 1ef6340..01da538 100644 --- a/drivers/event/sw/sw_evdev.c +++ b/drivers/event/sw/sw_evdev.c @@ -436,7 +436,7 @@ sw_dev_configure(const struct rte_eventdev *dev) /* If this is a reconfiguration, free the previous IQ allocation */ if (sw->chunks) - rte_free(sw->chunks); + return 0; sw->chunks = rte_malloc_socket(NULL, sizeof(struct sw_queue_chunk) *