From patchwork Tue Jun 26 10:53:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 41553 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 58ED31B4BD; Tue, 26 Jun 2018 12:53:24 +0200 (CEST) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id A38FA1B45C for ; Tue, 26 Jun 2018 12:53:22 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Jun 2018 03:53:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,274,1526367600"; d="scan'208";a="67805474" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga001.jf.intel.com with ESMTP; 26 Jun 2018 03:53:19 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w5QArJbC026435; Tue, 26 Jun 2018 11:53:19 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w5QArJ7O021181; Tue, 26 Jun 2018 11:53:19 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w5QArJrw021177; Tue, 26 Jun 2018 11:53:19 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Jianfeng Tan , konstantin.ananyev@intel.com, thomas@monjalon.net, bruce.richardson@intel.com, qi.z.zhang@intel.com Date: Tue, 26 Jun 2018 11:53:12 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 1/7] eal/linux: use glibc malloc in alarm X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Jianfeng Tan Alarm API is going to be used by IPC internally. However, because memory subsystem depends on IPC, alarm API cannot use rte_malloc as it creates a circular dependency. To avoid such chicken and egg problem, we change to use glibc malloc in the alarm API. Signed-off-by: Jianfeng Tan Signed-off-by: Anatoly Burakov --- lib/librte_eal/linuxapp/eal/eal_alarm.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c b/lib/librte_eal/linuxapp/eal/eal_alarm.c index c115e823a..391d2a65f 100644 --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c @@ -19,7 +19,6 @@ #include #include #include -#include #include #include @@ -91,7 +90,7 @@ eal_alarm_callback(void *arg __rte_unused) rte_spinlock_lock(&alarm_list_lk); LIST_REMOVE(ap, next); - rte_free(ap); + free(ap); } if (!LIST_EMPTY(&alarm_list)) { @@ -122,7 +121,7 @@ rte_eal_alarm_set(uint64_t us, rte_eal_alarm_callback cb_fn, void *cb_arg) if (us < 1 || us > (UINT64_MAX - US_PER_S) || cb_fn == NULL) return -EINVAL; - new_alarm = rte_zmalloc(NULL, sizeof(*new_alarm), 0); + new_alarm = calloc(1, sizeof(*new_alarm)); if (new_alarm == NULL) return -ENOMEM; @@ -196,7 +195,7 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg) if (ap->executing == 0) { LIST_REMOVE(ap, next); - rte_free(ap); + free(ap); count++; } else { /* If calling from other context, mark that alarm is executing @@ -220,7 +219,7 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg) if (ap->executing == 0) { LIST_REMOVE(ap, next); - rte_free(ap); + free(ap); count++; ap = ap_prev; } else if (pthread_equal(ap->executing_id, pthread_self()) == 0) From patchwork Tue Jun 26 10:53:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 41554 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 7E7441B505; Tue, 26 Jun 2018 12:53:26 +0200 (CEST) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 498EA1B45C for ; Tue, 26 Jun 2018 12:53:23 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Jun 2018 03:53:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,274,1526367600"; d="scan'208";a="52974027" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga006.jf.intel.com with ESMTP; 26 Jun 2018 03:53:19 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w5QArJFt026438; Tue, 26 Jun 2018 11:53:19 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w5QArJWf021188; Tue, 26 Jun 2018 11:53:19 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w5QArJ4N021184; Tue, 26 Jun 2018 11:53:19 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Jianfeng Tan , konstantin.ananyev@intel.com, thomas@monjalon.net, bruce.richardson@intel.com, qi.z.zhang@intel.com Date: Tue, 26 Jun 2018 11:53:13 +0100 Message-Id: <39abdf03ff35813b987875249ebba6b968932752.1530009564.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 2/7] eal/linux: use glibc malloc in interrupt handling X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Jianfeng Tan IPC uses interrupts API internally, and memory subsystem uses IPC. Therefore, IPC should not use rte_malloc to avoid circular dependency. Switch to using regular glibc malloc in interrupts API. Signed-off-by: Jianfeng Tan Signed-off-by: Anatoly Burakov --- lib/librte_eal/linuxapp/eal/eal_interrupts.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c index 056d41c12..180c0378a 100644 --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c @@ -30,7 +30,6 @@ #include #include #include -#include #include #include #include @@ -405,8 +404,7 @@ rte_intr_callback_register(const struct rte_intr_handle *intr_handle, } /* allocate a new interrupt callback entity */ - callback = rte_zmalloc("interrupt callback list", - sizeof(*callback), 0); + callback = calloc(1, sizeof(*callback)); if (callback == NULL) { RTE_LOG(ERR, EAL, "Can not allocate memory\n"); return -ENOMEM; @@ -431,10 +429,10 @@ rte_intr_callback_register(const struct rte_intr_handle *intr_handle, /* no existing callbacks for this - add new source */ if (src == NULL) { - if ((src = rte_zmalloc("interrupt source list", - sizeof(*src), 0)) == NULL) { + src = calloc(1, sizeof(*src)); + if (src == NULL) { RTE_LOG(ERR, EAL, "Can not allocate memory\n"); - rte_free(callback); + free(callback); ret = -ENOMEM; } else { src->intr_handle = *intr_handle; @@ -501,7 +499,7 @@ rte_intr_callback_unregister(const struct rte_intr_handle *intr_handle, if (cb->cb_fn == cb_fn && (cb_arg == (void *)-1 || cb->cb_arg == cb_arg)) { TAILQ_REMOVE(&src->callbacks, cb, next); - rte_free(cb); + free(cb); ret++; } } @@ -509,7 +507,7 @@ rte_intr_callback_unregister(const struct rte_intr_handle *intr_handle, /* all callbacks for that source are removed. */ if (TAILQ_EMPTY(&src->callbacks)) { TAILQ_REMOVE(&intr_sources, src, next); - rte_free(src); + free(src); } } From patchwork Tue Jun 26 10:53:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 41559 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 4FB401B573; Tue, 26 Jun 2018 12:53:38 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id 93A131B4BA for ; Tue, 26 Jun 2018 12:53:24 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Jun 2018 03:53:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,274,1526367600"; d="scan'208";a="67358859" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga001.fm.intel.com with ESMTP; 26 Jun 2018 03:53:20 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w5QArJnL026441; Tue, 26 Jun 2018 11:53:19 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w5QArJY7021199; Tue, 26 Jun 2018 11:53:19 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w5QArJYw021195; Tue, 26 Jun 2018 11:53:19 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Bruce Richardson , konstantin.ananyev@intel.com, thomas@monjalon.net, qi.z.zhang@intel.com Date: Tue, 26 Jun 2018 11:53:14 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 3/7] eal/bsdapp: add interrupt thread X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add interrupt thread to FreeBSD. It is largely a copy-paste from Linuxapp interrupt thread, except for a few key differences: * Use kevent instead of epoll * Do not recreate the event queue on adding/removing interrupt sources, add/remove them to/from the queue on the fly instead * No support for UIO/VFIO handles Signed-off-by: Anatoly Burakov --- lib/librte_eal/bsdapp/eal/eal_interrupts.c | 436 ++++++++++++++++++++- 1 file changed, 418 insertions(+), 18 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_interrupts.c b/lib/librte_eal/bsdapp/eal/eal_interrupts.c index 290d53ab9..d0db6c6ff 100644 --- a/lib/librte_eal/bsdapp/eal/eal_interrupts.c +++ b/lib/librte_eal/bsdapp/eal/eal_interrupts.c @@ -1,51 +1,451 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2010-2014 Intel Corporation + * Copyright(c) 2010-2018 Intel Corporation */ +#include +#include +#include +#include +#include + +#include +#include +#include +#include #include #include + #include "eal_private.h" +#define MAX_INTR_EVENTS 16 + +/** + * union buffer for reading on different devices + */ +union rte_intr_read_buffer { + char charbuf[16]; /* for others */ +}; + +TAILQ_HEAD(rte_intr_cb_list, rte_intr_callback); +TAILQ_HEAD(rte_intr_source_list, rte_intr_source); + +struct rte_intr_callback { + TAILQ_ENTRY(rte_intr_callback) next; + rte_intr_callback_fn cb_fn; /**< callback address */ + void *cb_arg; /**< parameter for callback */ +}; + +struct rte_intr_source { + TAILQ_ENTRY(rte_intr_source) next; + struct rte_intr_handle intr_handle; /**< interrupt handle */ + struct rte_intr_cb_list callbacks; /**< user callbacks */ + uint32_t active; +}; + +/* global spinlock for interrupt data operation */ +static rte_spinlock_t intr_lock = RTE_SPINLOCK_INITIALIZER; + +/* interrupt sources list */ +static struct rte_intr_source_list intr_sources; + +/* interrupt handling thread */ +static pthread_t intr_thread; + +static volatile int kq = -1; + +static int +intr_source_to_kevent(const struct rte_intr_handle *ih, struct kevent *ke) +{ + ke->filter = EVFILT_READ; + ke->ident = ih->fd; + + return 0; +} + int rte_intr_callback_register(const struct rte_intr_handle *intr_handle, - rte_intr_callback_fn cb, - void *cb_arg) + rte_intr_callback_fn cb, void *cb_arg) { - RTE_SET_USED(intr_handle); - RTE_SET_USED(cb); - RTE_SET_USED(cb_arg); + struct rte_intr_callback *callback = NULL; + struct rte_intr_source *src = NULL; + int ret, add_event; - return -ENOTSUP; + /* first do parameter checking */ + if (intr_handle == NULL || intr_handle->fd < 0 || cb == NULL) { + RTE_LOG(ERR, EAL, + "Registering with invalid input parameter\n"); + return -EINVAL; + } + if (kq < 0) { + RTE_LOG(ERR, EAL, "Kqueue is not active: %d\n", kq); + return -ENODEV; + } + + /* allocate a new interrupt callback entity */ + callback = calloc(1, sizeof(*callback)); + if (callback == NULL) { + RTE_LOG(ERR, EAL, "Can not allocate memory\n"); + return -ENOMEM; + } + callback->cb_fn = cb; + callback->cb_arg = cb_arg; + + rte_spinlock_lock(&intr_lock); + + /* check if there is at least one callback registered for the fd */ + TAILQ_FOREACH(src, &intr_sources, next) { + if (src->intr_handle.fd == intr_handle->fd) { + /* we had no interrupts for this */ + if (TAILQ_EMPTY(&src->callbacks)) + add_event = 1; + + TAILQ_INSERT_TAIL(&(src->callbacks), callback, next); + ret = 0; + break; + } + } + + /* no existing callbacks for this - add new source */ + if (src == NULL) { + src = calloc(1, sizeof(*src)); + if (src == NULL) { + RTE_LOG(ERR, EAL, "Can not allocate memory\n"); + ret = -ENOMEM; + goto fail; + } else { + src->intr_handle = *intr_handle; + TAILQ_INIT(&src->callbacks); + TAILQ_INSERT_TAIL(&(src->callbacks), callback, next); + TAILQ_INSERT_TAIL(&intr_sources, src, next); + add_event = 1; + ret = 0; + } + } + + /* add events to the queue */ + if (add_event) { + struct kevent ke; + + memset(&ke, 0, sizeof(ke)); + ke.flags = EV_ADD; /* mark for addition to the queue */ + + if (intr_source_to_kevent(intr_handle, &ke) < 0) { + RTE_LOG(ERR, EAL, "Cannot convert interrupt handle to kevent\n"); + ret = -ENODEV; + goto fail; + } + + /** + * add the intr file descriptor into wait list. + */ + if (kevent(kq, &ke, 1, NULL, 0, NULL) < 0) { + RTE_LOG(ERR, EAL, "Error adding fd %d kevent, %s\n", + src->intr_handle.fd, strerror(errno)); + ret = -errno; + goto fail; + } + } + rte_spinlock_unlock(&intr_lock); + + return ret; +fail: + /* clean up */ + if (src != NULL) { + TAILQ_REMOVE(&(src->callbacks), callback, next); + if (TAILQ_EMPTY(&(src->callbacks))) { + TAILQ_REMOVE(&intr_sources, src, next); + free(src); + } + } + free(callback); + rte_spinlock_unlock(&intr_lock); + return ret; } int rte_intr_callback_unregister(const struct rte_intr_handle *intr_handle, - rte_intr_callback_fn cb, - void *cb_arg) + rte_intr_callback_fn cb_fn, void *cb_arg) { - RTE_SET_USED(intr_handle); - RTE_SET_USED(cb); - RTE_SET_USED(cb_arg); + int ret; + struct rte_intr_source *src; + struct rte_intr_callback *cb, *next; - return -ENOTSUP; + /* do parameter checking first */ + if (intr_handle == NULL || intr_handle->fd < 0) { + RTE_LOG(ERR, EAL, + "Unregistering with invalid input parameter\n"); + return -EINVAL; + } + if (kq < 0) { + RTE_LOG(ERR, EAL, "Kqueue is not active\n"); + return -ENODEV; + } + + rte_spinlock_lock(&intr_lock); + + /* check if the insterrupt source for the fd is existent */ + TAILQ_FOREACH(src, &intr_sources, next) + if (src->intr_handle.fd == intr_handle->fd) + break; + + /* No interrupt source registered for the fd */ + if (src == NULL) { + ret = -ENOENT; + + /* interrupt source has some active callbacks right now. */ + } else if (src->active != 0) { + ret = -EAGAIN; + + /* ok to remove. */ + } else { + struct kevent ke; + + ret = 0; + + /* remove it from the kqueue */ + memset(&ke, 0, sizeof(ke)); + ke.flags = EV_DELETE; /* mark for deletion from the queue */ + + if (intr_source_to_kevent(intr_handle, &ke) < 0) { + RTE_LOG(ERR, EAL, "Cannot convert to kevent\n"); + ret = -ENODEV; + goto out; + } + + /** + * remove intr file descriptor from wait list. + */ + if (kevent(kq, &ke, 1, NULL, 0, NULL) < 0) { + RTE_LOG(ERR, EAL, "Error removing fd %d kevent, %s\n", + src->intr_handle.fd, strerror(errno)); + ret = -errno; + goto out; + } + + /*walk through the callbacks and remove all that match. */ + for (cb = TAILQ_FIRST(&src->callbacks); cb != NULL; cb = next) { + next = TAILQ_NEXT(cb, next); + if (cb->cb_fn == cb_fn && (cb_arg == (void *)-1 || + cb->cb_arg == cb_arg)) { + TAILQ_REMOVE(&src->callbacks, cb, next); + free(cb); + ret++; + } + } + + /* all callbacks for that source are removed. */ + if (TAILQ_EMPTY(&src->callbacks)) { + TAILQ_REMOVE(&intr_sources, src, next); + free(src); + } + } +out: + rte_spinlock_unlock(&intr_lock); + + return ret; } int -rte_intr_enable(const struct rte_intr_handle *intr_handle __rte_unused) +rte_intr_enable(const struct rte_intr_handle *intr_handle) { - return -ENOTSUP; + if (intr_handle && intr_handle->type == RTE_INTR_HANDLE_VDEV) + return 0; + + if (!intr_handle || intr_handle->fd < 0 || intr_handle->uio_cfg_fd < 0) + return -1; + + switch (intr_handle->type) { + /* not used at this moment */ + case RTE_INTR_HANDLE_ALARM: + return -1; + /* not used at this moment */ + case RTE_INTR_HANDLE_DEV_EVENT: + return -1; + /* unknown handle type */ + default: + RTE_LOG(ERR, EAL, + "Unknown handle type of fd %d\n", + intr_handle->fd); + return -1; + } + + return 0; } int -rte_intr_disable(const struct rte_intr_handle *intr_handle __rte_unused) +rte_intr_disable(const struct rte_intr_handle *intr_handle) { - return -ENOTSUP; + if (intr_handle && intr_handle->type == RTE_INTR_HANDLE_VDEV) + return 0; + + if (!intr_handle || intr_handle->fd < 0 || intr_handle->uio_cfg_fd < 0) + return -1; + + switch (intr_handle->type) { + /* not used at this moment */ + case RTE_INTR_HANDLE_ALARM: + return -1; + /* not used at this moment */ + case RTE_INTR_HANDLE_DEV_EVENT: + return -1; + /* unknown handle type */ + default: + RTE_LOG(ERR, EAL, + "Unknown handle type of fd %d\n", + intr_handle->fd); + return -1; + } + + return 0; +} + +static void +eal_intr_process_interrupts(struct kevent *events, int nfds) +{ + struct rte_intr_callback active_cb; + union rte_intr_read_buffer buf; + struct rte_intr_callback *cb; + struct rte_intr_source *src; + bool call = false; + int n, bytes_read; + + for (n = 0; n < nfds; n++) { + int event_fd = events[n].ident; + + rte_spinlock_lock(&intr_lock); + TAILQ_FOREACH(src, &intr_sources, next) + if (src->intr_handle.fd == event_fd) + break; + if (src == NULL) { + rte_spinlock_unlock(&intr_lock); + continue; + } + + /* mark this interrupt source as active and release the lock. */ + src->active = 1; + rte_spinlock_unlock(&intr_lock); + + /* set the length to be read dor different handle type */ + switch (src->intr_handle.type) { + case RTE_INTR_HANDLE_ALARM: + bytes_read = 0; + call = true; + break; + case RTE_INTR_HANDLE_VDEV: + case RTE_INTR_HANDLE_EXT: + bytes_read = 0; + call = true; + break; + case RTE_INTR_HANDLE_DEV_EVENT: + bytes_read = 0; + call = true; + break; + default: + bytes_read = 1; + break; + } + + if (bytes_read > 0) { + /** + * read out to clear the ready-to-be-read flag + * for epoll_wait. + */ + bytes_read = read(event_fd, &buf, bytes_read); + if (bytes_read < 0) { + if (errno == EINTR || errno == EWOULDBLOCK) + continue; + + RTE_LOG(ERR, EAL, "Error reading from file " + "descriptor %d: %s\n", + event_fd, + strerror(errno)); + } else if (bytes_read == 0) + RTE_LOG(ERR, EAL, "Read nothing from file " + "descriptor %d\n", event_fd); + else + call = true; + } + + /* grab a lock, again to call callbacks and update status. */ + rte_spinlock_lock(&intr_lock); + + if (call) { + /* Finally, call all callbacks. */ + TAILQ_FOREACH(cb, &src->callbacks, next) { + + /* make a copy and unlock. */ + active_cb = *cb; + rte_spinlock_unlock(&intr_lock); + + /* call the actual callback */ + active_cb.cb_fn(active_cb.cb_arg); + + /*get the lock back. */ + rte_spinlock_lock(&intr_lock); + } + } + + /* we done with that interrupt source, release it. */ + src->active = 0; + rte_spinlock_unlock(&intr_lock); + } +} + +static void * +eal_intr_thread_main(void *arg __rte_unused) +{ + struct kevent events[MAX_INTR_EVENTS]; + int nfds; + + /* host thread, never break out */ + for (;;) { + /* do not change anything, just wait */ + nfds = kevent(kq, NULL, 0, events, MAX_INTR_EVENTS, NULL); + + /* kevent fail */ + if (nfds < 0) { + if (errno == EINTR) + continue; + RTE_LOG(ERR, EAL, + "kevent returns with fail\n"); + break; + } + /* kevent timeout, will never happen here */ + else if (nfds == 0) + continue; + + /* kevent has at least one fd ready to read */ + eal_intr_process_interrupts(events, nfds); + } + close(kq); + kq = -1; + return NULL; } int rte_eal_intr_init(void) { - return 0; + int ret = 0; + + /* init the global interrupt source head */ + TAILQ_INIT(&intr_sources); + + kq = kqueue(); + if (kq < 0) { + RTE_LOG(ERR, EAL, "Cannot create kqueue instance\n"); + return -1; + } + + /* create the host thread to wait/handle the interrupt */ + ret = rte_ctrl_thread_create(&intr_thread, "eal-intr-thread", NULL, + eal_intr_thread_main, NULL); + if (ret != 0) { + rte_errno = -ret; + RTE_LOG(ERR, EAL, + "Failed to create thread for interrupt handling\n"); + } + + return ret; } int From patchwork Tue Jun 26 10:53:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 41555 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 488A61B522; Tue, 26 Jun 2018 12:53:29 +0200 (CEST) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id 816B61B4A4 for ; Tue, 26 Jun 2018 12:53:23 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Jun 2018 03:53:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,274,1526367600"; d="scan'208";a="61375858" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga002.fm.intel.com with ESMTP; 26 Jun 2018 03:53:20 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w5QArKWG026445; Tue, 26 Jun 2018 11:53:20 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w5QArJ87021210; Tue, 26 Jun 2018 11:53:19 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w5QArJuC021206; Tue, 26 Jun 2018 11:53:19 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Bruce Richardson , konstantin.ananyev@intel.com, thomas@monjalon.net, qi.z.zhang@intel.com Date: Tue, 26 Jun 2018 11:53:15 +0100 Message-Id: <3951b507713bf8848afdcd3c45f040bd2acc3663.1530009564.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 4/7] eal/bsdapp: add alarm support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Implement EAL alarm API support for FreeBSD. The implementation is largely identical to that of Linux version, with one key difference. The alarm API is a little Linux-centric in that it is expecting the alarm API to manage alarm timeouts without involvement of the interrupt thread. This works on Linux because in Linux, there's timerfd API which allows waiting for timer events on an fd. On FreeBSD, however, there are no timerfd's, and timer events are set up directly in kevent. There is no way to pass information from the alarm API to the interrupt thread, so we also add a little back-channel magic to get soonest alarm timeout from the alarm API. Signed-off-by: Anatoly Burakov --- lib/librte_eal/bsdapp/eal/eal_alarm.c | 299 +++++++++++++++++- lib/librte_eal/bsdapp/eal/eal_alarm_private.h | 19 ++ lib/librte_eal/bsdapp/eal/eal_interrupts.c | 29 +- 3 files changed, 334 insertions(+), 13 deletions(-) create mode 100644 lib/librte_eal/bsdapp/eal/eal_alarm_private.h diff --git a/lib/librte_eal/bsdapp/eal/eal_alarm.c b/lib/librte_eal/bsdapp/eal/eal_alarm.c index eb3913c97..51ea4b8c0 100644 --- a/lib/librte_eal/bsdapp/eal/eal_alarm.c +++ b/lib/librte_eal/bsdapp/eal/eal_alarm.c @@ -1,31 +1,314 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2010-2014 Intel Corporation + * Copyright(c) 2010-2018 Intel Corporation */ + +#include +#include +#include +#include #include +#include +#include #include #include +#include #include +#include +#include +#include + #include "eal_private.h" +#include "eal_alarm_private.h" + +#define NS_PER_US 1000 + +#ifdef CLOCK_MONOTONIC_RAW /* Defined in glibc bits/time.h */ +#define CLOCK_TYPE_ID CLOCK_MONOTONIC_RAW +#else +#define CLOCK_TYPE_ID CLOCK_MONOTONIC +#endif + +struct alarm_entry { + LIST_ENTRY(alarm_entry) next; + struct rte_intr_handle handle; + struct timespec time; + rte_eal_alarm_callback cb_fn; + void *cb_arg; + volatile uint8_t executing; + volatile pthread_t executing_id; +}; + +static LIST_HEAD(alarm_list, alarm_entry) alarm_list = LIST_HEAD_INITIALIZER(); +static rte_spinlock_t alarm_list_lk = RTE_SPINLOCK_INITIALIZER; + +static struct rte_intr_handle intr_handle = {.fd = -1 }; +static void eal_alarm_callback(void *arg); int rte_eal_alarm_init(void) { + intr_handle.type = RTE_INTR_HANDLE_ALARM; + + /* on FreeBSD, timers don't use fd's, and their identifiers are stored + * in separate namespace from fd's, so using any value is OK. however, + * EAL interrupts handler expects fd's to be unique, so use an actual fd + * to guarantee unique timer identifier. + */ + intr_handle.fd = open("/dev/zero", O_RDONLY); + + return 0; +} + +static inline int +timespec_cmp(const struct timespec *now, const struct timespec *at) +{ + if (now->tv_sec < at->tv_sec) + return -1; + if (now->tv_sec > at->tv_sec) + return 1; + if (now->tv_nsec < at->tv_nsec) + return -1; + if (now->tv_nsec > at->tv_nsec) + return 1; + return 0; +} + +static inline uint64_t +diff_ns(struct timespec *now, struct timespec *at) +{ + uint64_t now_ns, at_ns; + + if (timespec_cmp(now, at) >= 0) + return 0; + + now_ns = now->tv_sec * NS_PER_S + now->tv_nsec; + at_ns = at->tv_sec * NS_PER_S + at->tv_nsec; + + return at_ns - now_ns; +} + +int +eal_alarm_get_timeout_ns(uint64_t *val) +{ + struct alarm_entry *ap; + struct timespec now; + + if (clock_gettime(CLOCK_TYPE_ID, &now) < 0) + return -1; + + if (LIST_EMPTY(&alarm_list)) + return -1; + + ap = LIST_FIRST(&alarm_list); + + *val = diff_ns(&now, &ap->time); + return 0; } +static int +unregister_current_callback(void) +{ + struct alarm_entry *ap; + int ret = 0; + + if (!LIST_EMPTY(&alarm_list)) { + ap = LIST_FIRST(&alarm_list); + + do { + ret = rte_intr_callback_unregister(&intr_handle, + eal_alarm_callback, &ap->time); + } while (ret == -EAGAIN); + } + + return ret; +} + +static int +register_first_callback(void) +{ + struct alarm_entry *ap; + int ret = 0; + + if (!LIST_EMPTY(&alarm_list)) { + ap = LIST_FIRST(&alarm_list); + + /* register a new callback */ + ret = rte_intr_callback_register(&intr_handle, + eal_alarm_callback, &ap->time); + } + return ret; +} + +static void +eal_alarm_callback(void *arg __rte_unused) +{ + struct timespec now; + struct alarm_entry *ap; + + rte_spinlock_lock(&alarm_list_lk); + ap = LIST_FIRST(&alarm_list); + + if (clock_gettime(CLOCK_TYPE_ID, &now) < 0) + return; + + while (ap != NULL && timespec_cmp(&now, &ap->time) >= 0) { + ap->executing = 1; + ap->executing_id = pthread_self(); + rte_spinlock_unlock(&alarm_list_lk); + + ap->cb_fn(ap->cb_arg); + + rte_spinlock_lock(&alarm_list_lk); + + LIST_REMOVE(ap, next); + free(ap); + + ap = LIST_FIRST(&alarm_list); + } + + /* timer has been deleted from the kqueue, so recreate it if needed */ + register_first_callback(); + + rte_spinlock_unlock(&alarm_list_lk); +} + int -rte_eal_alarm_set(uint64_t us __rte_unused, - rte_eal_alarm_callback cb_fn __rte_unused, - void *cb_arg __rte_unused) +rte_eal_alarm_set(uint64_t us, rte_eal_alarm_callback cb_fn, void *cb_arg) { - return -ENOTSUP; + struct alarm_entry *ap, *new_alarm; + struct timespec now; + uint64_t ns; + int ret = 0; + + /* check parameters, also ensure us won't cause a uint64_t overflow */ + if (us < 1 || us > (UINT64_MAX - US_PER_S) || cb_fn == NULL) + return -EINVAL; + + new_alarm = calloc(1, sizeof(*new_alarm)); + if (new_alarm == NULL) + return -ENOMEM; + + /* use current time to calculate absolute time of alarm */ + clock_gettime(CLOCK_TYPE_ID, &now); + + ns = us * NS_PER_US; + + new_alarm->cb_fn = cb_fn; + new_alarm->cb_arg = cb_arg; + new_alarm->time.tv_nsec = (now.tv_nsec + ns) % NS_PER_S; + new_alarm->time.tv_sec = now.tv_sec + ((now.tv_nsec + ns) / NS_PER_S); + + rte_spinlock_lock(&alarm_list_lk); + + if (LIST_EMPTY(&alarm_list)) + LIST_INSERT_HEAD(&alarm_list, new_alarm, next); + else { + LIST_FOREACH(ap, &alarm_list, next) { + if (timespec_cmp(&new_alarm->time, &ap->time) < 0) { + LIST_INSERT_BEFORE(ap, new_alarm, next); + break; + } + if (LIST_NEXT(ap, next) == NULL) { + LIST_INSERT_AFTER(ap, new_alarm, next); + break; + } + } + } + + /* re-register first callback just in case */ + register_first_callback(); + + rte_spinlock_unlock(&alarm_list_lk); + + return ret; } int -rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn __rte_unused, - void *cb_arg __rte_unused) +rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg) { - return -ENOTSUP; + struct alarm_entry *ap, *ap_prev; + int count = 0; + int err = 0; + int executing; + + if (!cb_fn) { + rte_errno = EINVAL; + return -1; + } + + do { + executing = 0; + rte_spinlock_lock(&alarm_list_lk); + /* remove any matches at the start of the list */ + while (1) { + ap = LIST_FIRST(&alarm_list); + if (ap == NULL) + break; + if (cb_fn != ap->cb_fn) + break; + if (cb_arg != ap->cb_arg && cb_arg != (void *) -1) + break; + if (ap->executing == 0) { + LIST_REMOVE(ap, next); + free(ap); + count++; + } else { + /* If calling from other context, mark that + * alarm is executing so loop can spin till it + * finish. Otherwise we are trying to cancel + * ourselves - mark it by EINPROGRESS. + */ + if (pthread_equal(ap->executing_id, + pthread_self()) == 0) + executing++; + else + err = EINPROGRESS; + + break; + } + } + ap_prev = ap; + + /* now go through list, removing entries not at start */ + LIST_FOREACH(ap, &alarm_list, next) { + /* this won't be true first time through */ + if (cb_fn == ap->cb_fn && + (cb_arg == (void *)-1 || + cb_arg == ap->cb_arg)) { + if (ap->executing == 0) { + LIST_REMOVE(ap, next); + free(ap); + count++; + ap = ap_prev; + } else if (pthread_equal(ap->executing_id, + pthread_self()) == 0) { + executing++; + } else { + err = EINPROGRESS; + } + } + ap_prev = ap; + } + rte_spinlock_unlock(&alarm_list_lk); + } while (executing != 0); + + if (count == 0 && err == 0) + rte_errno = ENOENT; + else if (err) + rte_errno = err; + + rte_spinlock_lock(&alarm_list_lk); + + /* unregister if no alarms left, otherwise re-register first */ + if (LIST_EMPTY(&alarm_list)) + unregister_current_callback(); + else + register_first_callback(); + + rte_spinlock_unlock(&alarm_list_lk); + + return count; } diff --git a/lib/librte_eal/bsdapp/eal/eal_alarm_private.h b/lib/librte_eal/bsdapp/eal/eal_alarm_private.h new file mode 100644 index 000000000..65c711518 --- /dev/null +++ b/lib/librte_eal/bsdapp/eal/eal_alarm_private.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Intel Corporation + */ + +#ifndef EAL_ALARM_PRIVATE_H +#define EAL_ALARM_PRIVATE_H + +#include + +/* + * FreeBSD needs a back-channel communication mechanism between interrupt and + * alarm thread, because on FreeBSD, timer period is set up inside the interrupt + * API and not inside alarm API like on Linux. + */ + +int +eal_alarm_get_timeout_ns(uint64_t *val); + +#endif // EAL_ALARM_PRIVATE_H diff --git a/lib/librte_eal/bsdapp/eal/eal_interrupts.c b/lib/librte_eal/bsdapp/eal/eal_interrupts.c index d0db6c6ff..a4aac606e 100644 --- a/lib/librte_eal/bsdapp/eal/eal_interrupts.c +++ b/lib/librte_eal/bsdapp/eal/eal_interrupts.c @@ -16,6 +16,7 @@ #include #include "eal_private.h" +#include "eal_alarm_private.h" #define MAX_INTR_EVENTS 16 @@ -56,7 +57,22 @@ static volatile int kq = -1; static int intr_source_to_kevent(const struct rte_intr_handle *ih, struct kevent *ke) { - ke->filter = EVFILT_READ; + /* alarm callbacks are special case */ + if (ih->type == RTE_INTR_HANDLE_ALARM) { + uint64_t timeout_ns; + + /* get soonest alarm timeout */ + if (eal_alarm_get_timeout_ns(&timeout_ns) < 0) + return -1; + + ke->filter = EVFILT_TIMER; + /* timers are one shot */ + ke->flags |= EV_ONESHOT; + ke->fflags = NOTE_NSECONDS; + ke->data = timeout_ns; + } else { + ke->filter = EVFILT_READ; + } ke->ident = ih->fd; return 0; @@ -122,8 +138,10 @@ rte_intr_callback_register(const struct rte_intr_handle *intr_handle, } } - /* add events to the queue */ - if (add_event) { + /* add events to the queue. timer events are special as we need to + * re-set the timer. + */ + if (add_event || src->intr_handle.type == RTE_INTR_HANDLE_ALARM) { struct kevent ke; memset(&ke, 0, sizeof(ke)); @@ -218,8 +236,9 @@ rte_intr_callback_unregister(const struct rte_intr_handle *intr_handle, if (kevent(kq, &ke, 1, NULL, 0, NULL) < 0) { RTE_LOG(ERR, EAL, "Error removing fd %d kevent, %s\n", src->intr_handle.fd, strerror(errno)); - ret = -errno; - goto out; + /* removing non-existent even is an expected condition + * in some circumstances (e.g. oneshot events). + */ } /*walk through the callbacks and remove all that match. */ From patchwork Tue Jun 26 10:53:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 41556 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 65E171B538; Tue, 26 Jun 2018 12:53:31 +0200 (CEST) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 079051B4BA for ; Tue, 26 Jun 2018 12:53:23 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Jun 2018 03:53:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,274,1526367600"; d="scan'208";a="52974030" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga006.jf.intel.com with ESMTP; 26 Jun 2018 03:53:20 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w5QArKGp026448; Tue, 26 Jun 2018 11:53:20 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w5QArKIu021217; Tue, 26 Jun 2018 11:53:20 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w5QArKa9021213; Tue, 26 Jun 2018 11:53:20 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: Jianfeng Tan , Bruce Richardson , konstantin.ananyev@intel.com, thomas@monjalon.net, qi.z.zhang@intel.com Date: Tue, 26 Jun 2018 11:53:16 +0100 Message-Id: <5f6bc60bf193daf4eb07f4484c52878bb1b751de.1530009564.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 5/7] eal: bring forward init of interrupt handling X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Jianfeng Tan Next commit will make asynchronous IPC requests rely on alarm API, which in turn relies on interrupts to work. Therefore, move the EAL interrupt initialization before IPC initialization to avoid breaking IPC in the next commit. Signed-off-by: Jianfeng Tan Signed-off-by: Anatoly Burakov --- lib/librte_eal/bsdapp/eal/eal.c | 10 +++++----- lib/librte_eal/linuxapp/eal/eal.c | 10 +++++----- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c index dc279542d..f70f7aecd 100644 --- a/lib/librte_eal/bsdapp/eal/eal.c +++ b/lib/librte_eal/bsdapp/eal/eal.c @@ -625,6 +625,11 @@ rte_eal_init(int argc, char **argv) rte_config_init(); + if (rte_eal_intr_init() < 0) { + rte_eal_init_alert("Cannot init interrupt-handling thread\n"); + return -1; + } + /* Put mp channel init before bus scan so that we can init the vdev * bus through mp channel in the secondary process before the bus scan. */ @@ -713,11 +718,6 @@ rte_eal_init(int argc, char **argv) return -1; } - if (rte_eal_intr_init() < 0) { - rte_eal_init_alert("Cannot init interrupt-handling thread\n"); - return -1; - } - if (rte_eal_timer_init() < 0) { rte_eal_init_alert("Cannot init HPET or TSC timers\n"); rte_errno = ENOTSUP; diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 8655b8691..f8a0c06d7 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -839,6 +839,11 @@ rte_eal_init(int argc, char **argv) rte_config_init(); + if (rte_eal_intr_init() < 0) { + rte_eal_init_alert("Cannot init interrupt-handling thread\n"); + return -1; + } + /* Put mp channel init before bus scan so that we can init the vdev * bus through mp channel in the secondary process before the bus scan. */ @@ -968,11 +973,6 @@ rte_eal_init(int argc, char **argv) rte_config.master_lcore, (int)thread_id, cpuset, ret == 0 ? "" : "..."); - if (rte_eal_intr_init() < 0) { - rte_eal_init_alert("Cannot init interrupt-handling thread\n"); - return -1; - } - RTE_LCORE_FOREACH_SLAVE(i) { /* From patchwork Tue Jun 26 10:53:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 41558 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D29771B568; Tue, 26 Jun 2018 12:53:35 +0200 (CEST) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id C08C11B4B5 for ; Tue, 26 Jun 2018 12:53:23 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Jun 2018 03:53:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,274,1526367600"; d="scan'208";a="62273536" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga003.jf.intel.com with ESMTP; 26 Jun 2018 03:53:21 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w5QArKe6026451; Tue, 26 Jun 2018 11:53:20 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w5QArKCW021224; Tue, 26 Jun 2018 11:53:20 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w5QArKpu021220; Tue, 26 Jun 2018 11:53:20 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: konstantin.ananyev@intel.com, thomas@monjalon.net, bruce.richardson@intel.com, qi.z.zhang@intel.com, Jianfeng Tan Date: Tue, 26 Jun 2018 11:53:17 +0100 Message-Id: <1f85bec5ac54c3880046e0937ed3ba49ce6ea5fe.1530009564.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 6/7] ipc: remove IPC thread for async requests X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Previously, we were using two IPC threads - one to handle messages and synchronous requests, and another to handle asynchronous requests. To handle replies for an async request, rte_mp_handle woke up the rte_mp_handle_async thread to process through pthread_cond variable. Change it to handle asynchronous messages within the main IPC thread. To handle timeout events, for each async request which is sent, we set an alarm for it. If its reply is received before timeout, we will cancel the alarm when we handle the reply; otherwise, alarm will invoke the async_reply_handle() as the alarm callback. Signed-off-by: Jianfeng Tan Signed-off-by: Anatoly Burakov Suggested-by: Thomas Monjalon --- Notes: RFC->RFCv2: - Rebased on latest code - Implemented comments to the original RFC lib/librte_eal/common/eal_common_proc.c | 201 +++++++++--------------- 1 file changed, 70 insertions(+), 131 deletions(-) diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c index 707d8ab30..6f3366403 100644 --- a/lib/librte_eal/common/eal_common_proc.c +++ b/lib/librte_eal/common/eal_common_proc.c @@ -20,6 +20,7 @@ #include #include +#include #include #include #include @@ -94,11 +95,9 @@ TAILQ_HEAD(pending_request_list, pending_request); static struct { struct pending_request_list requests; pthread_mutex_t lock; - pthread_cond_t async_cond; } pending_requests = { .requests = TAILQ_HEAD_INITIALIZER(pending_requests.requests), .lock = PTHREAD_MUTEX_INITIALIZER, - .async_cond = PTHREAD_COND_INITIALIZER /**< used in async requests only */ }; @@ -106,6 +105,16 @@ static struct { static int mp_send(struct rte_mp_msg *msg, const char *peer, int type); +/* for use with alarm callback */ +static void +async_reply_handle(void *arg); + +/* for use with process_msg */ +static struct pending_request * +async_reply_handle_thread_unsafe(void *arg); + +static void +trigger_async_action(struct pending_request *req); static struct pending_request * find_pending_request(const char *dst, const char *act_name) @@ -290,6 +299,8 @@ process_msg(struct mp_msg_internal *m, struct sockaddr_un *s) RTE_LOG(DEBUG, EAL, "msg: %s\n", msg->name); if (m->type == MP_REP || m->type == MP_IGN) { + struct pending_request *req = NULL; + pthread_mutex_lock(&pending_requests.lock); pending_req = find_pending_request(s->sun_path, msg->name); if (pending_req) { @@ -301,11 +312,14 @@ process_msg(struct mp_msg_internal *m, struct sockaddr_un *s) if (pending_req->type == REQUEST_TYPE_SYNC) pthread_cond_signal(&pending_req->sync.cond); else if (pending_req->type == REQUEST_TYPE_ASYNC) - pthread_cond_signal( - &pending_requests.async_cond); + req = async_reply_handle_thread_unsafe( + pending_req); } else RTE_LOG(ERR, EAL, "Drop mp reply: %s\n", msg->name); pthread_mutex_unlock(&pending_requests.lock); + + if (req != NULL) + trigger_async_action(req); return; } @@ -365,7 +379,6 @@ timespec_cmp(const struct timespec *a, const struct timespec *b) } enum async_action { - ACTION_NONE, /**< don't do anything */ ACTION_FREE, /**< free the action entry, but don't trigger callback */ ACTION_TRIGGER /**< trigger callback, then free action entry */ }; @@ -375,7 +388,7 @@ process_async_request(struct pending_request *sr, const struct timespec *now) { struct async_request_param *param; struct rte_mp_reply *reply; - bool timeout, received, last_msg; + bool timeout, last_msg; param = sr->async.param; reply = ¶m->user_reply; @@ -383,13 +396,6 @@ process_async_request(struct pending_request *sr, const struct timespec *now) /* did we timeout? */ timeout = timespec_cmp(¶m->end, now) <= 0; - /* did we receive a response? */ - received = sr->reply_received != 0; - - /* if we didn't time out, and we didn't receive a response, ignore */ - if (!timeout && !received) - return ACTION_NONE; - /* if we received a response, adjust relevant data and copy mesasge. */ if (sr->reply_received == 1 && sr->reply) { struct rte_mp_msg *msg, *user_msgs, *tmp; @@ -448,120 +454,60 @@ trigger_async_action(struct pending_request *sr) free(sr->async.param->user_reply.msgs); free(sr->async.param); free(sr->request); + free(sr); } static struct pending_request * -check_trigger(struct timespec *ts) +async_reply_handle_thread_unsafe(void *arg) { - struct pending_request *next, *cur, *trigger = NULL; - - TAILQ_FOREACH_SAFE(cur, &pending_requests.requests, next, next) { - enum async_action action; - if (cur->type != REQUEST_TYPE_ASYNC) - continue; - - action = process_async_request(cur, ts); - if (action == ACTION_FREE) { - TAILQ_REMOVE(&pending_requests.requests, cur, next); - free(cur); - } else if (action == ACTION_TRIGGER) { - TAILQ_REMOVE(&pending_requests.requests, cur, next); - trigger = cur; - break; - } - } - return trigger; -} - -static void -wait_for_async_messages(void) -{ - struct pending_request *sr; - struct timespec timeout; - bool timedwait = false; - bool nowait = false; - int ret; - - /* scan through the list and see if there are any timeouts that - * are earlier than our current timeout. - */ - TAILQ_FOREACH(sr, &pending_requests.requests, next) { - if (sr->type != REQUEST_TYPE_ASYNC) - continue; - if (!timedwait || timespec_cmp(&sr->async.param->end, - &timeout) < 0) { - memcpy(&timeout, &sr->async.param->end, - sizeof(timeout)); - timedwait = true; - } - - /* sometimes, we don't even wait */ - if (sr->reply_received) { - nowait = true; - break; - } - } - - if (nowait) - return; - - do { - ret = timedwait ? - pthread_cond_timedwait( - &pending_requests.async_cond, - &pending_requests.lock, - &timeout) : - pthread_cond_wait( - &pending_requests.async_cond, - &pending_requests.lock); - } while (ret != 0 && ret != ETIMEDOUT); - - /* we've been woken up or timed out */ -} - -static void * -async_reply_handle(void *arg __rte_unused) -{ - struct timeval now; + struct pending_request *req = (struct pending_request *)arg; + enum async_action action; struct timespec ts_now; - while (1) { - struct pending_request *trigger = NULL; + struct timeval now; - pthread_mutex_lock(&pending_requests.lock); + if (gettimeofday(&now, NULL) < 0) { + RTE_LOG(ERR, EAL, "Cannot get current time\n"); + goto no_trigger; + } + ts_now.tv_nsec = now.tv_usec * 1000; + ts_now.tv_sec = now.tv_sec; - /* we exit this function holding the lock */ - wait_for_async_messages(); + action = process_async_request(req, &ts_now); - if (gettimeofday(&now, NULL) < 0) { - pthread_mutex_unlock(&pending_requests.lock); - RTE_LOG(ERR, EAL, "Cannot get current time\n"); - break; + TAILQ_REMOVE(&pending_requests.requests, req, next); + + if (rte_eal_alarm_cancel(async_reply_handle, req) < 0) { + /* if we failed to cancel the alarm because it's already in + * progress, don't proceed because otherwise we will end up + * handling the same message twice. + */ + if (rte_errno == EINPROGRESS) { + RTE_LOG(DEBUG, EAL, "Request handling is already in progress\n"); + goto no_trigger; } - ts_now.tv_nsec = now.tv_usec * 1000; - ts_now.tv_sec = now.tv_sec; - - do { - trigger = check_trigger(&ts_now); - /* unlock request list */ - pthread_mutex_unlock(&pending_requests.lock); - - if (trigger) { - trigger_async_action(trigger); - free(trigger); - - /* we've triggered a callback, but there may be - * more, so lock the list and check again. - */ - pthread_mutex_lock(&pending_requests.lock); - } - } while (trigger); + RTE_LOG(ERR, EAL, "Failed to cancel alarm\n"); } - RTE_LOG(ERR, EAL, "ERROR: asynchronous requests disabled\n"); - + if (action == ACTION_TRIGGER) + return req; +no_trigger: + free(req); return NULL; } +static void +async_reply_handle(void *arg) +{ + struct pending_request *req; + + pthread_mutex_lock(&pending_requests.lock); + req = async_reply_handle_thread_unsafe(arg); + pthread_mutex_unlock(&pending_requests.lock); + + if (req != NULL) + trigger_async_action(req); +} + static int open_socket_fd(void) { @@ -624,7 +570,7 @@ rte_mp_channel_init(void) { char path[PATH_MAX]; int dir_fd; - pthread_t mp_handle_tid, async_reply_handle_tid; + pthread_t mp_handle_tid; /* create filter path */ create_socket_path("*", path, sizeof(path)); @@ -671,17 +617,6 @@ rte_mp_channel_init(void) return -1; } - if (rte_ctrl_thread_create(&async_reply_handle_tid, - "rte_mp_async", NULL, - async_reply_handle, NULL) < 0) { - RTE_LOG(ERR, EAL, "failed to create mp thead: %s\n", - strerror(errno)); - close(mp_fd); - close(dir_fd); - mp_fd = -1; - return -1; - } - /* unlock the directory */ flock(dir_fd, LOCK_UN); close(dir_fd); @@ -853,7 +788,7 @@ rte_mp_sendmsg(struct rte_mp_msg *msg) static int mp_request_async(const char *dst, struct rte_mp_msg *req, - struct async_request_param *param) + struct async_request_param *param, const struct timespec *ts) { struct rte_mp_msg *reply_msg; struct pending_request *pending_req, *exist; @@ -898,6 +833,13 @@ mp_request_async(const char *dst, struct rte_mp_msg *req, param->user_reply.nb_sent++; + if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000, + async_reply_handle, pending_req) < 0) { + RTE_LOG(ERR, EAL, "Fail to set alarm for request %s:%s\n", + dst, req->name); + rte_panic("Fix the above shit to properly free all memory\n"); + } + return 0; fail: free(pending_req); @@ -1119,7 +1061,7 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts, /* for secondary process, send request to the primary process only */ if (rte_eal_process_type() == RTE_PROC_SECONDARY) { - ret = mp_request_async(eal_mp_socket_path(), copy, param); + ret = mp_request_async(eal_mp_socket_path(), copy, param, ts); /* if we didn't send anything, put dummy request on the queue */ if (ret == 0 && reply->nb_sent == 0) { @@ -1162,7 +1104,7 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts, snprintf(path, sizeof(path), "%s/%s", mp_dir_path, ent->d_name); - if (mp_request_async(path, copy, param)) + if (mp_request_async(path, copy, param, ts)) ret = -1; } /* if we didn't send anything, put dummy request on the queue */ @@ -1171,9 +1113,6 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts, dummy_used = true; } - /* trigger async request thread wake up */ - pthread_cond_signal(&pending_requests.async_cond); - /* finally, unlock the queue */ pthread_mutex_unlock(&pending_requests.lock); From patchwork Tue Jun 26 10:53:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 41560 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id AE9CD1B57D; Tue, 26 Jun 2018 12:53:39 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id 276EF1B4B5 for ; Tue, 26 Jun 2018 12:53:25 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Jun 2018 03:53:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,274,1526367600"; d="scan'208";a="67358863" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga001.fm.intel.com with ESMTP; 26 Jun 2018 03:53:21 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w5QArKKY026454; Tue, 26 Jun 2018 11:53:20 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w5QArKm1021231; Tue, 26 Jun 2018 11:53:20 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w5QArKfo021227; Tue, 26 Jun 2018 11:53:20 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: John McNamara , Marko Kovacevic , konstantin.ananyev@intel.com, thomas@monjalon.net, bruce.richardson@intel.com, qi.z.zhang@intel.com Date: Tue, 26 Jun 2018 11:53:18 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 7/7] doc: document IPC callback limitations X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" For asynchronous requests, user callback may be triggered either from IPC thread or from interrupt thread. Because of this, delivery of other interrupt-based events such as alarms may not be possible inside the asynchronous IPC request callback handler. Document this limitation. Signed-off-by: Anatoly Burakov --- doc/guides/prog_guide/multi_proc_support.rst | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/doc/guides/prog_guide/multi_proc_support.rst b/doc/guides/prog_guide/multi_proc_support.rst index 46a00ec11..1384fe335 100644 --- a/doc/guides/prog_guide/multi_proc_support.rst +++ b/doc/guides/prog_guide/multi_proc_support.rst @@ -220,8 +220,8 @@ way communication mechanism, with the requester expecting a response from the other side. Both messages and requests will trigger a named callback on the receiver side. -These callbacks will be called from within a dedicated IPC thread that is not -part of EAL lcore threads. +These callbacks will be called from within a dedicated IPC or interrupt thread +that are not part of EAL lcore threads. Registering for incoming messages ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -280,6 +280,13 @@ For asynchronous requests, a function pointer to the callback function must be provided instead. This callback will be called when the request either has timed out, or will have received a response to all the messages that were sent. +.. warning:: + + When an asynchronous request times out, the callback will be called not by + a dedicated IPC thread, but rather from EAL interrupt thread. Because of + this, it may not be possible for DPDK to trigger another interrupt-based + event (such as an alarm) while handling asynchronous IPC callback. + When the callback is called, the original request descriptor will be provided (so that it would be possible to determine for which sent message this is a callback to), along with a response descriptor like the one described above. @@ -311,6 +318,12 @@ supported. However, since sending messages (not requests) does not involve an IPC thread, sending messages while processing another message or request is supported. +Asynchronous request callbacks may be triggered either from IPC thread or from +interrupt thread, depending on whether the request has timed out. It is +therefore suggested to avoid waiting for interrupt-based events (such as alarms) +inside asynchronous IPC request callbacks. This limitation does not apply to +messages or synchronous requests. + If callbacks spend a long time processing the incoming requests, the requestor might time out, so setting the right timeout value on the requestor side is imperative.