[dpdk-dev] Unlink existing unused sockets at start up

Message ID 1450326062-105574-1-git-send-email-zhihong.wang@intel.com (mailing list archive)
State Rejected, archived
Delegated to: Thomas Monjalon
Headers

Commit Message

Zhihong Wang Dec. 17, 2015, 4:21 a.m. UTC
  This patch unlinks existing unused sockets (which cause new bindings to fail, e.g. vHost PMD) to ensure smooth startup.
In a lot of cases DPDK applications are terminated abnormally without proper resource release. Therefore, DPDK libs should be able to deal with unclean boot environment.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
---
 lib/librte_vhost/vhost_user/vhost-net-user.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)
  

Comments

Ilya Maximets Dec. 17, 2015, 11:47 a.m. UTC | #1
On 17.12.2015 07:21, Zhihong Wang wrote:
> This patch unlinks existing unused sockets (which cause new bindings to fail, e.g. vHost PMD) to ensure smooth startup.
> In a lot of cases DPDK applications are terminated abnormally without proper resource release.

Original OVS related problem discussed previously here
( http://dpdk.org/ml/archives/dev/2015-December/030326.html )
fixed in OVS by

commit 9b5422a98f817b9f2a1f8224cab7e1a8d0bbba1f
Author: Ilya Maximets <i.maximets@samsung.com>
Date:   Wed Dec 16 15:32:21 2015 +0300

    ovs-lib: Try to call exit before killing.
    
    While killing OVS may not free all allocated resources.
    
    Example:
        Socket for vhost-user port will stay in a system
        after 'systemctl stop openvswitch' and opening
        that port after restart will fail.


So, the crash of application is the last point of discussion.

> Therefore, DPDK libs should be able to deal with unclean boot environment.

Why are you think that recovery after crash of application
is a problem of underneath library?

Best regards, Ilya Maximets.

> 
> Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
> ---
>  lib/librte_vhost/vhost_user/vhost-net-user.c | 28 ++++++++++++++++++++++++----
>  1 file changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c
> index 8b7a448..eac0721 100644
> --- a/lib/librte_vhost/vhost_user/vhost-net-user.c
> +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
> @@ -120,18 +120,38 @@ uds_socket(const char *path)
>  	sockfd = socket(AF_UNIX, SOCK_STREAM, 0);
>  	if (sockfd < 0)
>  		return -1;
> -	RTE_LOG(INFO, VHOST_CONFIG, "socket created, fd:%d\n", sockfd);
> +	RTE_LOG(INFO, VHOST_CONFIG, "socket created, fd: %d\n", sockfd);
>  
>  	memset(&un, 0, sizeof(un));
>  	un.sun_family = AF_UNIX;
>  	snprintf(un.sun_path, sizeof(un.sun_path), "%s", path);
>  	ret = bind(sockfd, (struct sockaddr *)&un, sizeof(un));
>  	if (ret == -1) {
> -		RTE_LOG(ERR, VHOST_CONFIG, "fail to bind fd:%d, remove file:%s and try again.\n",
> +		RTE_LOG(ERR, VHOST_CONFIG,
> +			"bind fd: %d to file: %s failed, checking socket...\n",
>  			sockfd, path);
> -		goto err;
> +		ret = connect(sockfd, (struct sockaddr *)&un, sizeof(un));
> +		if (ret == -1) {
> +			RTE_LOG(INFO, VHOST_CONFIG,
> +				"socket: %s is inactive, rebinding after unlink...\n", path);
> +			unlink(path);
> +			ret = bind(sockfd, (struct sockaddr *)&un, sizeof(un));
> +			if (ret == -1) {
> +				RTE_LOG(ERR, VHOST_CONFIG,
> +					"bind fd: %d to file: %s failed even after unlink\n",
> +					sockfd, path);
> +				goto err;
> +			}
> +		} else {
> +			RTE_LOG(INFO, VHOST_CONFIG,
> +				"socket: %s is alive, remove it and try again\n", path);
> +			RTE_LOG(ERR, VHOST_CONFIG,
> +				"bind fd: %d to file: %s failed\n", sockfd, path);
> +			goto err;
> +		}
>  	}
> -	RTE_LOG(INFO, VHOST_CONFIG, "bind to %s\n", path);
> +	RTE_LOG(INFO, VHOST_CONFIG,
> +		"bind fd: %d to file: %s successful\n", sockfd, path);
>  
>  	ret = listen(sockfd, MAX_VIRTIO_BACKLOG);
>  	if (ret == -1)
>
  
Yuanhan Liu Dec. 17, 2015, 11:59 a.m. UTC | #2
On Wed, Dec 16, 2015 at 11:21:02PM -0500, Zhihong Wang wrote:
> This patch unlinks existing unused sockets (which cause new bindings to fail, e.g. vHost PMD) to ensure smooth startup.
> In a lot of cases DPDK applications are terminated abnormally without proper resource release. Therefore, DPDK libs should be able to deal with unclean boot environment.

No, I thought we have made it clear, that a library should not remove a
file given by the application, the application should.


(BTW, please wrap your commit log in 80 chars).

	--yliu
  
Zhihong Wang Dec. 18, 2015, 2:39 a.m. UTC | #3
> On 17.12.2015 07:21, Zhihong Wang wrote:
> > This patch unlinks existing unused sockets (which cause new bindings to fail, e.g.
> vHost PMD) to ensure smooth startup.
> > In a lot of cases DPDK applications are terminated abnormally without proper
> resource release.
> 
> Original OVS related problem discussed previously here
> ( http://dpdk.org/ml/archives/dev/2015-December/030326.html ) fixed in OVS
> by
> 
> commit 9b5422a98f817b9f2a1f8224cab7e1a8d0bbba1f
> Author: Ilya Maximets <i.maximets@samsung.com>
> Date:   Wed Dec 16 15:32:21 2015 +0300
> 
>     ovs-lib: Try to call exit before killing.
> 
>     While killing OVS may not free all allocated resources.
> 
>     Example:
>         Socket for vhost-user port will stay in a system
>         after 'systemctl stop openvswitch' and opening
>         that port after restart will fail.
> 
> 
> So, the crash of application is the last point of discussion.
> 
> > Therefore, DPDK libs should be able to deal with unclean boot environment.
> 
> Why are you think that recovery after crash of application is a problem of
> underneath library?

Thanks for the information!

Yes ideally the underneath lib shouldn't meddle with the recovery logic.
But I do think we should at least put a warning in the lib function said the app should make the path available. This is another topic though :-)
Like we did in memcpy:
/**
 * Copy 16 bytes from one location to another,
 * locations should not overlap.
 */


> 
> Best regards, Ilya Maximets.
>
  
Ilya Maximets Dec. 18, 2015, 6:17 a.m. UTC | #4
On 18.12.2015 05:39, Wang, Zhihong wrote:

> Yes ideally the underneath lib shouldn't meddle with the recovery logic.
> But I do think we should at least put a warning in the lib function said the app should make the path available. This is another topic though :-)
> Like we did in memcpy:
> /**
>  * Copy 16 bytes from one location to another,
>  * locations should not overlap.
>  */
> 

Isn't it enough to have an error in the log?

lib/librte_vhost/vhost_user/vhost-net-user.c:130:
RTE_LOG(ERR, VHOST_CONFIG, "fail to bind fd:%d, remove file:%s and try again.\n",

Best regards, Ilya Maximets.
  
Zhihong Wang Dec. 21, 2015, 3:31 a.m. UTC | #5
> -----Original Message-----
> From: Ilya Maximets [mailto:i.maximets@samsung.com]
> Sent: Friday, December 18, 2015 2:18 PM
> To: Wang, Zhihong <zhihong.wang@intel.com>; dev@dpdk.org
> Cc: p.fedin@samsung.com; yuanhan.liu@linux.intel.com; s.dyasly@samsung.com;
> Xie, Huawei <huawei.xie@intel.com>
> Subject: Re: [PATCH] Unlink existing unused sockets at start up
> 
> On 18.12.2015 05:39, Wang, Zhihong wrote:
> 
> > Yes ideally the underneath lib shouldn't meddle with the recovery logic.
> > But I do think we should at least put a warning in the lib function
> > said the app should make the path available. This is another topic though :-)
> Like we did in memcpy:
> > /**
> >  * Copy 16 bytes from one location to another,
> >  * locations should not overlap.
> >  */
> >
> 
> Isn't it enough to have an error in the log?

Function comments and function code are different things and are both necessary.
Also why wait till error occurs when a comment can warn the developer?

> 
> lib/librte_vhost/vhost_user/vhost-net-user.c:130:
> RTE_LOG(ERR, VHOST_CONFIG, "fail to bind fd:%d, remove file:%s and try
> again.\n",
> 
> Best regards, Ilya Maximets.
  

Patch

diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c
index 8b7a448..eac0721 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -120,18 +120,38 @@  uds_socket(const char *path)
 	sockfd = socket(AF_UNIX, SOCK_STREAM, 0);
 	if (sockfd < 0)
 		return -1;
-	RTE_LOG(INFO, VHOST_CONFIG, "socket created, fd:%d\n", sockfd);
+	RTE_LOG(INFO, VHOST_CONFIG, "socket created, fd: %d\n", sockfd);
 
 	memset(&un, 0, sizeof(un));
 	un.sun_family = AF_UNIX;
 	snprintf(un.sun_path, sizeof(un.sun_path), "%s", path);
 	ret = bind(sockfd, (struct sockaddr *)&un, sizeof(un));
 	if (ret == -1) {
-		RTE_LOG(ERR, VHOST_CONFIG, "fail to bind fd:%d, remove file:%s and try again.\n",
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"bind fd: %d to file: %s failed, checking socket...\n",
 			sockfd, path);
-		goto err;
+		ret = connect(sockfd, (struct sockaddr *)&un, sizeof(un));
+		if (ret == -1) {
+			RTE_LOG(INFO, VHOST_CONFIG,
+				"socket: %s is inactive, rebinding after unlink...\n", path);
+			unlink(path);
+			ret = bind(sockfd, (struct sockaddr *)&un, sizeof(un));
+			if (ret == -1) {
+				RTE_LOG(ERR, VHOST_CONFIG,
+					"bind fd: %d to file: %s failed even after unlink\n",
+					sockfd, path);
+				goto err;
+			}
+		} else {
+			RTE_LOG(INFO, VHOST_CONFIG,
+				"socket: %s is alive, remove it and try again\n", path);
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"bind fd: %d to file: %s failed\n", sockfd, path);
+			goto err;
+		}
 	}
-	RTE_LOG(INFO, VHOST_CONFIG, "bind to %s\n", path);
+	RTE_LOG(INFO, VHOST_CONFIG,
+		"bind fd: %d to file: %s successful\n", sockfd, path);
 
 	ret = listen(sockfd, MAX_VIRTIO_BACKLOG);
 	if (ret == -1)