[3/4] eal: don't crash if alarm set fails

Message ID 20180725182019.31518-4-stephen@networkplumber.org (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series small cleanups |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Stephen Hemminger July 25, 2018, 6:20 p.m. UTC
  There is no need to call rte_exit and crash the application here;
better to let the application handle the error itself.

Remove the gratuitous profanity which would be visible if
the rte_exit was still there.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 lib/librte_eal/common/eal_common_proc.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)
  

Comments

Anatoly Burakov July 26, 2018, 9:34 a.m. UTC | #1
On 25-Jul-18 7:20 PM, Stephen Hemminger wrote:
> There is no need to call rte_exit and crash the application here;
> better to let the application handle the error itself.
> 
> Remove the gratuitous profanity which would be visible if
> the rte_exit was still there.
> 
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> ---

Oops, this was a "debug" message i accidentally left in :( My apologies!
  
Anatoly Burakov July 26, 2018, 9:41 a.m. UTC | #2
On 25-Jul-18 7:20 PM, Stephen Hemminger wrote:
> There is no need to call rte_exit and crash the application here;
> better to let the application handle the error itself.
> 
> Remove the gratuitous profanity which would be visible if
> the rte_exit was still there.
> 
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> ---
>   lib/librte_eal/common/eal_common_proc.c | 10 ++++------
>   1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c
> index 9fcb9121908d..07b7579c565a 100644
> --- a/lib/librte_eal/common/eal_common_proc.c
> +++ b/lib/librte_eal/common/eal_common_proc.c
> @@ -841,14 +841,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
>   
>   	param->user_reply.nb_sent++;
>   
> -	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
> -			      async_reply_handle, pending_req) < 0) {
> +	ret = rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
> +				async_reply_handle, pending_req);
> +	if (ret < 0)
>   		RTE_LOG(ERR, EAL, "Fail to set alarm for request %s:%s\n",
>   			dst, req->name);
> -		rte_panic("Fix the above shit to properly free all memory\n");

Profanity aside, i think the message was trying to tell me something - 
namely, that if alarm_set fails, we're risking to leak this memory if 
reply from the peer never comes, and we're risking leaving the 
application hanging because the timeout never triggers. I'm not sure if 
leaving this "to the user" is the right choice, because there is no way 
for the user to free IPC-internal memory if it leaks.

So i think the proper way to handle this would've been to set the alarm 
first, then, if it fails, don't sent the message in the first place.
  
Thomas Monjalon Sept. 18, 2018, 9:43 a.m. UTC | #3
26/07/2018 11:41, Burakov, Anatoly:
> On 25-Jul-18 7:20 PM, Stephen Hemminger wrote:
> > There is no need to call rte_exit and crash the application here;
> > better to let the application handle the error itself.
> > 
> > Remove the gratuitous profanity which would be visible if
> > the rte_exit was still there.
> > 
> > Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> > ---
> > --- a/lib/librte_eal/common/eal_common_proc.c
> > +++ b/lib/librte_eal/common/eal_common_proc.c
> > @@ -841,14 +841,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
> >   
> >   	param->user_reply.nb_sent++;
> >   
> > -	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
> > -			      async_reply_handle, pending_req) < 0) {
> > +	ret = rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
> > +				async_reply_handle, pending_req);
> > +	if (ret < 0)
> >   		RTE_LOG(ERR, EAL, "Fail to set alarm for request %s:%s\n",
> >   			dst, req->name);
> > -		rte_panic("Fix the above shit to properly free all memory\n");
> 
> Profanity aside, i think the message was trying to tell me something - 
> namely, that if alarm_set fails, we're risking to leak this memory if 
> reply from the peer never comes, and we're risking leaving the 
> application hanging because the timeout never triggers. I'm not sure if 
> leaving this "to the user" is the right choice, because there is no way 
> for the user to free IPC-internal memory if it leaks.
> 
> So i think the proper way to handle this would've been to set the alarm 
> first, then, if it fails, don't sent the message in the first place.

What should be done here? OK to remove rte_panic for now?
  
Anatoly Burakov Sept. 18, 2018, 10:16 a.m. UTC | #4
On 18-Sep-18 10:43 AM, Thomas Monjalon wrote:
> 26/07/2018 11:41, Burakov, Anatoly:
>> On 25-Jul-18 7:20 PM, Stephen Hemminger wrote:
>>> There is no need to call rte_exit and crash the application here;
>>> better to let the application handle the error itself.
>>>
>>> Remove the gratuitous profanity which would be visible if
>>> the rte_exit was still there.
>>>
>>> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
>>> ---
>>> --- a/lib/librte_eal/common/eal_common_proc.c
>>> +++ b/lib/librte_eal/common/eal_common_proc.c
>>> @@ -841,14 +841,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
>>>    
>>>    	param->user_reply.nb_sent++;
>>>    
>>> -	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
>>> -			      async_reply_handle, pending_req) < 0) {
>>> +	ret = rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
>>> +				async_reply_handle, pending_req);
>>> +	if (ret < 0)
>>>    		RTE_LOG(ERR, EAL, "Fail to set alarm for request %s:%s\n",
>>>    			dst, req->name);
>>> -		rte_panic("Fix the above shit to properly free all memory\n");
>>
>> Profanity aside, i think the message was trying to tell me something -
>> namely, that if alarm_set fails, we're risking to leak this memory if
>> reply from the peer never comes, and we're risking leaving the
>> application hanging because the timeout never triggers. I'm not sure if
>> leaving this "to the user" is the right choice, because there is no way
>> for the user to free IPC-internal memory if it leaks.
>>
>> So i think the proper way to handle this would've been to set the alarm
>> first, then, if it fails, don't sent the message in the first place.
> 
> What should be done here? OK to remove rte_panic for now?
> 

As i said, the above fix is wrong because it leaks memory (however 
unlikely it may be).

The alarm set call should be moved to before we do send_msg() call (and 
goto fail; on failure). That way, even if alarm triggers too early (i.e. 
immediately), the requests tailq will still be locked until we complete 
our request sends - so we appropriately free memory on response, on 
timeout or in our failure handler if alarm set has failed.
  
Thomas Monjalon Oct. 24, 2018, 11:51 p.m. UTC | #5
18/09/2018 12:16, Burakov, Anatoly:
> On 18-Sep-18 10:43 AM, Thomas Monjalon wrote:
> > 26/07/2018 11:41, Burakov, Anatoly:
> >> On 25-Jul-18 7:20 PM, Stephen Hemminger wrote:
> >>> There is no need to call rte_exit and crash the application here;
> >>> better to let the application handle the error itself.
> >>>
> >>> Remove the gratuitous profanity which would be visible if
> >>> the rte_exit was still there.
> >>>
> >>> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> >>> ---
> >>> --- a/lib/librte_eal/common/eal_common_proc.c
> >>> +++ b/lib/librte_eal/common/eal_common_proc.c
> >>> @@ -841,14 +841,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
> >>>    
> >>>    	param->user_reply.nb_sent++;
> >>>    
> >>> -	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
> >>> -			      async_reply_handle, pending_req) < 0) {
> >>> +	ret = rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
> >>> +				async_reply_handle, pending_req);
> >>> +	if (ret < 0)
> >>>    		RTE_LOG(ERR, EAL, "Fail to set alarm for request %s:%s\n",
> >>>    			dst, req->name);
> >>> -		rte_panic("Fix the above shit to properly free all memory\n");
> >>
> >> Profanity aside, i think the message was trying to tell me something -
> >> namely, that if alarm_set fails, we're risking to leak this memory if
> >> reply from the peer never comes, and we're risking leaving the
> >> application hanging because the timeout never triggers. I'm not sure if
> >> leaving this "to the user" is the right choice, because there is no way
> >> for the user to free IPC-internal memory if it leaks.
> >>
> >> So i think the proper way to handle this would've been to set the alarm
> >> first, then, if it fails, don't sent the message in the first place.
> > 
> > What should be done here? OK to remove rte_panic for now?
> > 
> 
> As i said, the above fix is wrong because it leaks memory (however 
> unlikely it may be).
> 
> The alarm set call should be moved to before we do send_msg() call (and 
> goto fail; on failure). That way, even if alarm triggers too early (i.e. 
> immediately), the requests tailq will still be locked until we complete 
> our request sends - so we appropriately free memory on response, on 
> timeout or in our failure handler if alarm set has failed.

Someone to fix it, please?
  
Anatoly Burakov Oct. 25, 2018, 2:04 p.m. UTC | #6
On 25-Oct-18 12:51 AM, Thomas Monjalon wrote:
> 18/09/2018 12:16, Burakov, Anatoly:
>> On 18-Sep-18 10:43 AM, Thomas Monjalon wrote:
>>> 26/07/2018 11:41, Burakov, Anatoly:
>>>> On 25-Jul-18 7:20 PM, Stephen Hemminger wrote:
>>>>> There is no need to call rte_exit and crash the application here;
>>>>> better to let the application handle the error itself.
>>>>>
>>>>> Remove the gratuitous profanity which would be visible if
>>>>> the rte_exit was still there.
>>>>>
>>>>> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
>>>>> ---
>>>>> --- a/lib/librte_eal/common/eal_common_proc.c
>>>>> +++ b/lib/librte_eal/common/eal_common_proc.c
>>>>> @@ -841,14 +841,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
>>>>>     
>>>>>     	param->user_reply.nb_sent++;
>>>>>     
>>>>> -	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
>>>>> -			      async_reply_handle, pending_req) < 0) {
>>>>> +	ret = rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
>>>>> +				async_reply_handle, pending_req);
>>>>> +	if (ret < 0)
>>>>>     		RTE_LOG(ERR, EAL, "Fail to set alarm for request %s:%s\n",
>>>>>     			dst, req->name);
>>>>> -		rte_panic("Fix the above shit to properly free all memory\n");
>>>>
>>>> Profanity aside, i think the message was trying to tell me something -
>>>> namely, that if alarm_set fails, we're risking to leak this memory if
>>>> reply from the peer never comes, and we're risking leaving the
>>>> application hanging because the timeout never triggers. I'm not sure if
>>>> leaving this "to the user" is the right choice, because there is no way
>>>> for the user to free IPC-internal memory if it leaks.
>>>>
>>>> So i think the proper way to handle this would've been to set the alarm
>>>> first, then, if it fails, don't sent the message in the first place.
>>>
>>> What should be done here? OK to remove rte_panic for now?
>>>
>>
>> As i said, the above fix is wrong because it leaks memory (however
>> unlikely it may be).
>>
>> The alarm set call should be moved to before we do send_msg() call (and
>> goto fail; on failure). That way, even if alarm triggers too early (i.e.
>> immediately), the requests tailq will still be locked until we complete
>> our request sends - so we appropriately free memory on response, on
>> timeout or in our failure handler if alarm set has failed.
> 
> Someone to fix it, please?
> 

I'll do it.
  

Patch

diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c
index 9fcb9121908d..07b7579c565a 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -841,14 +841,12 @@  mp_request_async(const char *dst, struct rte_mp_msg *req,
 
 	param->user_reply.nb_sent++;
 
-	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
-			      async_reply_handle, pending_req) < 0) {
+	ret = rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
+				async_reply_handle, pending_req);
+	if (ret < 0)
 		RTE_LOG(ERR, EAL, "Fail to set alarm for request %s:%s\n",
 			dst, req->name);
-		rte_panic("Fix the above shit to properly free all memory\n");
-	}
-
-	return 0;
+	return ret;
 fail:
 	free(pending_req);
 	free(reply_msg);