[V4,3/9] bus: introduce sigbus handler

Message ID 1530268248-7328-4-git-send-email-jia.guo@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [V4,1/9] bus: introduce hotplug failure handler |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail apply issues

Commit Message

Guo, Jia June 29, 2018, 10:30 a.m. UTC
  When device be hotplug, if data path still read/write device, the sigbus
error will occur, this error need to be handled. So a handler need to be
here to capture the signal and handle it correspondingly.

To handle sigbus error is a bus-specific behavior, this patch introduces
a bus ops so that each kind of bus can implement its own logic.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v4->v3:
split patches to be small and clear.
---
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)
  

Comments

Stephen Hemminger July 10, 2018, 9:55 p.m. UTC | #1
On Fri, 29 Jun 2018 18:30:42 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> When device be hotplug, if data path still read/write device, the sigbus
> error will occur, this error need to be handled. So a handler need to be
> here to capture the signal and handle it correspondingly.
> 
> To handle sigbus error is a bus-specific behavior, this patch introduces
> a bus ops so that each kind of bus can implement its own logic.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v4->v3:
> split patches to be small and clear.
> ---
>  lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
> index 3642aeb..231bd3d 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -181,6 +181,20 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
>  typedef int (*rte_bus_hotplug_handler_t)(struct rte_device *dev);
>  
>  /**
> + * Implementation a specific sigbus handler, which is responsible
> + * for handle the sigbus error which is original memory error, or specific
> + * memory error that caused of hot unplug.
> + * @param failure_addr
> + *	Pointer of the fault address of the sigbus error.
> + *
> + * @return
> + *	0 for success handle the sigbus.
> + *	1 for no handle the sigbus.
> + *	-1 for failed to handle the sigbus
> + */
> +typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
> +
> +/**
>   * Bus scan policies
>   */
>  enum rte_bus_scan_mode {
> @@ -226,6 +240,8 @@ struct rte_bus {
>  	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
>  	rte_bus_hotplug_handler_t hotplug_handler;
>  						/**< handle hot plug on bus */
> +	rte_bus_sigbus_handler_t sigbus_handler; /**< handle sigbus error */
> +
>  };
>  
>  /**

One issue with handling sigbus is that you are going to trap program errors
as well as hotplug. How can you distinguish between removed device and a
buggy userspace program (or worse comprimised program)?
  
Guo, Jia July 11, 2018, 2:15 a.m. UTC | #2
On 7/11/2018 5:55 AM, Stephen Hemminger wrote:
> On Fri, 29 Jun 2018 18:30:42 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
>
>> When device be hotplug, if data path still read/write device, the sigbus
>> error will occur, this error need to be handled. So a handler need to be
>> here to capture the signal and handle it correspondingly.
>>
>> To handle sigbus error is a bus-specific behavior, this patch introduces
>> a bus ops so that each kind of bus can implement its own logic.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v4->v3:
>> split patches to be small and clear.
>> ---
>>   lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
>>   1 file changed, 16 insertions(+)
>>
>> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
>> index 3642aeb..231bd3d 100644
>> --- a/lib/librte_eal/common/include/rte_bus.h
>> +++ b/lib/librte_eal/common/include/rte_bus.h
>> @@ -181,6 +181,20 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
>>   typedef int (*rte_bus_hotplug_handler_t)(struct rte_device *dev);
>>   
>>   /**
>> + * Implementation a specific sigbus handler, which is responsible
>> + * for handle the sigbus error which is original memory error, or specific
>> + * memory error that caused of hot unplug.
>> + * @param failure_addr
>> + *	Pointer of the fault address of the sigbus error.
>> + *
>> + * @return
>> + *	0 for success handle the sigbus.
>> + *	1 for no handle the sigbus.
>> + *	-1 for failed to handle the sigbus
>> + */
>> +typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
>> +
>> +/**
>>    * Bus scan policies
>>    */
>>   enum rte_bus_scan_mode {
>> @@ -226,6 +240,8 @@ struct rte_bus {
>>   	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
>>   	rte_bus_hotplug_handler_t hotplug_handler;
>>   						/**< handle hot plug on bus */
>> +	rte_bus_sigbus_handler_t sigbus_handler; /**< handle sigbus error */
>> +
>>   };
>>   
>>   /**
> One issue with handling sigbus is that you are going to trap program errors
> as well as hotplug. How can you distinguish between removed device and a
> buggy userspace program (or worse comprimised program)?
That is a problem which i have been considerate in this mechanism and do 
it in other patch, the way is that first check if the error domain is 
belong to the mmio device resource or not,
if it is will do new sigbus handler for hotplug, if not will mean that 
it is buggy user space program, will use generic sigbus handler to 
handler it.
  

Patch

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 3642aeb..231bd3d 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -181,6 +181,20 @@  typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 typedef int (*rte_bus_hotplug_handler_t)(struct rte_device *dev);
 
 /**
+ * Implementation a specific sigbus handler, which is responsible
+ * for handle the sigbus error which is original memory error, or specific
+ * memory error that caused of hot unplug.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	0 for success handle the sigbus.
+ *	1 for no handle the sigbus.
+ *	-1 for failed to handle the sigbus
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -226,6 +240,8 @@  struct rte_bus {
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 	rte_bus_hotplug_handler_t hotplug_handler;
 						/**< handle hot plug on bus */
+	rte_bus_sigbus_handler_t sigbus_handler; /**< handle sigbus error */
+
 };
 
 /**