[dpdk-dev,v2] igb_uio: prevent reset for a list of devices
Checks
Commit Message
Some devices are having problem on device reset that happens during DPDK
application exit [1].
Create a static list of devices and exclude them from device reset.
[1]
http://dpdk.org/ml/archives/dev/2017-November/080927.html
Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device file")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
Cc: Jianfeng Tan <jianfeng.tan@intel.com>
Cc: Jingjing Wu <jingjing.wu@intel.com>
Cc: Shijith Thotton <shijith.thotton@caviumnetworks.com>
Cc: Gregory Etelson <gregory@weka.io>
Cc: Harish Patil <harish.patil@cavium.com>
Cc: George Prekas <george.prekas@epfl.ch>
Cc: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Cc: Rasesh Mody <rasesh.mody@cavium.com>
Cc: Lee Roberts <lee.roberts@hpe.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
This is alternative approach to
http://dpdk.org/dev/patchwork/patch/31144/
v2:
* more concise function, no change in functionality
---
lib/librte_eal/linuxapp/igb_uio/compat.h | 19 ++++++++++++++++++-
lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 8 +++++++-
2 files changed, 25 insertions(+), 2 deletions(-)
Comments
06/11/2017 19:48, Ferruh Yigit:
> Some devices are having problem on device reset that happens during DPDK
> application exit [1].
>
> Create a static list of devices and exclude them from device reset.
>
> [1]
> http://dpdk.org/ml/archives/dev/2017-November/080927.html
>
> Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device file")
> Cc: stable@dpdk.org
>
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Applied, thanks
An option may be required to disable this exception
which may be a security hole.
We still have an issue with this and PCI pass-through. If a guest is
restarted while using PCI pass-through and igb_uio issues a
pci_reset_function(), this causes the host to crash.
On Mon, Nov 6, 2017 at 6:55 PM, Thomas Monjalon <thomas@monjalon.net> wrote:
> 06/11/2017 19:48, Ferruh Yigit:
> > Some devices are having problem on device reset that happens during DPDK
> > application exit [1].
> >
> > Create a static list of devices and exclude them from device reset.
> >
> > [1]
> > http://dpdk.org/ml/archives/dev/2017-November/080927.html
> >
> > Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of
> device file")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>
> Applied, thanks
>
> An option may be required to disable this exception
> which may be a security hole.
>
>
07/11/2017 12:50, Chas Williams:
> We still have an issue with this and PCI pass-through. If a guest is
> restarted while using PCI pass-through and igb_uio issues a
> pci_reset_function(), this causes the host to crash.
Please, could you better explain the exact scenario and the cause of the crash?
Thanks
On Nov 7, 2017 20:50, "Chas Williams" <3chas3@gmail.com> wrote:
We still have an issue with this and PCI pass-through. If a guest is
restarted while using PCI pass-through and igb_uio issues a
pci_reset_function(), this causes the host to crash.
On Mon, Nov 6, 2017 at 6:55 PM, Thomas Monjalon <thomas@monjalon.net> wrote:
> 06/11/2017 19:48, Ferruh Yigit:
> > Some devices are having problem on device reset that happens during DPDK
> > application exit [1].
> >
> > Create a static list of devices and exclude them from device reset.
> >
> > [1]
> > http://dpdk.org/ml/archives/dev/2017-November/080927.html
> >
> > Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of
> device file")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>
> Applied, thanks
>
> An option may be required to disable this exception
> which may be a security hole.
>
>
Which host. Anything guest can do to crash host is a high severity big in
the host
Environment: Dell PowerEdge R730, Intel Corporation 82599ES 10-Gigabit
SFI/SFP+ Network Connection shared via PCI pass-through
Host: Debian 8
Guest: Custom Debian 8 with DPDK application based on 17.11
When we shutdown the guest, the kernel panics with:
[ 279.021818] Do you have a strange power saving mode enabled?
[ 279.021819] Dazed and confused, but trying to continue
[ 279.021847] {1}[Hardware Error]: Hardware error from APEI Generic
Hardware Error Source: 3
[ 279.021849] {1}[Hardware Error]: event severity: fatal
[ 279.021850] {1}[Hardware Error]: Error 0, type: fatal
[ 279.021851] {1}[Hardware Error]: section_type: PCIe error
[ 279.021852] {1}[Hardware Error]: port_type: 0, PCIe end point
[ 279.021853] {1}[Hardware Error]: version: 1.16
[ 279.021854] {1}[Hardware Error]: command: 0x0507, status: 0x4010
[ 279.021855] {1}[Hardware Error]: device_id: 0000:03:00.0
[ 279.021855] {1}[Hardware Error]: slot: 0
[ 279.021856] {1}[Hardware Error]: secondary_bus: 0x00
[ 279.021857] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x10fb
[ 279.021858] {1}[Hardware Error]: class_code: 000002
[ 279.021859] Kernel panic - not syncing: Fatal hardware error!
[ 279.021977] sched: Unexpected reschedule of offline CPU#1!
[ 279.021984] ------------[ cut here ]------------
[ 279.021992] WARNING: CPU: 43 PID: 2807 at
/build/linux-fHlJSJ/linux-4.12.6/arch/x86/kernel/smp.c:128
native_smp_send_reschedule+0x34/0x40
[ 279.021993] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1
vfio openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c
crc32c_generic nfsd nfs_aclr
pcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc
fscache tun intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp
coretemp kvm_intel kvm irqbypass mgag200 ttm drm_kms_helper drm joydev
crct10dif_pclmul crc32_pclmu
l ghash_clmulni_intel i2c_algo_bit ipmi_si ipmi_devintf iTCO_wdt
intel_cstate iTCO_vendor_support evdev intel_uncore mxm_wmi lpc_ich
ipmi_msghandler mfd_core ioatdma intel_rapl_perf dcdbas pcspkr shpchp
mei_me button wmi mei acpi_power_m
eter tpm_crb autofs4 ext4 crc16 jbd2 fscrypto mbcache sr_mod cdrom sg
hid_generic usbhid hid sd_mod
[ 279.022044] crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd
glue_helper ahci ehci_pci libahci ehci_hcd ixgbe libata megaraid_sas
usbcore dca i40e usb_common ptp pps_core scsi_mod mdio
[ 279.022060] CPU: 43 PID: 2807 Comm: revalidator85 Not tainted
4.12.0-1-amd64 #1 Debian 4.12.6-1
[ 279.022061] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.3.4
11/08/2016
[ 279.022062] task: ffff91d0473f7100 task.stack: ffffafef8f4a4000
[ 279.022066] RIP: 0010:native_smp_send_reschedule+0x34/0x40
[ 279.022067] RSP: 0018:ffffafef8f4a7c98 EFLAGS: 00010082
[ 279.022069] RAX: 000000000000002e RBX: ffff91d059d24080 RCX:
0000000000000001
[ 279.022070] RDX: 0000000000000000 RSI: 0000000000000002 RDI:
0000000000000046
[ 279.022071] RBP: ffff91d04691d100 R08: 0000000000000000 R09:
000000000000002e
[ 279.022072] R10: ffffafef8f4a7c90 R11: 00000000001cbb78 R12:
ffff91d85d21ae80
[ 279.022073] R13: ffff91d059d24000 R14: 0000000000000002 R15:
0000000000000008
[ 279.022075] FS: 00007f726affd700(0000) GS:ffff91d85d740000(0000)
knlGS:0000000000000000
[ 279.022076] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 279.022077] CR2: 00007fd422a52c48 CR3: 000000042d90f000 CR4:
00000000003426e0
[ 279.022078] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 279.022079] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 279.022080] Call Trace:
[ 279.022086] ? check_preempt_wakeup+0x181/0x220
[ 279.022091] ? check_preempt_curr+0x74/0x80
[ 279.022094] ? ttwu_do_wakeup+0x19/0x140
[ 279.022098] ? try_to_wake_up+0x1b8/0x470
[ 279.022101] ? wake_up_q+0x3f/0x70
[ 279.022106] ? futex_wake+0x15a/0x170
[ 279.022108] ? do_futex+0x2df/0xa90
[ 279.022111] ? SyS_futex+0x7a/0x170
[ 279.022113] ? SyS_read+0x76/0xc0
[ 279.022118] ? system_call_fast_compare_end+0xc/0x97
[ 279.022119] Code: a3 05 51 fb cc 00 73 15 48 8b 05 28 74 a3 00 be fd 00
00 00 48 8b 80 a0 00 00 00 ff e0 89 fe 48 c7 c7 88 5c de b6 e8 e2 c9 13 00
<0f> ff c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 05 5d 00
[ 279.022151] ---[ end trace eddc980dc8648163 ]---
[ 279.454274] Kernel Offset: 0x35400000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
The test engineer says this doesn't happen if we use SRIOV (which makes
sense since the device isn't directly shared between the guest and the
host). If I remove the pci_reset_function() from igb_uio's .release, then
all is well.
On Tue, Nov 7, 2017 at 8:02 AM, Thomas Monjalon <thomas@monjalon.net> wrote:
> 07/11/2017 12:50, Chas Williams:
> > We still have an issue with this and PCI pass-through. If a guest is
> > restarted while using PCI pass-through and igb_uio issues a
> > pci_reset_function(), this causes the host to crash.
>
> Please, could you better explain the exact scenario and the cause of the
> crash?
> Thanks
>
>
Regardless if the issue is actually in the host kernel, I cannot fix all
the hypervisors so I must attempt to be well behaved as a guest.
On Tue, Nov 7, 2017 at 8:13 AM, Stephen Hemminger <
stephen@networkplumber.org> wrote:
>
>
> On Nov 7, 2017 20:50, "Chas Williams" <3chas3@gmail.com> wrote:
>
> We still have an issue with this and PCI pass-through. If a guest is
> restarted while using PCI pass-through and igb_uio issues a
> pci_reset_function(), this causes the host to crash.
>
> On Mon, Nov 6, 2017 at 6:55 PM, Thomas Monjalon <thomas@monjalon.net>
> wrote:
>
>> 06/11/2017 19:48, Ferruh Yigit:
>> > Some devices are having problem on device reset that happens during DPDK
>> > application exit [1].
>> >
>> > Create a static list of devices and exclude them from device reset.
>> >
>> > [1]
>> > http://dpdk.org/ml/archives/dev/2017-November/080927.html
>> >
>> > Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of
>> device file")
>> > Cc: stable@dpdk.org
>> >
>> > Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>>
>> Applied, thanks
>>
>> An option may be required to disable this exception
>> which may be a security hole.
>>
>>
>
> Which host. Anything guest can do to crash host is a high severity big in
> the host
>
On 11/7/2017 10:12 AM, Chas Williams wrote:
> Environment: Dell PowerEdge R730, Intel Corporation 82599ES 10-Gigabit SFI/SFP+
> Network Connection shared via PCI pass-through
> Host: Debian 8
> Guest: Custom Debian 8 with DPDK application based on 17.11
>
> When we shutdown the guest, the kernel panics with:
>
> [ 279.021818] Do you have a strange power saving mode enabled?
> [ 279.021819] Dazed and confused, but trying to continue
> [ 279.021847] {1}[Hardware Error]: Hardware error from APEI Generic Hardware
> Error Source: 3
> [ 279.021849] {1}[Hardware Error]: event severity: fatal
> [ 279.021850] {1}[Hardware Error]: Error 0, type: fatal
> [ 279.021851] {1}[Hardware Error]: section_type: PCIe error
> [ 279.021852] {1}[Hardware Error]: port_type: 0, PCIe end point
> [ 279.021853] {1}[Hardware Error]: version: 1.16
> [ 279.021854] {1}[Hardware Error]: command: 0x0507, status: 0x4010
> [ 279.021855] {1}[Hardware Error]: device_id: 0000:03:00.0
> [ 279.021855] {1}[Hardware Error]: slot: 0
> [ 279.021856] {1}[Hardware Error]: secondary_bus: 0x00
> [ 279.021857] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x10fb
> [ 279.021858] {1}[Hardware Error]: class_code: 000002
> [ 279.021859] Kernel panic - not syncing: Fatal hardware error!
> [ 279.021977] sched: Unexpected reschedule of offline CPU#1!
> [ 279.021984] ------------[ cut here ]------------
> [ 279.021992] WARNING: CPU: 43 PID: 2807 at
> /build/linux-fHlJSJ/linux-4.12.6/arch/x86/kernel/smp.c:128
> native_smp_send_reschedule+0x34/0x40
> [ 279.021993] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio
> openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4
> nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c crc32c_generic nfsd
> nfs_aclr
> pcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache tun
> intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
> irqbypass mgag200 ttm drm_kms_helper drm joydev crct10dif_pclmul crc32_pclmu
> l ghash_clmulni_intel i2c_algo_bit ipmi_si ipmi_devintf iTCO_wdt intel_cstate
> iTCO_vendor_support evdev intel_uncore mxm_wmi lpc_ich ipmi_msghandler mfd_core
> ioatdma intel_rapl_perf dcdbas pcspkr shpchp mei_me button wmi mei acpi_power_m
> eter tpm_crb autofs4 ext4 crc16 jbd2 fscrypto mbcache sr_mod cdrom sg
> hid_generic usbhid hid sd_mod
> [ 279.022044] crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd
> glue_helper ahci ehci_pci libahci ehci_hcd ixgbe libata megaraid_sas usbcore dca
> i40e usb_common ptp pps_core scsi_mod mdio
> [ 279.022060] CPU: 43 PID: 2807 Comm: revalidator85 Not tainted 4.12.0-1-amd64
> #1 Debian 4.12.6-1
> [ 279.022061] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.3.4 11/08/2016
> [ 279.022062] task: ffff91d0473f7100 task.stack: ffffafef8f4a4000
> [ 279.022066] RIP: 0010:native_smp_send_reschedule+0x34/0x40
> [ 279.022067] RSP: 0018:ffffafef8f4a7c98 EFLAGS: 00010082
> [ 279.022069] RAX: 000000000000002e RBX: ffff91d059d24080 RCX: 0000000000000001
> [ 279.022070] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000046
> [ 279.022071] RBP: ffff91d04691d100 R08: 0000000000000000 R09: 000000000000002e
> [ 279.022072] R10: ffffafef8f4a7c90 R11: 00000000001cbb78 R12: ffff91d85d21ae80
> [ 279.022073] R13: ffff91d059d24000 R14: 0000000000000002 R15: 0000000000000008
> [ 279.022075] FS: 00007f726affd700(0000) GS:ffff91d85d740000(0000)
> knlGS:0000000000000000
> [ 279.022076] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 279.022077] CR2: 00007fd422a52c48 CR3: 000000042d90f000 CR4: 00000000003426e0
> [ 279.022078] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 279.022079] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 279.022080] Call Trace:
> [ 279.022086] ? check_preempt_wakeup+0x181/0x220
> [ 279.022091] ? check_preempt_curr+0x74/0x80
> [ 279.022094] ? ttwu_do_wakeup+0x19/0x140
> [ 279.022098] ? try_to_wake_up+0x1b8/0x470
> [ 279.022101] ? wake_up_q+0x3f/0x70
> [ 279.022106] ? futex_wake+0x15a/0x170
> [ 279.022108] ? do_futex+0x2df/0xa90
> [ 279.022111] ? SyS_futex+0x7a/0x170
> [ 279.022113] ? SyS_read+0x76/0xc0
> [ 279.022118] ? system_call_fast_compare_end+0xc/0x97
> [ 279.022119] Code: a3 05 51 fb cc 00 73 15 48 8b 05 28 74 a3 00 be fd 00 00 00
> 48 8b 80 a0 00 00 00 ff e0 89 fe 48 c7 c7 88 5c de b6 e8 e2 c9 13 00 <0f> ff c3
> 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 05 5d 00
> [ 279.022151] ---[ end trace eddc980dc8648163 ]---
> [ 279.454274] Kernel Offset: 0x35400000 from 0xffffffff81000000 (relocation
> range: 0xffffffff80000000-0xffffffffbfffffff)
>
> The test engineer says this doesn't happen if we use SRIOV (which makes sense
> since the device isn't directly shared between the guest and the host). If I
> remove the pci_reset_function() from igb_uio's .release, then all is well.
This was tougher than expected, so many unexpected behavior. Why resetting
pass-through device in guest cause a crash in the host?
Finally, I will send a patch to remove the reset. Hopefully no more surprises
for release.
Still there will remain two improvement in igb_uio for better security,
disabling device interrupt on exit and clear master on exit.
>
>
> On Tue, Nov 7, 2017 at 8:02 AM, Thomas Monjalon <thomas@monjalon.net
> <mailto:thomas@monjalon.net>> wrote:
>
> 07/11/2017 12:50, Chas Williams:
> > We still have an issue with this and PCI pass-through. If a guest is
> > restarted while using PCI pass-through and igb_uio issues a
> > pci_reset_function(), this causes the host to crash.
>
> Please, could you better explain the exact scenario and the cause of the crash?
> Thanks
>
>
I will confess I haven't looked into the issue too hard since I have a
workaround. My first guess is that there is something going on with the
IOMMU and quiescing a PCI pass-through device/function from the guest
(since I don't think the IOMMU is "visible" to the guest) seems iffy.
Most devices have some sort of reset to put the device into a known state
for setup/configuration (or enable/disable for the DMA engines). If this
is done at .dev_close(), shouldn't that be as sufficient as resetting the
function?
On Tue, Nov 7, 2017 at 1:49 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> On 11/7/2017 10:12 AM, Chas Williams wrote:
> > Environment: Dell PowerEdge R730, Intel Corporation 82599ES 10-Gigabit
> SFI/SFP+
> > Network Connection shared via PCI pass-through
> > Host: Debian 8
> > Guest: Custom Debian 8 with DPDK application based on 17.11
> >
> > When we shutdown the guest, the kernel panics with:
> >
> > [ 279.021818] Do you have a strange power saving mode enabled?
> > [ 279.021819] Dazed and confused, but trying to continue
> > [ 279.021847] {1}[Hardware Error]: Hardware error from APEI Generic
> Hardware
> > Error Source: 3
> > [ 279.021849] {1}[Hardware Error]: event severity: fatal
> > [ 279.021850] {1}[Hardware Error]: Error 0, type: fatal
> > [ 279.021851] {1}[Hardware Error]: section_type: PCIe error
> > [ 279.021852] {1}[Hardware Error]: port_type: 0, PCIe end point
> > [ 279.021853] {1}[Hardware Error]: version: 1.16
> > [ 279.021854] {1}[Hardware Error]: command: 0x0507, status: 0x4010
> > [ 279.021855] {1}[Hardware Error]: device_id: 0000:03:00.0
> > [ 279.021855] {1}[Hardware Error]: slot: 0
> > [ 279.021856] {1}[Hardware Error]: secondary_bus: 0x00
> > [ 279.021857] {1}[Hardware Error]: vendor_id: 0x8086, device_id:
> 0x10fb
> > [ 279.021858] {1}[Hardware Error]: class_code: 000002
> > [ 279.021859] Kernel panic - not syncing: Fatal hardware error!
> > [ 279.021977] sched: Unexpected reschedule of offline CPU#1!
> > [ 279.021984] ------------[ cut here ]------------
> > [ 279.021992] WARNING: CPU: 43 PID: 2807 at
> > /build/linux-fHlJSJ/linux-4.12.6/arch/x86/kernel/smp.c:128
> > native_smp_send_reschedule+0x34/0x40
> > [ 279.021993] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1
> vfio
> > openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4
> nf_defrag_ipv4
> > nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c crc32c_generic
> nfsd
> > nfs_aclr
> > pcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc
> fscache tun
> > intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp
> kvm_intel kvm
> > irqbypass mgag200 ttm drm_kms_helper drm joydev crct10dif_pclmul
> crc32_pclmu
> > l ghash_clmulni_intel i2c_algo_bit ipmi_si ipmi_devintf iTCO_wdt
> intel_cstate
> > iTCO_vendor_support evdev intel_uncore mxm_wmi lpc_ich ipmi_msghandler
> mfd_core
> > ioatdma intel_rapl_perf dcdbas pcspkr shpchp mei_me button wmi mei
> acpi_power_m
> > eter tpm_crb autofs4 ext4 crc16 jbd2 fscrypto mbcache sr_mod cdrom sg
> > hid_generic usbhid hid sd_mod
> > [ 279.022044] crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd
> > glue_helper ahci ehci_pci libahci ehci_hcd ixgbe libata megaraid_sas
> usbcore dca
> > i40e usb_common ptp pps_core scsi_mod mdio
> > [ 279.022060] CPU: 43 PID: 2807 Comm: revalidator85 Not tainted
> 4.12.0-1-amd64
> > #1 Debian 4.12.6-1
> > [ 279.022061] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS
> 2.3.4 11/08/2016
> > [ 279.022062] task: ffff91d0473f7100 task.stack: ffffafef8f4a4000
> > [ 279.022066] RIP: 0010:native_smp_send_reschedule+0x34/0x40
> > [ 279.022067] RSP: 0018:ffffafef8f4a7c98 EFLAGS: 00010082
> > [ 279.022069] RAX: 000000000000002e RBX: ffff91d059d24080 RCX:
> 0000000000000001
> > [ 279.022070] RDX: 0000000000000000 RSI: 0000000000000002 RDI:
> 0000000000000046
> > [ 279.022071] RBP: ffff91d04691d100 R08: 0000000000000000 R09:
> 000000000000002e
> > [ 279.022072] R10: ffffafef8f4a7c90 R11: 00000000001cbb78 R12:
> ffff91d85d21ae80
> > [ 279.022073] R13: ffff91d059d24000 R14: 0000000000000002 R15:
> 0000000000000008
> > [ 279.022075] FS: 00007f726affd700(0000) GS:ffff91d85d740000(0000)
> > knlGS:0000000000000000
> > [ 279.022076] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 279.022077] CR2: 00007fd422a52c48 CR3: 000000042d90f000 CR4:
> 00000000003426e0
> > [ 279.022078] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> > [ 279.022079] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> > [ 279.022080] Call Trace:
> > [ 279.022086] ? check_preempt_wakeup+0x181/0x220
> > [ 279.022091] ? check_preempt_curr+0x74/0x80
> > [ 279.022094] ? ttwu_do_wakeup+0x19/0x140
> > [ 279.022098] ? try_to_wake_up+0x1b8/0x470
> > [ 279.022101] ? wake_up_q+0x3f/0x70
> > [ 279.022106] ? futex_wake+0x15a/0x170
> > [ 279.022108] ? do_futex+0x2df/0xa90
> > [ 279.022111] ? SyS_futex+0x7a/0x170
> > [ 279.022113] ? SyS_read+0x76/0xc0
> > [ 279.022118] ? system_call_fast_compare_end+0xc/0x97
> > [ 279.022119] Code: a3 05 51 fb cc 00 73 15 48 8b 05 28 74 a3 00 be fd
> 00 00 00
> > 48 8b 80 a0 00 00 00 ff e0 89 fe 48 c7 c7 88 5c de b6 e8 e2 c9 13 00
> <0f> ff c3
> > 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 05 5d 00
> > [ 279.022151] ---[ end trace eddc980dc8648163 ]---
> > [ 279.454274] Kernel Offset: 0x35400000 from 0xffffffff81000000
> (relocation
> > range: 0xffffffff80000000-0xffffffffbfffffff)
> >
> > The test engineer says this doesn't happen if we use SRIOV (which makes
> sense
> > since the device isn't directly shared between the guest and the host).
> If I
> > remove the pci_reset_function() from igb_uio's .release, then all is
> well.
>
> This was tougher than expected, so many unexpected behavior. Why resetting
> pass-through device in guest cause a crash in the host?
>
> Finally, I will send a patch to remove the reset. Hopefully no more
> surprises
> for release.
>
> Still there will remain two improvement in igb_uio for better security,
> disabling device interrupt on exit and clear master on exit.
>
> >
> >
> > On Tue, Nov 7, 2017 at 8:02 AM, Thomas Monjalon <thomas@monjalon.net
> > <mailto:thomas@monjalon.net>> wrote:
> >
> > 07/11/2017 12:50, Chas Williams:
> > > We still have an issue with this and PCI pass-through. If a guest
> is
> > > restarted while using PCI pass-through and igb_uio issues a
> > > pci_reset_function(), this causes the host to crash.
> >
> > Please, could you better explain the exact scenario and the cause of
> the crash?
> > Thanks
> >
> >
>
>
On 11/7/2017 12:47 PM, Chas Williams wrote:
> I will confess I haven't looked into the issue too hard since I have a
> workaround. My first guess is that there is something going on with the IOMMU
> and quiescing a PCI pass-through device/function from the guest (since I don't
> think the IOMMU is "visible" to the guest) seems iffy.
>
> Most devices have some sort of reset to put the device into a known state for
> setup/configuration (or enable/disable for the DMA engines). If this is done at
> .dev_close(), shouldn't that be as sufficient as resetting the function?
This is for the cases DPDK app terminated unexpectedly, proper exit path already
does cleanup.
>
> On Tue, Nov 7, 2017 at 1:49 PM, Ferruh Yigit <ferruh.yigit@intel.com
> <mailto:ferruh.yigit@intel.com>> wrote:
>
> On 11/7/2017 10:12 AM, Chas Williams wrote:
> > Environment: Dell PowerEdge R730, Intel Corporation 82599ES 10-Gigabit
> SFI/SFP+
> > Network Connection shared via PCI pass-through
> > Host: Debian 8
> > Guest: Custom Debian 8 with DPDK application based on 17.11
> >
> > When we shutdown the guest, the kernel panics with:
> >
> > [ 279.021818] Do you have a strange power saving mode enabled?
> > [ 279.021819] Dazed and confused, but trying to continue
> > [ 279.021847] {1}[Hardware Error]: Hardware error from APEI Generic Hardware
> > Error Source: 3
> > [ 279.021849] {1}[Hardware Error]: event severity: fatal
> > [ 279.021850] {1}[Hardware Error]: Error 0, type: fatal
> > [ 279.021851] {1}[Hardware Error]: section_type: PCIe error
> > [ 279.021852] {1}[Hardware Error]: port_type: 0, PCIe end point
> > [ 279.021853] {1}[Hardware Error]: version: 1.16
> > [ 279.021854] {1}[Hardware Error]: command: 0x0507, status: 0x4010
> > [ 279.021855] {1}[Hardware Error]: device_id: 0000:03:00.0
> > [ 279.021855] {1}[Hardware Error]: slot: 0
> > [ 279.021856] {1}[Hardware Error]: secondary_bus: 0x00
> > [ 279.021857] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x10fb
> > [ 279.021858] {1}[Hardware Error]: class_code: 000002
> > [ 279.021859] Kernel panic - not syncing: Fatal hardware error!
> > [ 279.021977] sched: Unexpected reschedule of offline CPU#1!
> > [ 279.021984] ------------[ cut here ]------------
> > [ 279.021992] WARNING: CPU: 43 PID: 2807 at
> > /build/linux-fHlJSJ/linux-4.12.6/arch/x86/kernel/smp.c:128
> > native_smp_send_reschedule+0x34/0x40
> > [ 279.021993] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio
> > openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4
> > nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c crc32c_generic nfsd
> > nfs_aclr
> > pcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc
> fscache tun
> > intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp
> kvm_intel kvm
> > irqbypass mgag200 ttm drm_kms_helper drm joydev crct10dif_pclmul crc32_pclmu
> > l ghash_clmulni_intel i2c_algo_bit ipmi_si ipmi_devintf iTCO_wdt intel_cstate
> > iTCO_vendor_support evdev intel_uncore mxm_wmi lpc_ich ipmi_msghandler
> mfd_core
> > ioatdma intel_rapl_perf dcdbas pcspkr shpchp mei_me button wmi mei
> acpi_power_m
> > eter tpm_crb autofs4 ext4 crc16 jbd2 fscrypto mbcache sr_mod cdrom sg
> > hid_generic usbhid hid sd_mod
> > [ 279.022044] crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd
> > glue_helper ahci ehci_pci libahci ehci_hcd ixgbe libata megaraid_sas
> usbcore dca
> > i40e usb_common ptp pps_core scsi_mod mdio
> > [ 279.022060] CPU: 43 PID: 2807 Comm: revalidator85 Not tainted
> 4.12.0-1-amd64
> > #1 Debian 4.12.6-1
> > [ 279.022061] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.3.4
> 11/08/2016
> > [ 279.022062] task: ffff91d0473f7100 task.stack: ffffafef8f4a4000
> > [ 279.022066] RIP: 0010:native_smp_send_reschedule+0x34/0x40
> > [ 279.022067] RSP: 0018:ffffafef8f4a7c98 EFLAGS: 00010082
> > [ 279.022069] RAX: 000000000000002e RBX: ffff91d059d24080 RCX:
> 0000000000000001
> > [ 279.022070] RDX: 0000000000000000 RSI: 0000000000000002 RDI:
> 0000000000000046
> > [ 279.022071] RBP: ffff91d04691d100 R08: 0000000000000000 R09:
> 000000000000002e
> > [ 279.022072] R10: ffffafef8f4a7c90 R11: 00000000001cbb78 R12:
> ffff91d85d21ae80
> > [ 279.022073] R13: ffff91d059d24000 R14: 0000000000000002 R15:
> 0000000000000008
> > [ 279.022075] FS: 00007f726affd700(0000) GS:ffff91d85d740000(0000)
> > knlGS:0000000000000000
> > [ 279.022076] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 279.022077] CR2: 00007fd422a52c48 CR3: 000000042d90f000 CR4:
> 00000000003426e0
> > [ 279.022078] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> > [ 279.022079] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> > [ 279.022080] Call Trace:
> > [ 279.022086] ? check_preempt_wakeup+0x181/0x220
> > [ 279.022091] ? check_preempt_curr+0x74/0x80
> > [ 279.022094] ? ttwu_do_wakeup+0x19/0x140
> > [ 279.022098] ? try_to_wake_up+0x1b8/0x470
> > [ 279.022101] ? wake_up_q+0x3f/0x70
> > [ 279.022106] ? futex_wake+0x15a/0x170
> > [ 279.022108] ? do_futex+0x2df/0xa90
> > [ 279.022111] ? SyS_futex+0x7a/0x170
> > [ 279.022113] ? SyS_read+0x76/0xc0
> > [ 279.022118] ? system_call_fast_compare_end+0xc/0x97
> > [ 279.022119] Code: a3 05 51 fb cc 00 73 15 48 8b 05 28 74 a3 00 be fd 00
> 00 00
> > 48 8b 80 a0 00 00 00 ff e0 89 fe 48 c7 c7 88 5c de b6 e8 e2 c9 13 00 <0f>
> ff c3
> > 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 05 5d 00
> > [ 279.022151] ---[ end trace eddc980dc8648163 ]---
> > [ 279.454274] Kernel Offset: 0x35400000 from 0xffffffff81000000 (relocation
> > range: 0xffffffff80000000-0xffffffffbfffffff)
> >
> > The test engineer says this doesn't happen if we use SRIOV (which makes sense
> > since the device isn't directly shared between the guest and the host). If I
> > remove the pci_reset_function() from igb_uio's .release, then all is well.
>
> This was tougher than expected, so many unexpected behavior. Why resetting
> pass-through device in guest cause a crash in the host?
>
> Finally, I will send a patch to remove the reset. Hopefully no more surprises
> for release.
>
> Still there will remain two improvement in igb_uio for better security,
> disabling device interrupt on exit and clear master on exit.
>
> >
> >
> > On Tue, Nov 7, 2017 at 8:02 AM, Thomas Monjalon <thomas@monjalon.net <mailto:thomas@monjalon.net>
> > <mailto:thomas@monjalon.net <mailto:thomas@monjalon.net>>> wrote:
> >
> > 07/11/2017 12:50, Chas Williams:
> > > We still have an issue with this and PCI pass-through. If a guest is
> > > restarted while using PCI pass-through and igb_uio issues a
> > > pci_reset_function(), this causes the host to crash.
> >
> > Please, could you better explain the exact scenario and the cause of
> the crash?
> > Thanks
> >
> >
>
>
On Tue, Nov 7, 2017 at 5:26 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> On 11/7/2017 12:47 PM, Chas Williams wrote:
> > I will confess I haven't looked into the issue too hard since I have a
> > workaround. My first guess is that there is something going on with the
> IOMMU
> > and quiescing a PCI pass-through device/function from the guest (since I
> don't
> > think the IOMMU is "visible" to the guest) seems iffy.
> >
> > Most devices have some sort of reset to put the device into a known
> state for
> > setup/configuration (or enable/disable for the DMA engines). If this is
> done at
> > .dev_close(), shouldn't that be as sufficient as resetting the function?
>
> This is for the cases DPDK app terminated unexpectedly, proper exit path
> already
> does cleanup.
>
Call a usermode helper from igb_uio that does an open/close on the device
about to be released?
>
> >
> > On Tue, Nov 7, 2017 at 1:49 PM, Ferruh Yigit <ferruh.yigit@intel.com
> > <mailto:ferruh.yigit@intel.com>> wrote:
> >
> > On 11/7/2017 10:12 AM, Chas Williams wrote:
> > > Environment: Dell PowerEdge R730, Intel Corporation 82599ES
> 10-Gigabit
> > SFI/SFP+
> > > Network Connection shared via PCI pass-through
> > > Host: Debian 8
> > > Guest: Custom Debian 8 with DPDK application based on 17.11
> > >
> > > When we shutdown the guest, the kernel panics with:
> > >
> > > [ 279.021818] Do you have a strange power saving mode enabled?
> > > [ 279.021819] Dazed and confused, but trying to continue
> > > [ 279.021847] {1}[Hardware Error]: Hardware error from APEI
> Generic Hardware
> > > Error Source: 3
> > > [ 279.021849] {1}[Hardware Error]: event severity: fatal
> > > [ 279.021850] {1}[Hardware Error]: Error 0, type: fatal
> > > [ 279.021851] {1}[Hardware Error]: section_type: PCIe error
> > > [ 279.021852] {1}[Hardware Error]: port_type: 0, PCIe end point
> > > [ 279.021853] {1}[Hardware Error]: version: 1.16
> > > [ 279.021854] {1}[Hardware Error]: command: 0x0507, status:
> 0x4010
> > > [ 279.021855] {1}[Hardware Error]: device_id: 0000:03:00.0
> > > [ 279.021855] {1}[Hardware Error]: slot: 0
> > > [ 279.021856] {1}[Hardware Error]: secondary_bus: 0x00
> > > [ 279.021857] {1}[Hardware Error]: vendor_id: 0x8086,
> device_id: 0x10fb
> > > [ 279.021858] {1}[Hardware Error]: class_code: 000002
> > > [ 279.021859] Kernel panic - not syncing: Fatal hardware error!
> > > [ 279.021977] sched: Unexpected reschedule of offline CPU#1!
> > > [ 279.021984] ------------[ cut here ]------------
> > > [ 279.021992] WARNING: CPU: 43 PID: 2807 at
> > > /build/linux-fHlJSJ/linux-4.12.6/arch/x86/kernel/smp.c:128
> > > native_smp_send_reschedule+0x34/0x40
> > > [ 279.021993] Modules linked in: vfio_pci vfio_virqfd
> vfio_iommu_type1 vfio
> > > openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4
> nf_defrag_ipv4
> > > nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c
> crc32c_generic nfsd
> > > nfs_aclr
> > > pcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace
> sunrpc
> > fscache tun
> > > intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp
> > kvm_intel kvm
> > > irqbypass mgag200 ttm drm_kms_helper drm joydev crct10dif_pclmul
> crc32_pclmu
> > > l ghash_clmulni_intel i2c_algo_bit ipmi_si ipmi_devintf iTCO_wdt
> intel_cstate
> > > iTCO_vendor_support evdev intel_uncore mxm_wmi lpc_ich
> ipmi_msghandler
> > mfd_core
> > > ioatdma intel_rapl_perf dcdbas pcspkr shpchp mei_me button wmi mei
> > acpi_power_m
> > > eter tpm_crb autofs4 ext4 crc16 jbd2 fscrypto mbcache sr_mod cdrom
> sg
> > > hid_generic usbhid hid sd_mod
> > > [ 279.022044] crc32c_intel aesni_intel aes_x86_64 crypto_simd
> cryptd
> > > glue_helper ahci ehci_pci libahci ehci_hcd ixgbe libata
> megaraid_sas
> > usbcore dca
> > > i40e usb_common ptp pps_core scsi_mod mdio
> > > [ 279.022060] CPU: 43 PID: 2807 Comm: revalidator85 Not tainted
> > 4.12.0-1-amd64
> > > #1 Debian 4.12.6-1
> > > [ 279.022061] Hardware name: Dell Inc. PowerEdge R730/0WCJNT,
> BIOS 2.3.4
> > 11/08/2016
> > > [ 279.022062] task: ffff91d0473f7100 task.stack: ffffafef8f4a4000
> > > [ 279.022066] RIP: 0010:native_smp_send_reschedule+0x34/0x40
> > > [ 279.022067] RSP: 0018:ffffafef8f4a7c98 EFLAGS: 00010082
> > > [ 279.022069] RAX: 000000000000002e RBX: ffff91d059d24080 RCX:
> > 0000000000000001
> > > [ 279.022070] RDX: 0000000000000000 RSI: 0000000000000002 RDI:
> > 0000000000000046
> > > [ 279.022071] RBP: ffff91d04691d100 R08: 0000000000000000 R09:
> > 000000000000002e
> > > [ 279.022072] R10: ffffafef8f4a7c90 R11: 00000000001cbb78 R12:
> > ffff91d85d21ae80
> > > [ 279.022073] R13: ffff91d059d24000 R14: 0000000000000002 R15:
> > 0000000000000008
> > > [ 279.022075] FS: 00007f726affd700(0000)
> GS:ffff91d85d740000(0000)
> > > knlGS:0000000000000000
> > > [ 279.022076] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 279.022077] CR2: 00007fd422a52c48 CR3: 000000042d90f000 CR4:
> > 00000000003426e0
> > > [ 279.022078] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > > [ 279.022079] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > > [ 279.022080] Call Trace:
> > > [ 279.022086] ? check_preempt_wakeup+0x181/0x220
> > > [ 279.022091] ? check_preempt_curr+0x74/0x80
> > > [ 279.022094] ? ttwu_do_wakeup+0x19/0x140
> > > [ 279.022098] ? try_to_wake_up+0x1b8/0x470
> > > [ 279.022101] ? wake_up_q+0x3f/0x70
> > > [ 279.022106] ? futex_wake+0x15a/0x170
> > > [ 279.022108] ? do_futex+0x2df/0xa90
> > > [ 279.022111] ? SyS_futex+0x7a/0x170
> > > [ 279.022113] ? SyS_read+0x76/0xc0
> > > [ 279.022118] ? system_call_fast_compare_end+0xc/0x97
> > > [ 279.022119] Code: a3 05 51 fb cc 00 73 15 48 8b 05 28 74 a3 00
> be fd 00
> > 00 00
> > > 48 8b 80 a0 00 00 00 ff e0 89 fe 48 c7 c7 88 5c de b6 e8 e2 c9 13
> 00 <0f>
> > ff c3
> > > 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 05 5d 00
> > > [ 279.022151] ---[ end trace eddc980dc8648163 ]---
> > > [ 279.454274] Kernel Offset: 0x35400000 from 0xffffffff81000000
> (relocation
> > > range: 0xffffffff80000000-0xffffffffbfffffff)
> > >
> > > The test engineer says this doesn't happen if we use SRIOV (which
> makes sense
> > > since the device isn't directly shared between the guest and the
> host). If I
> > > remove the pci_reset_function() from igb_uio's .release, then all
> is well.
> >
> > This was tougher than expected, so many unexpected behavior. Why
> resetting
> > pass-through device in guest cause a crash in the host?
> >
> > Finally, I will send a patch to remove the reset. Hopefully no more
> surprises
> > for release.
> >
> > Still there will remain two improvement in igb_uio for better
> security,
> > disabling device interrupt on exit and clear master on exit.
> >
> > >
> > >
> > > On Tue, Nov 7, 2017 at 8:02 AM, Thomas Monjalon <
> thomas@monjalon.net <mailto:thomas@monjalon.net>
> > > <mailto:thomas@monjalon.net <mailto:thomas@monjalon.net>>> wrote:
> > >
> > > 07/11/2017 12:50, Chas Williams:
> > > > We still have an issue with this and PCI pass-through. If a
> guest is
> > > > restarted while using PCI pass-through and igb_uio issues a
> > > > pci_reset_function(), this causes the host to crash.
> > >
> > > Please, could you better explain the exact scenario and the
> cause of
> > the crash?
> > > Thanks
> > >
> > >
> >
> >
>
>
Hello,
There are some AWS R3.8XLARGE instances
that fail to bind Intel 10G VFs with igb_uio [c05cb4f939082].
System dmeg log show this backtrace:
igb_uio: probe of 0000:00:05.0 failed with error -16
IRQ handler type mismatch for IRQ 0
current handler: timer
Pid: 3619, comm: bash Not tainted 2.6.32-642.15.1.el6.x86_64 #1
Call Trace:
[<ffffffff810f49e2>] ? __setup_irq+0x382/0x3c0
[<ffffffffa03202a0>] ? uio_interrupt+0x0/0x48 [uio]
[<ffffffff810f51e3>] ? request_threaded_irq+0x133/0x230
[<ffffffffa0320193>] ? __uio_register_device+0x553/0x610 [uio]
[<ffffffffa032698f>] ? igbuio_pci_probe+0x290/0x47a [igb_uio]
[<ffffffff8129d00a>] ? kobject_get+0x1a/0x30
[<ffffffff812c04f7>] ? local_pci_probe+0x17/0x20
[<ffffffff812c16e1>] ? pci_device_probe+0x101/0x120
[<ffffffff81382152>] ? driver_sysfs_add+0x62/0x90
[<ffffffff813823fa>] ? driver_probe_device+0xaa/0x3a0
[<ffffffff8138153a>] ? driver_bind+0xca/0x110
[<ffffffff813805dc>] ? drv_attr_store+0x2c/0x30
[<ffffffff812171c5>] ? sysfs_write_file+0xe5/0x170
[<ffffffff81199e48>] ? vfs_write+0xb8/0x1a0
[<ffffffff8119b336>] ? fget_light_pos+0x16/0x50
[<ffffffff8119a981>] ? sys_write+0x51/0xb0
The VFs can be returned back to kernel ixgbevf driver with no faults.
The instances can bind VFs with igb_uio[b58eedfc7dd57]
I could not find yet why some R3.8XLARGE instances can bind IXGBE VFs with
igb_uio while other fail
lspci -vvv -s 0000:00:05.0
00:05.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)
Physical Slot: 5
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64
Region 0: Memory at f3010000 (64-bit, prefetchable) [size=16K]
Region 3: Memory at f3014000 (64-bit, prefetchable) [size=16K]
Capabilities: [70] MSI-X: Enable+ Count=3 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
Kernel driver in use: ixgbevf
Kernel modules: ixgbevf
Regards,
Gregory
On 11/8/2017 4:00 AM, Chas Williams wrote:
>
>
> On Tue, Nov 7, 2017 at 5:26 PM, Ferruh Yigit <ferruh.yigit@intel.com
> <mailto:ferruh.yigit@intel.com>> wrote:
>
> On 11/7/2017 12:47 PM, Chas Williams wrote:
> > I will confess I haven't looked into the issue too hard since I have a
> > workaround. My first guess is that there is something going on with the IOMMU
> > and quiescing a PCI pass-through device/function from the guest (since I don't
> > think the IOMMU is "visible" to the guest) seems iffy.
> >
> > Most devices have some sort of reset to put the device into a known state for
> > setup/configuration (or enable/disable for the DMA engines). If this is done at
> > .dev_close(), shouldn't that be as sufficient as resetting the function?
>
> This is for the cases DPDK app terminated unexpectedly, proper exit path already
> does cleanup.
>
>
> Call a usermode helper from igb_uio that does an open/close on the device about
> to be released?
Can a generic userspace code know how to cleaup various devices?
I guess driver required for this work and dpdk application that has drivers
already exit in that stage.
>
>
>
> >
> > On Tue, Nov 7, 2017 at 1:49 PM, Ferruh Yigit <ferruh.yigit@intel.com <mailto:ferruh.yigit@intel.com>
> > <mailto:ferruh.yigit@intel.com <mailto:ferruh.yigit@intel.com>>> wrote:
> >
> > On 11/7/2017 10:12 AM, Chas Williams wrote:
> > > Environment: Dell PowerEdge R730, Intel Corporation 82599ES 10-Gigabit
> > SFI/SFP+
> > > Network Connection shared via PCI pass-through
> > > Host: Debian 8
> > > Guest: Custom Debian 8 with DPDK application based on 17.11
> > >
> > > When we shutdown the guest, the kernel panics with:
> > >
> > > [ 279.021818] Do you have a strange power saving mode enabled?
> > > [ 279.021819] Dazed and confused, but trying to continue
> > > [ 279.021847] {1}[Hardware Error]: Hardware error from APEI Generic
> Hardware
> > > Error Source: 3
> > > [ 279.021849] {1}[Hardware Error]: event severity: fatal
> > > [ 279.021850] {1}[Hardware Error]: Error 0, type: fatal
> > > [ 279.021851] {1}[Hardware Error]: section_type: PCIe error
> > > [ 279.021852] {1}[Hardware Error]: port_type: 0, PCIe end point
> > > [ 279.021853] {1}[Hardware Error]: version: 1.16
> > > [ 279.021854] {1}[Hardware Error]: command: 0x0507, status: 0x4010
> > > [ 279.021855] {1}[Hardware Error]: device_id: 0000:03:00.0
> > > [ 279.021855] {1}[Hardware Error]: slot: 0
> > > [ 279.021856] {1}[Hardware Error]: secondary_bus: 0x00
> > > [ 279.021857] {1}[Hardware Error]: vendor_id: 0x8086, device_id:
> 0x10fb
> > > [ 279.021858] {1}[Hardware Error]: class_code: 000002
> > > [ 279.021859] Kernel panic - not syncing: Fatal hardware error!
> > > [ 279.021977] sched: Unexpected reschedule of offline CPU#1!
> > > [ 279.021984] ------------[ cut here ]------------
> > > [ 279.021992] WARNING: CPU: 43 PID: 2807 at
> > > /build/linux-fHlJSJ/linux-4.12.6/arch/x86/kernel/smp.c:128
> > > native_smp_send_reschedule+0x34/0x40
> > > [ 279.021993] Modules linked in: vfio_pci vfio_virqfd
> vfio_iommu_type1 vfio
> > > openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4
> nf_defrag_ipv4
> > > nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c
> crc32c_generic nfsd
> > > nfs_aclr
> > > pcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc
> > fscache tun
> > > intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp
> > kvm_intel kvm
> > > irqbypass mgag200 ttm drm_kms_helper drm joydev crct10dif_pclmul
> crc32_pclmu
> > > l ghash_clmulni_intel i2c_algo_bit ipmi_si ipmi_devintf iTCO_wdt
> intel_cstate
> > > iTCO_vendor_support evdev intel_uncore mxm_wmi lpc_ich ipmi_msghandler
> > mfd_core
> > > ioatdma intel_rapl_perf dcdbas pcspkr shpchp mei_me button wmi mei
> > acpi_power_m
> > > eter tpm_crb autofs4 ext4 crc16 jbd2 fscrypto mbcache sr_mod cdrom sg
> > > hid_generic usbhid hid sd_mod
> > > [ 279.022044] crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd
> > > glue_helper ahci ehci_pci libahci ehci_hcd ixgbe libata megaraid_sas
> > usbcore dca
> > > i40e usb_common ptp pps_core scsi_mod mdio
> > > [ 279.022060] CPU: 43 PID: 2807 Comm: revalidator85 Not tainted
> > 4.12.0-1-amd64
> > > #1 Debian 4.12.6-1
> > > [ 279.022061] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS
> 2.3.4
> > 11/08/2016
> > > [ 279.022062] task: ffff91d0473f7100 task.stack: ffffafef8f4a4000
> > > [ 279.022066] RIP: 0010:native_smp_send_reschedule+0x34/0x40
> > > [ 279.022067] RSP: 0018:ffffafef8f4a7c98 EFLAGS: 00010082
> > > [ 279.022069] RAX: 000000000000002e RBX: ffff91d059d24080 RCX:
> > 0000000000000001
> > > [ 279.022070] RDX: 0000000000000000 RSI: 0000000000000002 RDI:
> > 0000000000000046
> > > [ 279.022071] RBP: ffff91d04691d100 R08: 0000000000000000 R09:
> > 000000000000002e
> > > [ 279.022072] R10: ffffafef8f4a7c90 R11: 00000000001cbb78 R12:
> > ffff91d85d21ae80
> > > [ 279.022073] R13: ffff91d059d24000 R14: 0000000000000002 R15:
> > 0000000000000008
> > > [ 279.022075] FS: 00007f726affd700(0000) GS:ffff91d85d740000(0000)
> > > knlGS:0000000000000000
> > > [ 279.022076] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 279.022077] CR2: 00007fd422a52c48 CR3: 000000042d90f000 CR4:
> > 00000000003426e0
> > > [ 279.022078] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > > [ 279.022079] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > > [ 279.022080] Call Trace:
> > > [ 279.022086] ? check_preempt_wakeup+0x181/0x220
> > > [ 279.022091] ? check_preempt_curr+0x74/0x80
> > > [ 279.022094] ? ttwu_do_wakeup+0x19/0x140
> > > [ 279.022098] ? try_to_wake_up+0x1b8/0x470
> > > [ 279.022101] ? wake_up_q+0x3f/0x70
> > > [ 279.022106] ? futex_wake+0x15a/0x170
> > > [ 279.022108] ? do_futex+0x2df/0xa90
> > > [ 279.022111] ? SyS_futex+0x7a/0x170
> > > [ 279.022113] ? SyS_read+0x76/0xc0
> > > [ 279.022118] ? system_call_fast_compare_end+0xc/0x97
> > > [ 279.022119] Code: a3 05 51 fb cc 00 73 15 48 8b 05 28 74 a3 00 be
> fd 00
> > 00 00
> > > 48 8b 80 a0 00 00 00 ff e0 89 fe 48 c7 c7 88 5c de b6 e8 e2 c9 13 00
> <0f>
> > ff c3
> > > 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 05 5d 00
> > > [ 279.022151] ---[ end trace eddc980dc8648163 ]---
> > > [ 279.454274] Kernel Offset: 0x35400000 from 0xffffffff81000000
> (relocation
> > > range: 0xffffffff80000000-0xffffffffbfffffff)
> > >
> > > The test engineer says this doesn't happen if we use SRIOV (which
> makes sense
> > > since the device isn't directly shared between the guest and the
> host). If I
> > > remove the pci_reset_function() from igb_uio's .release, then all is
> well.
> >
> > This was tougher than expected, so many unexpected behavior. Why resetting
> > pass-through device in guest cause a crash in the host?
> >
> > Finally, I will send a patch to remove the reset. Hopefully no more
> surprises
> > for release.
> >
> > Still there will remain two improvement in igb_uio for better security,
> > disabling device interrupt on exit and clear master on exit.
> >
> > >
> > >
> > > On Tue, Nov 7, 2017 at 8:02 AM, Thomas Monjalon <thomas@monjalon.net
> <mailto:thomas@monjalon.net> <mailto:thomas@monjalon.net
> <mailto:thomas@monjalon.net>>
> > > <mailto:thomas@monjalon.net <mailto:thomas@monjalon.net>
> <mailto:thomas@monjalon.net <mailto:thomas@monjalon.net>>>> wrote:
> > >
> > > 07/11/2017 12:50, Chas Williams:
> > > > We still have an issue with this and PCI pass-through. If a
> guest is
> > > > restarted while using PCI pass-through and igb_uio issues a
> > > > pci_reset_function(), this causes the host to crash.
> > >
> > > Please, could you better explain the exact scenario and the cause of
> > the crash?
> > > Thanks
> > >
> > >
> >
> >
>
>
On 11/9/2017 9:20 AM, Gregory Etelson wrote:
> Hello,
>
> There are some AWS R3.8XLARGE instances
> that fail to bind Intel 10G VFs with igb_uio [c05cb4f939082].
Hi Gregory,
Will you dig this issue more? Please keep us updated.
> System dmeg log show this backtrace:
>
> igb_uio: probe of 0000:00:05.0 failed with error -16
> IRQ handler type mismatch for IRQ 0
> current handler: timer
> Pid: 3619, comm: bash Not tainted 2.6.32-642.15.1.el6.x86_64 #1
> Call Trace:
> [<ffffffff810f49e2>] ? __setup_irq+0x382/0x3c0
> [<ffffffffa03202a0>] ? uio_interrupt+0x0/0x48 [uio]
> [<ffffffff810f51e3>] ? request_threaded_irq+0x133/0x230
> [<ffffffffa0320193>] ? __uio_register_device+0x553/0x610 [uio]
> [<ffffffffa032698f>] ? igbuio_pci_probe+0x290/0x47a [igb_uio]
> [<ffffffff8129d00a>] ? kobject_get+0x1a/0x30
> [<ffffffff812c04f7>] ? local_pci_probe+0x17/0x20
> [<ffffffff812c16e1>] ? pci_device_probe+0x101/0x120
> [<ffffffff81382152>] ? driver_sysfs_add+0x62/0x90
> [<ffffffff813823fa>] ? driver_probe_device+0xaa/0x3a0
> [<ffffffff8138153a>] ? driver_bind+0xca/0x110
> [<ffffffff813805dc>] ? drv_attr_store+0x2c/0x30
> [<ffffffff812171c5>] ? sysfs_write_file+0xe5/0x170
> [<ffffffff81199e48>] ? vfs_write+0xb8/0x1a0
> [<ffffffff8119b336>] ? fget_light_pos+0x16/0x50
> [<ffffffff8119a981>] ? sys_write+0x51/0xb0
>
> The VFs can be returned back to kernel ixgbevf driver with no faults.
>
> The instances can bind VFs with igb_uio[b58eedfc7dd57]
>
> I could not find yet why some R3.8XLARGE instances can bind IXGBE VFs with
> igb_uio while other fail
>
> lspci -vvv -s 0000:00:05.0
> 00:05.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller
> Virtual Function (rev 01)
> Physical Slot: 5
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 64
> Region 0: Memory at f3010000 (64-bit, prefetchable) [size=16K]
> Region 3: Memory at f3014000 (64-bit, prefetchable) [size=16K]
> Capabilities: [70] MSI-X: Enable+ Count=3 Masked-
> Vector table: BAR=3 offset=00000000
> PBA: BAR=3 offset=00002000
> Kernel driver in use: ixgbevf
> Kernel modules: ixgbevf
>
> Regards,
> Gregory
>
On 11/9/2017 5:42 PM, Ferruh Yigit wrote:
> On 11/9/2017 9:20 AM, Gregory Etelson wrote:
>> Hello,
>>
>> There are some AWS R3.8XLARGE instances
>> that fail to bind Intel 10G VFs with igb_uio [c05cb4f939082].
>
> Hi Gregory,
>
> Will you dig this issue more? Please keep us updated.
>
>> System dmeg log show this backtrace:
>>
>> igb_uio: probe of 0000:00:05.0 failed with error -16
>> IRQ handler type mismatch for IRQ 0
>> current handler: timer
>> Pid: 3619, comm: bash Not tainted 2.6.32-642.15.1.el6.x86_64 #1
>> Call Trace:
>> [<ffffffff810f49e2>] ? __setup_irq+0x382/0x3c0
>> [<ffffffffa03202a0>] ? uio_interrupt+0x0/0x48 [uio]
>> [<ffffffff810f51e3>] ? request_threaded_irq+0x133/0x230
>> [<ffffffffa0320193>] ? __uio_register_device+0x553/0x610 [uio]
>> [<ffffffffa032698f>] ? igbuio_pci_probe+0x290/0x47a [igb_uio]
Here igb_uio probe() calls request_irq(), this behavior changed in latest code.
Can you please double check that you are using latest code?
Thanks,
ferruh
>> [<ffffffff8129d00a>] ? kobject_get+0x1a/0x30
>> [<ffffffff812c04f7>] ? local_pci_probe+0x17/0x20
>> [<ffffffff812c16e1>] ? pci_device_probe+0x101/0x120
>> [<ffffffff81382152>] ? driver_sysfs_add+0x62/0x90
>> [<ffffffff813823fa>] ? driver_probe_device+0xaa/0x3a0
>> [<ffffffff8138153a>] ? driver_bind+0xca/0x110
>> [<ffffffff813805dc>] ? drv_attr_store+0x2c/0x30
>> [<ffffffff812171c5>] ? sysfs_write_file+0xe5/0x170
>> [<ffffffff81199e48>] ? vfs_write+0xb8/0x1a0
>> [<ffffffff8119b336>] ? fget_light_pos+0x16/0x50
>> [<ffffffff8119a981>] ? sys_write+0x51/0xb0
>>
>> The VFs can be returned back to kernel ixgbevf driver with no faults.
>>
>> The instances can bind VFs with igb_uio[b58eedfc7dd57]
>>
>> I could not find yet why some R3.8XLARGE instances can bind IXGBE VFs with
>> igb_uio while other fail
>>
>> lspci -vvv -s 0000:00:05.0
>> 00:05.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller
>> Virtual Function (rev 01)
>> Physical Slot: 5
>> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>> Stepping- SERR- FastB2B- DisINTx+
>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>> Latency: 64
>> Region 0: Memory at f3010000 (64-bit, prefetchable) [size=16K]
>> Region 3: Memory at f3014000 (64-bit, prefetchable) [size=16K]
>> Capabilities: [70] MSI-X: Enable+ Count=3 Masked-
>> Vector table: BAR=3 offset=00000000
>> PBA: BAR=3 offset=00002000
>> Kernel driver in use: ixgbevf
>> Kernel modules: ixgbevf
>>
>> Regards,
>> Gregory
>>
>
It looks like igb_uio bind failed on servers running CentOS-6.x
Servers with CentOS-7.3 Ubuntu-14, Ubuntu-16 and AWS-1703 (Amazon Linux)
had no bind issues
Regards,
Gregory
On Fri, Nov 10, 2017 at 3:42 AM, Ferruh Yigit <ferruh.yigit@intel.com>
wrote:
> On 11/9/2017 9:20 AM, Gregory Etelson wrote:
> > Hello,
> >
> > There are some AWS R3.8XLARGE instances
> > that fail to bind Intel 10G VFs with igb_uio [c05cb4f939082].
>
> Hi Gregory,
>
> Will you dig this issue more? Please keep us updated.
>
> > System dmeg log show this backtrace:
> >
> > igb_uio: probe of 0000:00:05.0 failed with error -16
> > IRQ handler type mismatch for IRQ 0
> > current handler: timer
> > Pid: 3619, comm: bash Not tainted 2.6.32-642.15.1.el6.x86_64 #1
> > Call Trace:
> > [<ffffffff810f49e2>] ? __setup_irq+0x382/0x3c0
> > [<ffffffffa03202a0>] ? uio_interrupt+0x0/0x48 [uio]
> > [<ffffffff810f51e3>] ? request_threaded_irq+0x133/0x230
> > [<ffffffffa0320193>] ? __uio_register_device+0x553/0x610 [uio]
> > [<ffffffffa032698f>] ? igbuio_pci_probe+0x290/0x47a [igb_uio]
> > [<ffffffff8129d00a>] ? kobject_get+0x1a/0x30
> > [<ffffffff812c04f7>] ? local_pci_probe+0x17/0x20
> > [<ffffffff812c16e1>] ? pci_device_probe+0x101/0x120
> > [<ffffffff81382152>] ? driver_sysfs_add+0x62/0x90
> > [<ffffffff813823fa>] ? driver_probe_device+0xaa/0x3a0
> > [<ffffffff8138153a>] ? driver_bind+0xca/0x110
> > [<ffffffff813805dc>] ? drv_attr_store+0x2c/0x30
> > [<ffffffff812171c5>] ? sysfs_write_file+0xe5/0x170
> > [<ffffffff81199e48>] ? vfs_write+0xb8/0x1a0
> > [<ffffffff8119b336>] ? fget_light_pos+0x16/0x50
> > [<ffffffff8119a981>] ? sys_write+0x51/0xb0
> >
> > The VFs can be returned back to kernel ixgbevf driver with no faults.
> >
> > The instances can bind VFs with igb_uio[b58eedfc7dd57]
> >
> > I could not find yet why some R3.8XLARGE instances can bind IXGBE VFs
> with
> > igb_uio while other fail
> >
> > lspci -vvv -s 0000:00:05.0
> > 00:05.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller
> > Virtual Function (rev 01)
> > Physical Slot: 5
> > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr-
> > Stepping- SERR- FastB2B- DisINTx+
> > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > Latency: 64
> > Region 0: Memory at f3010000 (64-bit, prefetchable) [size=16K]
> > Region 3: Memory at f3014000 (64-bit, prefetchable) [size=16K]
> > Capabilities: [70] MSI-X: Enable+ Count=3 Masked-
> > Vector table: BAR=3 offset=00000000
> > PBA: BAR=3 offset=00002000
> > Kernel driver in use: ixgbevf
> > Kernel modules: ixgbevf
> >
> > Regards,
> > Gregory
> >
>
>
On Wed, 8 Nov 2017 07:00:23 -0500
Chas Williams <3chas3@gmail.com> wrote:
> On Tue, Nov 7, 2017 at 5:26 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> On 11/7/2017 12:47 PM, Chas Williams wrote:
> > I will confess I haven't looked into the issue too hard since I have a
> > workaround. My first guess is that there is something going on with the IOMMU
> > and quiescing a PCI pass-through device/function from the guest (since I don't
> > think the IOMMU is "visible" to the guest) seems iffy.
> >
> > Most devices have some sort of reset to put the device into a known state for
> > setup/configuration (or enable/disable for the DMA engines). If this is done at
> > .dev_close(), shouldn't that be as sufficient as resetting the function?
>
> This is for the cases DPDK app terminated unexpectedly, proper exit path already
> does cleanup.
>
> Call a usermode helper from igb_uio that does an open/close on the device about to be released?
usermode helper is hated by upstream kernel developers. There are many problems
such as what namespace and security.
On 11/9/2017 10:36 PM, Gregory Etelson wrote:
> It looks like igb_uio bind failed on servers running CentOS-6.x
Hi Gregory,
Below backtrace seems coming from old code, can you please confirm that you are
using latest igb_uio?
And what is the kernel version in that boxes?
Thanks,
ferruh
> Servers with CentOS-7.3 Ubuntu-14, Ubuntu-16 and AWS-1703 (Amazon Linux)
> had no bind issues
>
> Regards,
> Gregory
>
> On Fri, Nov 10, 2017 at 3:42 AM, Ferruh Yigit <ferruh.yigit@intel.com
> <mailto:ferruh.yigit@intel.com>> wrote:
>
> On 11/9/2017 9:20 AM, Gregory Etelson wrote:
> > Hello,
> >
> > There are some AWS R3.8XLARGE instances
> > that fail to bind Intel 10G VFs with igb_uio [c05cb4f939082].
>
> Hi Gregory,
>
> Will you dig this issue more? Please keep us updated.
>
> > System dmeg log show this backtrace:
> >
> > igb_uio: probe of 0000:00:05.0 failed with error -16
> > IRQ handler type mismatch for IRQ 0
> > current handler: timer
> > Pid: 3619, comm: bash Not tainted 2.6.32-642.15.1.el6.x86_64 #1
> > Call Trace:
> > [<ffffffff810f49e2>] ? __setup_irq+0x382/0x3c0
> > [<ffffffffa03202a0>] ? uio_interrupt+0x0/0x48 [uio]
> > [<ffffffff810f51e3>] ? request_threaded_irq+0x133/0x230
> > [<ffffffffa0320193>] ? __uio_register_device+0x553/0x610 [uio]
> > [<ffffffffa032698f>] ? igbuio_pci_probe+0x290/0x47a [igb_uio]
> > [<ffffffff8129d00a>] ? kobject_get+0x1a/0x30
> > [<ffffffff812c04f7>] ? local_pci_probe+0x17/0x20
> > [<ffffffff812c16e1>] ? pci_device_probe+0x101/0x120
> > [<ffffffff81382152>] ? driver_sysfs_add+0x62/0x90
> > [<ffffffff813823fa>] ? driver_probe_device+0xaa/0x3a0
> > [<ffffffff8138153a>] ? driver_bind+0xca/0x110
> > [<ffffffff813805dc>] ? drv_attr_store+0x2c/0x30
> > [<ffffffff812171c5>] ? sysfs_write_file+0xe5/0x170
> > [<ffffffff81199e48>] ? vfs_write+0xb8/0x1a0
> > [<ffffffff8119b336>] ? fget_light_pos+0x16/0x50
> > [<ffffffff8119a981>] ? sys_write+0x51/0xb0
> >
> > The VFs can be returned back to kernel ixgbevf driver with no faults.
> >
> > The instances can bind VFs with igb_uio[b58eedfc7dd57]
> >
> > I could not find yet why some R3.8XLARGE instances can bind IXGBE VFs with
> > igb_uio while other fail
> >
> > lspci -vvv -s 0000:00:05.0
> > 00:05.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller
> > Virtual Function (rev 01)
> > Physical Slot: 5
> > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> > Stepping- SERR- FastB2B- DisINTx+
> > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > Latency: 64
> > Region 0: Memory at f3010000 (64-bit, prefetchable) [size=16K]
> > Region 3: Memory at f3014000 (64-bit, prefetchable) [size=16K]
> > Capabilities: [70] MSI-X: Enable+ Count=3 Masked-
> > Vector table: BAR=3 offset=00000000
> > PBA: BAR=3 offset=00002000
> > Kernel driver in use: ixgbevf
> > Kernel modules: ixgbevf
> >
> > Regards,
> > Gregory
> >
>
>
Hello Ferruh,
re-checked igb_uio from dpdk master branch
0384f21dffc9081d1ae30f0a6e49926bfc4be85d
OS: CentOS release 6.7 (Final), 2.6.32-573.26.1.el6.x86_64
igb_uio: Use MSIX interrupt by default
ixgbevf: eth1: ixgbevf_remove: Remove complete
IRQ handler type mismatch for IRQ 0
current handler: timer
Pid: 7995, comm: python Not tainted 2.6.32-573.26.1.el6.x86_64 #1
Call Trace:
[<ffffffff810eee02>] ? __setup_irq+0x382/0x3c0
[<ffffffffa00212a0>] ? uio_interrupt+0x0/0x48 [uio]
[<ffffffff810ef603>] ? request_threaded_irq+0x133/0x230
[<ffffffffa0021193>] ? __uio_register_device+0x553/0x610 [uio]
[<ffffffffa003797f>] ? igbuio_pci_probe+0x290/0x47a [igb_uio]
[<ffffffff81292c7a>] ? kobject_get+0x1a/0x30
[<ffffffff812b5e37>] ? local_pci_probe+0x17/0x20
[<ffffffff812b7021>] ? pci_device_probe+0x101/0x120
[<ffffffff813748e2>] ? driver_sysfs_add+0x62/0x90
[<ffffffff81374b8a>] ? driver_probe_device+0xaa/0x3a0
[<ffffffff81374f2b>] ? __driver_attach+0xab/0xb0
[<ffffffff81374e80>] ? __driver_attach+0x0/0xb0
[<ffffffff81373d74>] ? bus_for_each_dev+0x64/0x90
[<ffffffff8137481e>] ? driver_attach+0x1e/0x20
[<ffffffff812b73c7>] ? pci_add_dynid+0xc7/0xf0
[<ffffffff812b74c2>] ? store_new_id+0xd2/0x110
[<ffffffff81372d6c>] ? drv_attr_store+0x2c/0x30
[<ffffffff8120eb15>] ? sysfs_write_file+0xe5/0x170
[<ffffffff81192208>] ? vfs_write+0xb8/0x1a0
[<ffffffff811936f6>] ? fget_light_pos+0x16/0x50
[<ffffffff81192d41>] ? sys_write+0x51/0xb0
[<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
igb_uio: probe of 0000:00:04.0 failed with error -16
IRQ handler type mismatch for IRQ 0
Regards,
Gregory
On Wed, Nov 15, 2017 at 5:44 PM, Ferruh Yigit <ferruh.yigit@intel.com>
wrote:
> On 11/9/2017 10:36 PM, Gregory Etelson wrote:
> > It looks like igb_uio bind failed on servers running CentOS-6.x
>
> Hi Gregory,
>
> Below backtrace seems coming from old code, can you please confirm that
> you are
> using latest igb_uio?
>
> And what is the kernel version in that boxes?
>
> Thanks,
> ferruh
>
> > Servers with CentOS-7.3 Ubuntu-14, Ubuntu-16 and AWS-1703 (Amazon Linux)
> > had no bind issues
> >
> > Regards,
> > Gregory
> >
> > On Fri, Nov 10, 2017 at 3:42 AM, Ferruh Yigit <ferruh.yigit@intel.com
> > <mailto:ferruh.yigit@intel.com>> wrote:
> >
> > On 11/9/2017 9:20 AM, Gregory Etelson wrote:
> > > Hello,
> > >
> > > There are some AWS R3.8XLARGE instances
> > > that fail to bind Intel 10G VFs with igb_uio [c05cb4f939082].
> >
> > Hi Gregory,
> >
> > Will you dig this issue more? Please keep us updated.
> >
> > > System dmeg log show this backtrace:
> > >
> > > igb_uio: probe of 0000:00:05.0 failed with error -16
> > > IRQ handler type mismatch for IRQ 0
> > > current handler: timer
> > > Pid: 3619, comm: bash Not tainted 2.6.32-642.15.1.el6.x86_64 #1
> > > Call Trace:
> > > [<ffffffff810f49e2>] ? __setup_irq+0x382/0x3c0
> > > [<ffffffffa03202a0>] ? uio_interrupt+0x0/0x48 [uio]
> > > [<ffffffff810f51e3>] ? request_threaded_irq+0x133/0x230
> > > [<ffffffffa0320193>] ? __uio_register_device+0x553/0x610 [uio]
> > > [<ffffffffa032698f>] ? igbuio_pci_probe+0x290/0x47a [igb_uio]
> > > [<ffffffff8129d00a>] ? kobject_get+0x1a/0x30
> > > [<ffffffff812c04f7>] ? local_pci_probe+0x17/0x20
> > > [<ffffffff812c16e1>] ? pci_device_probe+0x101/0x120
> > > [<ffffffff81382152>] ? driver_sysfs_add+0x62/0x90
> > > [<ffffffff813823fa>] ? driver_probe_device+0xaa/0x3a0
> > > [<ffffffff8138153a>] ? driver_bind+0xca/0x110
> > > [<ffffffff813805dc>] ? drv_attr_store+0x2c/0x30
> > > [<ffffffff812171c5>] ? sysfs_write_file+0xe5/0x170
> > > [<ffffffff81199e48>] ? vfs_write+0xb8/0x1a0
> > > [<ffffffff8119b336>] ? fget_light_pos+0x16/0x50
> > > [<ffffffff8119a981>] ? sys_write+0x51/0xb0
> > >
> > > The VFs can be returned back to kernel ixgbevf driver with no
> faults.
> > >
> > > The instances can bind VFs with igb_uio[b58eedfc7dd57]
> > >
> > > I could not find yet why some R3.8XLARGE instances can bind IXGBE
> VFs with
> > > igb_uio while other fail
> > >
> > > lspci -vvv -s 0000:00:05.0
> > > 00:05.0 Ethernet controller: Intel Corporation 82599 Ethernet
> Controller
> > > Virtual Function (rev 01)
> > > Physical Slot: 5
> > > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV-
> VGASnoop- ParErr-
> > > Stepping- SERR- FastB2B- DisINTx+
> > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
> >TAbort-
> > > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > Latency: 64
> > > Region 0: Memory at f3010000 (64-bit, prefetchable)
> [size=16K]
> > > Region 3: Memory at f3014000 (64-bit, prefetchable)
> [size=16K]
> > > Capabilities: [70] MSI-X: Enable+ Count=3 Masked-
> > > Vector table: BAR=3 offset=00000000
> > > PBA: BAR=3 offset=00002000
> > > Kernel driver in use: ixgbevf
> > > Kernel modules: ixgbevf
> > >
> > > Regards,
> > > Gregory
> > >
> >
> >
>
>
@@ -133,4 +133,21 @@ static bool pci_check_and_mask_intx(struct pci_dev *pdev)
#define HAVE_PCI_MSI_MASK_IRQ 1
#endif
-
+#define BROADCOM_PCI_VENDOR_ID 0x14E4
+static const struct pci_device_id no_reset_pci_tbl[] = {
+ { PCI_DEVICE(BROADCOM_PCI_VENDOR_ID, 0x168a) }, /* 57800 */
+ { PCI_DEVICE(BROADCOM_PCI_VENDOR_ID, 0x164f) }, /* 57711 */
+ { PCI_DEVICE(BROADCOM_PCI_VENDOR_ID, 0x168e) }, /* 57810 */
+ { PCI_DEVICE(BROADCOM_PCI_VENDOR_ID, 0x163d) }, /* 57811 */
+ { PCI_DEVICE(BROADCOM_PCI_VENDOR_ID, 0x168d) }, /* 57840_OBS */
+ { PCI_DEVICE(BROADCOM_PCI_VENDOR_ID, 0x16a1) }, /* 57840_4_10 */
+ { PCI_DEVICE(BROADCOM_PCI_VENDOR_ID, 0x16a2) }, /* 57840_2_20 */
+ { PCI_DEVICE(BROADCOM_PCI_VENDOR_ID, 0x16ae) }, /* 57810_MF */
+ { PCI_DEVICE(BROADCOM_PCI_VENDOR_ID, 0x163e) }, /* 57811_MF */
+ { PCI_DEVICE(BROADCOM_PCI_VENDOR_ID, 0x16a4) }, /* 57840_MF */
+ { PCI_DEVICE(BROADCOM_PCI_VENDOR_ID, 0x16a9) }, /* 57800_VF */
+ { PCI_DEVICE(BROADCOM_PCI_VENDOR_ID, 0x16af) }, /* 57810_VF */
+ { PCI_DEVICE(BROADCOM_PCI_VENDOR_ID, 0x163f) }, /* 57811_VF */
+ { PCI_DEVICE(BROADCOM_PCI_VENDOR_ID, 0x16ad) }, /* 57840_VF */
+ { 0 },
+};
@@ -348,6 +348,11 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode)
return 0;
}
+static bool is_device_excluded_from_reset(struct pci_dev *pdev)
+{
+ return !!pci_match_id(no_reset_pci_tbl, pdev);
+}
+
static int
igbuio_pci_release(struct uio_info *info, struct inode *inode)
{
@@ -360,7 +365,8 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
/* stop the device from further DMA */
pci_clear_master(dev);
- pci_reset_function(dev);
+ if (!is_device_excluded_from_reset(dev))
+ pci_reset_function(dev);
return 0;
}