[v2,10/10] kni: add API to set link status on kernel interface

Message ID 20180629015508.26599-11-dg@adax.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [v2,01/10] kni: remove unused variables from struct kni_dev |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Dan Gora June 29, 2018, 1:55 a.m. UTC
  Add a new API function to KNI, rte_kni_update_link() to allow DPDK
applications to update the link state for the KNI network interfaces
in the linux kernel.

Note that the default carrier state is set to off when the interface
is opened.

Signed-off-by: Dan Gora <dg@adax.com>
---
 kernel/linux/kni/kni_misc.c                   | 63 +++++++++++++++++++
 kernel/linux/kni/kni_net.c                    |  2 +
 .../eal/include/exec-env/rte_kni_common.h     | 19 ++++++
 lib/librte_kni/rte_kni.c                      | 37 ++++++++++-
 lib/librte_kni/rte_kni.h                      | 16 +++++
 5 files changed, 136 insertions(+), 1 deletion(-)
  

Comments

Ferruh Yigit Aug. 29, 2018, 11:48 a.m. UTC | #1
On 6/29/2018 2:55 AM, Dan Gora wrote:
> Add a new API function to KNI, rte_kni_update_link() to allow DPDK
> applications to update the link state for the KNI network interfaces
> in the linux kernel.
> 
> Note that the default carrier state is set to off when the interface
> is opened.

Why set carrier off when interface opened? Although I don't see any difference
in interface state with or without this call.

> 
> Signed-off-by: Dan Gora <dg@adax.com>

Overall looks good to me.
  
Stephen Hemminger Aug. 29, 2018, 3:54 p.m. UTC | #2
On Thu, 28 Jun 2018 18:55:08 -0700
Dan Gora <dg@adax.com> wrote:

> Add a new API function to KNI, rte_kni_update_link() to allow DPDK
> applications to update the link state for the KNI network interfaces
> in the linux kernel.
> 
> Note that the default carrier state is set to off when the interface
> is opened.
> 
> Signed-off-by: Dan Gora <dg@adax.com>

Do you really need a special ioctl for this?
There is already ability to set link state via sysfs or netlink.
  
Dan Gora Aug. 29, 2018, 9:02 p.m. UTC | #3
On Wed, Aug 29, 2018 at 12:54 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Thu, 28 Jun 2018 18:55:08 -0700
> Dan Gora <dg@adax.com> wrote:
>
>> Add a new API function to KNI, rte_kni_update_link() to allow DPDK
>> applications to update the link state for the KNI network interfaces
>> in the linux kernel.
>>
>> Note that the default carrier state is set to off when the interface
>> is opened.
>>
>> Signed-off-by: Dan Gora <dg@adax.com>
>
> Do you really need a special ioctl for this?
> There is already ability to set link state via sysfs or netlink.

I think yes.. AFAIK sysfs does not constitute a stable API; it's only
available for Linux (yes, I know KNI is linux-only currently, but
there's not really any technical reason why it can't work on BSD) and
there are already callbacks to change the MTU and MAC addresses which
could also be done via netlink.  IMHO having the kernel have an
accurate view of the link state is more important than the ability to
change the MAC address of the interface...

In our application we want the linux kernel/"normal" userspace to be
able to use the DPDK controlled interfaces like any other interface.
We need to be able to assign IP addresses to them, have them
participate in routing, etc.  Since they are controlled via our DPDK
application, there is no way for the kernel to know when the cable is
connected/removed since that information is only communicated to the
DPDK application.

The other option, which I toyed with but decided against, would be to
have a polling thread in the KNI module to call a callback into the
DPDK application to poll the link status.  However that would still
possibly leave a time period when the link is down, but the kernel
does not know about it.  I decided that it would probably be best to
just have a way for the DPDK application to inform the linux kernel
(via the KNI module) that the link was down.

It's important for the linux kernel to know about the link status if
the interface is going to be treated like any other.  Things like
assigning IP addresses and adding the interfaces to the routing table
happen automatically when the link is marked "up".  If the link is not
marked "up", or is "up" when it should be "down", then the kernel
cannot configure that interface correctly, or will use it when it
should not be.

thanks
dan
  
Dan Gora Aug. 29, 2018, 9:10 p.m. UTC | #4
On Wed, Aug 29, 2018 at 8:48 AM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> On 6/29/2018 2:55 AM, Dan Gora wrote:
>> Add a new API function to KNI, rte_kni_update_link() to allow DPDK
>> applications to update the link state for the KNI network interfaces
>> in the linux kernel.
>>
>> Note that the default carrier state is set to off when the interface
>> is opened.
>
> Why set carrier off when interface opened?

A couple of reasons:

1) That's the way every other Ethernet driver in the linux kernel does
it that I've seen.

2) The DPDK application may not actually be ready for the interface to
be used when it is first created.  Things like NetworkManager, etc
will gladly go trying to assign IP addresses to those interfaces, add
them to the routing table, etc as soon as the interface is marked
"up".  By making the default be "down", this allows the application to
finish any initialization on the DPDK side of the interface before
allowing it to be used by the kernel.

> Although I don't see any difference
> in interface state with or without this call.

Previously in the 'ip addr' output, the 'state' would be 'UNKNOWN'
when the interface was created.  After this patch the 'state' in 'ip
addr' is 'DOWN'.

thanks
dan
  
Stephen Hemminger Aug. 29, 2018, 10 p.m. UTC | #5
On Wed, 29 Aug 2018 18:02:06 -0300
Dan Gora <dg@adax.com> wrote:

> On Wed, Aug 29, 2018 at 12:54 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> > On Thu, 28 Jun 2018 18:55:08 -0700
> > Dan Gora <dg@adax.com> wrote:
> >  
> >> Add a new API function to KNI, rte_kni_update_link() to allow DPDK
> >> applications to update the link state for the KNI network interfaces
> >> in the linux kernel.
> >>
> >> Note that the default carrier state is set to off when the interface
> >> is opened.
> >>
> >> Signed-off-by: Dan Gora <dg@adax.com>  
> >
> > Do you really need a special ioctl for this?
> > There is already ability to set link state via sysfs or netlink.  
> 
> I think yes.. AFAIK sysfs does not constitute a stable API; 

It is a stable API on Linux.

> it's only
> available for Linux (yes, I know KNI is linux-only currently, but
> there's not really any technical reason why it can't work on BSD) and
> there are already callbacks to change the MTU and MAC addresses which
> could also be done via netlink.  IMHO having the kernel have an
> accurate view of the link state is more important than the ability to
> change the MAC address of the interface...

The device model on BSD is significantly different than Linux.
Doing KNI on BSD is going to be a full rewrite of the driver anyway;
I won't worry about sysfs, dependency.

The important part is that if KNI is ever going to be supportable
it needs to be upstream in Linux, not a bolt on out of tree driver.
Most Enterprise distributions will not support out of tree drivers
for good reasons.
  
Stephen Hemminger Aug. 29, 2018, 10:01 p.m. UTC | #6
On Wed, 29 Aug 2018 18:10:35 -0300
Dan Gora <dg@adax.com> wrote:

> On Wed, Aug 29, 2018 at 8:48 AM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> > On 6/29/2018 2:55 AM, Dan Gora wrote:  
> >> Add a new API function to KNI, rte_kni_update_link() to allow DPDK
> >> applications to update the link state for the KNI network interfaces
> >> in the linux kernel.
> >>
> >> Note that the default carrier state is set to off when the interface
> >> is opened.  
> >
> > Why set carrier off when interface opened?  
> 
> A couple of reasons:
> 
> 1) That's the way every other Ethernet driver in the linux kernel does
> it that I've seen.
> 
> 2) The DPDK application may not actually be ready for the interface to
> be used when it is first created.  Things like NetworkManager, etc
> will gladly go trying to assign IP addresses to those interfaces, add
> them to the routing table, etc as soon as the interface is marked
> "up".  By making the default be "down", this allows the application to
> finish any initialization on the DPDK side of the interface before
> allowing it to be used by the kernel.
> 
> > Although I don't see any difference
> > in interface state with or without this call.  
> 
> Previously in the 'ip addr' output, the 'state' would be 'UNKNOWN'
> when the interface was created.  After this patch the 'state' in 'ip
> addr' is 'DOWN'.
> 
> thanks
> dan

There is also a better (richer) API for link status on linux via
the operstate functions. Those might better match the semantics of
a tunnellish interface like KNI.
  
Dan Gora Aug. 29, 2018, 10:12 p.m. UTC | #7
On Wed, Aug 29, 2018 at 7:00 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
>> >> Add a new API function to KNI, rte_kni_update_link() to allow DPDK
>> >> applications to update the link state for the KNI network interfaces
>> >> in the linux kernel.
>> >>
>> >> Note that the default carrier state is set to off when the interface
>> >> is opened.
>> >>
>> >> Signed-off-by: Dan Gora <dg@adax.com>
>> >
>> > Do you really need a special ioctl for this?
>> > There is already ability to set link state via sysfs or netlink.
>>
>> I think yes.. AFAIK sysfs does not constitute a stable API;
>
> It is a stable API on Linux.

Ok, I didn't know this...

Still it seems better to me to be able to call
rte_kni_update_link(kni, link); than 'open ("/sys/whatever/where ever
it may be this kernel version/link/"); write(fd, "1"); close(fd); or
whatever...

But I guess if it is actually a stable API, we can hide all of that in
'rte_kni_update_link() and just do away with the ioctl!

I'm actually kind of shocked that I'm the only one who has run into
this.. I would have thought that having an accurate link status would
have been important for whoever used KNI.

>
>> it's only
>> available for Linux (yes, I know KNI is linux-only currently, but
>> there's not really any technical reason why it can't work on BSD) and
>> there are already callbacks to change the MTU and MAC addresses which
>> could also be done via netlink.  IMHO having the kernel have an
>> accurate view of the link state is more important than the ability to
>> change the MAC address of the interface...
>
> The device model on BSD is significantly different than Linux.
> Doing KNI on BSD is going to be a full rewrite of the driver anyway;
> I won't worry about sysfs, dependency.
>
> The important part is that if KNI is ever going to be supportable
> it needs to be upstream in Linux, not a bolt on out of tree driver.
> Most Enterprise distributions will not support out of tree drivers
> for good reasons.

Agreed there.. I was really torn between using KNI or the TAP
interface.  KNI seems cleaner, and at least at the time that I started
working on this, seemed like the way to interface to the kernel moving
forward.  The TAP interface stuff didn't seem like it was necessarily
going to be supported moving forward and the KNI was supposed to be
the "high performance" method to interface to the kernel.  But having
to build and install the rte_kni module on every system that we
install our software on is a major pain.

d
  
Dan Gora Aug. 29, 2018, 10:41 p.m. UTC | #8
On Wed, Aug 29, 2018 at 7:12 PM, Dan Gora <dg@adax.com> wrote:
> On Wed, Aug 29, 2018 at 7:00 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>>> >> Add a new API function to KNI, rte_kni_update_link() to allow DPDK
>>> >> applications to update the link state for the KNI network interfaces
>>> >> in the linux kernel.
>>> >>
>>> >> Note that the default carrier state is set to off when the interface
>>> >> is opened.
>>> >>
>>> >> Signed-off-by: Dan Gora <dg@adax.com>
>>> >
>>> > Do you really need a special ioctl for this?
>>> > There is already ability to set link state via sysfs or netlink.
>>>
>>> I think yes.. AFAIK sysfs does not constitute a stable API;
>>
>> It is a stable API on Linux.
>

Actually this does not seem to be completely true...

From Documentation/admin-guide/sysfs-rules.rst:

Rules on how to access information in sysfs
===========================================

The kernel-exported sysfs exports internal kernel implementation details
and depends on internal kernel structures and layout. It is agreed upon
by the kernel developers that the Linux kernel does not provide a stable
internal API. Therefore, there are aspects of the sysfs interface that
may not be stable across kernel releases.

<snip>

- devices are only "devices"
    There is no such thing like class-, bus-, physical devices,
    interfaces, and such that you can rely on in userspace. Everything is
    just simply a "device". Class-, bus-, physical, ... types are just
    kernel implementation details which should not be expected by
    applications that look for devices in sysfs.

    The properties of a device are:

    - devpath (``/devices/pci0000:00/0000:00:1d.1/usb2/2-2/2-2:1.0``)
<snip>

    - kernel name (``sda``, ``tty``, ``0000:00:1f.2``, ...)
<snip>

    - subsystem (``block``, ``tty``, ``pci``, ...)
<snip>

    - driver (``tg3``, ``ata_piix``, ``uhci_hcd``)
<snip>

    - attributes
<snip>

    Everything else is just a kernel driver-core implementation detail
    that should not be assumed to be stable across kernel releases.
  
Stephen Hemminger Aug. 29, 2018, 11:10 p.m. UTC | #9
On Wed, 29 Aug 2018 19:41:23 -0300
Dan Gora <dg@adax.com> wrote:

> On Wed, Aug 29, 2018 at 7:12 PM, Dan Gora <dg@adax.com> wrote:
> > On Wed, Aug 29, 2018 at 7:00 PM, Stephen Hemminger
> > <stephen@networkplumber.org> wrote:  
> >>> >> Add a new API function to KNI, rte_kni_update_link() to allow DPDK
> >>> >> applications to update the link state for the KNI network interfaces
> >>> >> in the linux kernel.
> >>> >>
> >>> >> Note that the default carrier state is set to off when the interface
> >>> >> is opened.
> >>> >>
> >>> >> Signed-off-by: Dan Gora <dg@adax.com>  
> >>> >
> >>> > Do you really need a special ioctl for this?
> >>> > There is already ability to set link state via sysfs or netlink.  
> >>>
> >>> I think yes.. AFAIK sysfs does not constitute a stable API;  
> >>
> >> It is a stable API on Linux.  
> >  
> 
> Actually this does not seem to be completely true...
> 
> From Documentation/admin-guide/sysfs-rules.rst:
> 
> Rules on how to access information in sysfs
> ===========================================
> 
> The kernel-exported sysfs exports internal kernel implementation details
> and depends on internal kernel structures and layout. It is agreed upon
> by the kernel developers that the Linux kernel does not provide a stable
> internal API. Therefore, there are aspects of the sysfs interface that
> may not be stable across kernel releases.
> 
> <snip>
> 
> - devices are only "devices"
>     There is no such thing like class-, bus-, physical devices,
>     interfaces, and such that you can rely on in userspace. Everything is
>     just simply a "device". Class-, bus-, physical, ... types are just
>     kernel implementation details which should not be expected by
>     applications that look for devices in sysfs.
> 
>     The properties of a device are:
> 
>     - devpath (``/devices/pci0000:00/0000:00:1d.1/usb2/2-2/2-2:1.0``)
> <snip>
> 
>     - kernel name (``sda``, ``tty``, ``0000:00:1f.2``, ...)
> <snip>
> 
>     - subsystem (``block``, ``tty``, ``pci``, ...)
> <snip>
> 
>     - driver (``tg3``, ``ata_piix``, ``uhci_hcd``)
> <snip>
> 
>     - attributes
> <snip>
> 
>     Everything else is just a kernel driver-core implementation detail
>     that should not be assumed to be stable across kernel releases.

Network device sysfs is stable. No one ever got around to putting it in documentation
I wouldn't worry, once anything in /sys/class/net is added it is not going to change without major breakage in many many tools.
  
Igor Ryzhov Aug. 30, 2018, 9:49 a.m. UTC | #10
Hi Dan,

We use KNI device exactly the same way you described – with IP addresses,
routing, etc.
And we also faced the same problem of having the actual link status in
Linux kernel.

There is a special callback for link state management in net_device_ops for
soft-devices like KNI called ndo_change_carrier.
Current KNI driver implements it already, you just need to write to
/sys/class/net/<iface>/carrier
to change link status.

Right now we implement it on application side, but I think it'll be good to
have this in rte_kni API.

Here is our implementation:

static int
linux_set_carrier(const char *name, int status)
{
char path[64];
const char *carrier = status ? "1" : "0";
int fd, ret;

sprintf(path, "/sys/devices/virtual/net/%s/carrier", name);
fd = open(path, O_WRONLY);
if (fd == -1) {
return -errno;
}

ret = write(fd, carrier, 2);
if (ret == -1) {
close(fd);
return -errno;
}

close(fd);

return 0;
}

Best regards,
Igor

On Thu, Aug 30, 2018 at 2:10 AM, Stephen Hemminger <
stephen@networkplumber.org> wrote:

> On Wed, 29 Aug 2018 19:41:23 -0300
> Dan Gora <dg@adax.com> wrote:
>
> > On Wed, Aug 29, 2018 at 7:12 PM, Dan Gora <dg@adax.com> wrote:
> > > On Wed, Aug 29, 2018 at 7:00 PM, Stephen Hemminger
> > > <stephen@networkplumber.org> wrote:
> > >>> >> Add a new API function to KNI, rte_kni_update_link() to allow DPDK
> > >>> >> applications to update the link state for the KNI network
> interfaces
> > >>> >> in the linux kernel.
> > >>> >>
> > >>> >> Note that the default carrier state is set to off when the
> interface
> > >>> >> is opened.
> > >>> >>
> > >>> >> Signed-off-by: Dan Gora <dg@adax.com>
> > >>> >
> > >>> > Do you really need a special ioctl for this?
> > >>> > There is already ability to set link state via sysfs or netlink.
> > >>>
> > >>> I think yes.. AFAIK sysfs does not constitute a stable API;
> > >>
> > >> It is a stable API on Linux.
> > >
> >
> > Actually this does not seem to be completely true...
> >
> > From Documentation/admin-guide/sysfs-rules.rst:
> >
> > Rules on how to access information in sysfs
> > ===========================================
> >
> > The kernel-exported sysfs exports internal kernel implementation details
> > and depends on internal kernel structures and layout. It is agreed upon
> > by the kernel developers that the Linux kernel does not provide a stable
> > internal API. Therefore, there are aspects of the sysfs interface that
> > may not be stable across kernel releases.
> >
> > <snip>
> >
> > - devices are only "devices"
> >     There is no such thing like class-, bus-, physical devices,
> >     interfaces, and such that you can rely on in userspace. Everything is
> >     just simply a "device". Class-, bus-, physical, ... types are just
> >     kernel implementation details which should not be expected by
> >     applications that look for devices in sysfs.
> >
> >     The properties of a device are:
> >
> >     - devpath (``/devices/pci0000:00/0000:00:1d.1/usb2/2-2/2-2:1.0``)
> > <snip>
> >
> >     - kernel name (``sda``, ``tty``, ``0000:00:1f.2``, ...)
> > <snip>
> >
> >     - subsystem (``block``, ``tty``, ``pci``, ...)
> > <snip>
> >
> >     - driver (``tg3``, ``ata_piix``, ``uhci_hcd``)
> > <snip>
> >
> >     - attributes
> > <snip>
> >
> >     Everything else is just a kernel driver-core implementation detail
> >     that should not be assumed to be stable across kernel releases.
>
> Network device sysfs is stable. No one ever got around to putting it in
> documentation
> I wouldn't worry, once anything in /sys/class/net is added it is not going
> to change without major breakage in many many tools.
>
  
Igor Ryzhov Aug. 30, 2018, 10:32 a.m. UTC | #11
Hi again,

Forgot to mention netif_carrier_off changes. Your changes are necessary for
correct link status management,
but netif_carrier_off call should also be added to kni_ioctl_create before
the register_netdev call.
That's needed for correct link status even before first call to ndo_open.

Best regards,
Igor

On Thu, Aug 30, 2018 at 12:49 PM, Igor Ryzhov <iryzhov@nfware.com> wrote:

> Hi Dan,
>
> We use KNI device exactly the same way you described – with IP addresses,
> routing, etc.
> And we also faced the same problem of having the actual link status in
> Linux kernel.
>
> There is a special callback for link state management in net_device_ops
> for soft-devices like KNI called ndo_change_carrier.
> Current KNI driver implements it already, you just need to write to
> /sys/class/net/<iface>/carrier to change link status.
>
> Right now we implement it on application side, but I think it'll be good
> to have this in rte_kni API.
>
> Here is our implementation:
>
> static int
> linux_set_carrier(const char *name, int status)
> {
> char path[64];
> const char *carrier = status ? "1" : "0";
> int fd, ret;
>
> sprintf(path, "/sys/devices/virtual/net/%s/carrier", name);
> fd = open(path, O_WRONLY);
> if (fd == -1) {
> return -errno;
> }
>
> ret = write(fd, carrier, 2);
> if (ret == -1) {
> close(fd);
> return -errno;
> }
>
> close(fd);
>
> return 0;
> }
>
> Best regards,
> Igor
>
> On Thu, Aug 30, 2018 at 2:10 AM, Stephen Hemminger <
> stephen@networkplumber.org> wrote:
>
>> On Wed, 29 Aug 2018 19:41:23 -0300
>> Dan Gora <dg@adax.com> wrote:
>>
>> > On Wed, Aug 29, 2018 at 7:12 PM, Dan Gora <dg@adax.com> wrote:
>> > > On Wed, Aug 29, 2018 at 7:00 PM, Stephen Hemminger
>> > > <stephen@networkplumber.org> wrote:
>> > >>> >> Add a new API function to KNI, rte_kni_update_link() to allow
>> DPDK
>> > >>> >> applications to update the link state for the KNI network
>> interfaces
>> > >>> >> in the linux kernel.
>> > >>> >>
>> > >>> >> Note that the default carrier state is set to off when the
>> interface
>> > >>> >> is opened.
>> > >>> >>
>> > >>> >> Signed-off-by: Dan Gora <dg@adax.com>
>> > >>> >
>> > >>> > Do you really need a special ioctl for this?
>> > >>> > There is already ability to set link state via sysfs or netlink.
>> > >>>
>> > >>> I think yes.. AFAIK sysfs does not constitute a stable API;
>> > >>
>> > >> It is a stable API on Linux.
>> > >
>> >
>> > Actually this does not seem to be completely true...
>> >
>> > From Documentation/admin-guide/sysfs-rules.rst:
>> >
>> > Rules on how to access information in sysfs
>> > ===========================================
>> >
>> > The kernel-exported sysfs exports internal kernel implementation details
>> > and depends on internal kernel structures and layout. It is agreed upon
>> > by the kernel developers that the Linux kernel does not provide a stable
>> > internal API. Therefore, there are aspects of the sysfs interface that
>> > may not be stable across kernel releases.
>> >
>> > <snip>
>> >
>> > - devices are only "devices"
>> >     There is no such thing like class-, bus-, physical devices,
>> >     interfaces, and such that you can rely on in userspace. Everything
>> is
>> >     just simply a "device". Class-, bus-, physical, ... types are just
>> >     kernel implementation details which should not be expected by
>> >     applications that look for devices in sysfs.
>> >
>> >     The properties of a device are:
>> >
>> >     - devpath (``/devices/pci0000:00/0000:00:1d.1/usb2/2-2/2-2:1.0``)
>> > <snip>
>> >
>> >     - kernel name (``sda``, ``tty``, ``0000:00:1f.2``, ...)
>> > <snip>
>> >
>> >     - subsystem (``block``, ``tty``, ``pci``, ...)
>> > <snip>
>> >
>> >     - driver (``tg3``, ``ata_piix``, ``uhci_hcd``)
>> > <snip>
>> >
>> >     - attributes
>> > <snip>
>> >
>> >     Everything else is just a kernel driver-core implementation detail
>> >     that should not be assumed to be stable across kernel releases.
>>
>> Network device sysfs is stable. No one ever got around to putting it in
>> documentation
>> I wouldn't worry, once anything in /sys/class/net is added it is not
>> going to change without major breakage in many many tools.
>>
>
>
  
Dan Gora Aug. 30, 2018, 9:41 p.m. UTC | #12
Hi All,

So I'm a little unclear as to what to do here moving forward..

Do we all agree at least that there should be some function in DPDK in
order to set the link state for KNI interfaces?

I'm a strong yes on this point.  I don't think that every KNI user
should have to implement something like this themselves if we can
provide a simple interface to do it.  It seems like a natural
extension of the KNI functionality IMHO.

If so, should we use a new ioctl as in my patch, or just use the
"write to /sys.../carrier" method as shown below?

I'm kind of ambivalent about this point.  The ioctl method has two
minor advantages to my mind:

1) It's already done :)

2) You can set the link speed and duplex as well and get a pretty
message in the syslog when the link status changes:

[ 2100.016079] rte_kni: adax0 NIC Link is Up 10000 Mbps Full Duplex.
[ 2100.016094] IPv6: ADDRCONF(NETDEV_CHANGE): adax0: link becomes ready
[ 2262.532126] IPv6: ADDRCONF(NETDEV_UP): adax1: link is not ready
[ 2263.432148] rte_kni: adax1 NIC Link is Up 10000 Mbps Full Duplex.
[ 2263.432170] IPv6: ADDRCONF(NETDEV_CHANGE): adax1: link becomes ready

On the other hand, the "write to /sys" method is a bit more simple and
confines the changes to the user space library.  If we're confident
that the /sys ABI is stable and not going to be changed going forward
it seems like a valid alternative.

I'm willing to do it either way.  I'll defer to the judgement of the
community...

thanks
dan


On Thu, Aug 30, 2018 at 6:49 AM, Igor Ryzhov <iryzhov@nfware.com> wrote:
> Hi Dan,
>
> We use KNI device exactly the same way you described – with IP addresses,
> routing, etc.
> And we also faced the same problem of having the actual link status in Linux
> kernel.
>
> There is a special callback for link state management in net_device_ops for
> soft-devices like KNI called ndo_change_carrier.
> Current KNI driver implements it already, you just need to write to
> /sys/class/net/<iface>/carrier to change link status.
>
> Right now we implement it on application side, but I think it'll be good to
> have this in rte_kni API.
>
> Here is our implementation:
>
> static int
> linux_set_carrier(const char *name, int status)
> {
> char path[64];
> const char *carrier = status ? "1" : "0";
> int fd, ret;
>
> sprintf(path, "/sys/devices/virtual/net/%s/carrier", name);
> fd = open(path, O_WRONLY);
> if (fd == -1) {
> return -errno;
> }
>
> ret = write(fd, carrier, 2);
> if (ret == -1) {
> close(fd);
> return -errno;
> }
>
> close(fd);
>
> return 0;
> }
>
> Best regards,
> Igor
  
Stephen Hemminger Aug. 30, 2018, 10:09 p.m. UTC | #13
On Thu, 30 Aug 2018 18:41:14 -0300
Dan Gora <dg@adax.com> wrote:

> On the other hand, the "write to /sys" method is a bit more simple and
> confines the changes to the user space library.  If we're confident
> that the /sys ABI is stable and not going to be changed going forward
> it seems like a valid alternative.

See Documentation/ABI/testing/sysfs-class-net
  
Dan Gora Aug. 30, 2018, 10:11 p.m. UTC | #14
On Thu, Aug 30, 2018 at 7:09 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Thu, 30 Aug 2018 18:41:14 -0300
> Dan Gora <dg@adax.com> wrote:
>
>> On the other hand, the "write to /sys" method is a bit more simple and
>> confines the changes to the user space library.  If we're confident
>> that the /sys ABI is stable and not going to be changed going forward
>> it seems like a valid alternative.
>
> See Documentation/ABI/testing/sysfs-class-net

yeah, but it's in the 'testing' directory :)

From Documentation/ABI/README:

testing/

This directory documents interfaces that are felt to be stable,
as the main development of this interface has been completed.
The interface can be changed to add new features, but the
current interface will not break by doing this, unless grave
errors or security problems are found in them.  Userspace
programs can start to rely on these interfaces, but they must be
aware of changes that can occur before these interfaces move to
be marked stable.  Programs that use these interfaces are
strongly encouraged to add their name to the description of
these interfaces, so that the kernel developers can easily
notify them if any changes occur (see the description of the
layout of the files below for details on how to do this.)

Like I said, I'm ok with using this if that's what everyone wants to do.

d
  
Dan Gora Sept. 4, 2018, 12:47 a.m. UTC | #15
Hi All,

One other problem with using the sysfs method to change the link state
rather than this ioctl method.  The sysfs/netdev method to change the
carrier was only introduced in kernel 3.9.  For older kernels, we
would just be out of luck.  The ioctl method will work with any kernel
version (2.6+).  It's not clear if this is a problem for DPDK apps or
not.

thanks
dan


On Thu, Aug 30, 2018 at 7:11 PM, Dan Gora <dg@adax.com> wrote:
> On Thu, Aug 30, 2018 at 7:09 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>> On Thu, 30 Aug 2018 18:41:14 -0300
>> Dan Gora <dg@adax.com> wrote:
>>
>>> On the other hand, the "write to /sys" method is a bit more simple and
>>> confines the changes to the user space library.  If we're confident
>>> that the /sys ABI is stable and not going to be changed going forward
>>> it seems like a valid alternative.
>>
>> See Documentation/ABI/testing/sysfs-class-net
>
> yeah, but it's in the 'testing' directory :)
>
> From Documentation/ABI/README:
>
> testing/
>
> This directory documents interfaces that are felt to be stable,
> as the main development of this interface has been completed.
> The interface can be changed to add new features, but the
> current interface will not break by doing this, unless grave
> errors or security problems are found in them.  Userspace
> programs can start to rely on these interfaces, but they must be
> aware of changes that can occur before these interfaces move to
> be marked stable.  Programs that use these interfaces are
> strongly encouraged to add their name to the description of
> these interfaces, so that the kernel developers can easily
> notify them if any changes occur (see the description of the
> layout of the files below for details on how to do this.)
>
> Like I said, I'm ok with using this if that's what everyone wants to do.
>
> d
  
Stephen Hemminger Sept. 5, 2018, 12:57 p.m. UTC | #16
On Mon, 3 Sep 2018 21:47:22 -0300
Dan Gora <dg@adax.com> wrote:

> Hi All,
> 
> One other problem with using the sysfs method to change the link state
> rather than this ioctl method.  The sysfs/netdev method to change the
> carrier was only introduced in kernel 3.9.  For older kernels, we
> would just be out of luck.  The ioctl method will work with any kernel
> version (2.6+).  It's not clear if this is a problem for DPDK apps or
> not.
> 
> thanks
> dan
> 
> 
> On Thu, Aug 30, 2018 at 7:11 PM, Dan Gora <dg@adax.com> wrote:
> > On Thu, Aug 30, 2018 at 7:09 PM, Stephen Hemminger
> > <stephen@networkplumber.org> wrote:  
> >> On Thu, 30 Aug 2018 18:41:14 -0300
> >> Dan Gora <dg@adax.com> wrote:
> >>  
> >>> On the other hand, the "write to /sys" method is a bit more simple and
> >>> confines the changes to the user space library.  If we're confident
> >>> that the /sys ABI is stable and not going to be changed going forward
> >>> it seems like a valid alternative.  
> >>
> >> See Documentation/ABI/testing/sysfs-class-net  
> >
> > yeah, but it's in the 'testing' directory :)
> >
> > From Documentation/ABI/README:
> >
> > testing/
> >
> > This directory documents interfaces that are felt to be stable,
> > as the main development of this interface has been completed.
> > The interface can be changed to add new features, but the
> > current interface will not break by doing this, unless grave
> > errors or security problems are found in them.  Userspace
> > programs can start to rely on these interfaces, but they must be
> > aware of changes that can occur before these interfaces move to
> > be marked stable.  Programs that use these interfaces are
> > strongly encouraged to add their name to the description of
> > these interfaces, so that the kernel developers can easily
> > notify them if any changes occur (see the description of the
> > layout of the files below for details on how to do this.)
> >
> > Like I said, I'm ok with using this if that's what everyone wants to do.
> >
> > d  
> 
> 
> 

Linux 3.9 is no longer supported. Currently, upstream Linux kernel is supported
from 3.16 on. If someone is on a kernel that old, they aren't going to get
any security fixes.
  
Dan Gora Sept. 11, 2018, 9:45 p.m. UTC | #17
Hi All,

So I implemented the "write to /sys/devices/virtual/net/*/carrier"
method to change the link status, but there is one more minor thing
that I wanted to point out about this approach.  The problem is that
you cannot read or write the '/sys/devices/virtual/net/*/carrier' file
while the interface is marked 'down'.  This means that link status
changes can only be performed by the DPDK application while the
interface is in the "up" state.  With the ioctl method, you can change
the carrier state at pretty much any time.

Is this a problem?  It's not a huge one, I guess, but it is something
else to consider.

Please let me know what you all think.

thanks
dan
  
Stephen Hemminger Sept. 11, 2018, 9:52 p.m. UTC | #18
On Tue, 11 Sep 2018 18:45:49 -0300
Dan Gora <dg@adax.com> wrote:

> Hi All,
> 
> So I implemented the "write to /sys/devices/virtual/net/*/carrier"
> method to change the link status, but there is one more minor thing
> that I wanted to point out about this approach.  The problem is that
> you cannot read or write the '/sys/devices/virtual/net/*/carrier' file
> while the interface is marked 'down'.  This means that link status
> changes can only be performed by the DPDK application while the
> interface is in the "up" state.  With the ioctl method, you can change
> the carrier state at pretty much any time.
> 
> Is this a problem?  It's not a huge one, I guess, but it is something
> else to consider.
> 
> Please let me know what you all think.
> 
> thanks
> dan

The carrier state has no meaning when device is down, at least for physical
devices. Because often the PHY is powered off when the device is marked down.
  
Dan Gora Sept. 11, 2018, 10:07 p.m. UTC | #19
On Tue, Sep 11, 2018 at 6:52 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> The carrier state has no meaning when device is down, at least for physical
> devices. Because often the PHY is powered off when the device is marked down.

The thing that caught my attention is that when you mark a kernel
ethernet device 'down', you get a message that the link is down in the
syslog.

snappy:root:bash 2645 => ip link set down dev eth0
Sep 11 18:32:48 snappy kernel: e1000e: eth0 NIC Link is Down

With this method, that's not possible because you cannot change the
link state from the callback from kni_net_release.

The carrier state doesn't have any meaning from a data transfer point
of view, but it's often useful for being able to diagnose connectivity
issues (is my cable plugged in or not).

I'm still not really clear what the objection really is to the ioctl
method.  Is it just the number of changes?  That the kernel driver has
to change as well?  Just that there is another way to do it?

thanks
dan
  
Stephen Hemminger Sept. 11, 2018, 11:14 p.m. UTC | #20
On Tue, 11 Sep 2018 19:07:47 -0300
Dan Gora <dg@adax.com> wrote:

> On Tue, Sep 11, 2018 at 6:52 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> > The carrier state has no meaning when device is down, at least for physical
> > devices. Because often the PHY is powered off when the device is marked down.  
> 
> The thing that caught my attention is that when you mark a kernel
> ethernet device 'down', you get a message that the link is down in the
> syslog.
> 
> snappy:root:bash 2645 => ip link set down dev eth0
> Sep 11 18:32:48 snappy kernel: e1000e: eth0 NIC Link is Down
> 
> With this method, that's not possible because you cannot change the
> link state from the callback from kni_net_release.
> 
> The carrier state doesn't have any meaning from a data transfer point
> of view, but it's often useful for being able to diagnose connectivity
> issues (is my cable plugged in or not).
> 
> I'm still not really clear what the objection really is to the ioctl
> method.  Is it just the number of changes?  That the kernel driver has
> to change as well?  Just that there is another way to do it?
> 
> thanks
> dan

I want to see KNI as part of the standard Linux kernel at some future date.
Having KNI as an out of tree driver means it is doomed to chasing tail lights
for the Linux kernel ABI instability and also problems with Linux distributions.

One of the barriers to entry for Linux drivers is introducing new ioctl's.
Ioctl's have issues with being device specific and also 32/64 compatiablity.
If KNI has ioctl's it makes it harder to get merged some day.

I freely admit that this is forcing KNI to respond to something that is not
there yet, so if it is too hard, then doing it with ioctl is going to be
necessary.
  
Jason Wang Sept. 12, 2018, 4:02 a.m. UTC | #21
On 2018年09月12日 07:14, Stephen Hemminger wrote:
> On Tue, 11 Sep 2018 19:07:47 -0300
> Dan Gora <dg@adax.com> wrote:
>
>> On Tue, Sep 11, 2018 at 6:52 PM, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>>> The carrier state has no meaning when device is down, at least for physical
>>> devices. Because often the PHY is powered off when the device is marked down.
>> The thing that caught my attention is that when you mark a kernel
>> ethernet device 'down', you get a message that the link is down in the
>> syslog.
>>
>> snappy:root:bash 2645 => ip link set down dev eth0
>> Sep 11 18:32:48 snappy kernel: e1000e: eth0 NIC Link is Down
>>
>> With this method, that's not possible because you cannot change the
>> link state from the callback from kni_net_release.
>>
>> The carrier state doesn't have any meaning from a data transfer point
>> of view, but it's often useful for being able to diagnose connectivity
>> issues (is my cable plugged in or not).
>>
>> I'm still not really clear what the objection really is to the ioctl
>> method.  Is it just the number of changes?  That the kernel driver has
>> to change as well?  Just that there is another way to do it?
>>
>> thanks
>> dan
> I want to see KNI as part of the standard Linux kernel at some future date.
> Having KNI as an out of tree driver means it is doomed to chasing tail lights
> for the Linux kernel ABI instability and also problems with Linux distributions.

Why not use vhost_net instead? KNI duplicates its function.

Thanks

>
> One of the barriers to entry for Linux drivers is introducing new ioctl's.
> Ioctl's have issues with being device specific and also 32/64 compatiablity.
> If KNI has ioctl's it makes it harder to get merged some day.
>
> I freely admit that this is forcing KNI to respond to something that is not
> there yet, so if it is too hard, then doing it with ioctl is going to be
> necessary.
  

Patch

diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c
index 1c38cfa1a..b5784ad1b 100644
--- a/kernel/linux/kni/kni_misc.c
+++ b/kernel/linux/kni/kni_misc.c
@@ -585,6 +585,66 @@  kni_ioctl_free(struct net *net, uint32_t ioctl_num,
 	return ret;
 }
 
+static int
+kni_ioctl_linkstat(struct net *net, uint32_t ioctl_num,
+		unsigned long ioctl_param)
+{
+	struct kni_net *knet = net_generic(net, kni_net_id);
+	int ret = -EINVAL;
+	struct kni_dev *dev, *n;
+	struct rte_kni_link_info link_info;
+	struct net_device *netdev;
+	uint16_t link;
+
+	if (_IOC_SIZE(ioctl_num) > sizeof(link_info))
+		return -EINVAL;
+
+	ret = copy_from_user(&link_info, (void *)ioctl_param,
+			     sizeof(link_info));
+	if (ret) {
+		pr_err("copy_from_user in kni_ioctl_release");
+		return -EIO;
+	}
+
+	/* Release the network device according to its name */
+	if (strlen(link_info.name) == 0)
+		return ret;
+
+	down_read(&knet->kni_list_lock);
+	list_for_each_entry_safe(dev, n, &knet->kni_list_head, list) {
+		if (strncmp(dev->name, link_info.name, RTE_KNI_NAMESIZE) != 0)
+			continue;
+
+		netdev = dev->net_dev;
+		if (netdev == NULL) {
+			up_read(&knet->kni_list_lock);
+			return ret;
+		}
+
+		link = link_info.link_status;
+
+		if (!netif_carrier_ok(netdev) && link) {
+			pr_info("%s NIC Link is Up %d Mbps %s.\n",
+				netdev->name,
+				link_info.link_speed,
+				link_info.link_duplex ==
+					RTE_KNI_LINK_FULL_DUPLEX ?
+					"Full Duplex" : "Half Duplex");
+			netif_carrier_on(netdev);
+		} else if (netif_carrier_ok(netdev) && !link) {
+			pr_info("%s NIC Link is Down.\n",
+				netdev->name);
+			netif_carrier_off(netdev);
+		}
+
+		ret = 0;
+		break;
+	}
+	up_read(&knet->kni_list_lock);
+
+	return ret;
+}
+
 static int
 kni_ioctl(struct inode *inode, uint32_t ioctl_num, unsigned long ioctl_param)
 {
@@ -609,6 +669,9 @@  kni_ioctl(struct inode *inode, uint32_t ioctl_num, unsigned long ioctl_param)
 	case _IOC_NR(RTE_KNI_IOCTL_FREE):
 		ret = kni_ioctl_free(net, ioctl_num, ioctl_param);
 		break;
+	case _IOC_NR(RTE_KNI_IOCTL_LINKSTAT):
+		ret = kni_ioctl_linkstat(net, ioctl_num, ioctl_param);
+		break;
 	default:
 		pr_debug("IOCTL default\n");
 		break;
diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
index 0850be434..fea3ec7e7 100644
--- a/kernel/linux/kni/kni_net.c
+++ b/kernel/linux/kni/kni_net.c
@@ -134,6 +134,7 @@  kni_net_open(struct net_device *dev)
 	struct kni_dev *kni = netdev_priv(dev);
 
 	netif_start_queue(dev);
+	netif_carrier_off(dev);
 
 	memset(&req, 0, sizeof(req));
 	req.req_id = RTE_KNI_REQ_CFG_NETWORK_IF;
@@ -153,6 +154,7 @@  kni_net_release(struct net_device *dev)
 	struct kni_dev *kni = netdev_priv(dev);
 
 	netif_stop_queue(dev); /* can't transmit any more */
+	netif_carrier_off(dev);
 
 	memset(&req, 0, sizeof(req));
 	req.req_id = RTE_KNI_REQ_CFG_NETWORK_IF;
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
index 318a3f939..f617d8026 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
@@ -124,11 +124,30 @@  struct rte_kni_device_info {
 	char mac_addr[6];
 };
 
+
+struct rte_kni_link_info {
+	char name[RTE_KNI_NAMESIZE];  /**< Network device name for KNI */
+	uint32_t link_speed;        /**< ETH_SPEED_NUM_ */
+
+#define RTE_KNI_LINK_HALF_DUPLEX 0 /**< Half-duplex connection. */
+#define RTE_KNI_LINK_FULL_DUPLEX 1 /**< Full-duplex connection. */
+	uint16_t link_duplex  : 1;  /**< RTE_KNI_LINK_[HALF/FULL]_DUPLEX */
+
+#define RTE_KNI_LINK_FIXED       0 /**< No autonegotiation. */
+#define RTE_KNI_LINK_AUTONEG     1 /**< Autonegotiated. */
+	uint16_t link_autoneg : 1;  /**< RTE_KNI_LINK_[AUTONEG/FIXED] */
+
+#define RTE_KNI_LINK_DOWN        0 /**< Link is down. */
+#define RTE_KNI_LINK_UP          1 /**< Link is up. */
+	uint16_t link_status  : 1;  /**< RTE_KNI_LINK_[DOWN/UP] */
+};
+
 #define KNI_DEVICE "kni"
 
 #define RTE_KNI_IOCTL_TEST    _IOWR(0, 1, int)
 #define RTE_KNI_IOCTL_CREATE  _IOWR(0, 2, struct rte_kni_device_info)
 #define RTE_KNI_IOCTL_RELEASE _IOWR(0, 3, struct rte_kni_device_info)
 #define RTE_KNI_IOCTL_FREE    _IOWR(0, 4, struct rte_kni_device_info)
+#define RTE_KNI_IOCTL_LINKSTAT _IOWR(0, 5, struct rte_kni_link_info)
 
 #endif /* _RTE_KNI_COMMON_H_ */
diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
index 6ef0859bf..aa3559306 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -13,7 +13,6 @@ 
 
 #include <rte_spinlock.h>
 #include <rte_string_fns.h>
-#include <rte_ethdev.h>
 #include <rte_malloc.h>
 #include <rte_log.h>
 #include <rte_kni.h>
@@ -817,6 +816,42 @@  rte_kni_unregister_handlers(struct rte_kni *kni)
 
 	return 0;
 }
+
+int
+rte_kni_update_link(struct rte_kni *kni, struct rte_eth_link *link)
+{
+	struct rte_kni_link_info link_info;
+
+	if (kni == NULL || !kni->in_use || link == NULL)
+		return -1;
+
+	snprintf(link_info.name, sizeof(link_info.name), "%s", kni->name);
+
+	link_info.link_speed = link->link_speed;
+	if (link->link_duplex == ETH_LINK_FULL_DUPLEX)
+		link_info.link_duplex = RTE_KNI_LINK_FULL_DUPLEX;
+	else
+		link_info.link_duplex = RTE_KNI_LINK_FULL_DUPLEX;
+
+	if (link->link_autoneg == ETH_LINK_FIXED)
+		link_info.link_autoneg = RTE_KNI_LINK_FIXED;
+	else
+		link_info.link_autoneg = RTE_KNI_LINK_AUTONEG;
+
+	if (link->link_status == ETH_LINK_UP)
+		link_info.link_status = RTE_KNI_LINK_UP;
+	else
+		link_info.link_status = RTE_KNI_LINK_DOWN;
+
+	if (ioctl(kni_fd, RTE_KNI_IOCTL_LINKSTAT, &link_info) < 0) {
+		RTE_LOG(ERR, KNI,
+			"Failed to update kni link info for dev '%s'.\n",
+			kni->name);
+		return -1;
+	}
+	return 0;
+}
+
 void
 rte_kni_close(void)
 {
diff --git a/lib/librte_kni/rte_kni.h b/lib/librte_kni/rte_kni.h
index 94516c38f..02d781a32 100644
--- a/lib/librte_kni/rte_kni.h
+++ b/lib/librte_kni/rte_kni.h
@@ -21,6 +21,7 @@ 
 #include <rte_memory.h>
 #include <rte_mempool.h>
 #include <rte_ether.h>
+#include <rte_ethdev.h>
 
 #include <exec-env/rte_kni_common.h>
 
@@ -255,6 +256,21 @@  int rte_kni_register_handlers(struct rte_kni *kni, struct rte_kni_ops *ops);
  */
 int rte_kni_unregister_handlers(struct rte_kni *kni);
 
+/**
+ *  Update link status info for KNI port.
+ *
+ *  Update the linkup/linkdown status of a KNI interface in the kernel.
+ *
+ *  @param kni
+ *   pointer to struct rte_kni.
+ *
+ *  @return
+ *   On success: 0
+ *   On failure: -1
+ */
+int __rte_experimental
+rte_kni_update_link(struct rte_kni *kni, struct rte_eth_link *link);
+
 /**
  *  Close KNI device.
  */