[v3] lib/telemetry:fix telemetry conns leak in case of socket write fail

Message ID tencent_39DA29090964DA893A7ACB80E04F0D1D470A@qq.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series [v3] lib/telemetry:fix telemetry conns leak in case of socket write fail |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/github-robot: build success github build: passed
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS

Commit Message

ShaoWei Sun Jan. 30, 2024, 1:57 a.m. UTC
  Telemetry can only create 10 conns by default, each of which is processed
by a thread.

When a thread fails to write using socket, the thread will end directly
without reducing the total number of conns.

This will result in the machine running for a long time, and if there are
10 failures, the telemetry will be unavailable

Fixes: 6dd571fd07c3 ("telemetry: introduce new functionality")

Signed-off-by: Shaowei Sun <1819846787@qq.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
---
 lib/telemetry/telemetry.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
  

Comments

David Marchand Feb. 1, 2024, 1:14 p.m. UTC | #1
On Tue, Jan 30, 2024 at 2:57 AM Shaowei Sun <1819846787@qq.com> wrote:
>
> Telemetry can only create 10 conns by default, each of which is processed
> by a thread.
>
> When a thread fails to write using socket, the thread will end directly
> without reducing the total number of conns.
>
> This will result in the machine running for a long time, and if there are
> 10 failures, the telemetry will be unavailable
>
> Fixes: 6dd571fd07c3 ("telemetry: introduce new functionality")
>
> Signed-off-by: Shaowei Sun <1819846787@qq.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> Acked-by: Ciara Power <ciara.power@intel.com>
> Acked-by: Chengwen Feng <fengchengwen@huawei.com>

Thanks for the fix.
As far as I can see, the limiting of the number of connections (which
here results in a DoS on the telemetry socket) was added in commit
2a7d0b872f79 ("telemetry: add upper limit on connections").

If you confirm this is indeed this commit that introduced the issue, I
will fix the Fixes: tag myself when applying.
  
ShaoWei Sun Feb. 1, 2024, 2:11 p.m. UTC | #2
Yes, you are correct, it should be the commit 2a7d0b872f79 that introduced the issue. Thank you for the correction.&nbsp;


&nbsp;
1819846787@qq.com



&nbsp;




------------------&nbsp;原始邮件&nbsp;------------------
发件人: "David Marchand"<david.marchand@redhat.com&gt;; 
发送时间: 2024年2月1日(星期四) 晚上9:14
收件人: " ShaoWei Sun"<1819846787@qq.com&gt;; "ciara.power"<ciara.power@intel.com&gt;; 
抄送: "dev"<dev@dpdk.org&gt;; "Bruce Richardson"<bruce.richardson@intel.com&gt;; "Chengwen Feng"<fengchengwen@huawei.com&gt;; 
主题: Re: [PATCH] [v3]lib/telemetry:fix telemetry conns leak in case of socket write fail



1819846787@qq.com&gt; wrote:
&gt;
&gt; Telemetry can only create 10 conns by default, each of which is processed
&gt; by a thread.
&gt;
&gt; When a thread fails to write using socket, the thread will end directly
&gt; without reducing the total number of conns.
&gt;
&gt; This will result in the machine running for a long time, and if there are
&gt; 10 failures, the telemetry will be unavailable
&gt;
&gt; Fixes: 6dd571fd07c3 ("telemetry: introduce new functionality")
&gt;
&gt; Signed-off-by: Shaowei Sun <1819846787@qq.com&gt;
&gt; Acked-by: Bruce Richardson <bruce.richardson@intel.com&gt;
&gt; Acked-by: Ciara Power <ciara.power@intel.com&gt;
&gt; Acked-by: Chengwen Feng <fengchengwen@huawei.com&gt;

Thanks for the fix.
As far as I can see, the limiting of the number of connections (which
here results in a DoS on the telemetry socket) was added in commit
2a7d0b872f79 ("telemetry: add upper limit on connections").

If you confirm this is indeed this commit that introduced the issue, I
will fix the Fixes: tag myself when applying.


-- 
David Marchand
  
David Marchand Feb. 1, 2024, 3:19 p.m. UTC | #3
On Tue, Jan 30, 2024 at 2:57 AM Shaowei Sun <1819846787@qq.com> wrote:
>
> Telemetry can only create 10 conns by default, each of which is processed
> by a thread.
>
> When a thread fails to write using socket, the thread will end directly
> without reducing the total number of conns.
>
> This will result in the machine running for a long time, and if there are
> 10 failures, the telemetry will be unavailable
>
Fixes: 2a7d0b872f79 ("telemetry: add upper limit on connections")
Cc: stable@dpdk.org
>
> Signed-off-by: Shaowei Sun <1819846787@qq.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> Acked-by: Ciara Power <ciara.power@intel.com>
> Acked-by: Chengwen Feng <fengchengwen@huawei.com>

Applied, thanks for the fix!
  

Patch

diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c
index 31e2391867..0b00c04090 100644
--- a/lib/telemetry/telemetry.c
+++ b/lib/telemetry/telemetry.c
@@ -378,8 +378,8 @@  client_handler(void *sock_id)
 			"{\"version\":\"%s\",\"pid\":%d,\"max_output_len\":%d}",
 			telemetry_version, getpid(), MAX_OUTPUT_LEN);
 	if (write(s, info_str, strlen(info_str)) < 0) {
-		close(s);
-		return NULL;
+		TMTY_LOG_LINE(ERR, "Socket write base info to client failed");
+		goto exit;
 	}
 
 	/* receive data is not null terminated */
@@ -404,6 +404,7 @@  client_handler(void *sock_id)
 
 		bytes = read(s, buffer, sizeof(buffer) - 1);
 	}
+exit:
 	close(s);
 	rte_atomic_fetch_sub_explicit(&v2_clients, 1, rte_memory_order_relaxed);
 	return NULL;