[v4] sched: enable traffic class oversubscription conditionally

Message ID 20220524134332.644318-1-marcinx.danilewicz@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [v4] sched: enable traffic class oversubscription conditionally |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/github-robot: build success github build: passed
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS

Commit Message

Danilewicz, MarcinX May 24, 2022, 1:43 p.m. UTC
  Added new API to enable or disable TC over subscription for best
effort traffic class at subport level.
Added changes after review and increased throughput.

By default TC OV is disabled.
History:
- v1 - TC OV disabled by default
- v2 - throughput improvements
- v3, v4 - changes from comments

Signed-off-by: Marcin Danilewicz <marcinx.danilewicz@intel.com>
---
 lib/sched/rte_sched.c | 189 +++++++++++++++++++++++++++++++++++-------
 lib/sched/rte_sched.h |  18 ++++
 lib/sched/version.map |   3 +
 3 files changed, 178 insertions(+), 32 deletions(-)
  

Comments

Cristian Dumitrescu May 24, 2022, 2:30 p.m. UTC | #1
> -----Original Message-----
> From: Danilewicz, MarcinX <marcinx.danilewicz@intel.com>
> Sent: Tuesday, May 24, 2022 2:44 PM
> To: dev@dpdk.org; Singh, Jasvinder <jasvinder.singh@intel.com>;
> Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Cc: Ajmera, Megha <megha.ajmera@intel.com>
> Subject: [PATCH v4] sched: enable traffic class oversubscription conditionally
> 
> Added new API to enable or disable TC over subscription for best
> effort traffic class at subport level.
> Added changes after review and increased throughput.
> 
> By default TC OV is disabled.
> History:
> - v1 - TC OV disabled by default
> - v2 - throughput improvements
> - v3, v4 - changes from comments
> 
> Signed-off-by: Marcin Danilewicz <marcinx.danilewicz@intel.com>
> ---
>  lib/sched/rte_sched.c | 189 +++++++++++++++++++++++++++++++++++------

Marcin,

I don't see any of my comments on the previous V3 version addressed. You mention in the change log that you addressed comments, but I see that all my comments were silently disregarded. Jasvinder also noted the same for his comments in a previous version. Please address the comments and do not keep sending the same code over and over.

This change was supposed to be straightforward, but for some reason the progress is extremely slow on your side. I think at this point we are at risk of missing the RC1 deadline for this feature.

Regards,
Cristian
  
Danilewicz, MarcinX May 25, 2022, 2:18 p.m. UTC | #2
Hi Cristian,

Oh .. you absolutely right. I did not found them all ..  I've missed them in all unneeded lines when you reply to full source code in message. I'll add changes from rest of the comments asap.


BR,
/Marcin

-----Original Message-----
From: Dumitrescu, Cristian <cristian.dumitrescu@intel.com> 
Sent: Tuesday, May 24, 2022 4:30 PM
To: Danilewicz, MarcinX <marcinx.danilewicz@intel.com>; dev@dpdk.org; Singh, Jasvinder <jasvinder.singh@intel.com>
Cc: Ajmera, Megha <megha.ajmera@intel.com>; Thakur, Sham Singh <sham.singh.thakur@intel.com>; Mcnamara, John <john.mcnamara@intel.com>; Devlin, Michelle <michelle.devlin@intel.com>
Subject: RE: [PATCH v4] sched: enable traffic class oversubscription conditionally



> -----Original Message-----
> From: Danilewicz, MarcinX <marcinx.danilewicz@intel.com>
> Sent: Tuesday, May 24, 2022 2:44 PM
> To: dev@dpdk.org; Singh, Jasvinder <jasvinder.singh@intel.com>; 
> Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Cc: Ajmera, Megha <megha.ajmera@intel.com>
> Subject: [PATCH v4] sched: enable traffic class oversubscription 
> conditionally
> 
> Added new API to enable or disable TC over subscription for best 
> effort traffic class at subport level.
> Added changes after review and increased throughput.
> 
> By default TC OV is disabled.
> History:
> - v1 - TC OV disabled by default
> - v2 - throughput improvements
> - v3, v4 - changes from comments
> 
> Signed-off-by: Marcin Danilewicz <marcinx.danilewicz@intel.com>
> ---
>  lib/sched/rte_sched.c | 189 +++++++++++++++++++++++++++++++++++------

Marcin,

I don't see any of my comments on the previous V3 version addressed. You mention in the change log that you addressed comments, but I see that all my comments were silently disregarded. Jasvinder also noted the same for his comments in a previous version. Please address the comments and do not keep sending the same code over and over.

This change was supposed to be straightforward, but for some reason the progress is extremely slow on your side. I think at this point we are at risk of missing the RC1 deadline for this feature.

Regards,
Cristian
--------------------------------------------------------------
Intel Research and Development Ireland Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263


This e-mail and any attachments may contain confidential material for the sole
use of the intended recipient(s). Any review or distribution by others is
strictly prohibited. If you are not the intended recipient, please contact the
sender and delete all copies.
  
Danilewicz, MarcinX May 27, 2022, 12:09 a.m. UTC | #3
Hi all,



Going trough all notes on http://patches.dpdk.org/project/dpdk/patch/20220427092357.491720-1-marcinx.danilewicz@intel.com/ please find my answers inline here.



First of all please take my apologies, for not checking url as above on regular basis. I was awaiting for mails with me on CC along to comments.



So lets go in to comments:





I don't see any note on the changes made in this version with respect to previous versions.  Can you include them in future version?  Also, I had some comments on the first version of this patch, I don't see any response.

@Marcin: hopefully all required changes are in new v5 patch



>  int

>  rte_sched_port_dequeue(struct rte_sched_port *port, struct rte_mbuf **pkts,

> uint32_t n_pkts);

>

> +/**

> + * Hierarchical scheduler subport TC OV enable/disable config.



The name of the feature should be fully stated here: traffic class oversubscription, not the abbreviation, please change.



@Marcin: please check v5 patch



> + * Note that this function is safe to use at runtime

> + * to enable/disable TC OV for subport.



We should actually forbit this rather than encourage it. Calling this function several times does not make sense, and it can create limitations that can come back and byte us in the future, whenever we might need to extend this code, for no reason.



Please actually replace with: "This function should be called at the time of subport initialization."



@Marcin: please check v5 patch





> + *

> + * @param port

> + *   Handle to port scheduler instance

> + * @param subport_id

> + *   Subport ID

> + * @param tc_ov_enable

> + *  Boolean flag to enable/disable TC OV

> + * @return

> + *   0 upon success, error code otherwise

> + */

> +__rte_experimental

> +int

> +rte_sched_subport_tc_ov_config(struct rte_sched_port *port, uint32_t

> subport_id, bool tc_ov_enable);

> +

>  #ifdef __cplusplus

>  }

>  #endif

> diff --git a/lib/sched/version.map b/lib/sched/version.map

> index d22c07fc9f..c6e994d8df 100644

> --- a/lib/sched/version.map

> +++ b/lib/sched/version.map

> @@ -34,4 +34,7 @@ EXPERIMENTAL {

>            # added in 21.11

>            rte_pie_rt_data_init;

>            rte_pie_config_init;

> +

> +         # added in 22.03



This is not in 22.03, it will hopefully be in 22.07.



> +         rte_sched_subport_tc_ov_config;

>  };

> --

> 2.25.1





> >                                                                                 subport->profile;

> >

> >                       grinder_prefetch_tc_queue_arrays(subport, pos);

> > -                     grinder_credits_update(port, subport, pos);

> > +

> > +                    if (unlikely(subport->is_tc_ov_enabled))

>

> Please remove the "unlikely" from here, don't put any likely/unlikely here at all.





@Marcin: as also noted below, I agree and all ours are removed.





> >

> > +/**

> > + * Hierarchical scheduler subport TC OV enable/disable config.

>

> The name of the feature should be fully stated here: traffic class

> oversubscription, not the abbreviation, please change.

>

>



@Marcin: please check changes





> > + * Note that this function is safe to use at runtime

> > + * to enable/disable TC OV for subport.

>

> We should actually forbit this rather than encourage it. Calling this function

> several times does not make sense, and it can create limitations that can come

> back and byte us in the future, whenever we might need to extend this code, for

> no reason.

>

> Please actually replace with: "This function should be called at the time of

> subport initialization."





@Marcin: please check changes



> > @@ -34,4 +34,7 @@ EXPERIMENTAL {

> >         # added in 21.11

> >         rte_pie_rt_data_init;

> >         rte_pie_config_init;

> > +

> > +      # added in 22.03

>

> This is not in 22.03, it will hopefully be in 22.07.

>



@Marcin: Well, nice spot!





> > Also the name of the feature should not be abbreviated in the patch title.

> >

> > I suggest you rework the title to:

> > [PATCH] sched: enable traffic class oversubscription conditionally

> >

@Marcin: already done





> > >

> > > Added new API to enable or disable TC over subscription for best

> > > effort traffic class at subport level.

> > > Added changes after review and increased throughput.

> > >

> > > By default TC OV is disabled.

> >

> > It should be the other way around, the TC_OV should be enabled by default.

> The

> > TC oversubscription is a more natural way to use this library, we usually want

> to

> > disable this feature just for better performance in case this functionality is

> not

> > needed. Please initialize the tc_ov flag accordingly.

> >

>

> In original code, this feature has always been disabled as it impacts

> performance.

> So, in my opinion we should keep it disabled by default and let user enable it

> when required.

>



In the original code, yes, it had to be explicitly enabled through a build-time flag. This was not the best option, and this is precisely what we are trying to fix with this patch.



But on the other hand all the users of these library that I know use it with the TC oversubscription turned on. Functionality is more important for them than performance. Hence my vote now is to enable it by default; those users that prefer performance over functionality can easily turn this feature off with no issues.



@Marcin: OK





> > >      uint8_t memory[0] __rte_cache_aligned;

> > > +

> > > +   /* TC oversubscription activation */

> > > +   int is_tc_ov_enabled;

> >

> > How about we simplify the name of this variable to: tc_ov_enabled ?

@Marcin: agree 😊



> > > +   s = port->subports[subport_id];

> > > +   s->is_tc_ov_enabled = tc_ov_enable ? 1 : 0;

> > > +

> > > +   if (s->is_tc_ov_enabled) {

> > > +                 /* TC oversubscription */

> > > +                 s->tc_ov_wm_min = port->mtu;

> > > +                 s->tc_ov_period_id = 0;

> > > +                 s->tc_ov = 0;

> > > +                 s->tc_ov_n = 0;

> > > +                 s->tc_ov_rate = 0;

> > > +

> > > +                 profile = port->subport_profiles + s->profile;

> > > +                 s->tc_ov_wm_max = rte_sched_time_ms_to_bytes(profile-

> > > >tc_period,

> > > +                                              s->pipe_tc_be_rate_max);

> > > +                 s->tc_ov_wm = s->tc_ov_wm_max;

> > > +   }

> > > +   return 0;

> > > +}

> >

> > This function should not exist, please remove it and keep the initial code that

> > computes the tc_ov related variable regardless of whether tc_ov is enabled

> or

> > not.

> >

> > All the tc_ov related variables have the tc_ov particle in their name, so there

> is

> > no clash. This is initialization code, so no performance overhead. Let's keep

> the

> > code unmodified and compute both the tc_ov and the non-tc_ov varables at

> > initialization, regardless of whether the feature is enabled or not.

> >

> > This comment is applicable to all the initialization code, please adjust all the

> init

> > code accordingly. There should be no diff showing in the patch for any of the

> init

> > code!

> >

> > For this file "rte_sched.c", your patch should contain just two additional run-

> > time functions, i.e. the non-tc-ov version of functions

> grinder_credits_update()

> > and grindler_credits_check(), and the small code required to test when to use

> > the tc-ov vs. the non-tc_ov version, makes sense?



@Marcin: Yes, except setting tc ov enabled initially. Right?





> > >                    s->n_pipe_profiles = params->n_pipe_profiles;

> > >                    s->n_max_pipe_profiles = params->n_max_pipe_profiles;

> > >

> > > +                 /* TC over-subscription is disabled by default */

> > > +                 s->is_tc_ov_enabled = 0;

> > > +

> >

> > By default, this feature should be enabled:

> > s->is_tc_ov_enabled = 1;

> >

@Marcin: as below and here, ok feature is going to be enabled



> > >

> > > -   if (!grinder_credits_check(port, subport, pos))

> > > -                  return 0;

> > > +   switch (subport->is_tc_ov_enabled) {

> > > +   case 1:

> > > +                 if (!grinder_credits_check_with_tc_ov(port, subport, pos))

> > > +                               return 0;

> > > +                 break;

> > > +   case 0:

> > > +                 if (!grinder_credits_check(port, subport, pos))

> > > +                               return 0;

> > > +                 break;

> > > +   }

> >

> > There should be no switch statement here, please replace with an if

> statement. I

> > suggest the following:

> >

> > int status;

> >

> > status = subport->tc_ov_enabled ? grinder_credits_check_with_tc_ov(port,

> > subport, pos) : grinder_credits_check(port, subport, pos); if (!status)

> >         return 0;

> >

@Marcin: I disagree on that. Performance is decreased by that. Well any other thing than switch-case statement used there.

With code above I get:

-------+------------+------------+

       |  received  |   dropped  |

-------+------------+------------+

  RX   |    5390751 |          0 |

QOS+TX |    5390784 |          0 |   pps: 5390784

-------+------------+------------+



With switch case:

pps: 6733376



With TC oversubscription disabled or enabled, to my surprise. What to do with this now? I will be sending v5 patch with switch-case as of now ….





> > >      /* Advance port time */

> > >      port->time += pkt_len;

> > > @@ -2770,7 +2891,11 @@ grinder_handle(struct rte_sched_port *port,

> > >                                                                              subport->profile;

> > >

> > >                    grinder_prefetch_tc_queue_arrays(subport, pos);

> > > -                  grinder_credits_update(port, subport, pos);

> > > +

> > > +                 if (unlikely(subport->is_tc_ov_enabled))

> >

> > Please remove the "unlikely" from here, don't put any likely/unlikely here at

> all.



@Marcin: Done. I didn’t not like “unlikely” from a first saw there.



> > > @@ -579,6 +579,24 @@ rte_sched_port_enqueue(struct rte_sched_port

> > > *port, struct rte_mbuf **pkts, uint  int

> > > rte_sched_port_dequeue(struct rte_sched_port *port, struct rte_mbuf

> > > **pkts, uint32_t n_pkts);

> > >

> > > +/**

> > > + * Hierarchical scheduler subport TC OV enable/disable config.

> >

> > The name of the feature should be fully stated here: traffic class

> > oversubscription, not the abbreviation, please change.



@Marcin: done



> > > + * Note that this function is safe to use at runtime

> > > + * to enable/disable TC OV for subport.

> >

> > We should actually forbit this rather than encourage it. Calling this function

> > several times does not make sense, and it can create limitations that can

> come

> > back and byte us in the future, whenever we might need to extend this code,

> for no reason.

> >

> > Please actually replace with: "This function should be called at the time of

> > subport initialization."

> >



@Marcin: done





> > > +   # added in 22.03

> >

> > This is not in 22.03, it will hopefully be in 22.07.

> >



@Marcin: done





> Yes, I agree this would be the ideal way to drive this change, but the problem is that modifying the existing subport parameter structure would represent an API change. This would require a deprecation notice, and the patch would be blocked until 22.11 release. Are you willing to wait until 22.11? If not, then adding the configuration function for this flag is the next best thing.



Are we making any plans for that?



No, definitely not.



> > Also the name of the feature should not be abbreviated in the patch title.

> > In original code, this feature has always been disabled as it impacts

>> performance.

> > So, in my opinion we should keep it disabled by default and let user enable it

> > when required.

> >



> In the original code, yes, it had to be explicitly enabled through a build-time flag. This was not the best option, and this is precisely what we are trying to fix with this patch.



> But on the other hand all the users of these library that I know use it with the TC oversubscription turned on. Functionality is more important for them than performance. Hence my vote now is to enable it by default; those users that prefer performance over functionality can easily turn this feature off with no issues.



@Marcin: OK, so we enable it by default.







Intel Research and Development Ireland Limited

Registered in Ireland

Registered Office: Collinstown Industrial Park, Leixlip, County Kildare

Registered Number: 308263





This e-mail and any attachments may contain confidential material for the sole

use of the intended recipient(s). Any review or distribution by others is

strictly prohibited. If you are not the intended recipient, please contact the

sender and delete all copies.

Stephen HemmingerMay 24, 2022, 2:52 p.m. UTC | #6

On Tue, 24 May 2022 13:33:31 +0000

Marcin Danilewicz <marcinx.danilewicz@intel.com<mailto:marcinx.danilewicz@intel.com>> wrote:



> /Marcin

> --------------------------------------------------------------

> Intel Research and Development Ireland Limited

> Registered in Ireland

> Registered Office: Collinstown Industrial Park, Leixlip, County Kildare

> Registered Number: 308263

>

>

> This e-mail and any attachments may contain confidential material for the sole

> use of the intended recipient(s). Any review or distribution by others is

> strictly prohibited. If you are not the intended recipient, please contact the

> sender and delete all copies.

>





Please talk to your management/lawyers. This kind of auto-footer violates the

required discussion properties of open source.



@Marcin:  I send answer in separate thread. But in a short, that message was added by mail server or something in between. Patch using git email engine was sent as per usual.





BR,

/Marcin
--------------------------------------------------------------
Intel Research and Development Ireland Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263


This e-mail and any attachments may contain confidential material for the sole
use of the intended recipient(s). Any review or distribution by others is
strictly prohibited. If you are not the intended recipient, please contact the
sender and delete all copies.
  

Patch

diff --git a/lib/sched/rte_sched.c b/lib/sched/rte_sched.c
index ec74bee939..6e7d81df46 100644
--- a/lib/sched/rte_sched.c
+++ b/lib/sched/rte_sched.c
@@ -213,6 +213,9 @@  struct rte_sched_subport {
 	uint8_t *bmp_array;
 	struct rte_mbuf **queue_array;
 	uint8_t memory[0] __rte_cache_aligned;
+
+	/* TC oversubscription activation */
+	int is_tc_ov_enabled;
 } __rte_cache_aligned;
 
 struct rte_sched_port {
@@ -1165,6 +1168,45 @@  rte_sched_cman_config(struct rte_sched_port *port,
 }
 #endif
 
+int
+rte_sched_subport_tc_ov_config(struct rte_sched_port *port,
+	uint32_t subport_id,
+	bool tc_ov_enable)
+{
+	struct rte_sched_subport *s;
+	struct rte_sched_subport_profile *profile;
+
+	if (port == NULL) {
+		RTE_LOG(ERR, SCHED,
+			"%s: Incorrect value for parameter port\n", __func__);
+		return -EINVAL;
+	}
+
+	if (subport_id >= port->n_subports_per_port) {
+		RTE_LOG(ERR, SCHED,
+			"%s: Incorrect value for parameter subport id\n", __func__);
+		return  -EINVAL;
+	}
+
+	s = port->subports[subport_id];
+	s->is_tc_ov_enabled = tc_ov_enable ? 1 : 0;
+
+	if (s->is_tc_ov_enabled) {
+		/* TC oversubscription */
+		s->tc_ov_wm_min = port->mtu;
+		s->tc_ov_period_id = 0;
+		s->tc_ov = 0;
+		s->tc_ov_n = 0;
+		s->tc_ov_rate = 0;
+
+		profile = port->subport_profiles + s->profile;
+		s->tc_ov_wm_max = rte_sched_time_ms_to_bytes(profile->tc_period,
+				s->pipe_tc_be_rate_max);
+		s->tc_ov_wm = s->tc_ov_wm_max;
+	}
+	return 0;
+}
+
 int
 rte_sched_subport_config(struct rte_sched_port *port,
 	uint32_t subport_id,
@@ -1254,6 +1296,9 @@  rte_sched_subport_config(struct rte_sched_port *port,
 		s->n_pipe_profiles = params->n_pipe_profiles;
 		s->n_max_pipe_profiles = params->n_max_pipe_profiles;
 
+		/* TC over-subscription is disabled by default */
+		s->is_tc_ov_enabled = 0;
+
 #ifdef RTE_SCHED_CMAN
 		if (params->cman_params != NULL) {
 			s->cman_enabled = true;
@@ -1316,13 +1361,6 @@  rte_sched_subport_config(struct rte_sched_port *port,
 
 		for (i = 0; i < RTE_SCHED_PORT_N_GRINDERS; i++)
 			s->grinder_base_bmp_pos[i] = RTE_SCHED_PIPE_INVALID;
-
-		/* TC oversubscription */
-		s->tc_ov_wm_min = port->mtu;
-		s->tc_ov_period_id = 0;
-		s->tc_ov = 0;
-		s->tc_ov_n = 0;
-		s->tc_ov_rate = 0;
 	}
 
 	{
@@ -1342,9 +1380,6 @@  rte_sched_subport_config(struct rte_sched_port *port,
 			else
 				profile->tc_credits_per_period[i] = 0;
 
-		s->tc_ov_wm_max = rte_sched_time_ms_to_bytes(profile->tc_period,
-							s->pipe_tc_be_rate_max);
-		s->tc_ov_wm = s->tc_ov_wm_max;
 		s->profile = subport_profile_id;
 
 	}
@@ -1417,17 +1452,20 @@  rte_sched_pipe_config(struct rte_sched_port *port,
 		double pipe_tc_be_rate =
 			(double) params->tc_credits_per_period[RTE_SCHED_TRAFFIC_CLASS_BE]
 			/ (double) params->tc_period;
-		uint32_t tc_be_ov = s->tc_ov;
 
-		/* Unplug pipe from its subport */
-		s->tc_ov_n -= params->tc_ov_weight;
-		s->tc_ov_rate -= pipe_tc_be_rate;
-		s->tc_ov = s->tc_ov_rate > subport_tc_be_rate;
+		if (s->is_tc_ov_enabled) {
+			uint32_t tc_be_ov = s->tc_ov;
 
-		if (s->tc_ov != tc_be_ov) {
-			RTE_LOG(DEBUG, SCHED,
-				"Subport %u Best-effort TC oversubscription is OFF (%.4lf >= %.4lf)\n",
-				subport_id, subport_tc_be_rate, s->tc_ov_rate);
+			/* Unplug pipe from its subport */
+			s->tc_ov_n -= params->tc_ov_weight;
+			s->tc_ov_rate -= pipe_tc_be_rate;
+			s->tc_ov = s->tc_ov_rate > subport_tc_be_rate;
+
+			if (s->tc_ov != tc_be_ov) {
+				RTE_LOG(DEBUG, SCHED,
+					"Subport %u Best-effort TC oversubscription is OFF (%.4lf >= %.4lf)\n",
+					subport_id, subport_tc_be_rate, s->tc_ov_rate);
+			}
 		}
 
 		/* Reset the pipe */
@@ -1460,19 +1498,22 @@  rte_sched_pipe_config(struct rte_sched_port *port,
 		double pipe_tc_be_rate =
 			(double) params->tc_credits_per_period[RTE_SCHED_TRAFFIC_CLASS_BE]
 			/ (double) params->tc_period;
-		uint32_t tc_be_ov = s->tc_ov;
 
-		s->tc_ov_n += params->tc_ov_weight;
-		s->tc_ov_rate += pipe_tc_be_rate;
-		s->tc_ov = s->tc_ov_rate > subport_tc_be_rate;
+		if (s->is_tc_ov_enabled) {
+			uint32_t tc_be_ov = s->tc_ov;
+
+			s->tc_ov_n += params->tc_ov_weight;
+			s->tc_ov_rate += pipe_tc_be_rate;
+			s->tc_ov = s->tc_ov_rate > subport_tc_be_rate;
 
-		if (s->tc_ov != tc_be_ov) {
-			RTE_LOG(DEBUG, SCHED,
-				"Subport %u Best effort TC oversubscription is ON (%.4lf < %.4lf)\n",
-				subport_id, subport_tc_be_rate, s->tc_ov_rate);
+			if (s->tc_ov != tc_be_ov) {
+				RTE_LOG(DEBUG, SCHED,
+					"Subport %u Best effort TC oversubscription is ON (%.4lf < %.4lf)\n",
+					subport_id, subport_tc_be_rate, s->tc_ov_rate);
+			}
+			p->tc_ov_period_id = s->tc_ov_period_id;
+			p->tc_ov_credits = s->tc_ov_wm;
 		}
-		p->tc_ov_period_id = s->tc_ov_period_id;
-		p->tc_ov_credits = s->tc_ov_wm;
 	}
 
 	return 0;
@@ -2318,6 +2359,45 @@  grinder_credits_update(struct rte_sched_port *port,
 	pipe->tb_credits = RTE_MIN(pipe->tb_credits, params->tb_size);
 	pipe->tb_time += n_periods * params->tb_period;
 
+	/* Subport TCs */
+	if (unlikely(port->time >= subport->tc_time)) {
+		for (i = 0; i < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE; i++)
+			subport->tc_credits[i] = sp->tc_credits_per_period[i];
+
+		subport->tc_time = port->time + sp->tc_period;
+	}
+
+	/* Pipe TCs */
+	if (unlikely(port->time >= pipe->tc_time)) {
+		for (i = 0; i < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE; i++)
+			pipe->tc_credits[i] = params->tc_credits_per_period[i];
+		pipe->tc_time = port->time + params->tc_period;
+	}
+}
+
+static inline void
+grinder_credits_update_with_tc_ov(struct rte_sched_port *port,
+	struct rte_sched_subport *subport, uint32_t pos)
+{
+	struct rte_sched_grinder *grinder = subport->grinder + pos;
+	struct rte_sched_pipe *pipe = grinder->pipe;
+	struct rte_sched_pipe_profile *params = grinder->pipe_params;
+	struct rte_sched_subport_profile *sp = grinder->subport_params;
+	uint64_t n_periods;
+	uint32_t i;
+
+	/* Subport TB */
+	n_periods = (port->time - subport->tb_time) / sp->tb_period;
+	subport->tb_credits += n_periods * sp->tb_credits_per_period;
+	subport->tb_credits = RTE_MIN(subport->tb_credits, sp->tb_size);
+	subport->tb_time += n_periods * sp->tb_period;
+
+	/* Pipe TB */
+	n_periods = (port->time - pipe->tb_time) / params->tb_period;
+	pipe->tb_credits += n_periods * params->tb_credits_per_period;
+	pipe->tb_credits = RTE_MIN(pipe->tb_credits, params->tb_size);
+	pipe->tb_time += n_periods * params->tb_period;
+
 	/* Subport TCs */
 	if (unlikely(port->time >= subport->tc_time)) {
 		subport->tc_ov_wm =
@@ -2348,6 +2428,39 @@  grinder_credits_update(struct rte_sched_port *port,
 static inline int
 grinder_credits_check(struct rte_sched_port *port,
 	struct rte_sched_subport *subport, uint32_t pos)
+{
+	struct rte_sched_grinder *grinder = subport->grinder + pos;
+	struct rte_sched_pipe *pipe = grinder->pipe;
+	struct rte_mbuf *pkt = grinder->pkt;
+	uint32_t tc_index = grinder->tc_index;
+	uint64_t pkt_len = pkt->pkt_len + port->frame_overhead;
+	uint64_t subport_tb_credits = subport->tb_credits;
+	uint64_t subport_tc_credits = subport->tc_credits[tc_index];
+	uint64_t pipe_tb_credits = pipe->tb_credits;
+	uint64_t pipe_tc_credits = pipe->tc_credits[tc_index];
+	int enough_credits;
+
+	/* Check pipe and subport credits */
+	enough_credits = (pkt_len <= subport_tb_credits) &&
+		(pkt_len <= subport_tc_credits) &&
+		(pkt_len <= pipe_tb_credits) &&
+		(pkt_len <= pipe_tc_credits);
+
+	if (!enough_credits)
+		return 0;
+
+	/* Update pipe and subport credits */
+	subport->tb_credits -= pkt_len;
+	subport->tc_credits[tc_index] -= pkt_len;
+	pipe->tb_credits -= pkt_len;
+	pipe->tc_credits[tc_index] -= pkt_len;
+
+	return 1;
+}
+
+static inline int
+grinder_credits_check_with_tc_ov(struct rte_sched_port *port,
+	struct rte_sched_subport *subport, uint32_t pos)
 {
 	struct rte_sched_grinder *grinder = subport->grinder + pos;
 	struct rte_sched_pipe *pipe = grinder->pipe;
@@ -2403,8 +2516,16 @@  grinder_schedule(struct rte_sched_port *port,
 	uint32_t pkt_len = pkt->pkt_len + port->frame_overhead;
 	uint32_t be_tc_active;
 
-	if (!grinder_credits_check(port, subport, pos))
-		return 0;
+	switch (subport->is_tc_ov_enabled) {
+	case 1:
+		if (!grinder_credits_check_with_tc_ov(port, subport, pos))
+			return 0;
+		break;
+	case 0:
+		if (!grinder_credits_check(port, subport, pos))
+			return 0;
+		break;
+	}
 
 	/* Advance port time */
 	port->time += pkt_len;
@@ -2770,7 +2891,11 @@  grinder_handle(struct rte_sched_port *port,
 						subport->profile;
 
 		grinder_prefetch_tc_queue_arrays(subport, pos);
-		grinder_credits_update(port, subport, pos);
+
+		if (unlikely(subport->is_tc_ov_enabled))
+			grinder_credits_update_with_tc_ov(port, subport, pos);
+		else
+			grinder_credits_update(port, subport, pos);
 
 		grinder->state = e_GRINDER_PREFETCH_MBUF;
 		return 0;
diff --git a/lib/sched/rte_sched.h b/lib/sched/rte_sched.h
index 5ece64e527..94febe1d94 100644
--- a/lib/sched/rte_sched.h
+++ b/lib/sched/rte_sched.h
@@ -579,6 +579,24 @@  rte_sched_port_enqueue(struct rte_sched_port *port, struct rte_mbuf **pkts, uint
 int
 rte_sched_port_dequeue(struct rte_sched_port *port, struct rte_mbuf **pkts, uint32_t n_pkts);
 
+/**
+ * Hierarchical scheduler subport TC OV enable/disable config.
+ * Note that this function is safe to use at runtime
+ * to enable/disable TC OV for subport.
+ *
+ * @param port
+ *   Handle to port scheduler instance
+ * @param subport_id
+ *   Subport ID
+ * @param tc_ov_enable
+ *  Boolean flag to enable/disable TC OV
+ * @return
+ *   0 upon success, error code otherwise
+ */
+__rte_experimental
+int
+rte_sched_subport_tc_ov_config(struct rte_sched_port *port, uint32_t subport_id, bool tc_ov_enable);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/sched/version.map b/lib/sched/version.map
index d22c07fc9f..c6e994d8df 100644
--- a/lib/sched/version.map
+++ b/lib/sched/version.map
@@ -34,4 +34,7 @@  EXPERIMENTAL {
 	# added in 21.11
 	rte_pie_rt_data_init;
 	rte_pie_config_init;
+
+	# added in 22.03
+	rte_sched_subport_tc_ov_config;
 };