← Back to Blog

The Cost of Absolute Priority: When "Work Stealing" Meets Hierarchical Scheduling

When building high-throughput task schedulers, we often hear a common refrain: "Separate critical tasks from background tasks; critical tasks must be executed first."

This statement sounds impeccable. After all, who wants a thread processing user requests to be blocked by a log compression task? Thus, a classic two-tier scheduling design emerged: Absolute Priority.

Today, we will use a Go code example to reproduce this industrial-grade design pattern and discuss the hidden costs it introduces while solving a particular problem.

What is "Absolute Priority"?

In standard task scheduling (such as the Go Runtime or various thread pools), tasks are usually treated as equal, or with only minor weight differences. However, in certain extreme scenarios—such as real-time search or high-frequency trading—systems not only demand speed but also deterministic speed.

Designers therefore introduced two levels of queues:

  1. Major Queue: Stores core path tasks, which must have absolute execution priority.
  2. Minor Task Source Collection: Stores dynamically generated secondary tasks (e.g., background cleanup, log aggregation). Worker threads are only allowed to process these tasks when the Major Queue is completely empty.

This pattern is less "scheduling" and more "priority-based multiplexing."

Code Reproduction (Go)

To demonstrate this logic, we build a PriorityTaskScheduler. Please note the select structure in the Pop method, which accurately reflects the decision path in the original design.

package main

import (
	"fmt"
	"sync"
	"sync/atomic"
)

// Task represents an executable task
type Task func()

// PriorityTaskScheduler demonstrates the two-tier task scheduling logic in industrial systems
// The Major queue has absolute priority, and the Minor queue is processed only when Major is empty
type PriorityTaskScheduler struct {
	major chan Task

	// minors stores dynamically created minor task sources
	// This is a typical "read-heavy, write-light" scenario, suitable for sync.Map
	minors      sync.Map
	minorsCount int64
}

func NewPriorityTaskScheduler(buffer int) *PriorityTaskScheduler {
	return &PriorityTaskScheduler{
		major: make(chan Task, buffer),
	}
}

// PushMajor submits tasks to the high-priority path
func (s *PriorityTaskScheduler) PushMajor(t Task) {
	s.major <- t
}

// NewMinorQueue creates a new low-priority task source
// In the original design, this often corresponds to a temporary subset of tasks or a background job
func (s *PriorityTaskScheduler) NewMinorQueue(buffer int) chan Task {
	minor := make(chan Task, buffer)
	id := atomic.AddInt64(&s.minorsCount, 1)
	s.minors.Store(id, minor)
	return minor
}

// Pop simulates the logic of a worker thread fetching tasks
// Core logic: check Major first, then Minor
func (s *PriorityTaskScheduler) Pop() Task {
	// 1. Absolute priority check: attempt to fetch from the major path
	select {
	case t := <-s.major:
		return t
	default:
		// 2. Only when the major path is empty, scan the minor paths
		// This logic is called "Stealing," but it's essentially a fallback process
		return s.stealFromMinors()
	}
}

func (s *PriorityTaskScheduler) stealFromMinors() Task {
	var found Task

	// Simulate the iteration logic for Minors in the original code
	// If there are many Minor queues, this will become a performance killer
	s.minors.Range(func(key, value interface{}) bool {
		ch, ok := value.(chan Task)
		if !ok {
			return true
		}

		select {
		case t, open := <-ch:
			if !open {
				// Automatic cleanup: if the queue is closed, remove it from the pool
				s.minors.Delete(key)
				return true
			}
			found = t
			return false // Found a task, stop iterating
		default:
			return true // Continue checking the next one
		}
	})

	return found
}

func main() {
	scheduler := NewPriorityTaskScheduler(10)

	// Create a minor task stream
	lowPriority := scheduler.NewMinorQueue(5)

	// Populate tasks
	// Note: Although low-priority tasks are added first, they will be executed later
	lowPriority <- func() { fmt.Println("Executing: Low-priority task (Minor)") }
	scheduler.PushMajor(func() { fmt.Println("Executing: High-priority task (Major)") })

	fmt.Println("Starting scheduler...")

	// Demonstrate consumption order
	for i := 0; i < 2; i++ {
		if t := scheduler.Pop(); t != nil {
			t()
		} else {
			fmt.Println("No tasks available")
		}
	}
}

Design Trade-offs Analysis

1. Isolation vs. Fairness

The biggest advantage of this design is the inviolability of the core path. As long as the Major queue has tasks, worker threads will never look at the Minor queues. Under stress tests, this ensures extremely low latency jitter for core business operations.

However, the cost is severe: Starvation. If the system is under continuous high load, and tasks constantly pour into the Major queue, tasks in the Minor queues might never get executed. In distributed systems, this often means background heartbeats, statistical reporting, or garbage collection tasks are indefinitely postponed, eventually leading to nodes being evicted for "appearing unhealthy"—even though they are actually extremely busy.

2. Scanning Overhead

Please pay attention to the iteration logic in stealFromMinors.

When the system is idle (Major is empty), worker threads will continuously iterate through the minors list. If there are 1000 Minor queues in the system (e.g., one queue per connection), and most of them are also empty, then each Pop operation will turn into an $O(N)$ full scan.

This is a typical case of CPU idling. To find a potentially existing low-priority task, the CPU cache is flushed with numerous invalid memory accesses. In original C++ or Rust implementations, designers often had to introduce extremely complex lock-free structures (such as lock-free linked lists with hazard pointers) to mitigate contention during iteration, but this, in turn, significantly increases code maintenance costs.

3. The Hidden Burden of Dynamic Lifecycles

There's an interesting detail in the code: s.minors.Delete(key). In the original design, this was achieved through Reference Counting—when a sub-queue was no longer held by any producer, it would automatically disappear from the scheduler's list.

While this "automatic garbage collection" reduces the cognitive load on higher-level business logic, it increases the burden on the scheduler. The scheduler not only has to dispatch tasks but also constantly monitor the lifecycle of task sources. This violates the Single Responsibility Principle, making the scheduling core bloated.

Summary: When to Use?

"Absolute Priority" is a powerful tool, but it's a double-edged sword.

Applicable Scenarios:

  • Hard Real-time Systems: Latency must be deterministic, e.g., trading matching engines.
  • Extremely Clear Task Prioritization: Major tasks are user requests, minor tasks are merely auxiliary statistics, and it's acceptable if they are dropped.

Inapplicable Scenarios:

  • General Throughput Systems: E.g., web servers, where all requests are essentially equal.
  • Large-scale Microservices: Background heartbeats and probe tasks are equally important; prolonged starvation can lead to cascading failures.

Before using this pattern, ask yourself: Are my minor tasks truly "minor" enough to tolerate indefinite starvation? If the answer is no, then a traditional scheduler with weighted round-robin or time-slicing might be a more robust choice.