[번역] scheduler/sched-nice-design.txt

iamyooon 2018. 8. 3. 17:58

2018. 8. 3. 17:58

이 문서에서는 새로운 리눅스 스케쥴러에서 개정된 nice-level 구현에 대한 개념을 설명한다. nice level은 리눅스에서 꽤 취약한 부분이였고 사람들은 지속적으로 nice +19태스크가 기존보다 cpu time을 덜 쓰게 만들어 달라고 커널개발자들을 괴롭혔다.

This document explains the thinking about the revamped and streamlined nice-levels implementation in the new Linux scheduler. Nice levels were always pretty weak under Linux and people continuously pestered us to make nice +19 tasks use up much less CPU time.

유감스럽게도 기존 스케쥴러에서 이런 요구사항을 구현하는게 쉽지않았다(그렇지 않았다면 진작 구현했을것이다) 왜냐하면 nice level은 전통적으로 타임슬라이스 길이와 엮여있고 타임슬라이스의 단위는 HZ tick에 의해 정해지기 때문이다. 제일 작은 타임슬라이스는 1/HZ이다.

Unfortunately that was not that easy to implement under the old scheduler, (otherwise we'd have done it long ago) because nice level support was historically coupled to timeslice length, and timeslice units were driven by the HZ tick, so the smallest timeslice was 1/HZ.

(2003년도에 ) O(1) 스케쥴러에서 음수 값의 nice는 2.4 이전에 비해 더욱 강력하게 변했다. (사람들은 이런 변화에 기뻐했다.) 또한 의도적으로 linear timeslice rule을 조정했다. 따라서 nice 19 레벨은 정확히 1 jiffies를 갖게 되었다. 이해를 돕기위한 타임슬라이스 그래프는 아래와 같다.

In the O(1) scheduler (in 2003) we changed negative nice levels to be much stronger than they were before in 2.4 (and people were happy about that change), and we also intentionally calibrated the linear timeslice rule so that nice +19 level would be _exactly_ 1 jiffy. To better understand it, the timeslice graph went like this (cheesy ASCII art alert!):

따라서 만약 누군가가 태스크의 nice level을 바꾼다면, 현재의 nice level +19는 기존보다 더 많은 CPU시간을 줄것이다(우선순위를 확장하기 위해 ABI를 변경하는 방식의 해결책은 일찍이 폐기되었다).

So that if someone wanted to really renice tasks, +19 would give a much bigger hit than the normal linear rule would do. (The solution of changing the ABI to extend priorities was discarded early on.)

이런 방식은 어느정도 시간이 걸렸지만 HZ가 1000일때 1 jiffies는 1ms이 되므로, CPU시간 0.1%를 쓸 수 있음을 의미한다. 이것은 조금 극단적이라고 느껴지지만 0.1%는 매우 적은 CPU시간이기 때문에 극단적이지 않다. 하지만 너무 적은 시간동안 CPU를 사용하면 잦은 재스케쥴링(1ms마다)으로 이어진다.(그리고 cache trash 등의 문제가 있다. HW사양이 낮고 cache가 작던 시절은 지나갔고 사람들은 여러개의 crunching app을 nice +19에서 수행한다는 것을 기억하라)

This approach worked to some degree for some time, but later on with HZ=1000 it caused 1 jiffy to be 1 msec, which meant 0.1% CPU usage which we felt to be a bit excessive. Excessive _not_ because it's too small of a CPU utilization, but because it causes too frequent (once per millisec) rescheduling. (and would thus trash the cache, etc. Remember, this was long ago when ha

This approach worked to some degree for some time, but later on with HZ=1000 it caused 1 jiffy to be 1 msec, which meant 0.1%

This approach worked to some degree for some time, but later on with HZ=1000 it caused 1 jiffy to be 1 msec, which meant0.1% CPU usage which we felt to be a bit excessive. Excessive _not_ because it's too small of a CPU utilization, but because it causes too frequent (once per millisec) rescheduling. (and would thus trash the cache, etc. Remember, this was long ago when hardware was weaker and caches were smaller, and people were running number crunching apps at nice +19.)

따라서 HZ가 1000일 때 우리는 nice +19가 5ms를 사용하도록 바꿨다. 우리는 이 값이 적절한 minimal granularity라고 느꼈다. 그리고 이 값은 5%의 CPU사용시간으로 환산된다. 하지만 HZ 변화에 민감한 것은 여전했다. 그리고 nice +19이 너무 적게 cpu를 사용한다는 불만은 없었고 여전히 너무 많이 사용한다는 불만이 있다.

So for HZ=1000 we changed nice +19 to 5msecs, because that felt like the right minimal granularity - and this translates to 5% CPU utilization. But the fundamental HZ-sensitive property for nice+19 still remained, and we never got a single complaint about nice +19 being too _weak_ in terms of CPU utilization, we only got complaints about it (still) being too _strong_ :-)

요약하자면, 우리는 nice level이 늘 일관성을 유지하기를 원한다. 하지만 HZ와 jiffies의 제약, timeslice와 granularity와 연결되어 있는 그들의 좋지 않은 디자인레벨 안에서는 어려워 보였다.

To sum it up: we always wanted to make nice levels more consistent, but within the constraints of HZ and jiffies and their nasty design level coupling to timeslices and granularity it was not really viable.

리눅스의 nice level 지원에 대한 두번째 불만(빈번하진 않지만 여전히 주기적으로 발생하는)은 nice level의 비대칭성이다(위에 그려진 그림에서 볼 수 있다). 아니면 좀 더 정확해지길 원한다. 기본적으로 nice API는 상대적인 것에 비해 nice level의 양상은 절대적으로 nice level 값 자체에 의존적이다.

The second (less frequent but still periodically occurring) complaint about Linux's nice level support was its assymetry around the origo (which you can see demonstrated in the picture above), or more accurately: the fact that nice level behavior depended on the _absolute_ nice level as well, while the nice API itself is fundamentally "relative":

int nice(int inc)
asmlinkage long sys_nice(int increment)

첫번째는 glibc API, 두번째는 syscall API이다. inc는 현재 nice level이 무엇이냐에 따라 상대적이라는것에 주의해야 한다. bash의 nice 커맨드같은 것이 이런 상대적인 API를 그대로 따르고 있다.

(the first one is the glibc API, the second one is the syscall API.) Note that the 'inc' is relative to the current nice level. Tools like bash's "nice" command mirror this relative API.

예전 스케줄러에서는, 예를 들어 nice level이 1인 태스크와 nice level이 2인 다른 태스크가 실행되면 두 태스크간의 cpu 분배는 부모 쉘의 nice level에 의존적이였다. 부모 쉘의 nice level이 -10일 때의 cpu할당량은 nice +5나 nice +10때와는 다르다.

With the old scheduler, if you for example started a niced task with +1 and another task with +2, the CPU split between the two tasks would depend on the nice level of the parent shell - if it was at nice -10 the CPU split was different than if it was at +5 or +10.

리눅스의 nice level 지원에 대한 세번째 불만은 음수의 nice level이 충분히 효과적이지 않다는 것이다. 따라서 많은 사람들이 audio app을 실행할 때 SCHED_FIFO같은 실시간 우선순위를 사용한다. 하지만 이 경우 다른 문제가 생길 수 있다: SCHED_FIFO는 starvation 문제에서 자유롭지 못하고 검증되지 않은 SCHED_FIFO 어플리케이션은 시스템을 lock-up상태로 만들 수 있다.

A third complaint against Linux's nice level support was that negative nice levels were not 'punchy enough', so lots of people had to resort to run audio (and other multimedia) apps under RT priorities such as SCHED_FIFO. But this caused other problems: SCHED_FIFO is not starvation proof, and a buggy SCHED_FIFO app can also lock up the system for good.

리눅스 커널 v2.6.23의 새로운 스케쥴러에서는 이런 세가지 종류의 불만을 해결했다:

The new scheduler in v2.6.23 addresses all three types of complaints:

첫번째 컴플레인(nice level이 충분히 효과적이지 않은 문제)을 해결하기 위해, 스케줄러는 timeslice와 HZ개념과의 연결을 끊었다(그리고 granulariry는 nice level과 구별되는 개념으로 바뀌었다). 따라서 더욱 일관된 nice +19 지원의 구현이 가능해졌다: 기존 스케줄러가 3~5~9%의 범위에서 변화가 있던것과 달리 새로운 스케줄러에서 nice +19 태스크들은 HZ에 상관없이 1.5%를 얻을 수 있다.

To address the first complaint (of nice levels being not "punchy" enough), the scheduler was decoupled from 'time slice' and HZ concepts (and granularity was made a separate concept from nice levels) and thus it was possible to implement better and more consistent nice +19 support: with the new scheduler nice +19 tasks get a HZ-independent 1.5%, instead of the variable 3%-5%-9% range they got in the old scheduler.

두번째 불만(nice level이 일관되지 않다는 문제)을 해결하기 위해, 새로운 스케줄러는 nice(1)가 그들의 절대적인 nice level과 상관없이 태스크들에게 동일한 CPU 사용량 변화를 주도록 만들었다. 따라서 새로운 스케줄러에서는, nice +10과 nice +11로 동작하는 태스크 사이의 CPU사용량 분배와 nice -5 와 nice -4로 동작하는 태스크 사이의 CPU 사용량 분배는 동일하다(하나는 55%의 CPU시간을, 나머지는 45%의 시간을 얻는다).

To address the second complaint (of nice levels not being consistent), the new scheduler makes nice(1) have the same CPU utilization effect on tasks, regardless of their absolute nice levels. So on the new scheduler, running a nice +10 and a nice 11 task has the same CPU utilization "split" between them as running a nice -5 and a nice -4 task. (one will get 55% of the CPU, the other 45%.)

이것을 구현하기 위해 nice level이 multiplicative(or exponential)하게 바꼈다. 이 방식대로라며 어떤 nice level에서 시작하던지 상관없이 결과는 항상 동일하다.

That is why nice levels were changed to be "multiplicative" (or exponential) - that way it does not matter which nice level you start out from, the 'relative result' will always be the same.

세번째 불만(음수의 nice level이 충분히 효과적이지 않고 audio app들이 더욱 위험한 SCHED_FIFO 스케쥴링 정책으로 수행하게 하는 문제)은 새로운 스케줄러에 의해 거의 자동으로 해결되었다. 음수의 nice level을 더욱 강하게 만드는 것은 dynamic range of nice level을 재조정하는것으로 해결된다.

The third complaint (of negative nice levels not being "punchy" enough and forcing audio apps to run under the more dangerous SCHED_FIFO scheduling policy) is addressed by the new scheduler almost automatically: stronger negative nice levels are an automatic side-effect of the recalibrated dynamic range of nice levels.

'커널 번역(기사,문서)' 카테고리의 다른 글

[번역] vm/page_owner.txt (0)	2018.09.05
[번역] lwn - Atomic context and kernel API design (0)	2018.09.04
[번역] vm/slub.txt (0)	2018.08.30
[번역] scheduler/sched-stats.txt (0)	2018.08.03
[번역] scheduler/sched-design-CFS.txt (0)	2018.08.03

kernelmsg

[번역] scheduler/sched-nice-design.txt

'커널 번역(기사,문서)' 카테고리의 다른 글

+ Recent posts

티스토리툴바