I thought only a faster CPU and SSD would help but I already have a Quad-Core CPU and it wasn't being maxed out. The actual tests were performed on an AMD-V enabled 128MB dual core VMWare container though.
There is a flag that can be passed to make in order to start multiple threads, by specifying 4 threads I was able to reduce the whole kernel compilation time from scratch by about 50%! (65minutes vs 31minutes!). *Yes I did do a make clean before each compilation too!
*Part of the slow kernel time is that I use the slow method of making my own initramfs (not pre-compressed and the kernel compile takes like 10x longer for the same thing I could do with a script which I normally do).
Normal Make (single thread by default):
make
real 65m18.956s
Threaded Make (4 threads):
make -j 4
real 31m57.877s
second run:
real 27m28.745s
Threaded Make (8 threads):
*I believe the worse result is likely due to swapping since I only had 128MB of RAM. Perhaps a lot more RAM could improve things too.
real 58m29.142s
user 33m3.616s
sys 19m13.064s
By increasing RAM to 512MB here are the results (when compiling RAM is more important than CPU and disk speed):
real 18m46.933s
user 20m11.776s
sys 6m51.334s
With 1GB of RAM
real 18m38.608s
user 20m31.857s
sys 7m46.141s
I believe the time was disappointing because of the initramfs creation.
With pre-created initramfs linked into kernel:
real 10m47.362s
user 18m24.837s
sys 1m48.095s
With 12 threads:
real 10m34.550s
Clearly the threads no longer help once the CPU is maxed out, I didn't check but considering with 8 threads that I was often at 80-90% CPU, now the CPU is the bottleneck. I'm going to increase my cores to 4 and try again.
It only shaved off 13 seconds, but the crazy thing is that initramfs takes 8 minutes to create alone! That's how inefficient the routine from the kernel is. The same initramfs is created through a script in about 1 minute or less!
Snapshot of top with 8 threads showing high iowait:
You can really see iowait is starting to become a factor (40-60% on both cores on average). I'm already running a RAID 1 with 7200 RPM 1TB drives. I believe SSD would make a huge improvement with the iowait. The CPU io often hits 70-80% but I believe the main culprit is the high iowait. The system with 8 threads is quite unresponsive to even shell commands and typing.
11:24:56 up 54 days, 1:29, 5 users, load average: 11.80, 10.36, 6.08
top - 11:22:12 up 54 days, 1:27, 5 users, load average: 11.79, 9.53, 5.04
Tasks: 121 total, 6 running, 97 sleeping, 18 stopped, 0 zombie
Cpu0 : 18.1% us, 13.3% sy, 0.0% ni, 0.0% id, 67.7% wa, 0.0% hi, 0.9% si
Cpu1 : 13.3% us, 19.5% sy, 0.0% ni, 0.0% id, 53.6% wa, 1.1% hi, 12.4% si
Mem: 126980k total, 116308k used, 10672k free, 788k buffers
Swap: 377488k total, 107036k used, 270452k free, 5180k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7970 root 18 0 17596 14m 2976 D 18.2 11.5 0:00.75 cc1
7932 root 18 0 19220 13m 1612 D 17.8 10.9 0:00.94 cc1
7923 root 18 0 24152 13m 3140 R 13.4 11.0 0:01.37 cc1
7938 root 18 0 19448 12m 1612 R 10.8 10.4 0:00.83 cc1
14 root 10 -5 0 0 0 S 8.0 0.0 5:17.26 kblockd/1
7687 root 18 0 36760 13m 3036 R 7.6 10.7 0:05.45 cc1
7905 root 18 0 21188 14m 3064 D 7.0 11.5 0:00.64 cc1
118 root 10 -5 0 0 0 D 3.5 0.0 15:23.71 kswapd0
7915 root 18 0 27164 17m 1616 D 2.9 14.1 0:00.69 cc1
With 512MB of RAM instead of 128MB
real 18m46.933s
user 20m11.776s
sys 6m51.334s
Things don't fee lagged at all on the system unlike last time when it had 128MB of RAM.
The load is lower and iowait is virtually non-existent.
03:52:52 up 4 min, 2 users, load average: 10.55, 5.14, 2.02
04:02:46 up 14 min, 2 users, load average: 1.85, 5.12, 4.13
top - 03:53:11 up 5 min, 2 users, load average: 10.30, 5.43, 2.18
Tasks: 93 total, 12 running, 81 sleeping, 0 stopped, 0 zombie
Cpu0 : 90.0% us, 9.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.3% hi, 0.0% si
Cpu1 : 89.3% us, 10.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.7% si
Mem: 516820k total, 431108k used, 85712k free, 176120k buffers
Swap: 377488k total, 0k used, 377488k free, 111164k cached
top - 04:02:42 up 14 min, 2 users, load average: 1.84, 5.18, 4.14
Tasks: 55 total, 2 running, 53 sleeping, 0 stopped, 0 zombie
Cpu0 : 10.0% us, 40.2% sy, 0.0% ni, 49.8% id, 0.0% wa, 0.0% hi, 0.0% si
Cpu1 : 11.6% us, 37.9% sy, 0.0% ni, 49.5% id, 1.0% wa, 0.0% hi, 0.0% si
Free Memory gets low sometimes:
total used free shared buffers cached
Mem: 504 486 18 0 4 427
With 1GB
Clearly 1GB is the sweet spot, I'm tempted to turn the threads up from 8 to at least 12 or 16.
We can also see that CPU usage gets higher, so it is a factor and that iowait when compiling is usually caused by swapping because of too little RAM.
Cpu0 : 92.0% us, 8.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si
Cpu1 : 93.4% us, 6.6% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si
total used free shared buffers cached
Mem: 1012 408 603 0 37 291
04:24:24 up 13 min, 2 users, load average: 2.43, 6.33, 4.47
04:33:13 up 21 min, 2 users, load average: 10.58, 7.00, 5.06
Even with the high load the system is very responsive, unlike at 128MB of RAM
Free Mem does get low still:
total used free shared buffers cached
Mem: 1012 986 26 0 7 900
-/+ buffers/cache: 77 934
Swap: 368 0 368
*Note that if you specify -j with no number it opens unlimited threads, it basically causes gcc to crash/fail in my experience. Perhaps with more memory this wouldn't have happened, I'm not sure what caused it for sure other than my system being unable to handle unlimited threads.
You'll get errors like this if specifying unlimited threads:
CC kernel/time/timekeeping.o
gcc: gcc: Internal error: Killed (program cc1)
Please submit a full bug report.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
For Debian GNU/Linux specific bug reporting instructions, see
<URL:file:///usr/share/doc/gcc-3.4/README.Bugs>.
Internal error: Killed (program cc1)
Please submit a full bug report.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
For Debian GNU/Linux specific bug reporting instructions, see
<URL:file:///usr/share/doc/gcc-3.4/README.Bugs>.
make[2]: make[1]: *** [arch/x86/kernel/setup.o] Killed
*** [fs/file_table.o] Killed
gcc: gcc: make[2]: *** [arch/x86/kernel/x86_init.o] Killed
make[1]: *** [fs/super.o] Killed
gcc: gcc: Internal error: Killed (program cc1)
Please submit a full bug report.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
For Debian GNU/Linux specific bug reporting instructions, see
<URL:file:///usr/share/doc/gcc-3.4/README.Bugs>.
Internal error: Killed (program cc1)
Please submit a full bug report.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
For Debian GNU/Linux specific bug reporting instructions, see
<URL:file:///usr/share/doc/gcc-3.4/README.Bugs>.
Internal error: Killed (program cc1)
Please submit a full bug report.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
For Debian GNU/Linux specific bug reporting instructions, see
<URL:file:///usr/share/doc/gcc-3.4/README.Bugs>.
Internal error: Killed (program cc1)
Please submit a full bug report.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
For Debian GNU/Linux specific bug reporting instructions, see
<URL:file:///usr/share/doc/gcc-3.4/README.Bugs>.
make[2]:
The single most important factor for faster compiling is RAM and 1GB+ is preferable. The first wall I hit was high iowait due to insufficient RAM and swapping. With more RAM the iowait virtually disappears and you can see the CPU's getting loaded to 80-90%.
Basically using make with more threads, decreases the compile time exponentially, but only so long as you have enough RAM to support those threads, and the next bottleneck will become CPU processing power. The key is to add more cores at that point. Having a Quad Core or even 6 core with lots of RAM would give you the best performance and faster compliing. I feel disk speed is of little impact when compiling so SSD wouldn't make much of a difference.
kernel, compilation, improve, compile, linux, hardware, upgradesi, cpu, ssd, quad, wasn, maxed, performed, amd, enabled, mb, dual, vmware, container, multiple, threads, specifying, reduce, vs, method, initramfs, compressed, default, threaded, swapping, ram, user, sys, increasing, compiling, disk, gb, disappointing, creation, linked, didn, bottleneck, cores, shaved, inefficient, snapshot, iowait, factor, raid, rpm, tb, improvement, io, culprit, unresponsive, shell, commands, typing, users, tasks, sy, ni, wa, mem, buffers, swap, cached, pid, pr, virt, res, shr, cc, kblockd, kswapd, fee, lagged, virtually, existent, min, tempted, usage, responsive, cache, specify, unlimited, gcc, wouldn, unable, ll, errors, timekeeping, submit, url, http, gnu, org, html, debian, reporting, usr, readme, fs, file_table, _init, preferable, insufficient, disappears, decreases, exponentially, processing, compliing,