Kernel/make compilation time and how to improve compile times/compile the Linux kernel faster without hardware upgrades

I thought only a faster CPU and SSD would help but I already have a Quad-Core CPU and it wasn't being maxed out.  The actual tests were performed on an AMD-V enabled 128MB dual core VMWare container though.

There is a flag that can be passed to make in order to start multiple threads, by specifying 4 threads I was able to reduce the whole kernel compilation time from scratch by about 50%! (65minutes vs 31minutes!).  *Yes I did do a make clean before each compilation too!

*Part of the slow kernel time is that I use the slow method of making my own initramfs (not pre-compressed and the kernel compile takes like 10x longer for the same thing I could do with a script which I normally do).

Normal Make (single thread by default):

make

real    65m18.956s

Threaded Make (4 threads):

make -j 4

real    31m57.877s

second run:

real    27m28.745s
 

Threaded Make (8 threads):

*I believe the worse result is likely due to swapping since I only had 128MB of RAM.  Perhaps a lot more RAM could improve things too.

real    58m29.142s
user    33m3.616s
sys    19m13.064s

By increasing RAM to 512MB here are the results (when compiling RAM is more important than CPU and disk speed):

real    18m46.933s
user    20m11.776s
sys    6m51.334s

With 1GB of RAM

real    18m38.608s
user    20m31.857s
sys    7m46.141s

I believe the time was disappointing because of the initramfs creation.

With pre-created initramfs linked into kernel:

real    10m47.362s
user    18m24.837s
sys    1m48.095s
 

With 12 threads:

real    10m34.550s

Clearly the threads no longer help once the CPU is maxed out, I didn't check but considering with 8 threads that I was often at 80-90% CPU, now the CPU is the bottleneck.  I'm going to increase my cores to 4 and try again.

It only shaved off 13 seconds, but the crazy thing is that initramfs takes 8 minutes to create alone!  That's how inefficient the routine from the kernel is.  The same initramfs is created through a script in about 1 minute or less!


 

Snapshot of top with 8 threads showing high iowait:

You can really see iowait is starting to become a factor (40-60% on both cores on average).  I'm already running a RAID 1 with 7200 RPM 1TB drives.  I believe SSD would make a huge improvement with the iowait.  The CPU io often hits 70-80% but I believe the main culprit is the high iowait.  The system with 8 threads is quite unresponsive to even shell commands and typing.

11:24:56 up 54 days,  1:29,  5 users,  load average: 11.80, 10.36, 6.08
 

top - 11:22:12 up 54 days,  1:27,  5 users,  load average: 11.79, 9.53, 5.04
Tasks: 121 total,   6 running,  97 sleeping,  18 stopped,   0 zombie
 Cpu0 : 18.1% us, 13.3% sy,  0.0% ni,  0.0% id, 67.7% wa,  0.0% hi,  0.9% si
 Cpu1 : 13.3% us, 19.5% sy,  0.0% ni,  0.0% id, 53.6% wa,  1.1% hi, 12.4% si
Mem:    126980k total,   116308k used,    10672k free,      788k buffers
Swap:   377488k total,   107036k used,   270452k free,     5180k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                        
 7970 root      18   0 17596  14m 2976 D 18.2 11.5   0:00.75 cc1                                                                                            
 7932 root      18   0 19220  13m 1612 D 17.8 10.9   0:00.94 cc1                                                                                            
 7923 root      18   0 24152  13m 3140 R 13.4 11.0   0:01.37 cc1                                                                                            
 7938 root      18   0 19448  12m 1612 R 10.8 10.4   0:00.83 cc1                                                                                            
   14 root      10  -5     0    0    0 S  8.0  0.0   5:17.26 kblockd/1                                                                                      
 7687 root      18   0 36760  13m 3036 R  7.6 10.7   0:05.45 cc1                                                                                            
 7905 root      18   0 21188  14m 3064 D  7.0 11.5   0:00.64 cc1                                                                                            
  118 root      10  -5     0    0    0 D  3.5  0.0  15:23.71 kswapd0                                                                                        
 7915 root      18   0 27164  17m 1616 D  2.9 14.1   0:00.69 cc1                 

With 512MB of RAM instead of 128MB

real    18m46.933s
user    20m11.776s
sys    6m51.334s
 

Things don't fee lagged at all on the system unlike last time when it had 128MB of RAM.

The load is lower and iowait is virtually non-existent.

 03:52:52 up 4 min,  2 users,  load average: 10.55, 5.14, 2.02

 04:02:46 up 14 min,  2 users,  load average: 1.85, 5.12, 4.13

top - 03:53:11 up 5 min,  2 users,  load average: 10.30, 5.43, 2.18
Tasks:  93 total,  12 running,  81 sleeping,   0 stopped,   0 zombie
 Cpu0 : 90.0% us,  9.7% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.3% hi,  0.0% si
 Cpu1 : 89.3% us, 10.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.7% si
Mem:    516820k total,   431108k used,    85712k free,   176120k buffers
Swap:   377488k total,        0k used,   377488k free,   111164k cached

top - 04:02:42 up 14 min,  2 users,  load average: 1.84, 5.18, 4.14
Tasks:  55 total,   2 running,  53 sleeping,   0 stopped,   0 zombie
 Cpu0 : 10.0% us, 40.2% sy,  0.0% ni, 49.8% id,  0.0% wa,  0.0% hi,  0.0% si
 Cpu1 : 11.6% us, 37.9% sy,  0.0% ni, 49.5% id,  1.0% wa,  0.0% hi,  0.0% si
 

Free Memory gets low sometimes:

            total       used       free     shared    buffers     cached
Mem:           504        486         18          0          4        427
 

With 1GB

Clearly 1GB is the sweet spot, I'm tempted to turn the threads up from 8 to at least 12 or 16.

We can also see that CPU usage gets higher, so it is a factor and that iowait when compiling is usually caused by swapping because of too little RAM.

 Cpu0 : 92.0% us,  8.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
 Cpu1 : 93.4% us,  6.6% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si

total       used       free     shared    buffers     cached
Mem:          1012        408        603          0         37        291

 

 04:24:24 up 13 min,  2 users,  load average: 2.43, 6.33, 4.47

 04:33:13 up 21 min,  2 users,  load average: 10.58, 7.00, 5.06

Even with the high load the system is very responsive, unlike at 128MB of RAM

Free Mem does get low still:

             total       used       free     shared    buffers     cached
Mem:          1012        986         26          0          7        900
-/+ buffers/cache:         77        934
Swap:          368          0        368 

*Note that if you specify -j with no number it opens unlimited threads, it basically causes gcc to crash/fail in my experience.  Perhaps with more memory this wouldn't have happened, I'm not sure what caused it for sure other than my system being unable to handle unlimited threads.

You'll get errors like this if specifying unlimited threads:


  CC      kernel/time/timekeeping.o
gcc: gcc: Internal error: Killed (program cc1)
Please submit a full bug report.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
For Debian GNU/Linux specific bug reporting instructions, see
<URL:file:///usr/share/doc/gcc-3.4/README.Bugs>.

Internal error: Killed (program cc1)
Please submit a full bug report.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
For Debian GNU/Linux specific bug reporting instructions, see
<URL:file:///usr/share/doc/gcc-3.4/README.Bugs>.

make[2]: make[1]: *** [arch/x86/kernel/setup.o] Killed
*** [fs/file_table.o] Killed
gcc: gcc: make[2]: *** [arch/x86/kernel/x86_init.o] Killed
make[1]: *** [fs/super.o] Killed
gcc: gcc: Internal error: Killed (program cc1)
Please submit a full bug report.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
For Debian GNU/Linux specific bug reporting instructions, see
<URL:file:///usr/share/doc/gcc-3.4/README.Bugs>.
Internal error: Killed (program cc1)
Please submit a full bug report.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
For Debian GNU/Linux specific bug reporting instructions, see
<URL:file:///usr/share/doc/gcc-3.4/README.Bugs>.
Internal error: Killed (program cc1)
Please submit a full bug report.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
For Debian GNU/Linux specific bug reporting instructions, see
<URL:file:///usr/share/doc/gcc-3.4/README.Bugs>.
Internal error: Killed (program cc1)
Please submit a full bug report.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
For Debian GNU/Linux specific bug reporting instructions, see
<URL:file:///usr/share/doc/gcc-3.4/README.Bugs>.
make[2]:

 

Conclusion/What I Learned

The single most important factor for faster compiling is RAM and 1GB+ is preferable.  The first wall I hit was high iowait due to insufficient RAM and swapping.  With more RAM the iowait virtually disappears and you can see the CPU's getting loaded to 80-90%.

Basically using make with more threads, decreases the compile time exponentially, but only so long as you have enough RAM to support those threads, and the next bottleneck will become CPU processing power.  The key is to add more cores at that point. Having a Quad Core or even 6 core with lots of RAM would give you the best performance and faster compliing.  I feel disk speed is of little impact when compiling so SSD wouldn't make much of a difference.


Tags:

kernel, compilation, improve, compile, linux, hardware, upgradesi, cpu, ssd, quad, wasn, maxed, performed, amd, enabled, mb, dual, vmware, container, multiple, threads, specifying, reduce, vs, method, initramfs, compressed, default, threaded, swapping, ram, user, sys, increasing, compiling, disk, gb, disappointing, creation, linked, didn, bottleneck, cores, shaved, inefficient, snapshot, iowait, factor, raid, rpm, tb, improvement, io, culprit, unresponsive, shell, commands, typing, users, tasks, sy, ni, wa, mem, buffers, swap, cached, pid, pr, virt, res, shr, cc, kblockd, kswapd, fee, lagged, virtually, existent, min, tempted, usage, responsive, cache, specify, unlimited, gcc, wouldn, unable, ll, errors, timekeeping, submit, url, http, gnu, org, html, debian, reporting, usr, readme, fs, file_table, _init, preferable, insufficient, disappears, decreases, exponentially, processing, compliing,

Latest Articles

  • FreePBX 17 How To Add a Trunk
  • Docker Container Onboot Policy - How to make sure a container is always running
  • FreePBX 17 How To Add Phones / Extensions and Register
  • Warning: The driver descriptor says the physical block size is 2048 bytes, but Linux says it is 512 bytes. solution
  • Cisco How To Use a Third Party SIP Phone (eg. Avaya, 3CX)
  • Cisco Unified Communication Manager (CUCM) - How To Add Phones
  • pptp / pptpd not working in DD-WRT iptables / router
  • systemd-journald high memory usage solution
  • How to Install FreePBX 17 in Linux Debian Ubuntu Mint Guide
  • How To Install Cisco's CUCM (Cisco Unified Communication Manager) 12 Guide
  • Linux Ubuntu Redhat How To Extract Images from PDF
  • Linux and Windows Dual Boot Issue NIC Won't work After Booting Windows
  • Cisco CME How To Enable ACD hunt groups
  • How to install gns3 on Linux Ubuntu Mint
  • How to convert audio for Asterisk .wav format
  • Using Cisco CME Router with Asterisk as a dial-peer
  • Cisco CME How To Configure SIP Trunk VOIP
  • Virtualbox host Only Network Error Failed to save host network interface parameter - Cannot change gateway IP of host only network
  • Cisco CME and C7200 Router Testing and Learning Environment on Ubuntu 20+ Setup Tutorial Guide
  • Abusive IP ranges blacklist