Optimize fstab to Improve R/W Performance of Hard-drives
Wow what a difference! If you have read a couple of my other articles you may know about my adventures with file sharing on a home network.
Taking a homebrew approach at setting all this up has proven… interesting.
Not the least of which was when I figured out why syncing was so slow to the RAID on my file server. In one sentence: I needed to change sync
to async
in fstab
. There! Article finished. Just kidding; please keep reading to learn more.
For those who may not know, fstab
is where Linux goes to get mounting options for disks upon start-up. Think of it like a cheat-sheet for those really tough problems on a final exam. With the proper settings your disks will perform well but as I found out, it only takes one parameter to make everything go sideways!
The async
option in fstab
is the opposite of sync
, which is rarely used; async
is the default.
The sync
option means all changes to the filesystem are immediately flushed to disk. For mechanical drives this leads to a huge slow down as the system has to move the disk heads to the right position and wait for the operation to complete.
With async
the system buffers the write operation and optimizes the actual writes; instead of being blocked the process continues to run.
See this post for details.
Wow what a difference that made!
I found it very useful to open both VNC and SSH sessions to debug the remote machine during the below steps.
Grep to find Process
Useful tools for troubleshooting here:
ps fauxww | grep -A 1 ‘[R]SYNC’
I used the above grep command to see if there are any rsync
processes running. Yes at this point it is worth mentioning that I use rsync
to automate copying and related operations between my machines. Some basic copying from one machine to another will also work during troubleshooting but I find that it is easier in this instance to debug when I have a separate process running to deal with copying. Think of it like trying to run a race while figuring out what is going wrong with your gait, I do not know about the rest of you but it is probably easier to watch someone else running and see how they do instead of trying to watch and critique yourself.
In short, if everything is going well with your automated copying and syncing operations there should be multiple rsync
processes, as per this post.
Disk Monitoring
Another useful tool when figuring out read/write transfer speeds:
iotop
The iostat
and iotop
commands (run as SuperUser
) help monitor system input/output device loading by observing the time the devices are active in relation to their average transfer rates. They are sometimes used to evaluate the balance of activity between disks as per this post.
Finding a Folder
Okay so this is not strictly related to the problem at hand — call it a close cousin to all of the other troubleshooting bits and pieces. Say what you will but I find it helpful to be able to quickly locate and confirm various items in sub-folders particularly when my copying operations via rsync
have a delete feature each time they are run. To find a folder named Documents
in your home
directory, run:
find $HOME -type d -name 'Documents'
OR
find ~ -type d -name 'Documents'
OR
find /home/<user>/ -type d -name 'Documents'
See here for details.
Kill Process
There are a number of ways to kill a process if you know the name of it. Again, not strictly close-holed to the problem at hand but I did find it helpful while I was going through my other troubleshooting steps. Here are a couple different ways you can accomplish this. We are going to assume that the process we are trying to kill is named irssi:
kill $(pgrep irssi)
killall -v irssi
pkill irssi
kill `ps -ef | grep irssi | grep -v grep | awk ‘{print $2}’`
Refer here for details.
Kernel Tracing
Back to something a tad more relavent to our problem at hand: the utilities for kernel tracing are very useful when hunting down processes and services that are consuming CPU and IO!
In my quest to find what was causing such slow read/write speeds I posed a number of questions in an effort of process of elimination.
Was it the stripe size for the RAID array that was causing the slow-down? Nope; this did not have an effect.
What about the commit time? Default should be five seconds before jbd2
is run.
Still a lot of excessive flushing after specifying commit=10 (10 seconds) in fstab!
This was the post that ultimately got me pointed in the right direction!
In order to capture events you need to be logged in as SuperUser, see here for details:
sudo -s
Set events for all services beginning with jbd2
:
echo jbd2:* > /sys/kernel/debug/tracing/set_event
Cancel all event tracing:
echo > sudo /sys/kernel/debug/tracing/set_event
Set ext4_sync_file_enter
event:
echo 1 > /sys/kernel/debug/tracing/events/ext4/ext4_sync_file_enter/enable
Send trace output to a non-SuperUser log file:
cat /sys/kernel/debug/tracing/trace > /home/<user>/out.txt
or
cat /sys/kernel/debug/tracing/trace_pipe > /home/<user>/out.txt
Taking a closer look at what was captured with the ext4_sync_file_enter
event, I determined that there was an absolute crap-ton of flushing occuring on a regular basis. Bingo! This was my culprit. The excessive flushing was due to the sync
setting for my RAID in the fstab
file. While possibly good in some situations, it was preferable (at least for me) to change it to async
. Any downside of choosing this option from my perspective was far outweighed by the fact that my disk I/O improved by a significant margin. All that being said you will want to do a bit more reading before deciding if this option is right for you, too.
Lots of troubleshooting with different Linux services and utilities but it was worth it; rsync
from one machine to another was a couple kB, now clocking in at 60MB!
Postscript
In my search for possible causes I pondered disk fragmentation. Turns out ext4
filesystem should not need to be defragmented, see this post for details. This whole article was admittedly a bit fragmented itself as there were a number of un-related services and utilities brought to bear to get to the root of my disk I/O problem. What experiences have you had in making your disks read and write faster?