Get process's progress when reading a file
Hi! Lately in my day job I've been using a Blender plugin to read PC2 animations. Animations of about 8Gb in size. On the current state of this plugin this blocks the whole GUI of Blender and can take up to ~10 minutes to load the whole file. Sometimes I could work on something else in parallel, but one time I've been stuck waiting for this to load though "how difficult can it be to find out how far into reading the file is the process?".
Turns out, thanks to /proc it is not very difficult to know from outside a process how much that process has read into a given file.
A simple test
But first, let us prepare a minimalistic sample, a script that will start reading a large file, and will take some time to do it. This script will create a sparse file TEST_FILE
and read it slowly (a sparse file is one that is smart enough to not ignore big gaps in internal contents, and so can take little disk space even if its much bigger):
#!/usr/bin/env python3
import os
import time
os.system("truncate -s 200k TEST_FILE") # Create a middle-sized, empty file
# Reading file, slowly
with open("TEST_FILE", "rb") as f:
while True:
content = f.read(1024)
if len(content) == 0:
break
print(".", end="", flush=True) # Give some feedback on the progress
time.sleep(1) # We are not in a hurry
os.unlink("TEST_FILE") # Cleanup after ourselves
Let's call it read-slowly.py
. If we run it it will take ~200 seconds to read the file.
File reading progress
So, what's the trick to know how much has the process read on the file?
Turns out, as you can see with man 5 proc, the /proc
filesystem exposes the files opened by a process on /proc/[pid]/fd/
, and the position that a process is at on that file, on /proc/[pid]/fdinfo/
.
The trick, then, is easy. Find the File Descriptor for that file on /proc/[pid]/fd
and look up it's posiiton on /proc/[pid]/fdinfo
. This script (let's call it read-fd-progress.sh) does both:
#!/usr/bin/env bash
# Check that we have at least a PID
if [ -z "$1" ];then
echo "$0 <PID> [<FD>]"
exit 1
fi
set -eu # Bail on any error
# See: http://redsymbol.net/articles/unofficial-bash-strict-mode/
PID=$1 # The PID is taken as the first parameter
FD=${2:-} # Take FD if we have second parameter, else consider it empty
if [ -z "$FD" ];then
# Show the user the available file descriptors
echo "Select a file descriptor:"
for i in /proc/$PID/fd/*;do
printf " %s: " $(basename $i)
readlink $i
done
read -p "FD: " FD
fi
FSIZE=$(stat -L /proc/$PID/fd/$FD --printf=%s) # Read full file size
while [ 1 ];do
# Stop if the process has finished reading the file
if [ ! -f /proc/$PID/fdinfo/$FD ];then
break
fi
# Read position on file
x=$(cat "/proc/$PID/fdinfo/$FD"|grep pos: |cut -d: -f2|tr -d '\t ')
# Convert that position into a % of the file size.
#
# This is not the interesting part, as it's just some hack to
# have fixed-point like numbers in the shell. But the trick
# is to have a per-10-thousands, and when printing split it
# in the integer per-100 and the decimal per-100.
# ... does that make sense?
PER10000=$(( $x * 10000 / $FSIZE ))
if [ $PER10000 -le 100 ];then
# Less than 1%
printf " 0.%02i%%\n" $(( $PER10000 ));
else
# More than 1%
printf "%3i.%02i%%\n" $(( $PER10000 / 100 )) $(( $PER10000 % 100 ));
fi
# Wait for the next loop
sleep 1
done
Now, if we call python3 read-slowly.py
on a terminal, and bash read-fd-progress.sh
on other, we can find out the progress of the reading process:
~ - > bash read-fd-progress.sh `pgrep -f read-slowly`
Select a file descriptor:
0: /dev/pts/4
1: /dev/pts/4
2: /dev/pts/4
3: /home/kenkeiras/TEST_FILE
325: /dev/urandom
FD: 3
2.00%
4.00%
4.00%
4.00%
4.00%
6.00%
...
As a final note: you might notice that while python reads 0.5% of the file each second, bash is only "batch" movements of 2% every 4 seconds. I suspect this is due to some internal caching in python that is appreciable is the small file sizes of this test.
Hope you learned something!