In this lab, you will explore how different file access patterns and block sizes affect performance. The goal is to understand how the operating system handles I/O under the hood, and how sequential versus random access, as well as block size, impact the time it takes to read data from a file. You will implement a simple benchmark in C, measure performance, and analyze the results.
Once you have logged in to Edlab, you can clone this repo using
git clone <https://github.com/umass-cs-377/file-io-performance-lab.git>
Then you can use cd to open the directory you just cloned:
cd file-io-performance-lab
You will begin by creating a large test file. On your terminal, use the following command to generate a 100 MB binary file filled with random data:
dd if=/dev/urandom of=testfile.bin bs=1M count=100
This will generate a file named testfile.bin in your work directory. Use ls -lh to make sure the file is created and the size is correct. Don’t forget to delete it after you finish this lab.
In this step, we will do a quick experiment and see how the block size you use to read a file changes performance.
But what is block size? The block size **here means how many bytes your program reads from the file at once. When you call read(fd, buf, block_size), the operating system copies that many bytes from the disk into memory before returning.
Before you start, let’s make a prediction:
Let’s write it down in a table like this:
| Block Size (bytes) | Predicted Speed (Fast/Slow) | Actual Time (ms) | Observation |
|---|---|---|---|
| 1,024 | |||
| 4,096 | |||
| 8192 | |||
| 16,384 |
Now, test your prediction. Run your program in sequential mode (we will tell you later what this is) several times with different block sizes:
./benchmark testfile.bin 1024 sequential
./benchmark testfile.bin 4096 sequential
./benchmark testfile.bin 8192 sequential
./benchmark testfile.bin 16384 sequential