Purpose

In this lab, you will explore how different file access patterns and block sizes affect performance. The goal is to understand how the operating system handles I/O under the hood, and how sequential versus random access, as well as block size, impact the time it takes to read data from a file. You will implement a simple benchmark in C, measure performance, and analyze the results.

Setup

Once you have logged in to Edlab, you can clone this repo using

git clone <https://github.com/umass-cs-377/file-io-performance-lab.git>

Then you can use cd to open the directory you just cloned:

cd file-io-performance-lab

Step 1: Create a (huge) file

You will begin by creating a large test file. On your terminal, use the following command to generate a 100 MB binary file filled with random data:

dd if=/dev/urandom of=testfile.bin bs=1M count=100

This will generate a file named testfile.bin in your work directory. Use ls -lh to make sure the file is created and the size is correct. Don’t forget to delete it after you finish this lab.

Step 2: Reads with Different Block Sizes

In this step, we will do a quick experiment and see how the block size you use to read a file changes performance.

But what is block size? The block size **here means how many bytes your program reads from the file at once. When you call read(fd, buf, block_size), the operating system copies that many bytes from the disk into memory before returning.

Before you start, let’s make a prediction:

Let’s write it down in a table like this:

Block Size (bytes) Predicted Speed (Fast/Slow) Actual Time (ms) Observation
1,024
4,096
8192
16,384

Now, test your prediction. Run your program in sequential mode (we will tell you later what this is) several times with different block sizes:

./benchmark testfile.bin 1024 sequential
./benchmark testfile.bin 4096 sequential
./benchmark testfile.bin 8192 sequential
./benchmark testfile.bin 16384 sequential