{"componentChunkName":"component---src-templates-post-template-js","path":"/posts/operatingsystems/file-systems","result":{"data":{"markdownRemark":{"id":"9bd1dd73-3fbb-55d1-82b7-d3565c4260a6","html":"<h1 id=\"what-does-a-storage-system-need\" style=\"position:relative;\"><a href=\"#what-does-a-storage-system-need\" aria-label=\"what does a storage system need permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>What does a storage system need?</h1>\n<ol>\n<li><strong>Reliability:</strong> User data should be safely stored even if a machine’s power is turned off or the OS crashes.</li>\n<li><strong>Large Capacity and Low Cost</strong> It takes 350 MB for an hour of music, 1 GB for 300 photos, and 4GB to store an hour long video. Many individuals own 1 TB or more of storage for personal files.</li>\n<li><strong>High performance:</strong> Programs must have quick access to data to satiate users(ie computer starting up, youtube serving video, amazon processing orders)</li>\n<li><strong>Named data:</strong> Data must be organized for future retrieval, the easiest way is to name the file.</li>\n<li><strong>Controlled Sharing:</strong> Users want to share stored data, but only with the right people.</li>\n</ol>\n<h2 id=\"how-does-a-file-system-help\" style=\"position:relative;\"><a href=\"#how-does-a-file-system-help\" aria-label=\"how does a file system help permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>How does a file system help?</h2>\n<p>Non-volatile storage is much slower than DRAM, also, access must be done in coarse grained units, 512 bytes or more at a time.</p>\n<ul>\n<li>\n<p>Goal: High performance</p>\n<ul>\n<li>Physical Characteristic: Large cost to initiate I/O access</li>\n<li>Organize data placement with files, directories, the free space bitmap(free map), and placement heuristics so that storage is accessed in large sequential units</li>\n<li>Use Caching to avoid accessing persistent storage</li>\n</ul>\n</li>\n<li>\n<p>Goal: Named data</p>\n<ul>\n<li>Physical Characteristic: Storage has large capacity, survive crashes, and is shared across programs</li>\n<li>Support files and directories with meaningful names</li>\n</ul>\n</li>\n<li>\n<p>Goal: Controlled Sharing</p>\n<ul>\n<li>Physical Characteristic: Device stores many users’ data</li>\n<li>Include access control metadata with files</li>\n</ul>\n</li>\n<li>\n<p>Goal: Reliable Storage</p>\n<ul>\n<li>Physical Characteristic: Crashes can occur during updates. Storage devices can fail. Flash memory cells can wear out</li>\n<li>Use transactions to make a set of updates atomic</li>\n<li>Use redundancy to detect and correct failures</li>\n<li>Move data to different storage locations to evenly wear out the disk</li>\n</ul>\n</li>\n</ul>\n<h2 id=\"why-is-it-important-to-know-how-file-systems-work-even-if-im-not-building-a-file-system\" style=\"position:relative;\"><a href=\"#why-is-it-important-to-know-how-file-systems-work-even-if-im-not-building-a-file-system\" aria-label=\"why is it important to know how file systems work even if im not building a file system permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Why is it important to know how file systems work even if I’m not building a file system?</h2>\n<ul>\n<li><strong>Performance:</strong> Even though file systems allow existing bytes in a file to be overwritten, inserting new bytes may require rewriting the entire file. Thus, autosaving may take as much as a second.</li>\n<li><strong>Corrupt Files</strong> When overwriting the existing file with updated data, an untimely crash can leave the file in an inconsistent state, containing a combination of old and new versions.</li>\n<li><strong>Lost files</strong> If instead of overwriting the document file, the applications writes to a new file, then deletes the original file, then moves the new file to the original file location, an untimely crash can leave the system with no copies of the document at all.</li>\n</ul>\n<h1 id=\"the-file-system-abstraction\" style=\"position:relative;\"><a href=\"#the-file-system-abstraction\" aria-label=\"the file system abstraction permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>The File System Abstraction</h1>\n<p>An OS abstraction that provides persistent named data.</p>\n<ul>\n<li><strong>File:</strong> named collection of data in a file system. It is an arbitrary size, consisting of metadata and data. From the POV of the file system, a file’s data is just an array of untyped bytes.</li>\n<li><strong>Directory:</strong> Provides names for files. Contains a list of human readable names and a mapping from each name to a specific file or directory</li>\n<li><strong>Hard link:</strong> Mapping between a name and the underlying file</li>\n<li><strong>Soft link:</strong> Mapping from a file name to another file name. Unfortunately, this can cause dangling links: ie you link /a to point to /b which points to hi.txt then you unlink /b, /a will dangle)</li>\n<li><strong>Volume:</strong> A collection of physical storage resources that form a logical storage device. This is usually an abstraction that corresponds to a logical disk.</li>\n<li><strong>Mounting:</strong> Create a mapping from some path in the existing file system to the root directory of the mounted volume’s file path. Used when plugging in USB drive to your computer</li>\n</ul>\n<h1 id=\"the-file-system-api\" style=\"position:relative;\"><a href=\"#the-file-system-api\" aria-label=\"the file system api permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>The File System API</h1>\n<h2 id=\"creating-and-deleting-files\" style=\"position:relative;\"><a href=\"#creating-and-deleting-files\" aria-label=\"creating and deleting files permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Creating and Deleting Files</h2>\n<ul>\n<li><code class=\"language-text\">Create()</code> creates a new file that has initial metadata but no other data and it creates a name for the file in a directory</li>\n<li><code class=\"language-text\">Link()</code> creates a hard link(new path name for existing file). After calling link(), there should be multiple path names that refer to the same underlying file. You cannot call link() on a directory</li>\n<li><code class=\"language-text\">Unlink()</code> removes a file from its directory. If there are multiple links to a file, unlink() removes the specified name. If there is only one link to a file, unlink() also deletes the underlying file and frees the resources.</li>\n<li><code class=\"language-text\">mkdir()</code> and <code class=\"language-text\">rmdir()</code> creates and deletes directories.</li>\n</ul>\n<h2 id=\"open-and-close\" style=\"position:relative;\"><a href=\"#open-and-close\" aria-label=\"open and close permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Open and close</h2>\n<ul>\n<li><code class=\"language-text\">open()</code> a process calls open() to get a file descriptor it can use to refer to the open file.</li>\n<li><code class=\"language-text\">close()</code> releases the open file record in the OS</li>\n</ul>\n<h2 id=\"file-access\" style=\"position:relative;\"><a href=\"#file-access\" aria-label=\"file access permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>File access</h2>\n<ul>\n<li><code class=\"language-text\">read()</code> starts from the process’s current file position and advances it by the number of bytes successfully read or written. Reads bytes</li>\n<li><code class=\"language-text\">write()</code>starts from the process’s current file position and advances it by the number of bytes successfully read or written. Writes bytes</li>\n<li><code class=\"language-text\">seek()</code> changes a process’ current position for a specified open file</li>\n<li><code class=\"language-text\">mmap()</code> establish a mapping between a region of the process’s virtual memory and some region of the file so that memory loads and stores to that virtual memory region will use the kernel’s file cache or triggers a Page fault exception which causes the kernel to fetch the desire page from the file system to memory</li>\n<li><code class=\"language-text\">munmap()</code> remove mapping created by mmap()</li>\n<li><code class=\"language-text\">fsync()</code> Ensures all pending updates for a file are written to persistent storage before the call returns. Used to ensure that updates are durable and will not be lost in case of crash. If fsync() is called twice, the first is written to persistent storage before the second.</li>\n</ul>\n<h1 id=\"software-layers\" style=\"position:relative;\"><a href=\"#software-layers\" aria-label=\"software layers permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Software Layers</h1>\n<h2 id=\"api-and-performance\" style=\"position:relative;\"><a href=\"#api-and-performance\" aria-label=\"api and performance permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>API and Performance</h2>\n<h3 id=\"system-calls-and-libraries\" style=\"position:relative;\"><a href=\"#system-calls-and-libraries\" aria-label=\"system calls and libraries permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>System calls and libraries</h3>\n<p>Application libraries wrap syscalls mentioned above in the file system API to add additional functionality such as buffering.</p>\n<h3 id=\"block-cache\" style=\"position:relative;\"><a href=\"#block-cache\" aria-label=\"block cache permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Block Cache</h3>\n<p>Since storage devices are much slower than DRAM, the OS has a block cache that caches recently read blocks and buffers recently written blocks.</p>\n<h3 id=\"prefetching\" style=\"position:relative;\"><a href=\"#prefetching\" aria-label=\"prefetching permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Prefetching</h3>\n<p>When a process reads the first two blocks of a file, the OS may prefetch the first 10 blocks. When the predictions are accurate, it can reduce latency from future reads(by returning the contents in the cache), reduce device overhead(by replacing a large number of small requests with one large one), and improve parallelism(by allowing hardware to process multiple requests at once in parallel).</p>\n<p>Some disadvantages of prefetching include cache pressure(each prefetched block might displace another block in the block cache which might be more useful), I/O contention(prefetching costs IO, other requests might need to wait behind prefetch requests), and wasted effort(the prefetched blocks are not actually used).</p>\n<h2 id=\"device-drivers\" style=\"position:relative;\"><a href=\"#device-drivers\" aria-label=\"device drivers permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Device Drivers</h2>\n<ul>\n<li>Used to translate the high level abstractions implemented by the OS and IO devices.</li>\n<li>Layering helps simplify OS by providing a common way to access various classes of devices.</li>\n</ul>\n<h3 id=\"memory-mapped-io\" style=\"position:relative;\"><a href=\"#memory-mapped-io\" aria-label=\"memory mapped io permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Memory Mapped IO</h3>\n<ul>\n<li>IO devices typically have a controller with a set of registers that can be written and read to transmit commands and data to and from the device.</li>\n<li>Memory Mapped IO maps each device’s control registers to a range of physical addresses on the memory bus. Reads and writes by the CPU to this physical address range go to the IO device’s controllers instead of main memory</li>\n</ul>\n<h3 id=\"dma\" style=\"position:relative;\"><a href=\"#dma\" aria-label=\"dma permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>DMA</h3>\n<ul>\n<li>Most IO devices transfer data in bulk. Rather than requiring the CPU to read or write each word of a large transfer, IO devices can use direct memory accesses(DMA).</li>\n<li>DMA allows the IO device to copy a block of data between its own internal memory and the system’s main memory.</li>\n<li>The OS uses memory mapped IO to set up a DMA transfer, then the device copies data to or from the target address without additional processor involvement</li>\n<li>After setting up a DMA transfer, the OS must not use the target physical pages for any other purpose until the DMA transfer is done.</li>\n</ul>\n<h1 id=\"lifecycle-of-a-disk-request\" style=\"position:relative;\"><a href=\"#lifecycle-of-a-disk-request\" aria-label=\"lifecycle of a disk request permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Lifecycle of a Disk Request</h1>\n<ul>\n<li>When a process issues a system call like read() to read data from disk into the process’s memory, the operating system moves the calling thread to a wait queue.</li>\n<li>Then, the operating system uses memory-mapped I/O both to tell the disk to read the requested data and to set upDMA so that the disk can place that data in the kernel’s memory. - The disk then reads the data and DMAs it into main memory; once that is done, the disk triggers an interrupt.</li>\n<li>The operating system’s interrupt handler then copies the data from the kernel’s buffer into the process’s address space.</li>\n<li>Finally, the operating system moves the thread the ready list.</li>\n<li>When the thread next runs, it will returns from the system call with the data now present in the specified buffer.</li>\n</ul>\n<h1 id=\"files-and-directories\" style=\"position:relative;\"><a href=\"#files-and-directories\" aria-label=\"files and directories permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Files and Directories</h1>\n<p>Motivation: How do we go from a file name and offset to a block number?\nNaive Approach: The file system is a dictionary that maps keys(file name and offset) to values(block number)</p>\n<h2 id=\"challenges-with-building-a-file-system\" style=\"position:relative;\"><a href=\"#challenges-with-building-a-file-system\" aria-label=\"challenges with building a file system permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Challenges with building a file system</h2>\n<ol>\n<li><strong>Performance</strong> File systems need good spatial locality, where blocks that are accessed together are stored sequentially. The naive approach would just map block numbers in random places.</li>\n<li><strong>Flexibility</strong> File systems allow apps to share data, and thus must let people access small files, large files, short-lived files, and anything in between.</li>\n<li><strong>Persistence</strong> File systems must maintain and update both user data and internal data structures through OS crashes and power failures</li>\n<li><strong>Reliability</strong> File systems must store data for long periods of time</li>\n</ol>\n<h2 id=\"file-system-implementation-overview\" style=\"position:relative;\"><a href=\"#file-system-implementation-overview\" aria-label=\"file system implementation overview permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>File System Implementation Overview</h2>\n<p>Most file systems have directories, index structures, free space maps, and locality heuristics.</p>\n<ul>\n<li><strong>Directories</strong> A way to map human readable file names to file numbers</li>\n<li><strong>Index structures</strong> A way to translate the file number to locate the blocks of the file. This is usually some form of tree.</li>\n<li><strong>Free space maps</strong> Tracks which storage blocks are free and which are in use as files grow and shrink. Also, blocks returned must have good spatial locality. Usually implemented as bitmaps</li>\n<li><strong>Locality Heuristics</strong> Policies that group data to optimize performance. Some file systems group by directory, others periodically defragment their storage by rewriting existing files so that each file is stored in sequential blocks.</li>\n</ul>\n<h2 id=\"directories\" style=\"position:relative;\"><a href=\"#directories\" aria-label=\"directories permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Directories</h2>\n<ul>\n<li><strong>Purpose</strong> Translate human readable names to internal file numbers in a mapping</li>\n<li><strong>Implementation</strong> Stored as a file</li>\n<li>The root directory’s file number is agreed upon ahead of time, for the Unix Fast File System it is 2.</li>\n<li>To read <code class=\"language-text\">/home/albert/hi.txt</code> we search the root directory by reading the file associated with file number 2. In file2 , we search for the name <code class=\"language-text\">home</code> and find that /home is stored in file number 880. In file 880, we search for the name <code class=\"language-text\">albert</code> and find /home/albert is stored in 526. In file 526, we search for <code class=\"language-text\">hi.txt</code>, we find /home/albert/hi.txt is in file number 469</li>\n<li>The directory API is different from the standard open/read/close/write, since the file name to file number mapping cannot be corrupted</li>\n<li>The syscalls that modify directories are <code class=\"language-text\">mkdir</code>, <code class=\"language-text\">link</code>, <code class=\"language-text\">unlink</code>, and <code class=\"language-text\">rmdir</code>. Mkdir and rmdir create and delete the directory, <code class=\"language-text\">link(\"hi.txt\", \"new.txt\")</code> creates a hard link to an existing file, <code class=\"language-text\">unlink(\"hi.txt\")</code> removes that hard link.</li>\n<li>Processes can read the contents of directory files using the standard file read syscall</li>\n</ul>\n<h3 id=\"hard-vs-soft-links\" style=\"position:relative;\"><a href=\"#hard-vs-soft-links\" aria-label=\"hard vs soft links permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Hard vs Soft Links</h3>\n<ul>\n<li><strong>Hard Links</strong> Multiple file directory entries that map different path names to the same file number. File systems must ensure that a file is only deleted when the last hard link is removed.</li>\n<li>File systems use reference counts that count the number of hard links to each file, starting with 1 at creation. Each call to <code class=\"language-text\">link()</code> increases the reference count by 1 and each call to <code class=\"language-text\">unlink()</code> decreases the reference count by 1</li>\n<li><strong>Soft Links</strong> are directory entries that map one name to another name</li>\n<li>If the file system supports hard links, storing file metadata in directory entries would be problematic since whenever the file size changes, all the file’s directory entries need to be updated</li>\n</ul>\n<h2 id=\"files\" style=\"position:relative;\"><a href=\"#files\" aria-label=\"files permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Files</h2>\n<p>Purpose: Translate a file number to the blocks that belong to the file\nGoals:</p>\n<ol>\n<li>Put data blocks sequentially to maximize spatial locality</li>\n<li>Provide efficient random access to any file block</li>\n<li>Limit overhead to be efficient for smaller files</li>\n<li>Be scalable to support large files</li>\n<li>Provide a place to store metadata, ie reference count, owner, access control list, size</li>\n</ol>\n<table>\n<thead>\n<tr>\n<th>File System</th>\n<th>Index Structure</th>\n<th>Free Space Management</th>\n<th>Locality Heuristics</th>\n<th>Index Structure Granularity</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>FAT(USB drives)</td>\n<td>Linked List</td>\n<td>FAT array</td>\n<td>Defragmentation</td>\n<td>Block</td>\n</tr>\n<tr>\n<td>FFS(Unix)</td>\n<td>Tree(fixed, assymmetric)</td>\n<td>Bitmap(fixed)</td>\n<td>Block groups, reserve space</td>\n<td>Block</td>\n</tr>\n<tr>\n<td>NTFS(Windows)</td>\n<td>tree(dynamic)</td>\n<td>Bitmap in file</td>\n<td>Best fit, defragmentation</td>\n<td>Extent</td>\n</tr>\n<tr>\n<td>ZFS(Copy on write)</td>\n<td>tree(copy on write, dynamic)</td>\n<td>Space map(log-structured)</td>\n<td>Write anywhere, block groups</td>\n<td>Block</td>\n</tr>\n</tbody>\n</table>\n<h3 id=\"fat-filesystem\" style=\"position:relative;\"><a href=\"#fat-filesystem\" aria-label=\"fat filesystem permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>FAT Filesystem</h3>\n<!-- ![File allocation table](/media/File-Systems/FAT.JPG) -->\n<ul>\n<li><strong>Techniques</strong> Uses a extremely simple index structure(linked list). FAT stands for file allocation table, its array of 32 bit entries in a reserved area of the volume. Each file corresponds to a FAT entry in the array, which is either the last block or contains a pointer to the next block.</li>\n<li><strong>Directories</strong> map file names to file numbers. The file’s number is the index of the file’s first entry in the FAT. We can then parse the linked list to find the rest of the file’s blocks.</li>\n<li><strong>Free space tracking</strong> If data block i is free, then FAT[i] = 0</li>\n<li><strong>Locality Heuristics</strong> Some FAT implementations use a next fit algorithm that sequentially scans the file allocation table starting from the last entry and returns the next free entry. This fragments the file, so there is often a defragmentation tool that reads files from their existing locations and rewrites them for better spatial locality.</li>\n<li><strong>Limitations</strong>: Poor locality(badly fragmented files), Poor random access(need to sequentially traverse the FAT entries), limited metadata and no access control, no support for hard links, limitations on volume and file size(max file size if 4GB), and a lack of support for reliability techniques(does not support transactional updates).</li>\n</ul>\n<h3 id=\"unix-fast-file-systemffs\" style=\"position:relative;\"><a href=\"#unix-fast-file-systemffs\" aria-label=\"unix fast file systemffs permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Unix Fast File System(FFS)</h3>\n<!-- ![Multilevel index](/media/File-Systems/multilevel-index.JPG) -->\n<ul>\n<li>\n<p><strong>Techniques</strong> FFS uses a carefully structured tree that locates any block of a file that is efficient for both large and small files. Each file is a tree with fixed size blocks as its leaves. Each file’s tree is rooted in an <em>inode</em> that contains the file metadata.</p>\n<ul>\n<li>Typically the file’s inode contains 15 pointers: the first 12 pointers are direct pointers that point directly to the first 12 data blocks of a file</li>\n<li>The 13th pointer in the <em>inode</em> is an indirect pointer which points to an array of direct pointers.</li>\n<li>The 14th pointer in the *inode is a double indirect pointer which points to an internal node which points to an array of indirect pointers, each pointing to an array of direct pointers.</li>\n<li>The 15th pointer in the inode is a triple indirect pointer that contains an array of double indirect pointers.</li>\n<li>All of the file system’s inodes are located in an inode array. The file’s file number, aka <em>inumber</em> is an index into the inode array.</li>\n<li>This asymmetric setup allows for efficient support for both large and small files.</li>\n</ul>\n</li>\n<li><strong>Free Space Management</strong> FFS allocates a bitmap with one bit per storage block, the ith bit in the bitmap indicates whether the ith block is free or in use.</li>\n<li>\n<p><strong>Locality Heuristics</strong> FFS uses block group placement and reserved space</p>\n<ul>\n<li>\n<p><em>block group placement</em> divides the disk into block groups so the seek time between any blocks in a block group will be small  </p>\n<!-- ![Block Group Placement](/media/File-Systems/block-group-placement.JPG) -->\n</li>\n<li>Each block group holds a portion of the metadata structures</li>\n<li>FFS puts a directory and its files in the same block group</li>\n<li>Within a block group, FFS writes to the first free block in the file’s block group. This helps locality in the long term since fragmentation is reduced</li>\n</ul>\n</li>\n<li>When the disk’s space is almost full, the block group heuristic will perform poorly since new writes will be scattered randomly around the disk. Thus, FFS reserves a portion of the disk’s space and presents a slightly reduced disk size to applications</li>\n</ul>\n<h3 id=\"windows-new-technology-file-systemntfs\" style=\"position:relative;\"><a href=\"#windows-new-technology-file-systemntfs\" aria-label=\"windows new technology file systemntfs permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Windows New Technology File System(NTFS)</h3>\n<p>Instead of using fixed trees like FFS, NTFS uses extents and flexible trees</p>\n<!-- ![NTFS index record](/media/File-Systems/nfts-index-record.JPG) -->\n<ul>\n<li><strong>Extent</strong> variable sized regions of files that are stored in a contiguous region on the storage device</li>\n<li><strong>Flexible tree and master file table</strong> Each file in NTFS is represented by a variable depth tree. The extent pointers for a file with a small number of extents can be stored in a shallow tree, deeper trees are only needed if the file becomes badly fragmented.</li>\n<li><strong>Master File Table(MFT)</strong> stores an array of 1KB MFT records, each of which stores a sequence of variable size attribute records.</li>\n<li><strong>Resident attribute</strong> Stores its contents directly in the MFT record</li>\n<li><strong>Non-resident attribute</strong> Stores extent pointers in its MFT record and stores its contents in those extents.</li>\n<li><strong>MFT record</strong> Includes standard information(creation time, last modified, owner ID), file name, and attribute list</li>\n<li>\n<p><strong>Four stages of file growth</strong></p>\n<ol>\n<li>Small files can have their contents included in the MFT record as a resident data attribute.</li>\n<li>Normal file data has a small number of extents tracked by a single non-resident data attribute.</li>\n<li>Large files combined with a fragmented file system can make a file have so many extents that the extent pointers will not fit in a single MFT record. These files can have multiple non-resident data attributes in multiple MFT records, with the attribute list in the first MFT record indicating which MFT record tracks which range of extents</li>\n<li>If the file is huge, the file attribute list can be made non-resident, allowing arbitrarily large numbers of MFT records</li>\n</ol>\n</li>\n<li><strong>Free Space Map</strong>\nNTFS stores all metadata in a dozen files with low file numbers. For example, file number 5 is the root, file number 6 is the free space bitmap, file number 8 contains a list of the volume’s bad blocks.</li>\n<li><strong>Locality Heuristics</strong> NFTS uses best fit, where the system tries to place a newly allocated file in the smallest free region large enough to hold it. NTFS also has a defragmentation utility</li>\n</ul>\n<h3 id=\"copy-on-write-cow\" style=\"position:relative;\"><a href=\"#copy-on-write-cow\" aria-label=\"copy on write cow permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Copy on Write (COW)</h3>\n<p>When updating an existing file, COW file systems do not overwrite existing data or metadata, instead, they write new versions to new locations.</p>\n<ul>\n<li>\n<p><strong>Motivations</strong></p>\n<ul>\n<li>Small writes are expensive. Large sequential writes is much better than small random writes, and the gap will continue to grow since bandwidth grows faster than seek time/rotational latency</li>\n<li>Small writes are expensive on raid since we need to read old data, read old parity, write new data, and write new parity. The number of times we read/write to parity bits grows linearly with the number of reads/write but not the size of the reads/writes</li>\n<li>Large DRAM caches can handle essentially all file reads, thus the cost of writes dominate performance so optimizing write performance is the number one priority.</li>\n<li>Flash storage works better with copy on write: there is no need to clear the erasure block and it writes data to a different location instead of overwriting the current data wearing out the flash drive evenly</li>\n<li>\n<p>Since old data isn’t overwritten, we can support versioning</p>\n<!-- ![Copy on Write](/media/File-Systems/copy-on-write.JPG) -->\n</li>\n</ul>\n</li>\n<li><strong>Implementation:</strong> We store inodes in a file rather than in the inode array. All the file system’s contents are stored in a tree rooted in the root inode, when we update a block, we write the block and all of the blocks on the path from it to the root to new locations.</li>\n</ul>\n<h3 id=\"zfs-file-system\" style=\"position:relative;\"><a href=\"#zfs-file-system\" aria-label=\"zfs file system permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>ZFS file system</h3>\n<ul>\n<li><strong>uberblock</strong> the root of the ZFS storage system, ZFS has an array of 256 uberblocks and rotates successive versions</li>\n<li><strong>dnode</strong> similar to inode, a file represented by a variable depth tree whose root is a dnode and leaves are data blocks</li>\n<li><strong>Free space map</strong> per block group space maps. ZFS maintains a space map for each block group, has a tree of extents, and log structured updates.</li>\n<li><strong>Locality Heuristics</strong> On writes, ZFS ensures sequential writes and batched updates.</li>\n</ul>\n<h2 id=\"file-and-directory-access-walkthrough\" style=\"position:relative;\"><a href=\"#file-and-directory-access-walkthrough\" aria-label=\"file and directory access walkthrough permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>File and Directory Access Walkthrough</h2>\n<!-- ![Walkthrough](/media/File-Systems/walkthrough.JPG) -->\n<p>Goal: Read the file /foo/bar/baz\nSteps:</p>\n<ol>\n<li>Read the root directory <code class=\"language-text\">/</code> to determine <code class=\"language-text\">/foo</code>’s inumber</li>\n<li>Open and read file 2’s inode. Since FFS stores pieces of the inode array at fixed locations on disk, given a file’s inumber it is easy to read the file’s inode</li>\n<li>From the root directory’s inode, extract the direct and indirect block pointers to determine which block stores the contents of the root directory.(block 48912)</li>\n<li>Read that block of data to get the list of name to inumber mappings in the root directory to find <code class=\"language-text\">/foo</code> has inumber 231</li>\n<li>Since we know <code class=\"language-text\">/foo</code>’s inumber, we read inode 231 to find where <code class=\"language-text\">/foo</code>’s data blocks are stored-block 1094 in this example.</li>\n<li>Read block 1094 to get the list of name to inumber mappings in the <code class=\"language-text\">/foo</code> directory to find that the directory file <code class=\"language-text\">/foo/bar</code> has inumber 731.</li>\n<li>Follow similar steps to read <code class=\"language-text\">/foo/bar</code>’s inode and data block 30991 to find <code class=\"language-text\">/foo/bar/baz</code>’s inumber is 402</li>\n<li>Read <code class=\"language-text\">/foo/bar/inode</code> to get the data blocks 89310, 14919, 23301</li>\n<li>Usually, data blocks are cached so we don’t need to repeat this process.</li>\n</ol>\n<h1 id=\"questions\" style=\"position:relative;\"><a href=\"#questions\" aria-label=\"questions permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Questions</h1>\n<ol>\n<li>What does mmap() do?</li>\n</ol>","fields":{"slug":"/posts/operatingsystems/file-systems","tagSlugs":["/tag/notes/","/tag/textbook/","/tag/operating-system/"]},"frontmatter":{"date":"2021-11-20T23:46:37.121Z","description":"Notes on File Systems","tags":["Notes","Textbook","Operating System"],"title":"File Systems"}}},"pageContext":{"slug":"/posts/operatingsystems/file-systems"}},"staticQueryHashes":["251939775","401334301","825871152"]}