Please purchase the course to watch this video.

Full Course
Interacting with the file system is a crucial aspect of command-line interface (CLI) applications, particularly when it comes to recursively traversing directories and analyzing files. Go simplifies this process through the walk
function in the filepath
package, which allows developers to start at a root directory and execute a given function at each file or directory node within the tree structure. This lesson demonstrates how to create an application called walker
, which collects statistics such as the total number of files, directories, and the cumulative size of the files within a specified directory. The application also includes functionality to filter files by their extensions and showcases the versatility of the io/fs
package for further enhancements. Overall, understanding these tools enables developers to effectively manage and manipulate file systems in their Go applications.
No links available for this lesson.
As well as being able to both write to and read from files, another common operation when it comes to CLI applications is the ability to interact with the actual file system, specifically how to traverse it in order to look for files recursively. Go actually provides a mechanism for us to be able to easily do that, which is the walk
function inside of the file path
package. By the way, the file path
package provides a number of different utility functions and routines for manipulating file name paths in a way that's compatible to the target operating system. We'll talk a little bit more about that in cross-platform code in the next module, but in addition to being able to support cross-platform file path functionality, it also provides the walk
function, which allows you to walk a file tree beginning at the root.
This function works in a rather interesting way. Basically, you call the function passing in the root directory, which is the root of the tree that you want to start at. Then you pass in another function that will be called at each node in the tree, whether it's a directory, such as a branch, or if it's an actual individual file, i.e. a leaf. To take a look at how the walk
function works, in this lesson, we're going to go ahead and create an application that will use this function to traverse all of the files inside of a root directory and to generate statistics from them. You can think of this as being similar to, say, the du
function, which is disk usage, which if I go ahead and run, let's say, my projects directory, and we'll just do the depth of one for this moment. We'll go ahead through all of my projects and actually calculate the size of bytes that have been taken up in each file. In this case, we'll do something similar. However, we'll just keep a tally of the total number of bytes inside of our directory.
Here, if I go ahead and run this again, so passing in a depth of one, and we'll pass in human readable, you can see it now prints out the size of the files or the size of the directory with all of the files inside, and 33 gigs for the current directory. In our case, we're going to create an application called walker
, which will iterate through all of the files and directories inside of the directory that we call it in, and it will capture statistics about the number of directories, files, and the total size of bytes inside.
Therefore, in order to get started, here I have a main
function inside of a main
package inside of the walker
directory, pretty similar to how we've started other projects. To begin, let's go ahead and import the file path
package, which is found underneath path/file path
. Then if we take a look at the file path.walk
function, you can see here it takes a root directory as the first parameter, and then the file path.walk
function we saw before. For the root directory, let's go ahead and set this to be the dot directory by default, and we'll also accept this as an os.args
. So if os.length of os.args
is greater than one, then we'll go ahead and set this root_deer
to also be the first argument, os.args[1]
as follows, and use the shorthand to initialize this.
Okay, then we can go ahead and pass this root_deer
into our file path.walk
function, followed by passing in our actual walk function. This function takes a path as a string as the first parameter, then an info
which is going to be, I believe, an fs.file_info
, sorry, an os.file_info
, that's my mistake, and then it takes an error, and the function itself returns an error as well. Let's go ahead and just return nil
for the moment, and we can go ahead and get rid of the fs
.
One interesting thing to note about this function is that the third parameter, let me scroll up, the third parameter is an error. If we take a look at the documentation, we can see what this error actually is. Here you can see the, it says the error argument reports an error related to a path, signaling that the walk function will not walk into that directory. The function can decide how to handle that error. As described earlier, returning the error will cause the walk to stop walking the entire tree. In our case, we're just going to go ahead and ignore it, because it's not going to be too much of an issue when it comes to our statistics. We still want to walk through other files, even if that node can't be walked into, whether it's a permission issue or something else.
In any case, now we have our walk function set up, let's go ahead and actually see what this looks like if we run it. And we'll just go ahead and print out the actual path as follows. Now, if I go ahead and run this code and let's go ahead and pass in the, let's go ahead and pass in a smaller directory, actually, let's go ahead and pass in the exec
directory. You can see this is what it looks like. It prints out exec
, exec.go.mod
, exec main.go
and the words.txt
. So printing out all of the files found in that directory. Let's see if we can go ahead and run it with a larger one, say the dreams of code
website, which is what you should be watching this course on. If I go ahead and run this, you can see, there we go, it prints out all of the files, including all of the node modules and everything else. And it's pretty fast at doing so.
Okay, so now that we've seen how the file path.walk
function works, let's go about actually collecting our statistics. As I mentioned before, we want to collect three different statistics in total, which is the total number of files, the total number of directories, and the total size in bytes, let's say. To begin, let's go ahead and just capture the total number of files, which we'll go ahead and set to be zero. Then let's go ahead and do an fmt.Println
. We could do a tab writer if we want to, but we've already seen the tab writer. So just for time, we'll go ahead and do total. We'll just print this out as a single line. So total, let's actually make this a little bit nicer. Summary and do fmt.Println total files
. And we'll go ahead and actually print out the total number of files as follows. Now, if we go ahead and run this, this should just produce summary total files. Let's go ahead and get rid of the print line for the statement.
Let's try that again. Summary total files zero. Great. Now let's go ahead and count the individual files. The easiest way to do so would be to just go ahead and do total number of files++
. And if we go ahead and run this, you'll see it kind of works. However, the issue here is we're also counting the total number of directories as well. For example, if I go ahead and make a deer called foo
, and if we go ahead and touch a file in foo
called bar.txt
. Now, if I go ahead and run this, you can see it produces the count of five, which isn't actually correct. Instead, I'm expecting there to only be the result of three, as we have two files inside of this directory and one inside of the foo
directory. The reason we're getting the number five is because we're also counting the directories found inside of this folder as well.
So how can we go about determining whether or not a file is a directory or a file? Well, that's where the second parameter of the filepath.walk
function comes in. This is the info
or an os.file_info
type, which provides some information about the actual file we can use, such as the mode, which are the permission bits we looked at in the last lesson, the file's name, which is going to be very similar to the path, the size, which is the total number of bytes, something that we'll want to use, and also the is_directory
, which will let us know whether or not the file is a directory. Therefore, if we can go ahead and use the if.isDir
, and we can actually go ahead and capture the total number of directories, which will set as total number of dis
, and we'll do total number of dis++
as follows.
Otherwise, we'll add an else statement in here and just capture the total number of files. Whilst we're here, we may as well go and print that out now that we're capturing it, total directories
, and I'm going to go ahead and line these up, just because I think it makes it a little easier to see what's going on. Now, if we go ahead and run this code, we can see we get the following results. Total file is three, which actually is the case. We have gomod
, main.go
and foo/bar.txt
, but the total number of directories is two. This number doesn't feel correct, as we only have one directory inside of our folder, and if we ls
foo
, there's no directories inside. If we make another directory, say foo/bar
, then we should expect this to be two, but it's actually three. Let's go ahead and remove the bar for the moment. So what is going on?
Well, if we go ahead and print out the file path whenever we detect a directory, let's just go ahead and print this out. Actually, we'll add a prefix of dir
just so it's easier to see what this is. Now, if I go ahead and run this, you can see why we're detecting two directories. This is because the first directory we're detecting is the dot directory, which if I do an ls -la
, you can see is the first directory in the ls
command. This dot directory is the current directory we're in, which we may or may not want to count, depending on our own situation. In my case, I don't want to count the current directory, as it feels kind of misleading. So we're just going to go ahead and add in the following check in order to make sure we don't count it. If info
, if path
is equal to dot
, then we'll just go ahead and return early.
Okay, now if we go ahead and run this, we should get three files and one directory, which is now what we're expecting. So far, so good. We've managed to get the total number of files and the total number of directories. The last thing we need to add is the total size in bytes. To do this, we can do total size in bytes
and we'll assign it the value of zero. Then in order to increment this, we want to add the size from the info
whenever we see a file. So in order to do so, let's go ahead and add it into the else block when we check to see if a file is a directory or a file. Technically, everything in Unix is a file, so even directories. So it can be a bit confusing when it comes to saying file versus directory.
In any case, total size in bytes
is going to be += info.size
, which returns the length in bytes for regular files. Actually, let's do total bytes
here. We're going to go ahead and set this to be an int64
just so that we don't have to cast it, which is the correct approach. On a 64-bit system, an int is going to be int64
, but it's still safe to make sure that we're using the concrete value or the concrete size. And we can go ahead and do total bytes += info.size
as follows. Now, all that remains is to print it out. Total bytes
. Total bytes
as follows. Now, if I go ahead and run this code, you can see we get the total number of bytes, which in this case is 754. Not too many. However, let's go ahead and run this against the dreams of code
website directory and see what it produces. Here we have a lot more. You can see we have 10,000 total files and 1,300 total directories. We also have 248 million bytes, so 248 megabytes, which if I go ahead and do a du -h dreams of code
, 260 megabytes. About the same.
So far, so cool. As you can see, the file path.walk
method is incredibly useful in order to be able to walk over all of the files and directories inside of a project. In addition to providing the functionality to walk through all of the files and directories inside of a root directory, the file path
package also provides a number of other functions that make working with file paths a lot easier. These include functions such as being able to join paths together in a cross platform way, which we've seen a couple of times throughout this course, being able to localize a path to a various operating system or being able to pull out the extension from the path as well. This function will return a path string as a parameter and returns the given extension if it exists.
To show how this works, let's go ahead and add in another feature to our application where we can define an extension flag and we'll only count the files that end in that extension. To do so, let's go ahead and create a new flag or let's go ahead and create a new variable called desired_ext
. And we'll go ahead and set this to be an empty string. Then we can go ahead and use the flag
package of string_var
passing in the desired_ext
as well as the name which we'll call ext
. And we'll set the default value to be an empty string. I'm also going to just say usage here and leave this blank for the moment. Next, we then need to call the flag.parse
function passing in the flags. And we need to change the os.args
to be flag.args
. And as follows. And then we can do flag.arg
passing in the zero argument. And it's greater than zero.
Okay, next we can then go ahead and make use of this inside of the else block where we actually count the files. To do so, let's go ahead and actually get the file extension using the ext
package of the file path
. file path.ext
and we'll just pass in the path to it. Then we can do a simple check of if desired extension has been set. So not equal to nil. And the file extension does not equal the desired extension
. Then we can just go ahead and return early setting a nil.
Now, if I go ahead and run this code again, passing in the extension of go
, and we'll just set a dot here. You can see we get the total number of directories and the total bytes, but we're not actually counting the number of go files inside. This is because we need to pass in the dot go extension rather than the go extension. This is because the ext
function of the file path
package will actually return the dots as well, the dot prefix as well. So we need to make sure to pass this in also.
As you can see, however, we've now managed to expand the functionality of our code to be able to perform counting of just individual files. So if we pass it in with the dot go extension to the dreams of code
website, you can see it changes the results. Pretty cool.
In addition to the file path
package, there's another package that is often used when it comes to the file system. This is the io/fs
package, which provides a number of basic interfaces to a file system. The io/fs
package provides a certain type called the fs
, which is a simple interface that defines the open method. However, this interface can be used with quite a few different functions. For example, the walkdir
function takes this fs
type and allows you to perform walking of a file tree rooted at root. Additionally, there's a number of other files such as read file
function, glob
function, etc, etc, that are all provided with this fs
type interface.
The fs
type itself can in some cases be more preferable to use, as it allows for more versatility by decoupling your code from the underlying file system. For example, whilst most of the time you may want to operate with the file system on your actual system itself, in some cases you may want to operate with an in-memory file system. Or an embedded file system that can also be used with the embed package in Go.
Therefore, in order to better understand how we can work with these different file systems, let's go ahead and change our code from using the filepath.walk
method to instead using the fs.walk
method. In order to do so, however, we first need to be able to turn our file system into an fs
file system. To do so, we can go ahead and use the dirfs
function of the os
package, which returns a file system for the tree of files rooted at the directory dir
.
Therefore, let's go ahead and create our dir
file system or dirfs
, I'll just call it lowercase for this, at the os.dirfs
function and we'll just go and pass in our root dir
as follows. Next, with our dirfs
system in place, the next thing we can do is go ahead and replace our call to the filepath.walk
function with a call to the fs.walkdir
, which is very similar to the filepath.walk
function, however, it takes a file system as the first argument, an fs.fs
, a root path, which in this case is actually going to be the dot directory rather than the actual root path we passed into our dirfs
, and an fs.walkdir
function, which again is also slightly different.
In any case, let's go ahead and use this as follows, passing in our dir.fs
, and we want to pass the dot as the root directory, because the root will be at the dot of this file directory. Then we can go ahead and pass in our fs.walkdir
function, which again is a path to a string. The second is actually an entry to a fs.dir entry
, this is different slightly, and again we have an error, and we also return an error as well.
Okay, with that, most of our functions should work, except for the fact we no longer have an info
or an os.file_info
. For some things, this isn't too much of an issue, as we can just use the entry value instead, which does provide an isDir
, a name
, and a type
property. Unfortunately, however, it doesn't provide the size, which only comes from an os.file_info
. Fortunately, the entry property provides an info
method, which will allow us to obtain an fs.file_info
, which will contain the size method. So let's just go ahead and wrap it as info
, and we'll capture an error, as this can return an error, and we'll use the if error
. In the case that there is an error, we'll just go ahead and return it.
With that, our code should now be making use of this new directory file system, rather than using the file path.walk
method. So let's just go ahead and delete that, and we can open up a terminal window and give it a go. Let's go ahead and run it in the current directory, which we can see works, and if I go ahead and run it in the dreamsofcode
directory, passing in the .go
extension, it should work as well.
With that, we've taken a brief look at how we can actually interact with the file system inside of Go, be it the file system on our local disk, or by using more abstract file systems if we want to make use of the fs.fs
type. We'll take a look at more of what fs.fs
does later on in this course, specifically when we look at the embed package for embedding a configuration into our binary application.
In any case, that covers the basics of how you can traverse a file system if you want to be able to create tools in order to do so, and in our case we've managed to create one to count the number of files, number of directories, and the total number of bytes inside. In the next lesson, we're going to actually take a deeper look at a topic we saw in the last module, specifically, file locks.