Please purchase the course to watch this video.

Full Course
Counting bytes within an io.reader
is crucial for efficient file handling in Go, and this process can be accomplished elegantly using the io.copy
function alongside the io.Discard
writer. To develop a countBytes
function, a test-driven development (TDD) approach is leveraged to ensure correctness, starting with defining test cases for various inputs. Instead of manually iterating through bytes or using complex constructs, the io.copy
function simplifies the counting process by copying data to a writer, which, in this case, does not need to retain the data, allowing the use of a no-operation writer. By implementing and thoroughly testing the function through diverse scenarios—including edge cases and Unicode characters—developers ensure robustness and precision in byte counting, setting the stage for future integration with larger algorithms that require line and word counts.
No links available for this lesson.
In the last lesson, we added in the new function to count the number of lines inside of an io.Reader
, which needed to take a slightly different approach from the CountWords function which made use of the bufio.Scanner
.
In this lesson, we're going to take a look at how we can actually count the number of bytes, and we're going to do so using a different approach than we've seen before.
In order to begin, let's go ahead and create a new function called CountBytes
, which takes a reader of io.Reader
, and returns an int
. And let's go ahead and set this to return 0 for the moment.
Then, same thing we did in our last lesson, let's begin by taking a TDD approach, and creating a test function with a bunch of test cases in order to validate the behaviour of the CountBytes
function.
Beginning with the func TestCountBytes
, again t *testing.T
, and again go ahead and define some test cases, very similar to what we had before, an anonymous struct with a name of string, an input of string, and a wants of int.
Then we can add the actual test code itself for range of tc := range testCases
, t.Run
, tc.name, func(t *testing.T)
. Then let's go ahead and create a new reader, which you can do as r := strings.NewReader(tc.input)
.
Then let's go ahead and capture the count number of bytes in the counter.CountBytes
function, passing it, storing it in a variable called res
.
And then we can do our same test expectation we saw before, which was if res != tc.wants
.
Then we can do a t.Fail()
and we'll do a t.Logf(expected: %d, got: %d)
and the expectation here was going to be the tc.wants
and the got is the actual res
and the wants here is an int
, not a string.
Then let's go ahead and add in some simple test cases. Again, we'll do a happy path, which is going to be five words, kind of what we've done already at the moment.
And the input for this is going to be 1, 2, 3, 4, 5
. And the wants here, I think this is going to be 25
, maybe even 23
. 22
, I think it's 23
. This one's a little harder to figure out, but we'll get it correct. In fact, we'll just leave this as follows. For the meantime, we'll add some more test cases shortly.
Let's head back on over to our CountWords, CountBytes function in order to make this happen.
Here we have a number of different approaches we can actually take when it comes to our algorithmic implementation. We could take a very similar approach where we create a buffered IO reader. And instead of reading runes, we read the individual bytes as follows, and then just increment a count.
Whilst this implementation would work, personally, I think there's a much simpler approach, which is to make use of another function provided by the io
package, the Copy
function. This function will copy from the source reader, which is the second parameter, into the destination, which is an io.Writer
, until either the end of file is reached on the source or an error occurs. It returns the number of bytes copied and the first error encountered whilst copying, if any.
Because of this, we can use this function and take the first return parameter, which is the number of bytes written, in order to return the number of bytes inside of our file, or io.Reader
. The only catch, however, is that we need to also pass in an io.Writer
, which is where the data will be written to.
Fortunately, however, there's a couple of ways we can do this effectively. To begin, let's go ahead and use the io.Copy
function. We'll just pass nil
in for the actual writer at the moment, and go ahead and pass in our reader.
Let's go ahead and capture the first return value, which is the bytes written, and store it in a variable called byteCount
. Then for the error, let's just go ahead and ignore it using the blank identifier.
Then underneath this, we can go ahead and return our byteCount
, but making sure to cast it to an integer.
In any case, with our CountBytes
function implemented, if we try to run this code using the go test
command, you can see we actually get a panic caused by a runtime error of invalid memory address or nil pointer dereference.
This is because we're passing in a nil value to the io.Writer
that we're copying to. Therefore in order to solve this, we're going to need an actual io.Writer
that we pass in in order for the copy command to write to.
We could use something like a bytes.Buffer
, but the problem with using this is it's going to store this data in memory. Which if you'll remember, will be an issue if we happen to be reading from a large file. Instead, we want to use a type that is effectively a null writer.
In Unix-like systems, there's actually a file called /dev/null
. Which if I cat
doesn't do anything, but I can also echo
values to it, such as foo
to /dev/null
, and nothing will happen. Again, the /dev/null
value, if I go ahead and cat
it, won't have anything inside.
In macOS and Linux, the /dev/null
file is a special sort of device that lives on the system. This device is basically a no-operation file, meaning anything you write to /dev/null
will be written to the void.
Therefore, one approach we could take is to just pass this file in to our io.Copy
command as an io.Writer
. Which we can do by first opening it using the os.OpenFile
command, passing in either the /dev/null
name, or if we want to be a little more cross-platform, we could pass in the os.DevNull
constant, which provides the /dev/null
device on Unix-like systems and the NUL
device on Windows.
Then we could just pass in the file to the io.Copy
function. This would work, but it's way more complex than it needs to be. This is because, as I mentioned before, the standard library pretty much has a solution to most of the common operations that we need.
In this case, if we take a look at the Writer
type of the io
package, you can see that the io
package has a variable called Discard
, which is a writer on which all writes will succeed without doing anything. This is effectively a no-operation writer when it comes to Go, and works in a similar way to the /dev/null
device.
Therefore we can go ahead and make use of this io.Discard
writer as follows, passing it in as the first argument to the io.Copy
command. Now if I go ahead and run this code again, you can see that this time we no longer get a panic or a crash, and the code works as expected. Which if I use the -v
flag, you can see it does.
With that, we've managed to implement our CountBytes
function in a slightly different way than what we were doing with both the CountWords
and CountLines
. We've managed to make use of the io.Copy
function as well as the io.Discard
writer.
For the remainder of this lesson, I'm going to go ahead and add in some additional test cases. And if that's something you're interested in, then please do follow along. Otherwise, if you want to move on to the next lesson, now is a good time to do so, but make sure to go ahead and commit your code beforehand.
Now back to me adding in some more test cases. With that, all that remains now is to go ahead and add in some additional tests into our CountBytes
test function.
Setting up the standard edge cases of empty file, which we'll do as an input of empty
. And here we want 0
. Let's go ahead and pass in all the spaces. In this case, we want to do input 1, 2, 3, 4, 5, 6, 7
. And we'll do some new lines as well in a bit.
New lines and words, input, let's do a 1, 2, 3, 4
. Let's go ahead and add a tab in there just to be safe as well. This one should be, actually not too sure how many this is, 3, 6, 9, 11, 15, 16, 17, 18, 19, 20
. I think it's 20
.
Let's go ahead and make sure that these edge cases are fine so far. They are so far, so good. Let's go ahead and pull out some Unicode characters just to test that the number of bytes is being calculated correctly here.
Let's do this one. Unicode characters and the input is going to be as follows. And here I believe we want 2
. I'm not actually sure how many bytes this is, so let's go ahead and test it and figure it out. It's actually three bytes. And let's go ahead and add in some Cyrillic. This one looks pretty cool. See, I'm not very good at Cyrillic in this case. Okay, we want it to be five bytes.
Let's go ahead and do that. With that, our tests are now passing and we've managed to add in an implementation in order to count the number of bytes in an io.Reader
.
However, rather than making use of the bufio.Scanner
or bufio.Reader
types, we instead managed to do this by using the io.Copy
command with the io.Discard
writer, which is a null or noop writer where any of the writes won't actually do anything to, which we can use in place of needing an io.Writer
when we don't actually want to do anything.
In any case, that covers the two new functionalities for both counting lines and counting bytes, as well as proving that these functions work as expected through the use of test cases.
In the next lesson, we're going to take a look at how we can then integrate them into our actual algorithm so that whenever we run the go run
command, passing in a text file as follows, we'll get back the count of lines and the count of words.
Before we move on, go ahead and commit the code as follows, adding in the count.go
and count_test.go
files and committing them with a message of added in new CountBytes
, CountBytes
function and tests, or anything similar that you want to add for your own commit message.
Once that's done, let's move on to the next lesson.