Please purchase the course to watch this video.

Full Course
Developing a reliable file processing application in Go requires careful handling of file offsets, especially when counting attributes such as bytes, words, and lines. Each read operation on a file updates the file descriptor’s offset, which can lead to issues when the same file descriptor is used across multiple counting functions. This lesson explores how to address this offset problem by utilizing the seek
method from the os
and io
packages to reset the file's read location before performing subsequent counts. By implementing a more efficient design using a getCounts
function that consolidates counting logic and incorporates an interface that combines both reading and seeking capabilities, the application can handle data from both files and standard input streams seamlessly. Additionally, incorporating unit tests helps ensure that the functionality remains intact as the code evolves.
No links available for this lesson.
The end of the last lesson we managed to get our word counter returning the three attributes that we were counting inside of our file. However when we went to go and run this code using the go run
command passing in the words.txt
, you'll remember that we were only receiving one of the three attributes in our results. In this case it's the number of bytes, which is 24.
However if I go ahead and change the ordering of our three count functions, first counting the number of lines, and then run the code again. This time you'll see we only get the number of lines rather than the number of bytes. The other two values have been set to zero. This is happening due to the way that files work when it comes to the kernel.
Whenever a file is opened a file descriptor is then created and maintained by the kernel. Then when we read and write from that file we're actually using that file descriptor and the kernel is pulling out data for us. This file descriptor contains an offset, which is the point in the file where we would next read from or write to. Every time we read from this file the offset is then moved by the number of bytes that we read, which is what allows us to read all of the contents of the file as we traverse along each read operation.
The problem is however, is that within the first function—let's say it's back to our CountBytes
—by the end of this function the file's offset has reached the end of file, which is what causes our application to stop reading from. However, unfortunately we then pass the same file descriptor to our CountWords
function, which then attempts to count all of the words from this file, but the offset is already set to the end, so there's no more words to read.
This is why our go test
command works, meaning we know that the individual functions are pulling out the correct values, but our actual CountFile
function, which combines all three, doesn't work, due to the bug caused by the offset. So in order for our new features to work, we need to figure out a solution to this offset problem.
There's actually a few different approaches we can take. We're going to look at one at the end of this module when we rework our algorithm to be more performant. For the meantime, however, let's take a less performant approach, but one that's a little easier, especially for our current code setup. This approach is to just reset the offset back to the beginning.
To take a look at how we can do this, if we head on over to the documentation for the os
package and jump down to the File
type, if we take a look at the methods available, we can find one that begins with the word Seek
. This method sets the offset of the next read or write on the file to the given offset provided, interpreted according to whence
.
We could use this function to:
- set the offset to the beginning
- set the offset to the end
- set the offset to any arbitrary byte
We’ll use it to set the offset back to the beginning.
Let’s head back to our CountFile
function and use this new method. Underneath the CountBytes
call, add:
const offsetStart = 0
file.Seek(offsetStart, io.SeekStart)
Go used to have:
os.SeekSet
os.SeekCur
os.SeekEnd
These are now deprecated. Instead, we use:
io.SeekStart
io.SeekCurrent
io.SeekEnd
Now if we test this, we should see both bytes and words appear correctly, confirming that the Seek
method is solving the issue.
Apply the same before calling CountLines
, and now all three values should show up when comparing to the wc
command.
However, there's still a problem in our main.go
function. It works when passing a file, but not with stdin.
Example of using stdin:
go run main.go < words.txt
You’ll see only the word count appears—value 5
. That’s because we haven’t updated the stdin case to use the new CountFile
logic.
But we can’t just pass os.Stdin
to CountFile
, because it expects a filename, not a file stream.
So let’s replicate the current logic:
bytes := CountBytes(os.Stdin)
os.Stdin.Seek(0, io.SeekStart)
lines := CountLines(os.Stdin)
os.Stdin.Seek(0, io.SeekStart)
words := CountWords(os.Stdin)
This works, but the code is now:
- hard to read
- duplicated
All we need is a function that takes an *os.File
(or any compatible stream) and returns a Counts
struct.
Let’s make a function:
func GetCounts(f *os.File) Counts {
// logic goes here
}
In GetCounts
, copy the logic from CountFile
, but change file
to f
.
Return a struct:
return Counts{
Bytes: CountBytes(f),
Words: CountWords(f),
Lines: CountLines(f),
}
Now CountFile
can become:
counts := GetCounts(file)
return counts, nil
And in main.go
:
counts := GetCounts(os.Stdin)
fmt.Println(counts.Lines, counts.Words, counts.Bytes)
This works for both stdin and actual files.
But we’re not testing GetCounts
. Let's fix that.
Create a new test:
func TestGetCounts(t *testing.T) {
testCases := []struct {
name string
input string
want Counts
}{
{
name: "simple five words",
input: "one two three four five\n",
want: Counts{
Lines: 1,
Words: 5,
Bytes: 24,
},
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
r := strings.NewReader(tc.input)
res := GetCounts(r)
if res != tc.want {
t.Fail()
t.Logf("Expected %+v, got %+v", tc.want, res)
}
})
}
}
This fails though—GetCounts
expects an *os.File
, not a Reader
.
The issue is that strings.Reader
is:
- an
io.Reader
- also implements
Seek
We need a function that accepts both interfaces.
Let’s define an interface:
type seekerReader interface {
io.Reader
io.Seeker
}
Use this in your GetCounts
function:
func GetCounts(r seekerReader) Counts
Even better, use the built-in:
func GetCounts(r io.ReadSeeker) Counts
Now your test will work.
Finally, we can reuse existing test cases by swapping out:
CountWords(r)
with:
GetCounts(r).Words
This is a bit of a shortcut, but lets us reuse test logic while preparing for future refactors.
✅ Now:
- We fixed the file offset bug.
- We support both stdin and file input.
- We have test coverage for our combined logic.
- We reused and extended test cases.
In the next lesson, we’ll fix the final issue—total counts—which currently only applies to word count. When running:
go run main.go words.txt lines.txt
It only prints 10
, but we want all three attributes.
So go ahead and commit:
count.go
count_test.go
main.go
💾 Commit message suggestion: refactor: add GetCounts to fix offset issue and unify logic
See you in the next lesson!