Please purchase the course to watch this video.

Full Course
The lesson addresses a bug related to file offsets when handling multiple files or standard input in an application. The core issue arises when the same file descriptor is used, leading to incorrect results from count functions due to the file offset being at the end of the file. The solution involves implementing a more efficient single-pass algorithm to count bytes, lines, and words by reading through the file only once, rather than iterating multiple times. Additionally, it emphasizes the importance of error handling and the limitations of unit tests in capturing all potential issues, particularly with standard input. The lesson concludes with plans to enhance coding output and further explore advanced testing and concurrency in future modules.
No links available for this lesson.
A few lessons back, we solved a bug inside of our application code caused by file offsets. Specifically, by passing the same file descriptor to multiple functions, we ran into an issue where the file offset was set to the end of file, which caused each subsequent count function to return a result of 0
.
We solved this using the seek
method to reset the file offset back to the start, which worked for:
- Individual files
- Multiple files
- Standard input stream from a file
However, this doesn't work when:
- We pipe output from another command (e.g.
echo
) - We manually type into standard input
echo "one two three" | go run main.go
Or:
one two three
^D
Both result in only printing the number of bytes, not the full count.
Investigating the Problem
When it comes to standard input, seeking isn’t always possible.
If we check the return value from Seek
, we can see the error:
_, err := file.Seek(0, io.SeekStart)
if err != nil {
log.Fatalf("failed to seek: %v", err)
}
When piping:
failed to seek: seek /dev/stdin: illegal seek
Problems with Current Approach
Even when seeking works:
- We're iterating the file three times: once each for lines, words, and bytes
- This is inefficient
- And doesn’t work on non-seekable streams like piped stdin
✅ Solution: Single-Pass Counting
We already count lines by checking runes in the file.
So let’s:
- Replace our use of
countWords
,countLines
,countBytes
- Iterate through runes once
- Count all three in a single loop
Step 1: Refactor getCounts
Delete all existing logic inside getCounts
, and replace with:
func getCounts(r io.Reader) Counts {
var res Counts
// implementation here...
return res
}
Note: we can now remove the
ReadSeeker
requirement and switch back toio.Reader
.
Step 2: Read Each Rune Once
Use bufio.NewReader
:
reader := bufio.NewReader(r)
for {
r, size, err := reader.ReadRune()
if err != nil {
break
}
res.Bytes += size
if r == '\n' {
res.Lines++
}
// Count words (explained next)
}
Step 3: Count Words
We'll use a state machine-style flag to track whether we're inside a word:
inWord := false
for {
r, size, err := reader.ReadRune()
if err != nil {
break
}
res.Bytes += size
if r == '\n' {
res.Lines++
}
if unicode.IsSpace(r) {
inWord = false
} else if !inWord {
res.Words++
inWord = true
}
}
✅ Benefits
Now that we've refactored:
- All our unit tests pass
countWords
,countLines
,countBytes
tests are ✅- You can test with:
go test
And with standard input:
echo "hello world" | go run main.go
Or:
this is stdin input
^D
All work correctly.
🧪 Future Note
Our unit tests don't truly test piped standard input behavior. Later in the course, we'll look at:
- End-to-end testing
- Simulating stdin behavior more thoroughly
📦 Final Cleanup
Since we're no longer using the seek
method, change getCounts
to use io.Reader
:
func getCounts(r io.Reader) Counts
Update any function signatures or calls accordingly.
✅ Recap
- Replaced multi-pass algorithm with single-pass
- Solved the stdin bug
- Improved efficiency
- Simplified code
💾 Commit Your Changes
git add count.go
git commit -m "added in single pass algorithm"
With that, we’re ready to move on to the next lesson where we’ll improve the printing output using tabular alignment.