Create an account to watch this video

Full Course
The lesson focuses on developing a word count function in Go and addresses challenges related to accurately counting words in various input scenarios. Initially, the function mistakenly counts spaces instead of words, leading to incorrect results, particularly when handling multiple spaces and newline characters. The lesson introduces a solution by leveraging the Go standard library's strings.Fields
function to simplify word counting, which efficiently splits strings around whitespace characters. The instructor emphasizes the importance of utilizing existing library functions to improve code readability and maintainability. Additionally, the lesson transitions towards best practices in testing by suggesting the separation of test code from application code, enhancing the approach to black-box testing. Finally, it hints at future enhancements, including error handling and expanding functionalities for a comprehensive command-line application.
In the last lesson, we managed to change our test code in order to support table-driven tests, which are a form of parameterised testing, allowing us to easily add new test cases in order to test our count words function. However, by doing so, it's allowed us to understand that our code doesn't work correctly, and given various different inputs, we'll produce the wrong count of words.
For example, if we go ahead and test this code, you can see in two of our expectations, one where we have a new line character in the middle of our sentence, and another where we have multiple spaces after a full stop, then our code doesn't work. Not only this, but there are other parameters we can pass in that will just cause our code to break.
So let's say prefixed multiple spaces, such as inputs, and then we can do, say, two spaces here. hello
. In this case, we want 1
, but we're probably going to have 3
. That's not great. And same thing as if we, say, have a suffix or a postfix of multiple spaces. So let's say suffixed, and we'll go ahead and just do hello
, this time one, two, three, four, five, say. Now, again, if we test this code, you'll see that our test isn't passing. In this case, we're getting 7
when we're expecting 1
.
These issues are due to the fact that there's a fundamental problem when it comes to our actual algorithm's implementation, in that it's not actually counting the number of words, it's instead counting the number of spaces, and we're deriving the word count from it. Although, as we've seen, spaces and words are loosely coupled. They're not always going to be able to easily derive from one another.
So how do we actually solve this? If we take a look at the words.txt
, we can make a better deduction of how many words there are in this file based on some properties. Initially, we're making the assumption that each word is going to have at least one space minus the last word, which isn't actually true. Spaces themselves don't equal the number of words minus one, but instead they are used to separate words. We could have multiple spaces in between, but we can at least understand that at least one space will always separate a word. Therefore, we can actually build an algorithm based off of this.
To take a look at what that algorithm looks like, let's head back on over to our func countWords
function, and let's just go ahead and remove all of our existing code. Then let's go ahead and define a word count, and we'll just go ahead and return it at the end of our function as follows.
Then, taking what we know, that spaces separate words, let's think about how we can actually figure out how to solve this. Given that we know words are going to be separated by a whitespace character, let's just begin by determining whether or not the character we're currently iterating through is actually a space. So, to do so, let's begin by iterating over our slice of bytes as we were doing before, using the following for range expression.
Then, we can go ahead and do a simple space is equal to x equals space
as follows. Okay, with that, our code is similar to what it was before, where we're checking each character to determine whether or not it's a space. So, how can we make use of this information to actually derive our word count?
Well, if we head back on over to our words.txt
, let's take a look at what our code is actually doing. It's starting here and going isSpace
? no. isSpace
? no. isSpace
? no. Then, it's doing isSpace
true
. Next, it's then going isSpace
no. So, therefore, we can actually make a determination due to the fact that the state of is space
is changing.
For example, if we had multiple spaces, we could go isSpace
? no, isSpace
? no, isSpace
? no, isSpace
? yes. Then, it's yes, yes, yes, and back to isSpace
? no. Therefore, we can actually determine that every time the isSpace
property changes from its previous state, so before it was no, before it was no, before it was no, before it was no, but now it's yes, we can determine that every time it switches over, there's actually a change in whether or not something is a word.
So, let's go ahead and change our algorithm to make use of that fact. First, we need to go ahead and say was space
, we could say, for the previous value, and in this case, we'll actually just set it to be false
. Actually, we'll go ahead and set this to be true
, and at the end of our for loop, let's just go ahead and set was space
to equal is space
, let's say.
Okay, with that, we're now tracking the previous state of our isSpace
expression. Then, all we need to do is do if wasSpace && !isSpace
, then we can go ahead and increment our word count by one wordCount++
. This will represent the state change from the previous expression and the new expression. So, if we were in a space, but we're currently not in a space, then we know that we've entered a word.
Now, we should be able to test our code and it should work, but before we do so, let's go ahead and just use the go run
command. Which still produces the number 5. Now, if we go ahead and test our code, let's see how many tests now pass. Almost all of them. If we go ahead and run the go test
command with the -v
flag, we can actually see that most of our edge cases are now working. Except for one.
So, which test is actually failing? Well, if we take a look at our test cases in the main test.go
, you can see it's this one, new lines
. This is because we have a new line character in between our words, and we're only currently checking for spaces, not for new line characters. So, let's go ahead and actually fix this.
To do so, we could just check to see if x
is a space, or if x
is a new line character.
if x == ' ' || x == '\n' {
Given that new line characters are technically considered white space. Now, if we go ahead and test this code, our code should now work. Which it does. So far, so good.
However, you may be thinking, how many characters do I have to consider? And this is something to be aware of. There are more than just two white space characters when it comes to our code. For example, let's say we have a tab \t
character, which we can go ahead and add in a test for. Let's go ahead and define one as Name: tab character in code
. And we'll go ahead and set the input to be Hello\tWorld
. And we'll also add a new line character in as well, just to be a little more expressive.
In this case, we're expecting there to be two words. But when we go ahead and test this code, using the go test
command, you'll see it actually reduces the value of one. So we could go about solving this by adding in yet another check of checking to see if this is a \t
. However, at this point, we're going to be adding in a lot of exceptions, given the fact that there's many white space characters when it comes to UTF-8 encoding.
So whilst this algorithm does work, there's actually an easier way for us to be able to solve this, which is to make use of the Go standard library. The Go standard library provides a lot of functions for commonly used algorithms and tasks. And believe it or not, counting words or fields is a pretty common operation.
In fact, the strings
package of the Go standard library actually provides a function for us to be able to do this, which is the Fields
function. Which, if we take a look at the documentation for, Fields
will split a string s
around each instance of one or more consecutive white space characters, as defined by the unicode.IsSpace
function, returning a slice of substrings of s
, or an empty slice if s
only contains white spaces, or contains only white spaces. This is pretty much what we're trying to do already. We're trying to separate our string around each instance of white space character, and we're tracking the state changes in order to understand whether or not it's a word.
If we take a look at the actual example, you can see this more in action, which is where it's calling the Fields
function on foobarbaz
, three words, and it ends up returning each individual word themselves. If we take a look at the unicode.IsSpace
function, you can see here are the actual space characters that are defined. In this case, it's actually only defining a few, which we could do an individual check each for, although there are some caveats when it comes to Unicode.
Therefore, let's just go ahead and replace our code with a call to this function instead. First, we'll import the strings package into our code, as follows, and then let's go ahead and define a new variable called words
, which we'll assign to the return value of the strings.Fields
function, passing in a stringified version of our actual data slice, as follows.
Then we can just go ahead and return the length of these words, and that should give us the count of words inside of the slice of bytes. If we now go ahead and test this as follows go test
, you can see our tests are working as expected, and if I do the -v
flag, we can see that they're running each of the individual tests, and they're all passing.
As you can see, this is much more concise, and it's pretty much doing the exact same thing we were doing before, although it is taking a little bit more memory. However, the readability of this code is greatly improved, and we're not having to deal with any edge cases for all of the different white space characters that we may be encountering.
That being said, this code also has some issues as well, which we're going to take a look at more in the next module, and how we can go about resolving them, kind of actually going back to our original algorithm. But for the meantime, we're going to leave this code as it is, because, well, it's a lot easier for us to read, and it showcases some of the power when it comes to the standard library of Go, which provides many different functions that we can actually use in order to understand how to break up strings, and to work with other values.
If you remember from earlier, I mentioned that casting from a string to a slice of bytes, and casting from a slice of bytes to a string, is a non-zero cost operation, and therefore this is actually going to be incurring a very slight cost when it comes to our code. Additionally, the strings package isn't the only package that provides a Fields
function.
As, if we take a look at the standard library, and head on over to the bytes package, which is a package that is used for dealing with bytes, whereas the strings package is a package for dealing with strings, you can see that this function also provides a Fields
function as well, which does pretty much the exact same thing. It splits our byte slice around each instance of one or more consecutive white space characters.
So let's actually go ahead and replace our call to the strings.Fields
function with a call to the bytes.Fields
function. First importing the bytes package, and instead using the bytes.Fields
function, and we no longer need to cast our data into a string. This will make our code just that little bit more concise, which if we go ahead and test, you can see is working as it did before.
This time, however, we're making use of a package suited for bytes, and we no longer have the overhead of casting our slice of bytes to a string. With that, we've managed to implement a simple word counting algorithm within our application, which if we actually go ahead and run with our words.txt
file, you can see is working correctly.
As you can see, whenever it comes to problems that you may encounter within your Go code, the standard library typically has a solution that we can use, and provides many different packages with many different functions related to the type that you're using. In most cases, whenever you're dealing with strings, you can use the strings package, or if you're dealing with bytes, as we are, we can use the bytes package.
In fact, the standard library is so powerful, especially when it comes to building command line applications, that we're going to be making use of it through the majority of this course. And from the 10th module of this course, we'll start looking at other third-party packages that will make building command line applications even more powerful.
In any case, that wraps up the first module of building command line applications with Go, where we now have a very simple countWords
function, and also a very simple application that will count the number of words in a given file. Before we move on, there's just one thing I want to quickly change when it comes to our test code. Currently, our test code is living inside of the same package as our application code, the main package.
Whilst there's nothing inherently wrong with this, it's a personal preference of mine to try to keep tests as separate from your main package as possible. The reason for this is that it forces you to end up testing the public interfaces of your code rather than having access to any of the internals. Currently, in our case, it doesn't really matter too much because all of our code is inside of a public interface. But it's still, in my opinion, a good habit to force as I believe it makes code less brittle when it comes to tests.
Therefore, before we go ahead and make some changes to this test code, which we're going to do in the next module, specifically when it comes to supporting some of the types we're going to be adding in, let's go ahead and change this from being a white box test, which means it has access to the internals of the package to a black box test. We'll talk more about white box testing versus black box testing in the module on advanced testing.
However, for the meantime, let's just go ahead and change our code by first adding in the suffix of _test
. This allows the package, this tells the Go toolchain that this is now a test package and shouldn't have access to the internals of the package it lives in. As you can see, our count words
function is now undefined. Now, in order to be able to call this function, we need to import our package as if we were a package consumer, which we can do by using the module's name, github.com/dreamsofcode-io/counter
.
Then we now have access to the main package, which is what we've imported, and we can just call main.count words
. Whilst this approach works, personally, I feel a little bit icky calling the main package whenever it comes to code. Fortunately, Go provides us the ability to rename a package when we import it, which we can do by just specifying the new package name beforehand. In this case, I want to call it counter
, which is the last path component of the actual package name.
Now, we can reference this as counter
. By the way, the renaming of packages is available to any package. It's very useful when it comes to things such as naming conflicts when you import two packages that have the same name, and we can apply it to pretty much any package, including those found in the standard library. For example, let's say we want to rename the testing
package to be foo
. We can do so as follows, and now we could reference the foo.T
type instead of the testing.T
type, although I only recommend doing this when you have a conflict or in the case of a package name that doesn't meet the last path component, as we saw with the main package.
When we restructure our code later on, we'll actually take a look at how we can solve that issue for other people that want to consume our package as well, but that's going to be at the end of module three. In any case, with that, we've now set up our code to be in a good state when we go to make changes to the test function later on, ensuring that we're only ever testing the public methods of our main package.
Before we move on, go ahead and commit your code. In the next lesson, we're going to begin improving on what we already have by first making sure that we're handling any errors when it comes to our code, and by continuing to improve on this code, adding in more command line interface functionality, such as being able to add command line arguments, being able to parse command line flags, and adding in new features to count more than just the number of words in a file, but also other attributes, such as the number of lines and the number of bytes.
In any case, now's a good time to go ahead and commit your code, which we can do by using the git add
command, adding in the main.go
file and the main test.go
file, and we can then go ahead and give this the commit message of say, converted counting words properly using the fields function
. And with that, we're ready to move on to the next module where we're going to start taking this code and turning it into a full featured CLI application.