Please purchase the course to watch this video.

Full Course
Lockfiles and pid files are essential tools used in programming to manage process concurrency and ensure data integrity. A lockfile allows only one instance of an application to run at a time, preventing data corruption during operations like database access or file manipulation. However, traditional lockfile methods can lead to issues when an application terminates unexpectedly, leaving orphaned lockfiles that require manual cleanup. To address this, pid files can be implemented, which contain the process ID of the running application, allowing for automated recovery and cleanup of orphaned files. By checking if the process associated with a pid file is active, developers can decide whether to remove the pid file, thus simplifying the management of running processes. The lesson outlines the steps for creating pid files, incorporating process management techniques, and presents a functional approach to implementing file locking mechanisms.
No links available for this lesson.
If you remember, back in the last module, when we were first looking at how to use signals and perform graceful shutdown, we implemented it with an example looking at lockfiles, where we were able to constrain our application in order to only have one process running at a single time. Lockfiles can be an incredibly useful tool in order to prevent multiple instances of your process from running simultaneously, which can help to ensure data integrity. Some common use cases are for things such as database access, file manipulation, or whenever you want to make sure that you only have one application running.
In my case, I actually use lockfiles with a CLI tool that I've created when it comes to screen recording, so that I only have one instance on my screen recorder running at a single time. The idea behind a lockfile is actually incredibly simple. Generally, you try to create a file if one doesn't exist already. In our case, we're doing it here using the oexclusive
flag and the oscreate
flag, which, if you remember from the earlier lessons in this module, is used to exclusively create a file if it doesn't exist already. If the file does exist already when we try to pass in these flags, then it will return an error. It actually returns a specific type of error, but we'll talk about that later on when it comes to advanced error handling and how we can check for that.
In this case, if we receive an error, then we assume we can't create the lockfile or we didn't obtain the lock for whatever reason. Therefore, the application doesn't proceed. However, if the application did create the lockfile with the exclusive and create flags, then it ensures that it's removed at the end of the application, and to make sure that happens on an interrupt signal, intercepting the interrupt signal using the notify function of the signal package, passing in a channel, which we'll sleep on, either checking to see if the application is done, or if not, we'll go ahead and call the done channel and remove the lockfile at the end of the process.
In general, the idea is pretty simple. However, lockfiles themselves do have a bit of an issue. To show this in action, if we go ahead and run this code, as you can see, the application now is running and the lockfile has been created. Then when I go ahead and run this code again, you'll see it fails to create the lockfile as another process might be running. If I go ahead and press sig int
, then the lockfile is removed because the interrupt signal has been intercepted. So far, this works pretty well.
However, there is an issue with our current lockfile setup that we had in the last lesson. To show what this issue is, if I go ahead and run the application again using the go run
command, and this time, rather than sending an interrupt signal, if I use pkill
to send a sig term
, which has the ID of 15, to the pid file application, you can see the application is terminated, but the lockfile wasn't cleaned up. This is an issue because now our system is in a bad state and I can't run the application again without first removing the lockfile manually. Whilst in some cases this is acceptable and you want to be notified of when your application is in a bad state, in other situations it would be nice to be able to recover by default. Therefore, how can we go about doing this?
Well, this is where another type of lockfile comes specifically known as a pid file. Pid files are actually very similar to lockfiles, however they have one slight difference. Rather than being empty, they actually store the ID of the process that has been created them, which allows us to track and manage the process. Pid files themselves can be used for a number of different purposes, such as acting as lockfiles, or even to just let other applications know the process ID that is currently running, which means that they can actually monitor the process or kill it if they need to.
In our case, we're going to turn our lockfile into a pid file and use it to recover when our application is in a bad state. So to begin, let's go ahead and first define a new file name. Currently we had this hardcoded as lockfile and it could present a bit of an issue. Here you can see we have two divergences. So let's go ahead and actually change this to be file name and pass it in as follows. We could call this lockfile
name, but in this case we're just going to go ahead and set it to be file name.
Next, in order for our pid file to work, we need to obtain our process's ID and write to it. So how to do so? Well, in order to obtain the process ID in Go, we can go ahead and make use of the OS package, specifically the git pid
function, which returns the process ID of the caller. In addition to git pid
, there's also a number of other git
functions as well, such as git uid
, which returns the numeric user's ID, git gid
, which returns the numeric group ID, etc., etc., which we should have an understanding of what those are after the file mode lesson we just had.
In any case, in this situation, we want to obtain the process ID or the pid. So let's go ahead and capture it in a variable called pid
. Then we can just go ahead and write this to our actual file, which we can do using the fmt.Println
function, or it may be preferable to use the fprintf
function so that you don't have a newline character. If you do, then you're going to need to parse that out later on. To show how that looks, however, let's go ahead and use the fmt.Println
method so that we know how to parse out newline characters. And we'll go ahead and parse in our file as the writer, and we'll just go ahead and parse in the process ID.
Now if I go ahead and run the code and check the directory, you can see we have a pid file inside, which contains the process ID of the application. We can confirm that this is the case by using pgrep
on pid file, which as you can see is the exact same ID. And if we try to run our code again, you can see we can't create the log file because the pid file exists. So far, so good. We basically have the exact same setup as we had before. However, this time, because we now have the process ID that owns the log file, or the pid file in this case, we're able to do some more things.
For example, if I go ahead and run this code, and let's go ahead and send a pkill
of 15 to this pid file application so that it's terminated. We're sending the termination signal. Now we still have our pid file remaining, so we have an orphaned pid file, but we actually have the process ID that's meant to own it. This means we can actually put our system into a state of recovery, where we are able to check for this existence of the process that owns this pid, and if it doesn't exist, we can just remove our pid file.
Therefore, let's go ahead and actually implement this to show how it's done. In this case, we're going to go ahead and create a new function called recover pid file
, let's say. And we'll just accept the file name as the pid file we want to recover. Then let's say if an error does occur, we can just go ahead and call the recover pid file
function, passing in the pid file name. In most cases in your application, you'll want to actually continue the process if you are able to recover. In this case, however, we're just going to go ahead and actually clean up after ourselves, rather than trying to continue the process. I'll leave that as an exercise up for you.
In any case, to recover the pid file, the first thing we need to do is obtain the process id from it. So let's just go ahead and capture this as data error
, and we'll do the os.read file
function, passing in the file name, and it will do a quick if error check. If we do receive an error, we'll just go ahead and actually do log.fatal
line failed to read file, and we'll just go ahead and print out the error.
Okay, with that, we now have the actual process id available to us in the form of data. So the next thing we need to do is go ahead and actually turn it into a process id integer. To do so, we can go ahead and use the sdr.conv
package, which allows us to use the a2i
method, which will convert a string into an int. In this case, the string we want to parse is the data string, as follows, which we can go ahead and cast to a string as such. Then let's go ahead and actually capture the pid, and if an error occurs, we'll just go ahead and log.fatal line as well. log.fatal line failed to parse pid
, and we'll just go ahead and return an error. And we can just do a format.Println pid
is equal to pid.
Okay, let's just go ahead and print this out to see what happens. As you can see, we do have our pid file. So if I go ahead and run this code, you can see we get an error. failed to parse pid
, sdr.conv a2i
. I'm going to go ahead and actually set the flags here. Let's go ahead and do log.set flags
, zero
, just so that it's a little easier. Now if we go ahead and run this code, failed to parse pid
, sdr.conv a2i
, parsing 16, 5, 7, 7, 2, invalid syntax. This is because we have a new line character at the end of our process id, which is because we used the fmt.fprint line
function in order to write our pid to it, so it wrote a new line character.
But therefore we need to go ahead and actually trim this new line character, which we can do using the strings.trim space
function, which we've seen at the very beginning of this course, which will remove the white space character at the end of the function. Now if I go ahead and run this, you can see we get pid 16, 57, 72. So far so good.
Next up we need to go ahead and check for the existence of this process, and if it doesn't exist then we want to go ahead and remove the file. To do so let's go ahead and create a new function called process exists
, which will take a pid as an integer and it will return a boolean and an error in case an error occurs. Then we can go ahead and do a quick check, so if process exists, then let's go ahead and just quickly return a false and a nil just for the meantime.
As for actually checking whether or not a process does exist by its id, to do so we can again use the os package, which provides a lot of tools for working with our system. Specifically we want to use the find process
function. If we take a look at the documentation for this function, actually I'm going to go ahead and scroll down so my face isn't in the way, if we take a look at the documentation for this function you can see that find process
looks for a running process by its pid. The process, which is the os.process
struct or type that it returns, can be used to obtain information about the underlying operating system process.
On Unix systems, find process
always succeeds, which is good to know, and returns a process for the given pid regardless of whether the process exists. That's a bit of a caveat. Therefore in order to test whether the process actually exists we need to see whether the p.signal
function with syscall.signal
0, so sending a signal of 0 through the syscall package, reports an error. Therefore let's go ahead and actually do this.
So first we're going to go ahead and capture a process and we'll capture an error as follows and we use the os.find process
function in order to obtain the underlying process. If we get an error here, we shouldn't ever get one because as the documentation said Unix systems will always return, but let's just go ahead and return a false and then we'll wrap this error saying could not
. Actually, we'll just do find process then we'll go ahead and wrap it as so
.
Okay so next we then need to actually see whether or not the process is running, which as the documentation said we can do so by using the signal method, passing in an os.signal. So far throughout signals we've been using the interrupt or the kill signal, which are two variables available on the os package. However as you saw, the documentation allows us to call signals using the syscall function, which is a much lower level package which allows us to actually perform system calls in Go.
If we take a look at the some of the constants of this function, specifically the sig, you can see it actually provides a number of other signals that we can use in their more low level form. For example, we can use sig int
, sig term
, sig kill
and even sig h up
and all of the other signals here as well. In our case, we just want to go ahead and pass a null signal, so we can do that by calling the signal class which is just an integer and passing in the value of zero.
Therefore we're now sending the signal of zero to our signal to test to see whether it exists. However we need to go ahead and actually capture the return value of this, which is an error. So error
, and we'll just set this to be equal, process.signal exists
. However we actually don't need to capture this, we just want to go ahead and actually check to see if this exists, which we can do using the following line. So exists is
, and we're assigning the value of process.signal
, which returns an error, but we want to check to see if this is nil
. If the error is nil
, then we know that the process exists because it was able to receive a null signal. Therefore we can then just go ahead and return the exists value as follows.
Okay with that, we should now have a function available for us to check to see whether a process exists by its given process id. Therefore we can now go ahead and actually print this out, saying process exists as follows. And we'll just do the exists, process exists, passing in the PID. Actually let's go ahead and capture this, exists, error, and we'll do process exists
, PID
, and then if error we actually just want to go ahead and do a log.fatal
, fail to check if the process exists
. So we'll just go ahead and log, and we can go ahead and log the exists boolean as well.
Okay now if we go ahead and run this code, you can see that we're getting the PID from the file, and we're checking to see if the process exists, which in this case it doesn't. Therefore we can now go ahead and make use of this knowledge to actually remove our process id, or PID file, in the case that the process doesn't exist. So let's go ahead and do so. If it does exist, we're just going to go ahead and return. Basically we don't want to do anything in the case that this process is actually running. However if it doesn't exist, which is outside of this if block, let's just go ahead and remove the file name as follows.
And we can do a quick if error check, just to be safe if error error is not equal to nil. And we'll go ahead and do a log.fatal
line failed to remove file
. And we can get rid of this as well. Then let's just go ahead and do a nice print line saying PID file cleaned up
.
Okay with that, if we go ahead and run our code now, we can see that we have the PID file in place and it contains the following PID. Now if we go ahead and run our code, you can see that we get the log message PID file cleaned up
. And our application is still complaining because the file exists, because we're recovering afterwards. We can actually go ahead and put this after and check the PID file is actually recovered.
Now if we go ahead and run this code again, and we'll go ahead and send a sig term
. So pkill PID file
signal has been terminated and the PID file exists. Now if I go ahead and run this code again, you can see that we failed to get the locked file, another process might be running. We checked and we cleaned up the PID file yet again, as it no longer exists. Not only this, but our actual locking should work in the case that two applications run. Yep, failed to create lock file, another process might be running. Although this time we didn't actually clean it up because we checked to see if the process existed, which it is.
Okay with that, we've managed to change our lock file implementation to instead use a PID file. And now that this is working, we can see that it's been cleaned up as well, which provides a lot more information than just using a standard lock file by itself, as it allows us to easily determine the process ID that owns this lock file, which means we can recover in the event that the file gets orphaned. However currently our code fails, even though when it could recover, it could actually proceed. However, I'm going to leave that as an exercise for you, to go ahead and actually change this code so that we can use file locking when it comes to using a PID file.
In order to get you started, let's go ahead and define what that file lock may look like. So you want to start by creating a new struct called file lock
and inside it's going to have two methods or two public methods of try lock
, which will return an error in the case it can't try the lock. So let's just set this to be return nil tries to obtain PID file
. Then we're going to have a method called unlock
, which again will unlock slash remove PID file. So first of all, go ahead and abstract all of the code that we've currently written in order to kind of open a file and remove a file or unlock it into these two functions.
Then for the actual try lock
function, I recommend extending on it. So attempt to recover file if exists and no PID or and PID doesn't continue fail otherwise. There we go. So in this case, you'll want to set up your try lock
so that it tries to obtain the PID file. If the PID file exists and the process ID doesn't, then it will remove it and obtain the PID file again. Otherwise it will try and fail. So yeah, as an exercise, go about kind of implementing this file lock method and I'll leave some checklists and some hints down below in order to help you understand how to do this. I'll also provide some code snippets and things that I've tried in order to create this actual lock file. Once you're done, we'll then move on to the next lesson where we're going to take a look at another form of file locking. However, rather than actually creating a lock file or creating a PID file, in this case we'll be locking an actual file that we may be writing to or reading from.