nagios plugins really are that easy blog header scaled

Nagios Plugins Really Are That Easy. 

Plugins. They aren’t scary; they’re not that hard to create, but it is fun to say a few times fast.

February 12, 2024

For those who prefer video format, watch our webinar covering this topic.

In this post we’ll show you how to monitor all your tech and make it do your bidding. Before the robots come to get us, at least we can set some warning and critical thresholds to enjoy the current state of things a little while longer.  

Let’s ask ourselves some basic questions about Nagios Plugins and walk through how simple it is to create your own.  

So very first thing, what are Nagios plugins?

To put it as simply as possible, Nagios Plugins are responsible for monitoring the metrics that you’re interested in. Plugins contain the instructions needed to pull the metrics from your systems or applications. 

Also, fun fact, Nagios Plugins are used to monitor hosts! A great example would be the standard ping plugin that Nagios uses to monitor whether a system is up or down. Hosts have different statuses than OK, warning, critical, and unknown. Instead, they have up, down, and unreachable. 

Nagios Plugins will also contain logic to determine if that metric is in an OK, warning, critical, or unknown state. 

Plugins essentially are little monitoring tools, wrapped into a tiny package. Isn’t that cute?  

Considerations for your Nagios Plugin.

There are a lot of tutorials out there on how to create Nagios Plugins, but many are written by people who are unable to show just how simple they are to make. There are a lot of basics that can be glossed over, and even worse, can make writing a Nagios Plugin seem like a very complex endeavor.  

So, let’s dispel some assumptions that go into creating a Nagios Plugin.  

Here are some considerations you need to be aware of.  

1. Where does your plugin actually live?

Location is the most important consideration when creating your Nagios Plugin. 

Ask yourself: Is this a plugin that resides on the Nagios server, the server being monitored, or somewhere in between? 

The location is important to consider because that will dictate the language you will use to write the plugin and how the plugin is called by Nagios. 

For example, if the location of the plugin is a Windows system, that can drastically change your options compared to if the plugin location is a Linux system. On a Windows system, you might have access to run PowerShell scripts, compiled C# applications, etc., but maybe not pearl, bash, or Python scripts.  

Similarly, how you call a plugin on a Windows system would be different from a Linux system. On Windows, you might choose to use the Nagios Cross-Platform Agent’s (NCPA) custom plugin feature, whereas on Linux, you might choose to use the check by SSH plugin that comes with the standard set of Nagios Plugins. 

2. Using Nagios Plugins for Proxy checks.

Take the standard example of a ping check. 

With ping check, you’re checking on the connection between the Nagios server, Host A, and Host B. But what if you wanted to gather the metrics on the connection between Host A and Host B? 

Having a ping plugin on Host A could be called with Host B as the target. 

Nagios then gets the result of the connection between Host A and Host B with the Nagios connection completely removed from it. That way you can get a better idea of what the network looks like between two points rather than what the network looks like from Nagios to those points. 

3. Programming language

Which language can I use to write my Nagios Plugin? Nearly, all of them.  

Currently, the Nagios Plugin development guidelines push you towards using a compiled language like C for performance. 

However, Nagios was created back in a time when CPU cycles were far less abundant than they are today. So, you might still want to write your plugin with one eye on performance (Don’t go crazy with the nested “if” statements). 

Remember what really matters. Place your priority on getting the right metrics and statuses that you want monitored rather than on the language that you choose to write your Nagios Plugin. 

If you find that your Nagios Plugin needs to be a little more efficient, you’ll find that, more likely than not, there’s something that can be cut out of your plugin before you move to a different language. 

Almost all programming languages provide you with the ability to output an exit code, an important piece in a Nagios Plugin. So, choose whatever language you’re most comfortable with, or whatever makes the most sense for the device that you’re trying to monitor. 

Next, how is the plugin going to be called? Notice that we say “called” rather than “executed” because that can be confusing within the topic of programming languages. 

Essentially, how is Nagios going to call this plugin? 

As mentioned before, this has quite a bit to do with where the plugin is located. 

If it lives on a Windows machine, you might have the plugin residing in NCPA’s plugins directory, and you can use the check NCPA.py plugin to call your custom plugin.  

If your plugin lives on a Linux system, you can use NCPA, or you can use the check by SSH plugin to call your custom plugin. 

Oftentimes, if the plugin is located on the Nagios server, it has all the logic needed to reach out to the application that’s being monitored. For example, if you’re monitoring SQL Server, a plugin will typically make a connection to SQL Server’s port listener port on port 1433.  

To summarize, Nagios reaches out to a system, queries it, and brings back the metric when the plugin lives on the Nagios server. Super easy. 

So… why would you want a plugin to live on a remote Host? 

4. So… why would you want a Nagios Plugin to live on a remote Host? 

A few reasons. 

Maybe you have plugins living on the remote Host for the purpose of executing them as passive service checks, where the remote Host will schedule the execution of that plugin and just the data is sent back to the Nagios server. 

Another reason could be that you’re more comfortable with Windows or are a PowerShell developer, and you don’t want to go through learning Bash. Go ahead and create your Nagios Plugin in PowerShell. 

Finally, let's walk through the building of a Nagios Plugin.

As an example, let’s create a plugin that monitors the usage of a CPU. This plugin can be found on the Nagios Exchange, but you can get it here. 

In this example, we are currently in a Windows server and have a bunch of PowerShell plugins. 

I’m taking advantage of a lot of the features available to me with PowerShell creating different help parameters here. 

<# 
.DESCRIPTION 
A PowerShell based plugin for Nagios and Nagios-like systems. This plugin checks the CPU utilization on Windows machines. This plugin gives you the average CPU usage across all CPUs and all cores.
Remember, thresholds must be breached before they are thrown.
E.g. numwarning 10 will need the number of files to be 11 or higher to throw a WARNING. 
.SYNOPSIS 
A PowerShell based plugin to check CPU utilization on Windows machines 
.NOTES 
This plugin does not have the option to show individual utilization per CPU or per core. 
This plugin will return performance data. 
.PARAMETER warning 
The CPU utilization you will tolerate before throwing a WARNING 
.PARAMETER critical 
The CPU utilization you will tolerate before throwing a CRITICAL 
.EXAMPLE 
PS> .\check_cpu.ps1 
.EXAMPLE 
PS> .\check_cpu.ps1 -warning 80 -critical 90 
#>

Thanks to PowerShell, we’ll be able to see all the parameters, examples, the syntax, and all the good stuff. 

We just have two parameters for warning and critical. 

param( 
    [Parameter(Mandatory=$false)][int]$warning = $null, 
    [Parameter(Mandatory=$false)][int]$critical = $null 
) 

We recommend always setting your exit message and your exit code specifically to an exit code of three, or unknown for a service check. Then, make your plugin go through the logic of checking the metric and changing the message and exit. 

$message = "Nothing changed the status output!" 
$exitcode = 3 

This can avoid a situation where you run your plugin, and nothing changes the exit code. Your plugin says everything is OK on the system while your CPU is hot enough to fry an egg. 

Here is the process for getting the check and getting, the OK, warning, and critical status. 

function processCheck { 

    param ( 
        [Parameter(Mandatory=$true)][int]$checkResult, 
        [Parameter(Mandatory=$true)][int]$warningThresh, 
        [Parameter(Mandatory=$true)][int]$criticalThresh, 
        [Parameter(Mandatory=$false)][string]$returnMessage 
    ) 
 
    [array]$returnArray 
    if ((!$criticalThresh) -and (!$warningThresh) ) { 
 
        $returnArray = @(0, "OK: $returnMessage") 
    } 
    elseif ($checkResult -gt $criticalThresh) { 
 
        $returnArray = @(2, "CRITICAL: $returnMessage") 
    } 
    elseif ($checkResult -le $criticalThresh -and $checkResult -gt $warningThresh) { 
 
        $returnArray = @(1, "WARNING: $returnMessage") 
    } 
    else { 
 
        $returnArray = @(0, "OK: $returnMessage") 
    } 
 
    return $returnArray 
 
} 

Here’s the logic to get the CPU information, placing it in a percentage, and determining its usage.   

$cpus = (Get-CimInstance -ClassName Win32_Processor -ComputerName localhost).LoadPercentage 
$cpuusage = 0 
 
 
foreach ($cpu in $cpus) { 
    $cpuusage += $cpu 
}

Remember how we said that Nagios only really cares about the exit code? We can exit without implementing these return messages and have an exit code of zero through three depending on what the state is.  

The power of an exit message here, however, is that we can display and separate out the status of the service check from the actual performance data. 

And what we are returning here is simply the CPU usage, CPU usage in percent, our warning threshold that was specified, and the critical threshold that was specified. 

$message = "CPU utilization is $cpuusage" 
$processArray = processCheck -checkResult $cpuusage ` 
                             -warningThresh $warning ` 
                             -criticalThresh $critical ` 
                             -returnMessage "CPU usage is $cpuusage | 'CPU Usage'=$cpuusage%;$warning;$critical" 
$exitcode = $processArray[1] 
$exitMessage = $processArray[2] 
 
write-host $exitMessage 
exit $exitcode

This data will get passed to Nagios XI. XI will then process it and put it into a nice graph for you. So getting that information and recording it for historical use is super simple for those creating a custom Nagios Plugin. 

Try writing your own Nagios Plugin!

We host community plugins on the Nagios Exchange and even feature some of the best ones on our social media.