The SignifAI Community Hub

Welcome to the SignifAI Community Hub.
This is the place for you to find something new, express your thoughts, share and collaborate with other people. You'll find comprehensive guides and documentation to help you start working with SignifAI as quickly as possible, as well as support if you get stuck. Let's jump right in!

Nagios

Integrating with Nagios monitoring system

Integrating with Nagios

Nagios is a free and open source computer-software application that monitors systems, networks, and infrastructure. It alerts users upon issue creation and resolution.
With SignifAI Deep Knowledge™ engine, you can integrate all Nagios alerts into higher level issues which are automatically correlated, enriched, and prioritized with any other data stream coming from your infrastructure.
SignifAI also runs anomaly detection on your Nagios checks as part of the integration.

Details

We currently integrate with Nagios using an agent that pulls all data from the Nagios status.dat file. Monitoring the status.dat file on the main Nagios server provides couple of advantages:

  1. In case of a large distributed cluster installation with consolidation of events in any number of nodes, we only require installation of the agent on those main nodes.
  2. status.dat is the only place guaranteed to track any acknowledgements, state changes, service state changes and hosts state changes.
  3. With simple agent configuration, it is possible to filter for specific events as needed.

SignifAI Agent for Nagios

Instead of writing a dedicated Python data collector for Nagios, we have strategically decided to merge our efforts with an existing solid agent framework adopted by the Open Source community. To learn more about it read our Snap Open Telemetry Framework page. We have extended the framework to create a Nagios Collector Plugin for Snap as well as a Snap Processor Plugin to reduce duplicate events.

RPM Installation

We currently support RPM packages for Linux RedHat and CentOS version 6.8 and above.

  1. Install the repo: curl -s https://packagecloud.io/install/repositories/signifai/snap/script.rpm.sh | sudo bash
  2. Install our packages: yum install signifai-*

What Will Be Installed?

The download script will install the following packages:

  • signifai-snap-plugin-publisher-signifai
  • signifai-snap-plugin-processor-split-regexp
  • signifai-snap-plugin-processor-logs-regexp
  • signifai-snap-plugin-collector-nagios
  • snap-plugin-processor-regexp-engine
  • Go packages in case they are not installed already
  • Any other dependencies needed
  1. For better monitoring of the agent and in order to make sure it is working properly, we support adding a dedicated monitoring task manifest to monitor the internal Snap agent processor and error logs. Please contact support@signifai.io to receive your dedicated queue details and keys. Once you received the information write out the error log task manifest.
   ---
   version: 1
   schedule: 
     interval: 3s
     type: simple
   workflow:
     collect:
       config:
         /intel/logs:
           metric_name: signifai_snap_logs
           cache_dir: /opt/signifai/snap/cache
           log_dir: /var/log/snap
           log_file: snapteld.log
           splitter_type: new-line
           collection_time: 2s
           metrics_limit: 1000
       metrics:
         /intel/logs/*: {}
       publish:
       - plugin_name: awssqs
         config:
           queue: https://sqs.us-east-1.amazonaws.com/265975144233/$AWS_QUEUE_NAME
           akid:  $AWS_ACCESS_KEY
           secret: $AWS_SECRET_ACCESS_KEY
  1. Write out the collection task manifest:
   ---
   version: 1
   schedule: 
     interval: 300s
     type: simple
   deadline: 300s
   workflow:
     collect:
       config:
         /nagios: 
           status_file: $NAGIOS_STATUS_DAT
       metrics:
         /nagios/*/long_plugin_output: {}
         /nagios/*/services/*/long_plugin_output: {}
       tags:
         /nagios:
           is_long: "yes"
         /nagios/{your target host}:
           target_host: {add your host}
       process:
       - plugin_name: metric-repeat-filter
       publish:
       - config:
         api: metrics 
         host: $HOSTNAME
         token: $APIKEY
       - plugin_name: signifai-publisher

Make sure you grab your API KEY from the Nagios sensor section and to replace the $HOST with the hostname of the monitoring server. You will also need to replace $NAGIOS_STATUS_DAT with the path to your nagios status.dat file.

Make sure to edit you task file under /opt/signifai/snap/tasks/nagios-longoutput-to-signifai.yaml
Last restart the service service snap-telemetry restart

Building for any OS version

You will need to install 4 plugins:

  1. SignifAI Snap Nagios Plun-In: you can find full installation details here.
  2. The duplication processor plug-in (to avoid alerts duplications) - you can find full installation details here.
  3. The Snap SignifAI Publisher. You can find full installation details here
  4. SignifAI Snap advance regex processor engine. Note - this plug-in is not mandatory in case you want to send simple Nagios events but it is highly recommended. You can find full installation details here

Note

We currently require installing both the collector and the processor to avoid duplication and very large amount of information to transmit over the wire.
That way, ONLY new events will be sent to the SignifAI backend.

Configuration

There are few parameters we allow you to change based on your requirements in the Snap task definition:

  1. Collection interval. Where you can change the parameter interval: "10s". Usually 10 seconds collection interval is enough but you can increase or decrease it based on your needs.
  2. You can change the collected metrics:
       metrics:
         /nagios/*/acknowledged: {} 
         /nagios/*/state: {}
         /nagios/*/services/*/acknowledged: {}
         /nagios/*/services/*/state: {}
    

Supporting the Nagios long_plugin_output Option

Nagios supports writing your own plugin extension which will pretty much can do and output any type of information. This is very useful for custom checks and custom output information to be included in the incident.
SignifAI Nagios collector supports the long plugin output option by allowing you to specify a regex split filter, which will parse the information only for that part and send it to the SignifAI backend as additional attributes.
This capability is extremely important for customers who have already invested in their Nagios deployment and wish to achieve the highest value from integrating their own custom logic and outputs with SignifAI.

At a minimum, plugins should return at least one line of text output. Beginning with Nagios Core 3, plugins can optionally return multiple lines of output. Plugins may also return optional performance data that can be processed by external applications.

Plugin Output Length Restrictions

Nagios Core will only read the first 4 KB of data that a plugin returns. This is done in order to prevent runaway plugins from dumping megs or gigs of data back to Nagios Core. This 4 KB output limit is fairly easy to change if you need. Simply edit the value of the MAX_PLUGIN_OUTPUT_LENGTH definition in the include/nagios.h.in file of the source code distribution and recompile Nagios Core.

The following task file is a full example of a configuration file that specify all the Nagios configuration, metrics to collect, enabling the metric-repeat-filter in order to remove duplications before sending the data to SignifAI, a split-regexp plugin to determine exactly where to start parsing the longoutput option and last, enabling the regexp_log plugin with a simple JSON example parsing.

---
  version: 1
  schedule:
    type: "simple"
    interval: "60s"
  workflow:
    collect:
      config:
        /nagios:
          status_file: $PATH_TO_STATUS_DAT$
      metrics:
        /nagios/*/long_plugin_output: {}
        /nagios/*/services/*/long_plugin_output: {}
      process:
        - plugin_name: "metric-repeat-filter"
          process:
            - plugin_name: "split-regexp"
              config:
                regexp_split: \[[A-Za-z0-9:_-]*\]
              process:
                - plugin_name: "logs-regexp"
                  config:
                    regexp_log: "^.*{(?:.*\"appname\": \"(?P<application_name>[^\"]*)\"
                  publish:
                    - plugin_name: signifai-publisher
                      config:
                        host: $HOSTNAME$
                        token: $TOKEN$
                        api: metrics

For the full regexp_log plugin capabilities and syntax please review this link. Make sure you are reviewing it carefully if you wish SignifAI to collect and automatically parse your valuable information.

Advanced Regex Processing

SignifAI developed a very advanced regex processor engine to support complex requirement for very nested Nagios long output or any other non-standard output. In this processor, metrics are passed in from the collector (or from previous processors), modified, and passed on "down the chain" to either more processors or publishers. This plugin can split one metric into many by a regular expression, then enrich each of the split metrics with further regular expressions and golang templating.

Upon receiving a metric, this plugin will:

Attempt to match the metric against a "gate" provided in configuration. If it matches, the plugin will process against that gate's configuration by:

  • Splitting the metrics by the regexes provided in the 'split' section, in order:
    • Attempting to match each split against the original gate, filtering it out entirely on failure
    • Parsing the regexes, using capture groups to capture parts of the string to store in the metric as tags
    • Using golang templating against the metric as a whole to create or override further tags for the metric.
  • If no gates match, the metric is simply passed "down the chain" as-is.
  • If the metric matches more than one gate, it will be processed for each gate.

For full documentation please refer here. Please note - this is an advanced feature. Contact support@signifai.io for any help.

---
  version: 1
  schedule:
    type: "simple"
    interval: "3s"
  max-failures: 10
  workflow:
    collect:
      config:
        metric_name: featurelistfile
        cache_dir: /var/lib/snap/logcache
        log_dir: /var/log
        log_file: featurelistfile
        splitter_type: new-line
        collection_time: 2s
        metrics_limit: 1000
      metrics:
        /intel/logs/*: {}
      publish:
        - plugin_name: "regexp-engine"
          config:
            "^feature ([A-Za-z0-9]+)":
              split:
                - "\|"
              parse:
                - "^feature (?P<feature_name>)"
              tags:
                feature_index: "{{ .Tags.feature_name }}"
          publish:
            - plugin_name: "file"
              config:
                file: /tmp/logmetrics

SignifAI Hints

If you are using the long output option and writing your own script of health check, we strongly advise to make sure your attribute names are are matching to at least some of the SignifAI auto-correlation attributes. That's important so that no additional configuration is needed in order to correlate between your custom checks attributes and any other system you are adding to SignifAI.
The following attributes will be automatically correlated:

  • application_name
  • datacenter_name
  • datacenter_status
  • host_name

Note - this list is flexible. If you need anything else please contact us.

Nagios long_plugin_output Supported Version

Note - SignifAI currently support Nagios version 3.5.1 and above. Also, as of Nagios 4.x the default 4K limit has been doubled to 8K. Any additional information will be truncated. If you wish to make sure you receive your most valuable information, prioritize those fields first and annotate them at the beginning of the message.

Need help with the integration?

Contact us at: support@signifai.io and we will be happy to help.

Nagios

Integrating with Nagios monitoring system