Monitor Log & Relaunch Any Program in Linux

April 9, 2022

Linux, MariaDB, Nginx, Original, PHP, WordPress

CodeMilitant Solutions for Linux Nginx Python PHP Bash MariaDB

As a server DevOps, one of the most annoying issues can be program crashes. At the least, this diminishes server performance, and at the worst, it cripples the entire server. When the server is now a boat anchor, clients experience more than frustration.

Clearly there are many reasons a server program can crash, and none of these issues may be the fault of the devops. Using a typical web server as an example, the programs commonly running are Nginx, MariaDB, PHP on a Linux server. This setup, commonly called a LEMP stack (LAMP if Apache), by default is setup to simply serve website content for a handful of site visitors.

As a server devops, it’s vital to understand the default LEMP stack configurations right out of the box. Since Apache, Nginx, PHP and MariaDB have no idea the level of performance any website will be experiencing, only the bare minimums are enabled to ensure the website delivers content.

Understand that changing the size of a server will do very little to improve the total number of concurrent connections a server can endure. And long before a client decides to upgrade their hosting (server size), the site will crash repeatedly until they find you, the devops, that can solve their problems.

Once ‘sudo’ access is established, you really have no idea what you’re getting yourself into until you terminal into the server to see what’s going on. One of the first things you should do is setup the monitoring and relaunch Bash script below.


#!/bin/bash
# Bash script to monitor if vital programs are running
# If programs are not running, will attempt a restart
# set -x

declare -a PROGRAMS=('nginx' 'php-fpm' 'memcached')

timer=$(($RANDOM % 18))
sleep $timer
DATE=$(date "+%m-%d-%Y | %T")
for each in "${PROGRAMS[@]}"; do
  count=$( ps -ax | grep -o "$each" | wc -l );
  status=$( egrep -o running <<<$(systemctl status "$each") )
    if [ $count -lt 2 ] && [[ "$status" != "running" ]]; then
	echo "${each^^} PROGRAM WAS DOWN - RESTARTED: $DATE"$'\n\n' >> /var/log/relaunch.log;
	example_access_log=$( tail -n 300 /var/log/nginx/example.access.log );
	example_error_log=$( tail -n 300 /var/log/nginx/example.error.log );
	php_error_log=$( tail -n 50 /var/log/php-fpm/error.log );
	echo "=========== EXAMPLE NGINX ACCESS LOG: ==============="$'\n\n'"$example_access_log"$'\n\n' >> /var/log/relaunch.log;
	echo "=========== EXAMPLE NGINX ERROR LOG: ==============="$'\n\n'"$example_error_log"$'\n\n' >> /var/log/relaunch.log;
	echo "=========== PHP-FPM ERROR LOG: ==============="$'\n\n'"$php_error_log"$'\n\n' >> /var/log/relaunch.log;
	systemctl stop "$each";
	sleep 10;
	systemctl start "$each";
    fi
done

# set +x
exit 0

What does the above script do? First, this is a Bash script, so be sure this program is available before trying to set this up. On the command line, enter:


[user@localhost ~]$ which bash

This should return:


[user@localhost ~]$ /usr/bin/bash

The first line tells the script where to find Bash, so be sure this line matches the results from the ‘which’ command.

Anything starting with a ‘#’ hash tag is a comment line. The next command line ‘set -x’ can be used to debug the script, but the ‘#’ tag comments this out and the command is skipped.

Next, comes the list of programs the script should monitor. This is an array that’s built in the oldest form of Bash array format. This makes it compatible with any version of Bash, and this compatibility will make life easy when jumping from one server to another.

Next is the timer. Any program executed in Linux operates by informing the kernel that it’s executing. The kernel knows all, but it’s always a good idea to give the kernel a heads up and allow it to allocate resources for this script. The timer puts this script in an open file status without any drain on the server resources. Now the kernel knows there’s a file running and can limit other programs from executing while this script is executing.

Sleep executes the result of the timer value and this simply puts the script in a pause moment giving the kernel time to file this script in it’s list of processes.

The DATE variable is assigned the current date and time in the event there is a problem, let’s use this date/time to pinpoint when the error occurred and track down possible solutions.

Next is the magic that makes this script possible. The ‘for’ loop executes the actual monitor and restart of each program in the previously built array. The array can have any number of programs in it. This example would be for a typical web server.

The ‘for’ loop assigns each element in the array to the variable ‘each’. Then ‘each’ is used to verify that it is running by reviewing the program found in the array ‘${PROGRAMS[@]}’.

Inside the ‘for’ loop, the first thing to determine is the ‘count’ for the process snapshot. Any time a program runs in Linux, a search of the process snapshot (ps) list reveals whether or not the program currently exists in the kernel processes, and provides the script with a count that later confirms whether or not the script is running. The ‘sysctl’ variable is assigned the value of ‘running’ if in fact the program being monitored is ‘running’. Both the proper ‘count’ and the proper ‘sysctl’ status must exist for the restart process to be bypassed. If either should fail to meet the minimum parameters, the restart condition is executed.

The restart condition is found inside the ‘if’ statement that determines whether or not the restart is logged and executed. If the ‘count’ is less than ( -lt ) 2 and the ‘status’ does not equal “running”, then the ‘if’ condition is true and executes the error logging and restart.

The error logging starts by echoing which program was down and the date/time it was found to be down. It then creates a little separation with a couple of carriage returns $’\n\n’ and then the access and error log snapshots are assigned to their respective variables. These variables are then appended into the ‘/var/log/relaunch.log’. Notice that the logging variables are assigned the ‘tail’ end of the program logs. This tail can be as long or short as is required to help determine the potential cause of any troubles. Simply change the ‘tail’ number to change this output. Since this script will be executed by a cron job, it’s better to have a longer tail than a shorter one.

Once the logs are gathered and appended to the relaunch.log, a hard restart is executed. First the process that was not running is stopped, and then a 10 second ‘sleep’ is executed to give the server time to remove any processes from memory before the failed program is started once again.

The ‘for’ loop returns back to the top when it hits ‘done’ and runs it’s tests on the next program. By default, when the array has reached it’s last entry, the ‘for’ loop concludes and exits.

The ‘set +x’ will now turn off any debugging, if it was turned on to begin with, and will return the script back to a normal output. The ‘set’ command should always be enabled/disabled in pairs. First enable it with ‘set -x’, then disable it with ‘set +x’

Once all elements of the script have been executed, the script should ‘exit’ with a ‘0’ status. ‘0’ means success with no failures.

This small script is executed at a selected time interval using a cron job. Executing this script every 5 minutes would look like:


*/5 * * * * /usr/local/sbin/relaunch.sh

Be sure to ‘chmod’ this script to 700 for the server to properly execute it.


sudo chmod 700 /usr/local/sbin/relaunch.sh

To all the Bash scripting experts out there, this is the basic framework for what can be a far more complex script. This script can be adopted to send emails, exit on failures, log additional failures, start or stop additional programs and further interact with the programs being monitored.

Please add your comments below to help those that are learning how to solve server crashes.

April 9, 2022

codemilitant

Linux, MariaDB, Nginx, Original, PHP, WordPress

‘chmod’ this script to 700 for, ‘count’ for the process snapshot any time, ‘for’ loop assigns each element in, ‘for’ loop executes the actual monitor, ‘for’ loop returns back to, ‘status’ does not equal, ‘sudo’ access is established you really have, /var/log/nginx/example access log, /var/log/nginx/example error log, above script do, access and error log snapshots are assigned, allocate resources for this script the timer, anything starting with, appended into the ‘/var/log/relaunch log’ notice that, array format this makes it compatible with, array that’s built in the oldest, assigned the current date, basic framework for what can be, carriage returns, change this output since this script will, clearly there are many reasons, client decides to upgrade their hosting server, comes the list of programs, command line enter, comment line the next command line ‘set, compatibility will make life easy when jumping, complex script this script can be, count -lt, count that later confirms whether, crash repeatedly until they find you, debug the script but, default lemp stack configurations right out of, determine the potential cause of any troubles, devops using, either should fail to meet, emails exit on failures log additional failures, error logging starts by echoing which, example nginx access log, example nginx error log, example would be for, executed first, executes the error logging and restart, executing this script every, executing while this script is executing, first line tells, first this is, improve the total number of concurrent connections, informing the kernel that it’s executing, inside the ‘for’ loop the first thing, kernel knows all but it’s always, linux server this setup commonly called, little separation with, logging variables are assigned, longer tail than, magic that makes this script possible, matches the results from the ‘which’ command, memory before, minimum parameters, minutes would look like, monitor log, pause moment giving, performance any website will be experiencing only, php-fpm error log, pinpoint when the error occurred and track, possible solutions, previously built array the array can have, problem let’s, process snapshot ps list reveals whether, processes and provides the script with, program being monitored is ‘running’ both, program currently exists in the kernel, program logs this tail can be, program runs in linux, program was down, program was down and, programs are not running will attempt, programs being monitored, proper ‘count’ and the proper ‘sysctl’ status, relaunch any program in linux, resources now the kernel knows there’s, restart condition is found inside the ‘if’, reviewing the program found in the array, running and can limit other programs from, running the ‘sysctl’ variable is assigned, script should monitor this is an, second ‘sleep’ is executed to give, selected time interval using, server can endure and long before, server devops it’s vital to understand, server program can crash, server time to remove any processes from, server will do very little to, setup the monitoring and relaunch bash script, should return, simply change, simply serve, since apache nginx php and mariadb have, sleep executes the result of the timer, small script is executed at, solve server crashes, stack lamp if apache by default is, started once again, statement that determines whether or not, status ‘0’, status without any drain on the server, stopped and then, systemctl start, systemctl status, systemctl stop, their respective variables these variables are then, these issues may be the fault, things you should do is, those that are learning how to, timer any program executed in linux operates, typical web server, value and this simply puts the script, variable ‘each’ then ‘each’ is, verify that it is running by, vital programs are running, website delivers content, what’s going on one of the first, which bash, you’re getting yourself into until you terminal

Monitor Log & Relaunch Any Program in Linux

Leave a Reply Cancel reply