Project 1: AI-Powered Shell Script Website Monitor

Published On: August 18, 2025

Table of Contents

Project Overview: AI-Powered Shell Script Website Monitor

Goal: Create a script that monitors a list of websites, sends alerts on failure, keeps a log, and uses AI to provide intelligent analysis of downtime.

Learning Objectives:

Shell Scripting Fundamentals: Variables, loops, conditional statements (if/else), functions.
Advanced Shell Scripting: Reading from files, error handling, command-line tools (curl, date, grep), process substitution, sourcing external files.
DevOps Best Practices:
- Configuration as Code: Separating configuration (websites.conf) from logic (monitor.sh).
- Secrets Management: Using a .env file for API keys and tokens (never hardcode secrets!).
- Automation: Scheduling the script with cron.
- Logging: Maintaining a clear record of events.
- Alerting: Integrating with modern tools like Slack or Telegram.
AI Integration: Using curl to interact with the OpenAI API (ChatGPT) to get human-readable insights into failures.

Prerequisites

A Linux/macOS Environment: A command line is essential.
curl: A command-line tool for transferring data with URLs. (Usually pre-installed).
jq: A lightweight command-line JSON processor. We’ll need this to parse the AI’s response.
- Install it: sudo apt-get install jq (Debian/Ubuntu) or sudo yum install jq (CentOS/RHEL) or brew install jq (macOS).
(Optional) A Slack Webhook URL or Telegram Bot Token: For sending alerts.
(Optional) An OpenAI API Key: To enable the AI analysis feature. You can get one from the OpenAI Platform.

Phase 1: Project Structure

A well-structured project is a DevOps best practice. Let’s create the directory and files.

# Create the main project directory
mkdir website-monitor
cd website-monitor

# Create the core script file
touch monitor.sh

# Create the configuration file for websites
touch websites.conf

# Create a file for our secrets (environment variables)
touch .env

# Create a directory for logs
mkdir logs

# Make our script executable
chmod +x monitor.sh

Our project structure should now look like this:

website-monitor/
├── monitor.sh        # The main script
├── websites.conf     # List of websites to monitor
├── .env              # Secrets (API keys, webhook URLs)
└── logs/             # Directory for log files

Phase 2: The Core Monitoring Logic (v1)

Let’s start by building the basic functionality inside monitor.sh.

Step 2.1: Populate Configuration Files

websites.conf: Add the URLs you want to monitor, one per line. Include one that is likely to fail for testing purposes.

https://google.com
https://github.com
https://httpstat.us/503  # This site will return a 503 error for testing
https://non-existent-domain-12345.com # This will fail to resolve

.env: Add your secret tokens here. IMPORTANT: Add .env to your .gitignore file if you use Git!

#Alerting Configuration
# Choose one: "slack", "telegram", "email", or "log"
ALERT_METHOD="email"
# Email configuration
EMAIL_TO="sriharimalapati@gmail.com"
EMAIL_FROM="srihari9963@INBOOKY1PLUS"
# OpenAI Configuration (Optional)
# OPENAI_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Step 2.2: The Initial monitor.sh Script

Let’s write the first version of our script. This version will read the config, check each site, and log the status.

#!/bin/bash
# Configuration 
CONFIG_FILE="websites.conf"
LOG_FILE="logs/monitor.log"
ENV_FILE=".env" # NEW: Define the .env file

# Load Environment Variables 
if [ -f "${ENV_FILE}" ]; then
    export $(grep -v '^#' ${ENV_FILE} | xargs)
else
    echo "Error: .env file not found at ${ENV_FILE}"
    exit 1
fi

# Functions 

# Function to log messages with a timestamp
log() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "${LOG_FILE}"
}

# NEW: Function to send an email alert
send_email_alert() {
    local url=$1
    local reason=$2
    local subject="Website Down Alert: ${url}"
    local body="The website ${url} is currently down.

Reason: ${reason}
Timestamp: $(date '+%Y-%m-%d %H:%M:%S')

Please investigate the issue.
"
    # Use the 'mail' command to send the email
    echo "${body}" | mail -s "${subject}" "${EMAIL_TO}"
    log "ALERT: Email alert sent to ${EMAIL_TO} for ${url}."
}

# NEW: Central function to handle sending alerts based on ALERT_METHOD
send_alert() {
    local url=$1
    local reason=$2

    case "${ALERT_METHOD}" in
        "email")
            send_email_alert "$url" "$reason"
            ;;
        "log")
            # For 'log' method, we just log it as a critical failure
            log "ALERT_TRIGGERED (method=log): ${url} is DOWN. Reason: ${reason}"
            ;;
        *)
            log "WARNING: Unknown ALERT_METHOD '${ALERT_METHOD}'. Defaulting to log."
            log "ALERT_TRIGGERED (method=log): ${url} is DOWN. Reason: ${reason}"
            ;;
    esac
}


# Function to check a single website
check_site() {
    local url=$1
    
    http_code=$(curl -s -o /dev/null -w "%{http_code}" --connect-timeout 5 --max-time 10 "${url}")
    curl_exit_code=$?

    if [ $curl_exit_code -ne 0 ]; then
        log "ERROR: Failed to connect to ${url}. cURL exit code: ${curl_exit_code}."
        # MODIFIED: Call the central alert function
        send_alert "${url}" "Connection Error. cURL exit code: ${curl_exit_code}."
        echo "Site ${url} is DOWN (Connection Error)."
    elif [ "${http_code}" -ge 200 ] && [ "${http_code}" -lt 400 ]; then
        log "SUCCESS: ${url} is UP. Status code: ${http_code}."
        echo "Site ${url} is UP."
    else
        log "FAILURE: ${url} is DOWN. Status code: ${http_code}."
        # MODIFIED: Call the central alert function
        send_alert "${url}" "HTTP Status Code: ${http_code}."
        echo "Site ${url} is DOWN (HTTP ${http_code})."
    fi
}

# Main Script Logic

# Check if the configuration file exists
if [ ! -f "${CONFIG_FILE}" ]; then
    echo "Error: Configuration file not found at ${CONFIG_FILE}"
    exit 1
fi

log "Monitor script started"

# Read each line from the config file and check the site
while IFS= read -r site || [[ -n "$site" ]]; do
    if [[ -n "$site" ]]; then # Ensure the line is not empty
        check_site "$site"
    fi
done < "${CONFIG_FILE}"

log "Monitor script finished"

Run it:

./monitor.sh

Expected Output (on screen):

Site https://google.com is UP.
Site https://github.com is UP.
Site https://httpstat.us/503 is DOWN (HTTP 503).
Site https://non-existent-domain-12345.com is DOWN (Connection Error).

Check the log file: cat logs/monitor.log will show timestamped details.

Phase 3: Adding Robust Alerting

Now let’s make the alerts useful by integrating with Slack.

Update .env:
Uncomment and fill in your SLACK_WEBHOOK_URL and set ALERT_METHOD=”slack”.

Update monitor.sh:

We need a place to store the last known status of each website.

In your project directory, create a new folder called status:

mkdir status

We will create a file inside this directory to track the status. You don’t need to create the file yourself; the script will do it automatically.

This is a significant upgrade to the script’s logic. We will add new functions to manage state and incorporate a retry loop. Carefully replace your existing monitor.sh with this new version.

#!/bin/bash
# Configuration 
CONFIG_FILE="websites.conf"
LOG_FILE="logs/monitor.log"
ENV_FILE=".env"
STATE_FILE="status/status.log" # NEW: File to store the last known status

# Constants 
MAX_RETRIES=3       # NEW: Number of times to retry a failed check
RETRY_DELAY=5       # NEW: Seconds to wait between retries

# Load Environment Variables 
if [ -f "${ENV_FILE}" ]; then
    export $(grep -v '^#' ${ENV_FILE} | xargs)
else
    echo "Error: .env file not found at ${ENV_FILE}"
    exit 1
fi

# Functions

# Function to log messages with a timestamp
log() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "${LOG_FILE}"
}

# MODIFIED: Function to send an email alert (now handles different subjects)
send_email_alert() {
    local url=$1
    local reason=$2
    local subject=$3 # NEW: Pass in the subject line
    local body="Website: ${url}
Reason: ${reason}
Timestamp: $(date '+%Y-%m-%d %H:%M:%S')
"
    echo "${body}" | mail -s "${subject}" "${EMAIL_TO}"
    log "ALERT: Email alert with subject '${subject}' sent to ${EMAIL_TO} for ${url}."
}

# MODIFIED: Central function to handle sending alerts
send_alert() {
    local url=$1
    local reason=$2
    local status=$3 # NEW: "DOWN" or "RECOVERY"

    local subject=""
    if [ "$status" == "DOWN" ]; then
        subject="Website Down Alert: ${url}"
    elif [ "$status" == "RECOVERY" ]; then
        subject="Website Recovered: ${url}"
    else
        return # Do nothing if status is unknown
    fi

    case "${ALERT_METHOD}" in
        "email")
            send_email_alert "$url" "$reason" "$subject"
            ;;
        "log")
            log "ALERT_TRIGGERED (method=log): Status is ${status} for ${url}. Reason: ${reason}"
            ;;
        *)
            log "WARNING: Unknown ALERT_METHOD '${ALERT_METHOD}'. Defaulting to log."
            log "ALERT_TRIGGERED (method=log): Status is ${status} for ${url}. Reason: ${reason}"
            ;;
    esac
}

# NEW: Function to get the last known status of a site
get_last_status() {
    local url=$1
    if [ ! -f "${STATE_FILE}" ]; then
        echo "UP" # Assume UP if state file doesn't exist yet
        return
    fi
    # Find the line for the URL and get the status after the '='
    grep "^${url}=" "${STATE_FILE}" | cut -d'=' -f2
}

# NEW: Function to update the status of a site in the state file
update_status() {
    local url=$1
    local status=$2 # "UP" or "DOWN"
    
    # Create a temporary file
    local temp_state_file
    temp_state_file=$(mktemp)

    # If the state file exists, remove the old entry for this URL
    if [ -f "${STATE_FILE}" ]; then
        grep -v "^${url}=" "${STATE_FILE}" > "${temp_state_file}"
    fi
    
    # Add the new status entry
    echo "${url}=${status}" >> "${temp_state_file}"
    
    # Replace the old state file with the new one
    mv "${temp_state_file}" "${STATE_FILE}"
}


# HEAVILY MODIFIED: Function to check a single website
check_site() {
    local url=$1
    local current_status="UP" # Assume UP initially
    local reason=""

    # Retry Loop 
    for (( i=1; i<=MAX_RETRIES; i++ )); do
        http_code=$(curl -s -o /dev/null -w "%{http_code}" --connect-timeout 5 --max-time 10 "${url}")
        curl_exit_code=$?

        if [ $curl_exit_code -ne 0 ]; then
            current_status="DOWN"
            reason="Connection Error. cURL exit code: ${curl_exit_code}."
        elif [ "${http_code}" -ge 200 ] && [ "${http_code}" -lt 400 ]; then
            current_status="UP"
            reason="HTTP Status Code: ${http_code}."
            break # Success, no need to retry
        else
            current_status="DOWN"
            reason="HTTP Status Code: ${http_code}."
        fi

        # If we are not on the last retry, wait before trying again
        if [ "$current_status" == "DOWN" ] && [ $i -lt $MAX_RETRIES ]; then
            log "INFO: Check for ${url} failed (Attempt ${i}/${MAX_RETRIES}). Retrying in ${RETRY_DELAY}s..."
            sleep ${RETRY_DELAY}
        fi
    done

    # State Comparison Logic 
    last_status=$(get_last_status "${url}")
    # If no previous status, default to UP to avoid alerts on first run
    last_status=${last_status:-"UP"} 

    echo "Site: ${url} | Current Status: ${current_status} | Last Known Status: ${last_status}"

    if [ "${current_status}" == "DOWN" ] && [ "${last_status}" == "UP" ]; then
        # It just went down! Send an alert.
        log "STATE_CHANGE: ${url} has gone DOWN. Reason: ${reason}"
        send_alert "${url}" "${reason}" "DOWN"
        update_status "${url}" "DOWN"
    elif [ "${current_status}" == "UP" ] && [ "${last_status}" == "DOWN" ]; then
        # It just recovered! Send a recovery alert.
        log "STATE_CHANGE: ${url} has RECOVERED. Reason: ${reason}"
        send_alert "${url}" "${reason}" "RECOVERY"
        update_status "${url}" "UP"
    elif [ "${current_status}" != "${last_status}" ]; then
        # This handles the initial run case where a site is UP and last_status is empty
        update_status "${url}" "${current_status}"
    else
        # Status has not changed, just log it quietly
        log "INFO: Status for ${url} remains ${current_status}."
    fi
}

# Main Script Logic 

# Ensure state and log directories exist
mkdir -p "$(dirname "${LOG_FILE}")"
mkdir -p "$(dirname "${STATE_FILE}")"

if [ ! -f "${CONFIG_FILE}" ]; then
    echo "Error: Configuration file not found at ${CONFIG_FILE}"
    exit 1
fi

log "Monitor script started"

while IFS= read -r site || [[ -n "$site" ]]; do
    if [[ -n "$site" ]]; then
        check_site "$site"
    fi
done < "${CONFIG_FILE}"

log "Monitor script finished"

How to Run and What to Expect

Run the script as before: ./monitor.sh

Expected Screen Output:

Site: https://google.com | Current Status: UP | Last Known Status: UP
Site: https://github.com | Current Status: UP | Last Known Status: UP
Site: https://httpstat.us/503 | Current Status: DOWN | Last Known Status: UP
Site: https://non-existent-domain-12345.com | Current Status: DOWN | Last Known Status: UP

Key Changes:

Sourcing .env: The script now loads the variables from your .env file.
send_alert function: A modular function to handle different alert methods. The case statement makes it easy to add more (like Telegram or slackl).
Improved Logic: The script now specifically alerts on status codes >= 400.

Phase 4: The AI Integration

This is the “expert” level. When a site is down, we’ll ask an AI what the error code means and what to do about it.

Update .env:
Uncomment and add your OPENAI_API_KEY.

Final monitor.sh with AI:

#!/bin/bash

# Load environment variables from .env file
if [ -f .env ]; then
    export $(grep -v '^#' .env | xargs)
fi

# Configuration 
CONFIG_FILE="websites.conf"
LOG_FILE="logs/monitor.log"
# Set to "true" to enable AI analysis on failures
AI_ANALYSIS_ENABLED="true" 

# Functions 

log() {
    # Log to both console and file
    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "${LOG_FILE}"
}

get_ai_suggestion() {
    local url=$1
    local error_code=$2
    local error_type=$3 # "HTTP" or "Connection"

    if [ "${AI_ANALYSIS_ENABLED}" != "true" ] || [ -z "${OPENAI_API_KEY}" ]; then
        echo "AI analysis disabled or API key not set."
        return
    fi

    log "Querying AI for analysis on ${url} (${error_type} ${error_code})..."

    if [ "${error_type}" == "HTTP" ]; then
        prompt="A website monitor check for the URL '${url}' failed with HTTP status code ${error_code}. As a DevOps expert, briefly explain what this status code likely means in one sentence and suggest 2-3 immediate, practical troubleshooting steps in a bulleted list."
    else
        prompt="A website monitor check for the URL '${url}' failed with a cURL connection error (exit code ${error_code}). As a DevOps expert, briefly explain what this type of error likely means (e.g., DNS, firewall) in one sentence and suggest 2-3 immediate, practical troubleshooting steps in a bulleted list."
    fi

    # Construct the JSON payload for the OpenAI API
    json_payload=$(cat <<EOF
{
  "model": "gpt-3.5-turbo",
  "messages": [{"role": "user", "content": "${prompt}"}],
  "temperature": 0.5,
  "max_tokens": 150
}
EOF
)

    # Use curl to call the API and jq to parse the response
    response=$(curl -s -X POST https://api.openai.com/v1/chat/completions \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer ${OPENAI_API_KEY}" \
      -d "${json_payload}")

    # Extract the content, remove leading/trailing quotes and newlines
    suggestion=$(echo "${response}" | jq -r '.choices[0].message.content' | sed 's/\\n/\n/g' | sed 's/^"//' | sed 's/"$//')

    if [ -n "${suggestion}" ] && [ "${suggestion}" != "null" ]; then
        echo -e "\n\n* AI DevOps Assistant Suggestion:*\n${suggestion}"
    else
        echo "Could not retrieve AI suggestion. API response: ${response}"
    fi
}

send_alert() {
    local subject=$1
    local body=$2
    
    log "Sending alert: ${subject}"

    case "${ALERT_METHOD}" in
        "slack")
            if [ -z "${SLACK_WEBHOOK_URL}" ]; then log "ERROR: SLACK_WEBHOOK_URL is not set."; return; fi
            payload="{\"blocks\":[{\"type\":\"section\",\"text\":{\"type\":\"mrkdwn\",\"text\":\"*${subject}*\"}},{\"type\":\"section\",\"text\":{\"type\":\"mrkdwn\",\"text\":\"${body}\"}}]}"
            curl -s -X POST -H 'Content-type: application/json' --data "${payload}" "${SLACK_WEBHOOK_URL}" > /dev/null
            ;;
        *)
            log "ALERT (Logged): Subject: ${subject} - Body: ${body}"
            ;;
    esac
}

check_site() {
    local url=$1
    
    http_code=$(curl -s -o /dev/null -w "%{http_code}" --connect-timeout 5 --max-time 10 "${url}")
    curl_exit_code=$?

    if [ $curl_exit_code -ne 0 ]; then
        subject="🚨 Website Down: ${url}"
        body="Failed to connect to the site (cURL exit code: ${curl_exit_code}). This could be a DNS or network issue."
        ai_suggestion=$(get_ai_suggestion "${url}" "${curl_exit_code}" "Connection")
        send_alert "${subject}" "${body}${ai_suggestion}"
        
    elif [ "${http_code}" -ge 400 ]; then
        subject="🚨 Website Alert: ${url}"
        body="The site is responding with a non-success HTTP status code: *${http_code}*."
        ai_suggestion=$(get_ai_suggestion "${url}" "${http_code}" "HTTP")
        send_alert "${subject}" "${body}${ai_suggestion}"
        
    else
        log "SUCCESS: ${url} is UP. Status code: ${http_code}."
    fi
}

#  Main Script Logic 
if [ ! -f "${CONFIG_FILE}" ]; then log "Error: Configuration file not found at ${CONFIG_FILE}"; exit 1; fi

log "--- Monitor script started ---"
while IFS= read -r site || [[ -n "$site" ]]; do
    if [[ -n "$site" ]] && [[ ! "$site" =~ ^# ]]; then
        check_site "$site"
    fi
done < "${CONFIG_FILE}"
log "--- Monitor script finished ---"

Key AI-related Changes:

get_ai_suggestion function: This is the core of the AI feature.
- It takes the URL and error code as input.
- It crafts a specific, role-based prompt for the AI (“As a DevOps expert…”). This is called prompt engineering.
- It uses a heredoc (cat <<EOF…) to create the JSON payload for the OpenAI API, which is a clean way to handle multi-line strings.
- It calls the API using curl.
- It uses jq to parse the JSON response and extract the AI’s message.
Integration: The check_site function now calls get_ai_suggestion on failure and appends the AI’s response to the alert body before sending it.

Phase 5: Automation with Cron

The final step is to automate the script to run periodically.

Open your crontab for editing: crontab -e
Add a line to run the script. This example runs it every 5 minutes.
IMPORTANT: Use absolute paths in your cron job to avoid issues.
# Run the website monitor every 5 minutes */5 * * * * /path/to/your/project/website-monitor/monitor.sh
- Find the absolute path by navigating to your website-monitor directory and running pwd.
- The cron job will run in the background. The script’s output (from tee) and logs will be saved to your logs/monitor.log file.

Final Project Review

You now have a complete, end-to-end monitoring solution that demonstrates expert-level shell scripting and modern DevOps principles.

Modular: Functions for logging, alerting, and checking make the code clean and reusable.
Configurable: URLs and secrets are managed outside the script.
Robust: It handles different types of failures (connection vs. HTTP errors).
Automated: cron ensures it runs continuously without manual intervention.
Intelligent: AI integration provides valuable, context-aware troubleshooting advice directly in the alert, saving engineers time during an outage.

This project is an excellent foundation. You can expand it by adding more alert methods, checking for specific content on a page, or measuring website response times.

For more information about Job Notifications, Open-source Projects, DevOps and Cloud project, please stay tuned TechCareerHubs official website.