Monitoring
To view or assign network settings, navigate to the following page in the CloudFS WebUI:
Configuration > Monitoring
The following table describes the monitoring settings you can configure.
| Monitoring Settings | Description |
|---|---|
| Syslog | |
| Syslog Server |
Enter the IP address or hostname of the syslog server. Note: Only UDP is supported for syslog messages. |
| Logging Level | Select the minimum level of messages to send to the server. The selected level and all higher levels are sent. For example, if you specify Error, then messages of type Error, Critical, Alert, and Emergency are sent. |
| Trace Logs | Click Add Trace Log to specify the applications to be logged in addition to the standard syslog logging. Select a service and a logging level, and click Add. Add additional entries as needed. |
|
SMTP Server |
Enter the hostname or IP address of the email server. |
| Sender email address | Enter the email address to appear in the From field for alert notifications. |
| SMTP Port | Enter the port on the email server. |
| Use encryption | Select to encrypt the alert messages using SMTPS. |
| Use authentication | Select to require authentication for the SMTP server. If the SMTP server requires authentication, enable this setting and enter a username and password. |
| Username | Enter the username for access to the SMTP server. |
| Password | Enter the password for the user accessing the SMTP server. |
| Test | Click to test connection. |
| Test email address | Email address of the recipient for the email test. |
| Email Alert Settings | |
| Allow Repeating Email |
Enable to send multiple emails for any given alert. If this setting is disabled, only the new alert is sent. Note:
|
| Email interval (minutes) Default: 5 minutes | Select the interval for aggregating events before sending email notification. This setting is independent of the Allow repeating emails field. |
| Email Alert Recipients | Click Add Recipient. Enter an email address and enable or disable from the Alerts Category and Alert Severity to notify the recipients.
Type of Alerts Alert Category
|
| Edit Recipient | Only the Alert Category and Alert Severity can be changed for the selected recipient. |
| Delete Recipient | Select the recipient and click Delete. Click Confirm on the message to proceed ahead. |
| SNMP | |
|
Read Community String |
Enter the community string for read-only communication between the node and the trap receiver. If you specify a custom community string, the public community string is disabled. |
|
Recipient IP Recipient Community String |
Enter the hostnames or IP addresses of up to two SNMP trap receivers and the associated community strings. |
| SNMP Trap Threshold Settings | Enter the usage thresholds (percent) that will trigger SNMP trap messages of particular types: CPU usage, memory usage, disk usage, and cloud usage. The node generates SNMP traps of the specified type if the usage meets or exceeds the threshold. |
| SNMP Users | |
| Add User | Click to add an SNMP user. Added users are listed. Listed users can be deleted from the list. |
| Username | Enter a user name for the SNMP user. |
| Authentication Protocol | Select the algorithm to use for authenticating the SNMP user (SHA or MD5). |
| Authentication Password | Enter a password to authenticate the SNMP user. |
| Authentication Data Privacy Method | Select an option for data encryption (DES or AES). |
|
Privacy Password Privacy Password Confirm |
Enter a password for the DES or AES encryption. |
Simple Network Management Protocol
Simple Network Management Protocol (SNMP) is an Internet Standard protocol that allows customers to monitor networked devices through a single tool. Devices that support SNMP include routers, modems, switches, and servers. SNMP monitors values exported by the SNMP agent and allows push notifications, which are traps in SNMP language. The two components of SNMP include SNMP agents and SNMP managers. An SNMP agent is a software on a managed device. The software allows the SNMP manager to communicate with the device using SNMP. The SNMP manager is an external software that queries, receives events, and gets responses from devices. A managed device implements an SNMP interface for node-specific information.
A Management Information Base (MIB) is a collection of information organized hierarchically. These are accessed using a protocol such as SNMP. MIBs are created by Managed Device vendors (Panzura), and they are stored on the device. MIBs must be provided to the SNMP manager so it knows how to translate the MIB values from the managed device. MIBs use a hierarchical namespace that contains object identifiers (OID). An OID identifies a variable that can be read using SNMP.
There are two transaction types: polling and traps. When polling occurs, the SNMP manager sends an SNMP request to a managed device at the default polling interval, which is 120 seconds. The managed device then responds with an SNMP response status. The second type of transaction is a trap or push notification. SNMP managers are always ready to receive a trap from a managed device. In order to receive and translate a trap from a managed device, the managed device must be configured to send traps to the manager, and the SNMP manager must be provided with a trap MIB from the managed device.
The following are the three versions of SNMP security:
- Version 1. Plain text authentication.
- Version 2. Improved authentication. Community strings are still transmitted over the wire in clear, plain text.
- Version 3. Provides the following three levels of authentication:
o NoAuthNoPriv. Users who use this level don't have authentication or privacy when they send and receive messages.
o AuthNoPriv. This level requires users to authenticate, but does not encrypt sent or received messages.
o AuthPriv. This level is the most secure. Authentication is required and sent and received messages are encrypted.
You can download the MIB zip archive using the following URL: https://docs.panzura.com/PANZURA_SNMP.tgz
After downloading the MIB archive, decompress it and load files into your SNMP manager.
This download is also available by selecting Maintenance> System Operations> Download MIB.
To configure the SNMP settings on a Panzura node:
-
Log in to your Panzura node.
-
Select CONFIGURATION > Monitoring > SNMP Users > SNMP Settings.
-
In the Read Community String field, enter the community string for read-only communication between the node and SNMP manager.
If you specify a custom community string, the public community string is disabled.
-
In the Recipient IP field, enter the SNMP manager IP (this is where we send traps).
-
In the Recipient Community String, enter the SNMP manager associated with community strings.
-
In the SNMP Trap Threshold Settings field, Enter the usage thresholds (percent) that will trigger SNMP trap messages.
To edit a trap:
-
Log in to your Panzura node.
-
Select CONFIGURATION > Monitoring > SNMP Users > SNMP Settings.
-
In the SNMP Trap Thresholds section, select Actions > Edit SNMP Trap.
To add an SNMP user:
-
Log in to your Panzura node.
-
Select CONFIGURATION > SNMP Users.
-
Click the ADD button.
The following table displays the SNMP traps that Panzura supports:
| Name | Trigger Condition | ID |
|---|---|---|
| pzCloudControllerHighCPUUsage | cpu_load > threshold_value | SNMPv2- SMI::enterprises.32853.1.2.1.2.1000 |
| pzCloudControllerHighMemoryUsage | (used_memory * 100 / total_memory) > threshold_value | SNMPv2- SMI::enterprises.32853.1.2.1.2.1001 |
| pzCloudControllerHighDiskUsage | (used_disk * 100 / total_disk) > threshold_value | SNMPv2- SMI::enterprises.32853.1.2.1.2.1002 |
| pzCloudControllerHighCloudUsage | (used_cloud * 100 / total_cloud) > threshold_value | SNMPv2- SMI::enterprises.32853.1.2.1.2.1003 |
| pzTrapMetaSpill | (meta_space_used * 100 / total_meta_ssd) > threshold_value | SNMPv2- SMI::enterprises.32853.1.2.1.2.1004 |
| pzTrapMetaAllocFail | vfs.zfs.metaslab.stats.mg_spill>100 and it keeps increasing | SNMPv2- SMI::enterprises.32853.1.2.1.2.1005 |
| pzTrapActiveDown | /opt/pixel8/bin/pz_ping <host>, failed for 3 times, we think the host is down, then this trap is sent | SNMPv2- SMI::enterprises.32853.1.2.1.2.1006 |
| pzAutoFailover | AutoFailover occurred | SNMPv2- SMI::enterprises.32853.1.2.1.2.1007 |
| pzRegularFailover | RegularFailover occurred | SNMPv2- SMI::enterprises.32853.1.2.1.2.1008 |
| pzAlertTrap | An alert is shown on GUI, an email will be sent to customer, and 1009 trap will be sent also | SNMPv2- SMI::enterprises.32853.1.2.1.2.1009 |
| pzCloudWriteFailureTrap | Sent when cloud write failures exceed threshold | SNMPv2- SMI::enterprises.32853.1.2.1.2.1011 |
| pzWarnTrap | A warning is shown in theGUI, and an email will be sent to customer. Trap1009 is sent as well. | SNMPv2-SMI::enterprises.32853.1.2.1.2.1012 |
| pzInfoTrap | Info is shown in the GUI, and an email is sent to the customer. Trap 1009 trap is sent as well. | SNMPv2-SMI::enterprises.32853.1.2.1.2.1013 |
| pzSwapUsage | If the usage is grater then 50% in the /usr/sbin/swapinfo output, this trap is sent. | SNMPv2-SMI::enterprises.32853.1.2.1.2.1014 |
Syslog Message Categories
When using syslog to monitor for events needing attention, Panzura recommends monitoring for LOG_EMERG, LOG_ALERT, and LOG_CRIT.
- LOG_EMERG and LOG_ALERT messages typically represent conditions requiring immediate attention.
- LOG_CRIT events represent events that can become EMERG or ALERT if not addressed.
The following table provides a list of Syslog message categories.
| Syslog Message Category | Description |
|---|---|
| LOG_NOTICE | Conditions that are not error conditions, but should possibly be handled specially. |
| LOG_INFO | Informational messages. |
| LOG_WARNING | Warning messages. |
| LOG_ALERT | A condition that should be corrected immediately, such as a corrupted system database. |
| LOG_ERR | Indicates a general error for general notification. Can occur often even on a node that is operating normally. |
| LOG_CRIT | Critical conditions, such as hard device errors. |
| LOG_EMERG | A panic condition. This message is normally broadcast to all users. |
| LOG_DEBUG | Messages that contain information normally of use only when debugging. |
Panzura MIB Objects
The Panzura node MIB provides access to the following types of system information and statistics:
- node ID, CloudFS version, and hostname
- Visibility into caching:
- Hot, warm, and cold for automated caching
- Hot, warm, cold for pinned files
- Cache hits/misses for automated caching
- Cache hits/misses for pinned
- Cloud statistics
- Number of drive files uploaded to cloud
- Number of drive files downloaded
- Number of upload failures
- Number of download failures
- SMB users
- Total number of SMB users currently connected to the node
- Total number of files locked by SMB users currently connected
- CloudFS local configuration
- Mode of node: master/subordinate
- Hostname of master node
- Local system snapshot information
- Latest system snapshot generated reference #
- Latest system snapshot uploaded to cloud reference #
- Date and time when the latest master snapshot was successfully generated
- CloudFS remote configuration
- Remote node hostnames
- Latency from node to remote node
- Remote node up or down
- Down means there is no communication
- Remote snapshot information from all other nodes
- Latest snapshot reference # synchronized from the specified remote node
- The latest snapshot reference # uploaded by the specified remote node
MIB Download*
The following table lists the objects in the Panzura node Management Information Base (MIB).
| Object Name | Object ID (OID) | Type | Description |
| System Identification | |||
| ccSysCCID | .1.3.6.1.4.1.32853.1.4.1.1.0 | Sensor | The node’s CCID |
| ccSysVersion | .1.3.6.1.4.1.32853.1.4.1.2.0 | Sensor | PFOS version |
| System Usage | |||
| cpuLoad | .1.3.6.1.4.1.32853.1.3.1.1.1 | Sensor | CPU usage averaged over the previous 5 minutes. Measures the number of processes waiting for CPU resources. |
| pzCloudnodeHighCPUUsage | .1.3.6.1.4.1.32853.1.2.1.2.1000 | Trap | CPU usage averaged over the previous 5 minutes. Measures the number of processes waiting for CPU resources. |
| memUsed | .1.3.6.1.4.1.32853.1.3.1.2.1 | Sensor | Memory usage at the time of the measurement in KB. |
| pzCloudnodeHighMemoryUsage | .1.3.6.1.4.1.32853.1.2.1.2.1001 | Trap | Memory usage at the time of the measurement in KB. |
| localHDUsed | .1.3.6.1.4.1.32853.1.3.1.3.1 | Sensor | Disk usage at the time of the measurement in KB. |
| pzCloudnodeHighD | .1.3.6.1.4.1.32853.1.2.1.2.1002 | Trap | Disk usage at the time of the measurement in KB. |
| cloudStiskUsageatsUsed | .1.3.6.1.4.1.32853.1.3.1.4.1 | Sensor | Cloud usage at the time of the measurement in KB. |
| pzCloudnodeHighCloudUsage | .1.3.6.1.4.1.32853.1.2.1.2.1003 | Trap | Cloud usage at the time of the measurement in KB. |
| High Availability | |||
| pzTrapActiveDown | .1.3.6.1.4.1.32853.1.2.1.2.1006 | Trap | HA‐Local active node failure notification. |
| Cache | |||
| ccStatCaHotAutoCache | .1.3.6.1.4.1.32853.1.4.2.1.1.0 | Sensor | The total number of bytes in data cache storage with an Auto Cache Smart Cache rule that were accessed during the last week. |
| ccStatCaHotAutoPinned | .1.3.6.1.4.1.32853.1.4.2.1.2.0 | Sensor | The total number of bytes in data cache storage with a Pinned Smart Cache rule that have been accessed in the last week. |
| ccStatCaWarmAutoCache | .1.3.6.1.4.1.32853.1.4.2.1.3.0 | Sensor | The total number of bytes in data cache storage with an Auto Cache Smart Cache rule that were accessed more than a week ago but less than one month ago. |
| ccStatCaWarmAutoPinned | .1.3.6.1.4.1.32853.1.4.2.1.4.0 | Sensor | The total number of bytes in data cache storage with a Pinned Smart Cache rule that were accessed more than a week ago but less than one month ago. |
| ccStatCaColdAutoCache | .1.3.6.1.4.1.32853.1.4.2.1.5.0 | Sensor | The total number of bytes in data cache storage with an Auto Cache Smart Cache rule that were accessed more than one month ago. |
| ccStatCaColdAutoPinned | .1.3.6.1.4.1.32853.1.4.2.1.6.0 | Sensor | The total number of bytes in data cache storage with a Pinned Smart Cache rule that were accessed more than one month ago. |
| ccStatCaCacheHits | .1.3.6.1.4.1.32853.1.4.2.1.7.0 | Sensor | The total number of cache hit bytes in data cache storage with an Auto_Cache Data Locality rule. |
| ccStatCaPinnedHits | .1.3.6.1.4.1.32853.1.4.2.1.8.0 | Sensor | The total number of cache hit bytes in data cache storage with a Pinned Data Locality rule. |
| ccStatCaCacheMissed | .1.3.6.1.4.1.32853.1.4.2.1.9.0 | Sensor | The total number of cache missed bytes in data cache storage with an Auto_Cache Data Locality rule. |
| ccStatCaPinnedMissed | .1.3.6.1.4.1.32853.1.4.2.1.10.0 | Sensor | The total number of cache missed bytes in data cache storage with a Pinned Data Locality rule. |
| ccStatCaEvited | .1.3.6.1.4.1.32853.1.4.2.1.11.0 | Sensor | The total number of evicted bytes in data cache storage. |
| Drive File Operations | |||
| ccStatClUploads | .1.3.6.1.4.1.32853.1.4.2.2.1.0 | Sensor | The total number of drive files uploaded to cloud storage. |
| ccStatClUploadFails | .1.3.6.1.4.1.32853.1.4.2.2.2.0 | Sensor | The total number of upload failures to upload a drive file to cloud storage. |
| ccStatClDownloads | .1.3.6.1.4.1.32853.1.4.2.2.3.0 | Sensor | The total number of drive files downloaded from cloud storage. |
| ccStatClDownloadFails | .1.3.6.1.4.1.32853.1.4.2.2.4.0 | Sensor | The total number of download failures to download a drive file from cloud storage. |
| SMB Users | |||
| ccStatSmbUsers | .1.3.6.1.4.1.32853.1.4.2.3.1.0 | Sensor | The total number of SMB users currently connected to the node. |
| ccStatSmbLockedFiles | .1.3.6.1.4.1.32853.1.4.2.3.2.0 | Sensor | The total number of files locked by SMB users currently connected to the node. |
| Snapshots | |||
| ccInfoLoSnLastGenSnapNum | .1.3.6.1.4.1.32853.1.4.3.1.1.0 | Sensor | The reference number of the latest snapshot generated by the node. |
| ccInfoLoSnLastUploadSnapNum | .1.3.6.1.4.1.32853.1.4.3.1.2.0 | Sensor | The reference number of the latest snapshot uploaded to cloud storage. |
| ccInfoLoSnLastMasterSnap | .1.3.6.1.4.1.32853.1.4.3.1.3.0 | Sensor | The date when the latest master snapshot was generated successfully. |
| Status of snapshot synchronization from remote nodes | |||
| ccInfoReSnIdx | .1.3.6.1.4.1.32853.1.4.3.2.1.1.1 | Sensor | Index number. |
| ccInfoReSnHostname | .1.3.6.1.4.1.32853.1.4.3.2.1.1.2 | Sensor | The remote node’s hostname. |
| ccInfoReSnLastSyncSnapNum | .1.3.6.1.4.1.32853.1.4.3.2.1.1.3 | Sensor | The reference number of the latest snapshot synchronized from the specified remote node. |
| ccInfoReSnLastUploadSnapNum | .1.3.6.1.4.1.32853.1.4.3.2.1.1.4 | Sensor | The reference number of the latest snapshot uploaded by the specified remote node. |
| CloudFS | |||
| ccInfoCfsCfgIdx | .1.3.6.1.4.1.32853.1.4.3.3.1.1.1 | Sensor | Index number. |
| ccInfoCfsCfgHostname | .1.3.6.1.4.1.32853.1.4.3.3.1.1.2 | Sensor | The hostname of the node. |
| ccInfoCfsCfgFilesystemName | .1.3.6.1.4.1.32853.1.4.3.3.1.1.3 | Sensor | The filesystem name that the node is hosting. |
| ccInfoCfsCfgState | .1.3.6.1.4.1.32853.1.4.3.3.1.1.4 | Sensor | The operational state of the node. |
| ccInfoCfsCfgStatus | .1.3.6.1.4.1.32853.1.4.3.3.1.1.5 | Sensor | The status of the node. |
| Network latency from a node to other nodes in the CloudFS | |||
| ccInfocfsLaIdx | .1.3.6.1.4.1.32853.1.4.3.3.2.1.1 | Sensor | Index number. |
| ccInfocfsLaHostname | .1.3.6.1.4.1.32853.1.4.3.3.2.1.2 | Sensor | The hostname of the node included in the CloudFS. |
| ccInfocfsLaHelloLatency | .1.3.6.1.4.1.32853.1.4.3.3.2.1.3 | Sensor | The network latency in milliseconds from the remote node. |
| ccInfoCfsCfgLoMode | .1.3.6.1.4.1.32853.1.4.3.3.3.1.0 | Sensor | The configuration mode of the node, i.e. master or subordinate. |
| ccInfoCfsCfgLoCfgMaster | .1.3.6.1.4.1.32853.1.4.3.3.3.2.0 | Sensor | The name of the master configuration node. |
Monitoring Recommendations for CPU Load
The amount of CPU load placed on a node is measured in terms of the number of processes that are ready to run and in the queue awaiting CPU resources. These are the recommended thresholds by model. (CPU load is measured by OID .1.3.6.1.4.1.32853.1.3.1.1.1.)
| node Model | Number of Processes in the Queue and Ready To Run | Recommendation |
| 28xx models | Above 320 | Monitor closely |
| Above 400 | Look into it | |
| Above 800 | Take action | |
| 40xx models | Above 640 | Monitor closely |
| Above 800 | Look into it | |
| Above 1600 | Take action | |
| 5100 model | Above 480 | Monitor closely |
| Above 600 | Look into it | |
| Above 1200 | Take action | |
| 5300 and 5500 models | Above 960 | Monitor closely |
| Above 1200 | Look into it | |
| Above 2400 | Take action | |
| 6xxx models | Above 1200 | Look into it |
VM nodes
For nodes operating in virtualized environments, such as VMware ESXi or Amazon Web Services (AWS), first determine the number of CPU cores assigned to the node. When 4 cores are assigned, use the values given for the 28xx models.
For a larger number of cores, scale the values upward. For example, a node operating within AWS can have 8 cores.
To scale the values, first divide the number of cores by 4 to get 2. Next, multiply this by the values for the 28xx models. The CPU load recommendations become (2*320=640), (2*400=800), and (2*800=1600).
*Pertains to those running Panzura's version 8 nodes