Monitoring

To view or assign network settings, navigate to the following page in the CloudFS WebUI:

Configuration > Monitoring

The following table describes the monitoring settings you can configure.

Monitoring Settings	Description
Syslog
Syslog Server	Enter the IP address or hostname of the syslog server. Note: Only UDP is supported for syslog messages.
Logging Level	Select the minimum level of messages to send to the server. The selected level and all higher levels are sent. For example, if you specify Error, then messages of type Error, Critical, Alert, and Emergency are sent.
Trace Logs	Click Add Trace Log to specify the applications to be logged in addition to the standard syslog logging. Select a service and a logging level, and click Add. Add additional entries as needed.
Email
SMTP Server	Enter the hostname or IP address of the email server.
Sender email address	Enter the email address to appear in the From field for alert notifications.
SMTP Port	Enter the port on the email server.
Use encryption	Select to encrypt the alert messages using SMTPS.
Use authentication	Select to require authentication for the SMTP server. If the SMTP server requires authentication, enable this setting and enter a username and password.
Username	Enter the username for access to the SMTP server.
Password	Enter the password for the user accessing the SMTP server.
Test	Click to test connection.
Test email address	Email address of the recipient for the email test.
Email Alert Settings
Allow Repeating Email	Enable to send multiple emails for any given alert. If this setting is disabled, only the new alert is sent. Note: If the repeated alerts are not suppressed, multiple notifications for the same event will be sent and all of these will share the timestamp of the initial event. Node reboot and disk offline events are reported only once, even if repeated alerts are not suppressed.
Email interval (minutes) Default: 5 minutes	Select the interval for aggregating events before sending email notification. This setting is independent of the Allow repeating emails field.
Email Alert Recipients	Click Add Recipient. Enter an email address and enable or disable from the Alerts Category and Alert Severity to notify the recipients. Type of Alerts Alert Category Hardware: To get the alerts from this category, CloudFS node should be deployed on it. System Management File System Network Interdiction Cloud Alert Severity Info Warning Critical
Edit Recipient	Only the Alert Category and Alert Severity can be changed for the selected recipient.
Delete Recipient	Select the recipient and click Delete. Click Confirm on the message to proceed ahead.
SNMP
Read Community String	Enter the community string for read-only communication between the node and the trap receiver. If you specify a custom community string, the public community string is disabled.
Recipient IP Recipient Community String	Enter the hostnames or IP addresses of up to two SNMP trap receivers and the associated community strings.
SNMP Trap Threshold Settings	Enter the usage thresholds (percent) that will trigger SNMP trap messages of particular types: CPU usage, memory usage, disk usage, and cloud usage. The node generates SNMP traps of the specified type if the usage meets or exceeds the threshold.
SNMP Users
Add User	Click to add an SNMP user. Added users are listed. Listed users can be deleted from the list.
Username	Enter a user name for the SNMP user.
Authentication Protocol	Select the algorithm to use for authenticating the SNMP user (SHA or MD5).
Authentication Password	Enter a password to authenticate the SNMP user.
Authentication Data Privacy Method	Select an option for data encryption (DES or AES).
Privacy Password Privacy Password Confirm	Enter a password for the DES or AES encryption.

Simple Network Management Protocol

Simple Network Management Protocol (SNMP) is an Internet Standard protocol that allows customers to monitor networked devices through a single tool. Devices that support SNMP include routers, modems, switches, and servers. SNMP monitors values exported by the SNMP agent and allows push notifications, which are traps in SNMP language. The two components of SNMP include SNMP agents and SNMP managers. An SNMP agent is a software on a managed device. The software allows the SNMP manager to communicate with the device using SNMP. The SNMP manager is an external software that queries, receives events, and gets responses from devices. A managed device implements an SNMP interface for node-specific information.

A Management Information Base (MIB) is a collection of information organized hierarchically. These are accessed using a protocol such as SNMP. MIBs are created by Managed Device vendors (Panzura), and they are stored on the device. MIBs must be provided to the SNMP manager so it knows how to translate the MIB values from the managed device. MIBs use a hierarchical namespace that contains object identifiers (OID). An OID identifies a variable that can be read using SNMP.

There are two transaction types: polling and traps. When polling occurs, the SNMP manager sends an SNMP request to a managed device at the default polling interval, which is 120 seconds. The managed device then responds with an SNMP response status. The second type of transaction is a trap or push notification. SNMP managers are always ready to receive a trap from a managed device. In order to receive and translate a trap from a managed device, the managed device must be configured to send traps to the manager, and the SNMP manager must be provided with a trap MIB from the managed device.

The following are the three versions of SNMP security:

Version 1. Plain text authentication.
Version 2. Improved authentication. Community strings are still transmitted over the wire in clear, plain text.
Version 3. Provides the following three levels of authentication:
o NoAuthNoPriv. Users who use this level don't have authentication or privacy when they send and receive messages.
o AuthNoPriv. This level requires users to authenticate, but does not encrypt sent or received messages.
o AuthPriv. This level is the most secure. Authentication is required and sent and received messages are encrypted.

You can download the MIB zip archive using the following URL: https://docs.panzura.com/PANZURA_SNMP.tgz

After downloading the MIB archive, decompress it and load files into your SNMP manager.

This download is also available by selecting Maintenance> System Operations> Download MIB.

To configure the SNMP settings on a Panzura node:

Log in to your Panzura node.
Select CONFIGURATION > Monitoring > SNMP Users > SNMP Settings.
In the Read Community String field, enter the community string for read-only communication between the node and SNMP manager.

If you specify a custom community string, the public community string is disabled.
In the Recipient IP field, enter the SNMP manager IP (this is where we send traps).
In the Recipient Community String, enter the SNMP manager associated with community strings.
In the SNMP Trap Threshold Settings field, Enter the usage thresholds (percent) that will trigger SNMP trap messages.

To edit a trap:

Log in to your Panzura node.
Select CONFIGURATION > Monitoring > SNMP Users > SNMP Settings.
In the SNMP Trap Thresholds section, select Actions > Edit SNMP Trap.

To add an SNMP user:

Log in to your Panzura node.
Select CONFIGURATION > SNMP Users.
Click the ADD button.

The following table displays the SNMP traps that Panzura supports:

Name	Trigger Condition	ID
pzCloudControllerHighCPUUsage	cpu_load > threshold_value	SNMPv2- SMI::enterprises.32853.1.2.1.2.1000
pzCloudControllerHighMemoryUsage	(used_memory * 100 / total_memory) > threshold_value	SNMPv2- SMI::enterprises.32853.1.2.1.2.1001
pzCloudControllerHighDiskUsage	(used_disk * 100 / total_disk) > threshold_value	SNMPv2- SMI::enterprises.32853.1.2.1.2.1002
pzCloudControllerHighCloudUsage	(used_cloud * 100 / total_cloud) > threshold_value	SNMPv2- SMI::enterprises.32853.1.2.1.2.1003
pzTrapMetaSpill	(meta_space_used * 100 / total_meta_ssd) > threshold_value	SNMPv2- SMI::enterprises.32853.1.2.1.2.1004
pzTrapMetaAllocFail	vfs.zfs.metaslab.stats.mg_spill>100 and it keeps increasing	SNMPv2- SMI::enterprises.32853.1.2.1.2.1005
pzTrapActiveDown	/opt/pixel8/bin/pz_ping <host>, failed for 3 times, we think the host is down, then this trap is sent	SNMPv2- SMI::enterprises.32853.1.2.1.2.1006
pzAutoFailover	AutoFailover occurred	SNMPv2- SMI::enterprises.32853.1.2.1.2.1007
pzRegularFailover	RegularFailover occurred	SNMPv2- SMI::enterprises.32853.1.2.1.2.1008
pzAlertTrap	An alert is shown on GUI, an email will be sent to customer, and 1009 trap will be sent also	SNMPv2- SMI::enterprises.32853.1.2.1.2.1009
pzCloudWriteFailureTrap	Sent when cloud write failures exceed threshold	SNMPv2- SMI::enterprises.32853.1.2.1.2.1011
pzWarnTrap	A warning is shown in theGUI, and an email will be sent to customer. Trap1009 is sent as well.	SNMPv2-SMI::enterprises.32853.1.2.1.2.1012
pzInfoTrap	Info is shown in the GUI, and an email is sent to the customer. Trap 1009 trap is sent as well.	SNMPv2-SMI::enterprises.32853.1.2.1.2.1013
pzSwapUsage	If the usage is grater then 50% in the /usr/sbin/swapinfo output, this trap is sent.	SNMPv2-SMI::enterprises.32853.1.2.1.2.1014

Syslog Message Categories

When using syslog to monitor for events needing attention, Panzura recommends monitoring for LOG_EMERG, LOG_ALERT, and LOG_CRIT.

LOG_EMERG and LOG_ALERT messages typically represent conditions requiring immediate attention.
LOG_CRIT events represent events that can become EMERG or ALERT if not addressed.

The following table provides a list of Syslog message categories.

Syslog Message Category	Description
LOG_NOTICE	Conditions that are not error conditions, but should possibly be handled specially.
LOG_INFO	Informational messages.
LOG_WARNING	Warning messages.
LOG_ALERT	A condition that should be corrected immediately, such as a corrupted system database.
LOG_ERR	Indicates a general error for general notification. Can occur often even on a node that is operating normally.
LOG_CRIT	Critical conditions, such as hard device errors.
LOG_EMERG	A panic condition. This message is normally broadcast to all users.
LOG_DEBUG	Messages that contain information normally of use only when debugging.

Panzura MIB Objects

The Panzura node MIB provides access to the following types of system information and statistics:

node ID, CloudFS version, and hostname

Visibility into caching:
- Hot, warm, and cold for automated caching
- Hot, warm, cold for pinned files
- Cache hits/misses for automated caching
- Cache hits/misses for pinned
Cloud statistics
- Number of drive files uploaded to cloud
- Number of drive files downloaded
- Number of upload failures
- Number of download failures
SMB users
- Total number of SMB users currently connected to the node
- Total number of files locked by SMB users currently connected
CloudFS local configuration
- Mode of node: master/subordinate
- Hostname of master node
Local system snapshot information
- Latest system snapshot generated reference #
- Latest system snapshot uploaded to cloud reference #
- Date and time when the latest master snapshot was successfully generated
CloudFS remote configuration
- Remote node hostnames
- Latency from node to remote node
- Remote node up or down
- Down means there is no communication
Remote snapshot information from all other nodes
- Latest snapshot reference # synchronized from the specified remote node
- The latest snapshot reference # uploaded by the specified remote node

MIB Download*

The following table lists the objects in the Panzura node Management Information Base (MIB).

Object Name	Object ID (OID)	Type	Description
System Identification
ccSysCCID	.1.3.6.1.4.1.32853.1.4.1.1.0	Sensor	The node’s CCID
ccSysVersion	.1.3.6.1.4.1.32853.1.4.1.2.0	Sensor	PFOS version
System Usage
cpuLoad	.1.3.6.1.4.1.32853.1.3.1.1.1	Sensor	CPU usage averaged over the previous 5 minutes. Measures the number of processes waiting for CPU resources.
pzCloudnodeHighCPUUsage	.1.3.6.1.4.1.32853.1.2.1.2.1000	Trap	CPU usage averaged over the previous 5 minutes. Measures the number of processes waiting for CPU resources.
memUsed	.1.3.6.1.4.1.32853.1.3.1.2.1	Sensor	Memory usage at the time of the measurement in KB.
pzCloudnodeHighMemoryUsage	.1.3.6.1.4.1.32853.1.2.1.2.1001	Trap	Memory usage at the time of the measurement in KB.
localHDUsed	.1.3.6.1.4.1.32853.1.3.1.3.1	Sensor	Disk usage at the time of the measurement in KB.
pzCloudnodeHighD	.1.3.6.1.4.1.32853.1.2.1.2.1002	Trap	Disk usage at the time of the measurement in KB.
cloudStiskUsageatsUsed	.1.3.6.1.4.1.32853.1.3.1.4.1	Sensor	Cloud usage at the time of the measurement in KB.
pzCloudnodeHighCloudUsage	.1.3.6.1.4.1.32853.1.2.1.2.1003	Trap	Cloud usage at the time of the measurement in KB.
High Availability
pzTrapActiveDown	.1.3.6.1.4.1.32853.1.2.1.2.1006	Trap	HA‐Local active node failure notification.
Cache
ccStatCaHotAutoCache	.1.3.6.1.4.1.32853.1.4.2.1.1.0	Sensor	The total number of bytes in data cache storage with an Auto Cache Smart Cache rule that were accessed during the last week.
ccStatCaHotAutoPinned	.1.3.6.1.4.1.32853.1.4.2.1.2.0	Sensor	The total number of bytes in data cache storage with a Pinned Smart Cache rule that have been accessed in the last week.
ccStatCaWarmAutoCache	.1.3.6.1.4.1.32853.1.4.2.1.3.0	Sensor	The total number of bytes in data cache storage with an Auto Cache Smart Cache rule that were accessed more than a week ago but less than one month ago.
ccStatCaWarmAutoPinned	.1.3.6.1.4.1.32853.1.4.2.1.4.0	Sensor	The total number of bytes in data cache storage with a Pinned Smart Cache rule that were accessed more than a week ago but less than one month ago.
ccStatCaColdAutoCache	.1.3.6.1.4.1.32853.1.4.2.1.5.0	Sensor	The total number of bytes in data cache storage with an Auto Cache Smart Cache rule that were accessed more than one month ago.
ccStatCaColdAutoPinned	.1.3.6.1.4.1.32853.1.4.2.1.6.0	Sensor	The total number of bytes in data cache storage with a Pinned Smart Cache rule that were accessed more than one month ago.
ccStatCaCacheHits	.1.3.6.1.4.1.32853.1.4.2.1.7.0	Sensor	The total number of cache hit bytes in data cache storage with an Auto_Cache Data Locality rule.
ccStatCaPinnedHits	.1.3.6.1.4.1.32853.1.4.2.1.8.0	Sensor	The total number of cache hit bytes in data cache storage with a Pinned Data Locality rule.
ccStatCaCacheMissed	.1.3.6.1.4.1.32853.1.4.2.1.9.0	Sensor	The total number of cache missed bytes in data cache storage with an Auto_Cache Data Locality rule.
ccStatCaPinnedMissed	.1.3.6.1.4.1.32853.1.4.2.1.10.0	Sensor	The total number of cache missed bytes in data cache storage with a Pinned Data Locality rule.
ccStatCaEvited	.1.3.6.1.4.1.32853.1.4.2.1.11.0	Sensor	The total number of evicted bytes in data cache storage.
Drive File Operations
ccStatClUploads	.1.3.6.1.4.1.32853.1.4.2.2.1.0	Sensor	The total number of drive files uploaded to cloud storage.
ccStatClUploadFails	.1.3.6.1.4.1.32853.1.4.2.2.2.0	Sensor	The total number of upload failures to upload a drive file to cloud storage.
ccStatClDownloads	.1.3.6.1.4.1.32853.1.4.2.2.3.0	Sensor	The total number of drive files downloaded from cloud storage.
ccStatClDownloadFails	.1.3.6.1.4.1.32853.1.4.2.2.4.0	Sensor	The total number of download failures to download a drive file from cloud storage.
SMB Users
ccStatSmbUsers	.1.3.6.1.4.1.32853.1.4.2.3.1.0	Sensor	The total number of SMB users currently connected to the node.
ccStatSmbLockedFiles	.1.3.6.1.4.1.32853.1.4.2.3.2.0	Sensor	The total number of files locked by SMB users currently connected to the node.
Snapshots
ccInfoLoSnLastGenSnapNum	.1.3.6.1.4.1.32853.1.4.3.1.1.0	Sensor	The reference number of the latest snapshot generated by the node.
ccInfoLoSnLastUploadSnapNum	.1.3.6.1.4.1.32853.1.4.3.1.2.0	Sensor	The reference number of the latest snapshot uploaded to cloud storage.
ccInfoLoSnLastMasterSnap	.1.3.6.1.4.1.32853.1.4.3.1.3.0	Sensor	The date when the latest master snapshot was generated successfully.
Status of snapshot synchronization from remote nodes
ccInfoReSnIdx	.1.3.6.1.4.1.32853.1.4.3.2.1.1.1	Sensor	Index number.
ccInfoReSnHostname	.1.3.6.1.4.1.32853.1.4.3.2.1.1.2	Sensor	The remote node’s hostname.
ccInfoReSnLastSyncSnapNum	.1.3.6.1.4.1.32853.1.4.3.2.1.1.3	Sensor	The reference number of the latest snapshot synchronized from the specified remote node.
ccInfoReSnLastUploadSnapNum	.1.3.6.1.4.1.32853.1.4.3.2.1.1.4	Sensor	The reference number of the latest snapshot uploaded by the specified remote node.
CloudFS
ccInfoCfsCfgIdx	.1.3.6.1.4.1.32853.1.4.3.3.1.1.1	Sensor	Index number.
ccInfoCfsCfgHostname	.1.3.6.1.4.1.32853.1.4.3.3.1.1.2	Sensor	The hostname of the node.
ccInfoCfsCfgFilesystemName	.1.3.6.1.4.1.32853.1.4.3.3.1.1.3	Sensor	The filesystem name that the node is hosting.
ccInfoCfsCfgState	.1.3.6.1.4.1.32853.1.4.3.3.1.1.4	Sensor	The operational state of the node.
ccInfoCfsCfgStatus	.1.3.6.1.4.1.32853.1.4.3.3.1.1.5	Sensor	The status of the node.
Network latency from a node to other nodes in the CloudFS
ccInfocfsLaIdx	.1.3.6.1.4.1.32853.1.4.3.3.2.1.1	Sensor	Index number.
ccInfocfsLaHostname	.1.3.6.1.4.1.32853.1.4.3.3.2.1.2	Sensor	The hostname of the node included in the CloudFS.
ccInfocfsLaHelloLatency	.1.3.6.1.4.1.32853.1.4.3.3.2.1.3	Sensor	The network latency in milliseconds from the remote node.
ccInfoCfsCfgLoMode	.1.3.6.1.4.1.32853.1.4.3.3.3.1.0	Sensor	The configuration mode of the node, i.e. master or subordinate.
ccInfoCfsCfgLoCfgMaster	.1.3.6.1.4.1.32853.1.4.3.3.3.2.0	Sensor	The name of the master configuration node.

Monitoring Recommendations for CPU Load

The amount of CPU load placed on a node is measured in terms of the number of processes that are ready to run and in the queue awaiting CPU resources. These are the recommended thresholds by model. (CPU load is measured by OID .1.3.6.1.4.1.32853.1.3.1.1.1.)

node Model	Number of Processes in the Queue and Ready To Run	Recommendation
28xx models	Above 320	Monitor closely
	Above 400	Look into it
	Above 800	Take action
40xx models	Above 640	Monitor closely
	Above 800	Look into it
	Above 1600	Take action
5100 model	Above 480	Monitor closely
	Above 600	Look into it
	Above 1200	Take action
5300 and 5500 models	Above 960	Monitor closely
	Above 1200	Look into it
	Above 2400	Take action
6xxx models	Above 1200	Look into it

VM nodes

For nodes operating in virtualized environments, such as VMware ESXi or Amazon Web Services (AWS), first determine the number of CPU cores assigned to the node. When 4 cores are assigned, use the values given for the 28xx models.

For a larger number of cores, scale the values upward. For example, a node operating within AWS can have 8 cores.

To scale the values, first divide the number of cores by 4 to get 2. Next, multiply this by the values for the 28xx models. The CPU load recommendations become (2*320=640), (2*400=800), and (2*800=1600).

*Pertains to those running Panzura's version 8 nodes