ua_uerf
NAME
ua_uerf - uerf output filter
SYNOPSIS
uerf [-options] | ua_uerf [-options]
DESCRIPTION
The Digital UNIX error reporting utility uerf and the newer dia (DECevent)
produce detailed reports which are impossible for scanning for error trend
analysis. The ua_uerf utility is a uerf filter which can summarize errors:
by type;
by day;
by grand total;
by exclusion;
by limited volumes;
by single line summary
in a form suitable for either ad hoc reporting or for daily email summaries
or for weekly|monthly management summary.
The ua_uerf utility is not a replacement for dia (or uerf), but it does
provide a quick scanning ability to determine when you need to drill deeper
into hardware errors.
ARGUMENTS
input-file
Input file specification, defaults as standard input since
typically uerf ouptut is piped directly into ua_uerf.
OPTIONS
Record Type Options
-all Show all record types (default)
-none Show no record types, implies -total
-boot Show boot|oper records
-hardware Show hardware records
-scsi Show scsi cam records
-unix Show software records
-misc Show any other record types
Filtering Options
nosummary Do not display summary information totals.
-total Show summary totals by day.
-ignore string1[,string2...]
Ignore record types with matching strings.
-keep string1[,string2...]
Keep record types with matching strings.
Used to retain records which matched -ignore.
-limit N
Limit the number of replicated records displayed for a day.
The default is 5, use zero to see only totals.
Other Options
-output output-file
To specify an output file.
-verbose To generate some debugging displays.
-? To display a terse help message.
EXAMPLES
A 132 column display is strongly recommended for ua_uerf.
The following aliases are used in the examples:
sxkac@glacier: alias | grep UERF
UA7UERF='uerf -c err,oper -o full -t s:`ua_date -uerf -7` | /usr/local/sbin/ua_uerf -a'
UAXUERF='uerf -c err,oper -o full -t s:`ua_date -uerf -30` | /usr/local/sbin/ua_uerf -a'
UA_UERF='uerf -c err,oper -o full -t s:`ua_date -uerf -1` | /usr/local/sbin/ua_uerf -a'
ua_date is a date formating routine with several pre-defined UNIX date for-
mats and the ability to specify "delta" days, such as "7 days ago" with
UA7UERF above. You can, of course, key in a date in uerf format:
sxkac@nugget: ua_date -uerf -7
03-sep-1997,00:00:00
A ua_uerf example for "yesterday" showing only 2 entries for any given type
and ignoring cdisk_rec_status errors:
sxkac@glacier: UA_UERF -l2 -i cdisk_rec_status
#glacier Tue Sep 9 1997
>04:26:41 46 199 Bus:12 lu:98.0 R=ctape_move_tape:::Unexpected CCB status
>09:10:22 47 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
*09:10:22 48 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
>12:43:07 50 199 Bus:03 lu:28.0 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 8082899
>18:44:57 52 199 Bus:09 lu:73.0 R=cdisk_rec_tur_done:::Event - Unit Attention
*23:06:52 54 199 Bus:09 lu:73.0 R=cdisk_rec_tur_done:::Event - Unit Attention
>23:11:44 60 199 Bus:09 lu:75.0 R=cdisk_rec_tur_done:::Event - Unit Attention
*23:12:24 62 199 Bus:09 lu:75.0 R=cdisk_rec_tur_done:::Event - Unit Attention
Summary:
Total 1 199 Bus:12 lu:98.0 R=ctape_move_tape:::Unexpected CCB status
Total 4 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
Total 1 199 Bus:03 lu:28.0 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 8082899
Total 16 199 Bus:09 lu:73.0 R=cdisk_rec_tur_done:::Event - Unit Attention
Total 17 199 Bus:09 lu:75.0 R=cdisk_rec_tur_done:::Event - Unit Attention
Normally, the UA_UERF alias is used "as is" to summarize recent errors when
investigating a problem... the example above just helps keep down the size
of this man page. Note, you must use dia to determine the BBR errors above
were soft (correctable) errors.
To find specifically when all the BBR errors occured:
sxkac@glacier: UA_UERF nosum -i cdisk,ctape -k cdisk_bbr -l20
#glacier Tue Sep 9 1997
>09:10:22 47 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
>09:10:22 48 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
>10:41:37 49 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
>12:43:07 50 199 Bus:03 lu:28.0 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 8082899
>23:16:13 81 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
A ua_uerf weekly summary:
sxkac@spike: UA7UERF -l0
#spike Sun Sep 7 1997
#spike Sun Sep 7 1997 07:53:02 19 301 SYSTEM SHUTDOWN |halted by sxkac: Apply HSZ v3.1-1,-2 patches
#spike Sun Sep 7 1997 08:20:11 0 300 SYSTEM STARTUP
Summary:
Total 1 301 SYSTEM SHUTDOWN
Total 1 300 SYSTEM STARTUP
Total 1 199 Bus:03 lu:26.1 R=cdisk_check_sense:::Event - Unit Attention
Total 2 199 Bus:03 lu:26.1 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
Total 2 199 Bus:03 lu:26.1 R=cdisk_rec_tur_done:::Event - Unit Attention
Total 1 199 Bus:03 lu:26.2 R=cdisk_check_sense:::Event - Unit Attention
Total 2 199 Bus:03 lu:26.2 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
Total 2 199 Bus:03 lu:26.2 R=cdisk_rec_tur_done:::Event - Unit Attention
Total 3 199 Bus:03 lu:25.0 R=cdisk_op_spin:::Event - Unit Attention
Total 3 199 Bus:02 lu:18.0 R=cdisk_check_sense:::Event - Unit Attention
Total 3 199 Bus:02 lu:17.0 R=cdisk_check_sense:::Event - Unit Attention
Total 1 199 Bus:02 lu:20.3 R=cdisk_check_sense:::Event - Unit Attention
Total 1 199 Bus:02 lu:20.3 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
Total 1 199 Bus:02 lu:20.3 R=cdisk_rec_tur_done:::Event - Unit Attention
A ua_uerf monthly summary for July:
sxkac@glacier: uerf -f /var/adm/binary.errlog.970501 -o full \
> -t s:`ua_date -u 7/1` e:`ua_date -u 7/31` |
> ua_uerf -l0
#glacier Sat Jul 5 1997
#glacier Sat Jul 5 1997 10:28:16 0 110 MACHINE STATE |CONFIGURATION
#glacier Sat Jul 5 1997 10:28:16 1 300 SYSTEM STARTUP
#glacier Sat Jul 5 1997 13:07:13 2 301 SYSTEM SHUTDOWN |halted by root: continue with controller upgrades
#glacier Sat Jul 5 1997 14:03:48 0 110 MACHINE STATE |CONFIGURATION
#glacier Sat Jul 5 1997 14:03:49 1 300 SYSTEM STARTUP
#glacier Sun Jul 6 1997
#glacier Sun Jul 6 1997 09:57:24 3 301 SYSTEM SHUTDOWN |halted by root: install kzpsa*8 and ba660
#glacier Sun Jul 6 1997 12:57:36 0 110 MACHINE STATE |CONFIGURATION
#glacier Sun Jul 6 1997 12:57:36 1 300 SYSTEM STARTUP
#glacier Sun Jul 13 1997
#glacier Sun Jul 13 1997 09:49:24 66 301 SYSTEM SHUTDOWN |halted by sxkac: Move ba660 to piu1
#glacier Sun Jul 13 1997 10:38:50 0 110 MACHINE STATE |CONFIGURATION
#glacier Sun Jul 13 1997 10:38:51 1 300 SYSTEM STARTUP
#glacier Sun Jul 20 1997
#glacier Sun Jul 20 1997 10:11:59 29 301 SYSTEM SHUTDOWN |halted by sxkac: Apply DU v3.2g-002 patches
#glacier Sun Jul 20 1997 11:09:14 0 110 MACHINE STATE |CONFIGURATION
#glacier Sun Jul 20 1997 11:09:14 1 300 SYSTEM STARTUP
#glacier Sun Jul 20 1997 14:26:45 2 301 SYSTEM SHUTDOWN |rebooted by root: Adjust ubc-maxpercent &
#glacier Sun Jul 20 1997 14:31:29 0 110 MACHINE STATE |CONFIGURATION
#glacier Sun Jul 20 1997 14:31:29 1 300 SYSTEM STARTUP
#glacier Wed Jul 30 1997
#glacier Wed Jul 30 1997 10:55:20 247 301 SYSTEM SHUTDOWN |halted by sxdjd: potential power outtage
#glacier Wed Jul 30 1997 12:10:13 0 110 MACHINE STATE |CONFIGURATION
#glacier Wed Jul 30 1997 12:10:13 1 300 SYSTEM STARTUP
Summary:
Total 18 199 Bus:05 lu:41.0 R=ctape_iodone:Soft Error Detected (rec:DEC TZ877:+
Total 1 199 Bus:03 lu:28.2 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 604763
Total 5 199 Bus:05 lu:41.1 R=changer_check_status:::Recovered error
Total 1 199 Bus:07 lu:57.1 R=cdisk_complete:::Retries Exhausted
Total 7 300 SYSTEM STARTUP
Total 6 301 SYSTEM SHUTDOWN
Total 15 199 Bus:10 lu:81.1 R=changer_check_status:::Recovered error
Total 13 199 Bus:07 lu:57.1 R=cdisk_check_sense:::Event - Unit Attention
Total 1 199 Bus:07 lu:60.2 R=cdisk_check_sense:::Event - Unit Attention
Total 1 199 Bus:07 lu:60.0 R=cdisk_check_sense:::Event - Unit Attention
Total 8 199 Bus:07 lu:58.0 R=cdisk_check_sense:::Event - Unit Attention
Total 1 199 Bus:07 lu:59.0 R=cdisk_check_sense:::Event - Unit Attention
Total 1 199 Bus:07 lu:60.3 R=cdisk_check_sense:::Event - Unit Attention
Total 4 199 Bus:01 lu:10.0 R=cdisk_check_sense:::Event - Unit Attention
Total 4 199 Bus:01 lu: 9.1 R=cdisk_check_sense:::Event - Unit Attention
Total 2 199 Bus:03 lu:27.0 R=cdisk_check_sense:::Event - Unit Attention
Total 2 199 Bus:03 lu:28.3 R=cdisk_check_sense:::Event - Unit Attention
Total 2 199 Bus:03 lu:28.0 R=cdisk_check_sense:::Event - Unit Attention
Total 12 199 Bus:03 lu:25.0 R=cdisk_check_sense:::Event - Unit Attention
Total 12 199 Bus:03 lu:26.1 R=cdisk_check_sense:::Event - Unit Attention
Total 2 199 Bus:03 lu:28.2 R=cdisk_check_sense:::Event - Unit Attention
Total 2 199 Bus:01 lu: 9.0 R=cdisk_check_sense:::Event - Unit Attention
Total 1 199 Bus:10 lu:81.0 R=ctape_wfm:Soft Error Detected (rec:DEC TZ877:+
Total 2 199 Bus:10 lu:81.1 R=changer_send_ccb:::CCB aborted (timeout), recovering
Total 2 199 Bus:02 lu:16.1 R=changer_send_ccb:::CCB aborted (timeout), recovering
Total 1 199 Bus:02 lu:16.1 R=changer_online:::Device Not Ready
Total 1 199 Bus:03 lu:28.3 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 369753
Total 2 100 CPU EXCEPTION
Total 1 199 Bus:03 lu:28.2 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 608417
Total 101 199 Bus:08 lu:67.0 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
Total 97 199 Bus:08 lu:67.0 R=cdisk_rec_tur_done:::Event - Unit Attention
Total 19 199 Bus:08 lu:65.0 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
Total 15 199 Bus:08 lu:65.0 R=cdisk_rec_tur_done:::Event - Unit Attention
Total 3 199 Bus:08 :R=spo_bus_reset_rspn:::Bus reset request from adapter detected (reason = 0x9)
Total 3 199 Bus:08 :R=spo_process_ccb:::A SCSI bus reset has been done
Total 3 199 Bus:08 lu:66.0 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
Total 3 199 Bus:08 lu:68.0 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
The monthly summary above filters down 43,731 lines of uerf output. To
find when the BBR and CPU errors occurred:
sxkac@glacier: uerf -f /var/adm/binary.errlog.970501 -o full \
> -t s:`ua_date -u 7/1` e:`ua_date -u 7/31` > uerf.july
sxkac@glacier: wc -l uerf.july
43731 uerf.july
sxkac@glacier: ua_uerf uerf.july -hardware nosum \
> -scsi -k cdisk_bbr -i cdisk,changer,ctape
#glacier Tue Jul 1 1997
>17:12:48 317 199 Bus:03 lu:28.2 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 604763
#glacier Sat Jul 12 1997
>10:30:29 65 199 Bus:03 lu:28.3 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 369753
#glacier Thu Jul 17 1997
>08:07:35 2 100 CPU EXCEPTION
#glacier Fri Jul 18 1997
>08:11:29 3 199 Bus:03 lu:28.2 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 608417
#glacier Thu Jul 24 1997
>22:54:10 60 199 Bus:08 :R=spo_bus_reset_rspn:::Bus reset request from adapter detected (reason = 0x9)
>22:54:10 61 199 Bus:08 :R=spo_process_ccb:::A SCSI bus reset has been done
#glacier Sat Jul 26 1997
>01:12:15 78 199 Bus:08 :R=spo_bus_reset_rspn:::Bus reset request from adapter detected (reason = 0x9)
>01:12:15 79 199 Bus:08 :R=spo_process_ccb:::A SCSI bus reset has been done
>01:16:50 88 199 Bus:08 :R=spo_bus_reset_rspn:::Bus reset request from adapter detected (reason = 0x9)
>01:16:50 89 199 Bus:08 :R=spo_process_ccb:::A SCSI bus reset has been done
#glacier Tue Jul 29 1997
>08:58:55 224 100 CPU EXCEPTION
The example above selects all scsi records, but uses -ignore to filter out
the noise (changer, ctape, and most cdisk errors).
RESTRICTIONS / NOTES
ua_uerf has been tested under DU v3.2g and v4.0b.
As stated above, a 132 column display is recommended.
Suggestions for enhancements or bug reports can be directed to
dutools@ts.sois.alaska.edu.
The ua_uerf utility uses the cci command parser utilized by non-UNIX
operating systems instead of the traditional UNIX getopt() parsing.
Options have generally been defined to "look like" UNIX style options, but
can be spelled out or abbreviated in many cases. For example '-l' is the
same as '-limit'. In some cases, like 'nosummary', options must be par-
tially or fully spelled out. Required option length can be found by
ua_uerf -?. Because of this multiple options must be space separated and
the hyphen is part of the option name.
From the monthly example above, one must use dia to determine that the two
"CPU EXCEPTION" errors were single-bit correctable memory errors. Gen-
erally, that determination should occur on a daily basis. Sample scripts
for email'ing summaries and some sed filters for parsing dia BBR and CPU
errors are included with the ua_uerf distribution. If one uses meaningful
text for shutdown reason, the binary.errlog files can be used for longer
term tracking of events. Also included with the distribution is a script to
"roll" and preserve binary.errlog every four months.
ACKNOWLEDGEMENTS
The ua_uerf utility was written at the University of Alaska.
RELATED INFORMATION
Commands: uerf(8), dia(8), ua_date(8).