ua_uerf





NAME


  ua_uerf - uerf output	filter


SYNOPSIS


  uerf [-options] | ua_uerf [-options]


DESCRIPTION


  The Digital UNIX error reporting utility uerf	and the	newer dia (DECevent)
  produce detailed reports which are impossible	for scanning for error trend
  analysis.  The ua_uerf utility is a uerf filter which	can summarize errors:

       by type;
       by day;
       by grand	total;
       by exclusion;
       by limited volumes;
       by single line summary

  in a form suitable for either	ad hoc reporting or for	daily email summaries
  or for weekly|monthly	management summary.

  The ua_uerf utility is not a replacement for dia (or uerf), but it does
  provide a quick scanning ability to determine	when you need to drill deeper
  into hardware	errors.


ARGUMENTS


  input-file

       Input file specification, defaults as standard input since
       typically uerf ouptut is	piped directly into ua_uerf.


OPTIONS


  Record Type Options

  -all	     Show all record types (default)
  -none	     Show no record types, implies -total
  -boot	     Show boot|oper records
  -hardware  Show hardware records
  -scsi	     Show scsi cam records
  -unix	     Show software records
  -misc	     Show any other record types

  Filtering Options

  nosummary  Do	not display summary information	totals.

  -total     Show summary totals by day.

  -ignore    string1[,string2...]

       Ignore record types with	matching strings.

  -keep	     string1[,string2...]

       Keep record types with matching strings.
       Used to retain records which matched -ignore.

  -limit     N

       Limit the number	of replicated records displayed	for a day.
       The default is 5, use zero to see only totals.

  Other	Options

  -output    output-file

       To specify an output file.

  -verbose   To	generate some debugging	displays.

  -?	     To	display	a terse	help message.


EXAMPLES


  A 132	column display is strongly recommended for ua_uerf.
  The following	aliases	are used in the	examples:

       sxkac@glacier: alias | grep UERF
       UA7UERF='uerf -c	err,oper -o full -t s:`ua_date -uerf -7` | /usr/local/sbin/ua_uerf -a'
       UAXUERF='uerf -c	err,oper -o full -t s:`ua_date -uerf -30` | /usr/local/sbin/ua_uerf -a'
       UA_UERF='uerf -c	err,oper -o full -t s:`ua_date -uerf -1` | /usr/local/sbin/ua_uerf -a'

  ua_date is a date formating routine with several pre-defined UNIX date for-
  mats and the ability to specify "delta" days,	such as	"7 days	ago" with
  UA7UERF above.  You can, of course, key in a date in uerf format:

       sxkac@nugget: ua_date -uerf -7
       03-sep-1997,00:00:00

  A ua_uerf example for	"yesterday" showing only 2 entries for any given type
  and ignoring cdisk_rec_status	errors:

       sxkac@glacier: UA_UERF -l2 -i cdisk_rec_status
       #glacier	 Tue Sep  9 1997
       >04:26:41     46	 199 Bus:12 lu:98.0 R=ctape_move_tape:::Unexpected CCB status
       >09:10:22     47	 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
       *09:10:22     48	 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
       >12:43:07     50	 199 Bus:03 lu:28.0 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 8082899
       >18:44:57     52	 199 Bus:09 lu:73.0 R=cdisk_rec_tur_done:::Event - Unit	Attention
       *23:06:52     54	 199 Bus:09 lu:73.0 R=cdisk_rec_tur_done:::Event - Unit	Attention
       >23:11:44     60	 199 Bus:09 lu:75.0 R=cdisk_rec_tur_done:::Event - Unit	Attention
       *23:12:24     62	 199 Bus:09 lu:75.0 R=cdisk_rec_tur_done:::Event - Unit	Attention

       Summary:
	    Total     1	 199 Bus:12 lu:98.0 R=ctape_move_tape:::Unexpected CCB status
	    Total     4	 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
	    Total     1	 199 Bus:03 lu:28.0 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 8082899
	    Total    16	 199 Bus:09 lu:73.0 R=cdisk_rec_tur_done:::Event - Unit	Attention
	    Total    17	 199 Bus:09 lu:75.0 R=cdisk_rec_tur_done:::Event - Unit	Attention

  Normally, the	UA_UERF	alias is used "as is" to summarize recent errors when
  investigating	a problem... the example above just helps keep down the	size
  of this man page.  Note, you must use	dia to determine the BBR errors	above
  were soft (correctable) errors.

  To find specifically when all	the BBR	errors occured:

       sxkac@glacier: UA_UERF nosum -i cdisk,ctape -k cdisk_bbr	-l20
       #glacier	 Tue Sep  9 1997
       >09:10:22     47	 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
       >09:10:22     48	 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
       >10:41:37     49	 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150
       >12:43:07     50	 199 Bus:03 lu:28.0 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 8082899
       >23:16:13     81	 199 Bus:07 lu:57.1 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 3341150

  A ua_uerf weekly summary:

       sxkac@spike: UA7UERF -l0
       #spike	 Sun Sep  7 1997
       #spike	 Sun Sep  7 1997 07:53:02     19  301 SYSTEM SHUTDOWN	|halted	by sxkac:  Apply HSZ v3.1-1,-2 patches
       #spike	 Sun Sep  7 1997 08:20:11      0  300 SYSTEM STARTUP

       Summary:
	    Total     1	 301 SYSTEM SHUTDOWN
	    Total     1	 300 SYSTEM STARTUP
	    Total     1	 199 Bus:03 lu:26.1 R=cdisk_check_sense:::Event	- Unit Attention
	    Total     2	 199 Bus:03 lu:26.1 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
	    Total     2	 199 Bus:03 lu:26.1 R=cdisk_rec_tur_done:::Event - Unit	Attention
	    Total     1	 199 Bus:03 lu:26.2 R=cdisk_check_sense:::Event	- Unit Attention
	    Total     2	 199 Bus:03 lu:26.2 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
	    Total     2	 199 Bus:03 lu:26.2 R=cdisk_rec_tur_done:::Event - Unit	Attention
	    Total     3	 199 Bus:03 lu:25.0 R=cdisk_op_spin:::Event - Unit Attention
	    Total     3	 199 Bus:02 lu:18.0 R=cdisk_check_sense:::Event	- Unit Attention
	    Total     3	 199 Bus:02 lu:17.0 R=cdisk_check_sense:::Event	- Unit Attention
	    Total     1	 199 Bus:02 lu:20.3 R=cdisk_check_sense:::Event	- Unit Attention
	    Total     1	 199 Bus:02 lu:20.3 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
	    Total     1	 199 Bus:02 lu:20.3 R=cdisk_rec_tur_done:::Event - Unit	Attention

  A ua_uerf monthly summary for	July:

       sxkac@glacier: uerf -f /var/adm/binary.errlog.970501 -o full \
       > -t s:`ua_date -u 7/1` e:`ua_date -u 7/31` |
       > ua_uerf -l0
       #glacier	 Sat Jul  5 1997
       #glacier	 Sat Jul  5 1997 10:28:16      0  110 MACHINE STATE	|CONFIGURATION
       #glacier	 Sat Jul  5 1997 10:28:16      1  300 SYSTEM STARTUP
       #glacier	 Sat Jul  5 1997 13:07:13      2  301 SYSTEM SHUTDOWN	|halted	by root:  continue with	controller upgrades
       #glacier	 Sat Jul  5 1997 14:03:48      0  110 MACHINE STATE	|CONFIGURATION
       #glacier	 Sat Jul  5 1997 14:03:49      1  300 SYSTEM STARTUP
       #glacier	 Sun Jul  6 1997
       #glacier	 Sun Jul  6 1997 09:57:24      3  301 SYSTEM SHUTDOWN	|halted	by root:  install kzpsa*8 and ba660
       #glacier	 Sun Jul  6 1997 12:57:36      0  110 MACHINE STATE	|CONFIGURATION
       #glacier	 Sun Jul  6 1997 12:57:36      1  300 SYSTEM STARTUP
       #glacier	 Sun Jul 13 1997
       #glacier	 Sun Jul 13 1997 09:49:24     66  301 SYSTEM SHUTDOWN	|halted	by sxkac:  Move	ba660 to piu1
       #glacier	 Sun Jul 13 1997 10:38:50      0  110 MACHINE STATE	|CONFIGURATION
       #glacier	 Sun Jul 13 1997 10:38:51      1  300 SYSTEM STARTUP
       #glacier	 Sun Jul 20 1997
       #glacier	 Sun Jul 20 1997 10:11:59     29  301 SYSTEM SHUTDOWN	|halted	by sxkac:  Apply DU v3.2g-002 patches
       #glacier	 Sun Jul 20 1997 11:09:14      0  110 MACHINE STATE	|CONFIGURATION
       #glacier	 Sun Jul 20 1997 11:09:14      1  300 SYSTEM STARTUP
       #glacier	 Sun Jul 20 1997 14:26:45      2  301 SYSTEM SHUTDOWN	|rebooted by root:  Adjust ubc-maxpercent &
       #glacier	 Sun Jul 20 1997 14:31:29      0  110 MACHINE STATE	|CONFIGURATION
       #glacier	 Sun Jul 20 1997 14:31:29      1  300 SYSTEM STARTUP
       #glacier	 Wed Jul 30 1997
       #glacier	 Wed Jul 30 1997 10:55:20    247  301 SYSTEM SHUTDOWN	|halted	by sxdjd:  potential power outtage
       #glacier	 Wed Jul 30 1997 12:10:13      0  110 MACHINE STATE	|CONFIGURATION
       #glacier	 Wed Jul 30 1997 12:10:13      1  300 SYSTEM STARTUP

       Summary:
	    Total    18	 199 Bus:05 lu:41.0 R=ctape_iodone:Soft	Error Detected (rec:DEC	TZ877:+
	    Total     1	 199 Bus:03 lu:28.2 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 604763
	    Total     5	 199 Bus:05 lu:41.1 R=changer_check_status:::Recovered error
	    Total     1	 199 Bus:07 lu:57.1 R=cdisk_complete:::Retries Exhausted
	    Total     7	 300 SYSTEM STARTUP
	    Total     6	 301 SYSTEM SHUTDOWN
	    Total    15	 199 Bus:10 lu:81.1 R=changer_check_status:::Recovered error
	    Total    13	 199 Bus:07 lu:57.1 R=cdisk_check_sense:::Event	- Unit Attention
	    Total     1	 199 Bus:07 lu:60.2 R=cdisk_check_sense:::Event	- Unit Attention
	    Total     1	 199 Bus:07 lu:60.0 R=cdisk_check_sense:::Event	- Unit Attention
	    Total     8	 199 Bus:07 lu:58.0 R=cdisk_check_sense:::Event	- Unit Attention
	    Total     1	 199 Bus:07 lu:59.0 R=cdisk_check_sense:::Event	- Unit Attention
	    Total     1	 199 Bus:07 lu:60.3 R=cdisk_check_sense:::Event	- Unit Attention
	    Total     4	 199 Bus:01 lu:10.0 R=cdisk_check_sense:::Event	- Unit Attention
	    Total     4	 199 Bus:01 lu:	9.1 R=cdisk_check_sense:::Event	- Unit Attention
	    Total     2	 199 Bus:03 lu:27.0 R=cdisk_check_sense:::Event	- Unit Attention
	    Total     2	 199 Bus:03 lu:28.3 R=cdisk_check_sense:::Event	- Unit Attention
	    Total     2	 199 Bus:03 lu:28.0 R=cdisk_check_sense:::Event	- Unit Attention
	    Total    12	 199 Bus:03 lu:25.0 R=cdisk_check_sense:::Event	- Unit Attention
	    Total    12	 199 Bus:03 lu:26.1 R=cdisk_check_sense:::Event	- Unit Attention
	    Total     2	 199 Bus:03 lu:28.2 R=cdisk_check_sense:::Event	- Unit Attention
	    Total     2	 199 Bus:01 lu:	9.0 R=cdisk_check_sense:::Event	- Unit Attention
	    Total     1	 199 Bus:10 lu:81.0 R=ctape_wfm:Soft Error Detected (rec:DEC TZ877:+
	    Total     2	 199 Bus:10 lu:81.1 R=changer_send_ccb:::CCB aborted (timeout),	recovering
	    Total     2	 199 Bus:02 lu:16.1 R=changer_send_ccb:::CCB aborted (timeout),	recovering
	    Total     1	 199 Bus:02 lu:16.1 R=changer_online:::Device Not Ready
	    Total     1	 199 Bus:03 lu:28.3 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 369753
	    Total     2	 100 CPU EXCEPTION
	    Total     1	 199 Bus:03 lu:28.2 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 608417
	    Total   101	 199 Bus:08 lu:67.0 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
	    Total    97	 199 Bus:08 lu:67.0 R=cdisk_rec_tur_done:::Event - Unit	Attention
	    Total    19	 199 Bus:08 lu:65.0 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
	    Total    15	 199 Bus:08 lu:65.0 R=cdisk_rec_tur_done:::Event - Unit	Attention
	    Total     3	 199 Bus:08 :R=spo_bus_reset_rspn:::Bus	reset request from adapter detected (reason = 0x9)
	    Total     3	 199 Bus:08 :R=spo_process_ccb:::A SCSI	bus reset has been done
	    Total     3	 199 Bus:08 lu:66.0 R=cdisk_rec_status:::Recovery progress event, this is NOT an error
	    Total     3	 199 Bus:08 lu:68.0 R=cdisk_rec_status:::Recovery progress event, this is NOT an error

  The monthly summary above filters down 43,731	lines of uerf output.  To
  find when the	BBR and	CPU errors occurred:

       sxkac@glacier: uerf -f /var/adm/binary.errlog.970501 -o full \
       > -t s:`ua_date -u 7/1` e:`ua_date -u 7/31` > uerf.july
       sxkac@glacier: wc -l uerf.july
	    43731 uerf.july
       sxkac@glacier: ua_uerf uerf.july	-hardware nosum	\
       > -scsi -k cdisk_bbr -i cdisk,changer,ctape
       #glacier	 Tue Jul  1 1997
       >17:12:48    317	 199 Bus:03 lu:28.2 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 604763
       #glacier	 Sat Jul 12 1997
       >10:30:29     65	 199 Bus:03 lu:28.3 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 369753
       #glacier	 Thu Jul 17 1997
       >08:07:35      2	 100 CPU EXCEPTION
       #glacier	 Fri Jul 18 1997
       >08:11:29      3	 199 Bus:03 lu:28.2 R=cdisk_bbr_done:::cdisk_bbr: BBR disabled bad block number: 608417
       #glacier	 Thu Jul 24 1997
       >22:54:10     60	 199 Bus:08 :R=spo_bus_reset_rspn:::Bus	reset request from adapter detected (reason = 0x9)
       >22:54:10     61	 199 Bus:08 :R=spo_process_ccb:::A SCSI	bus reset has been done
       #glacier	 Sat Jul 26 1997
       >01:12:15     78	 199 Bus:08 :R=spo_bus_reset_rspn:::Bus	reset request from adapter detected (reason = 0x9)
       >01:12:15     79	 199 Bus:08 :R=spo_process_ccb:::A SCSI	bus reset has been done
       >01:16:50     88	 199 Bus:08 :R=spo_bus_reset_rspn:::Bus	reset request from adapter detected (reason = 0x9)
       >01:16:50     89	 199 Bus:08 :R=spo_process_ccb:::A SCSI	bus reset has been done
       #glacier	 Tue Jul 29 1997
       >08:58:55    224	 100 CPU EXCEPTION

  The example above selects all	scsi records, but uses -ignore to filter out
  the noise (changer, ctape, and most cdisk errors).


RESTRICTIONS / NOTES


  ua_uerf has been tested under	DU v3.2g and v4.0b.
  As stated above, a 132 column	display	is recommended.
  Suggestions for enhancements or bug reports can be directed to
  dutools@ts.sois.alaska.edu.

  The ua_uerf utility uses the cci command parser utilized by non-UNIX
  operating systems instead of the traditional UNIX getopt() parsing.
  Options have generally been defined to "look like" UNIX style	options, but
  can be spelled out or	abbreviated in many cases.  For	example	'-l' is	the
  same as '-limit'.  In	some cases, like 'nosummary', options must be par-
  tially or fully spelled out.	Required option	length can be found by
  ua_uerf -?.  Because of this multiple	options	must be	space separated	and
  the hyphen is	part of	the option name.

  From the monthly example above, one must use dia to determine	that the two
  "CPU EXCEPTION" errors were single-bit correctable memory errors. Gen-
  erally, that determination should occur on a daily basis.  Sample scripts
  for email'ing	summaries and some sed filters for parsing dia BBR and CPU
  errors are included with the ua_uerf distribution. If	one uses meaningful
  text for shutdown reason, the	binary.errlog files can	be used	for longer
  term tracking	of events. Also	included with the distribution is a script to
  "roll" and preserve binary.errlog every four months.


ACKNOWLEDGEMENTS


  The ua_uerf utility was written at the University of Alaska.


RELATED INFORMATION


  Commands: uerf(8), dia(8), ua_date(8).