Mass Storage at the PSCPhil Andrews Manager, Data Intensive Systems
Pittsburgh Supercomputing Center, 4400 Fifth Ave, Pittsburgh Pa 15213, USA
Other Personnel: Janet Brown, Susan Straub, Bruce Collier, Vidya Dinamani, Rob Pennington.
This talk is online at http://www.psc.edu/~andrews
Until April '97 the Pittsburgh Supercomputing Center was one of 4 NSF-funded Supercomputing Centers. Unfortunately, in the PACI recompete it was not selected for continued funding.
The main machines are all Cray products; some of them the first of their kind.
Figure 1 is the Cray C90, 16 processors
Figure 2 is the Cray T3D, 512 processors
Figure 3 is the Cray T3E, 512 processors
Figure 4 shows 3 Cray J90s with 8,8,10 processors
Archival needs: commonly run dedicated machine jobs, need to get large files in and out of storage quickly. Shelf operations unacceptable.
Figure 5 is a schematic of the file server layout
File Server/File Systems:
Golem has 10 CPUs, 11 IOSs available for Archive use. Low memory (64MW-512MB).
Eight identical user file systems because:
Each file system has:
Figure 6 is the eight-way file system layout
Each pair of file systems gets an MSP, each MSP can grab 2 tape drives
Secondaries run with 2 Disks per SCSI chain, 2SCSI chains per SI3 adapter, 2 SI3 adapters per IOS. File systems deliver 26 MB/s.
7 Cabinets, 8 IBM Magstar drives, space for 2400 tapes. Run one drive per SCSI chain.
The two STK silos in the PSC machine room at Westinghouse.
Two connected silos, 6,000 slots each
Running mixed media, both older STK tapes and IBM 3590 tapes in silos. Plus shelf operation for STK tapes.
Conversion is via IBM C-12 frame, bolts to STK Silo, holds 4 IBM Magstar drives. No problems so far.
An MPEG animation of the actual drive frame installation.
View after the drive frame installation.
Total automated tape capacity now 14,400 tapes, approximately 200 TB. Double length tapes (later this year) would bring it to ~400 TB.
The new (Golem-based) archive now now totals approximately 34TB, growing at about 2TB/month. There is an almost identical amount of data in the pre-1996 archive.
Growth rate of the new archive in total data and files.
Initial user access was by FAR (home grown archiving software) and FTP to Golem file systems.
Each pair of file systems gets a Media Specific Process; DMF counts files and data on an MSP basis.
Load Balance, bytes stored by MSP
Whenever the disk occupancy on an MSP reaches a "warning threshold", DMF performs a "relief" operation; deleting premigrated files and premigrating more files to tape.
Relief from warning threshold
The IBM drives perform hardware compression; we track this on an MSP basis:
Compression ratio by MSP
We attempt to keep the "hit" ratio on file requests as high as possible; the following figures measure it on a daily basis as a function of both file number and data.
"Get" Hit Rate by files
"Get" Hit Rate by data
The daily growth rate is spiky, but without long-term trends.
Daily growth rate, files & data
We graph both the seek times (time to find tape and insert into drive), and mount times (time to start reading data)
Seek and mount times
Distributed File System
Now running DFS server on Golem, exporting all 8 Golem user file systems. Clients running on SGI, Sun, J90, IBM/AIX. Windows NT, Apple Macintosh, C90 coming. Early days yet.
DFS performance, all operations to/from the archival file server (Golem). We present raw data, we have not yet had time for evaluation.
Client Write Comparison
Client Write Comparison(2)
Client Full Read (Complete File) Comparison
Client Partial Read (40 bytes from file center) Comparison<.h5>
SGI Client R/W 10MB Cache
SGI Client R/W 100MB Cache
SGI Client Full Read Cache Comparison
SGI Client Partial Read Cache Comparison
SGI Client Write Cache Comparison
Sun Client R/W 40MB Cache
J90 Client R/W 64MB cache
DMAPI Application on SGI XFS
We have used the DMAPI (Data Management API) implementation available with the XFS filesystem under IRIX 6.2 to create a user level DMAPI application that successfully managed a large striped filesystem on an SGI Power Challenge machine. The goal of this work was to build a prototype system that was capable of providing a high-performance storage management system for the SGI based on currently available standard technologies that provided transparent access to the file archiving system resident on the CRI J90 system at the PSC, the File Archiver. This is a follow-on project to the Multi-Resident AFS work at the PSC and was intended to examine the DMAPI technology that is beginning to become available from several of the vendors (HP/Convex and SGI, in particular).
The software prototype had to meet the following requirements:
Performance tests were performed early on to ensure that the overhead of using the DMAPI was acceptable. Of particular concern was the rate at which the filesystem could be scanned to locate candidates for migration. If this rate is too low, then it would not be possible to keep up with a reasonable rate of creation of new files on the filesystem. The measured rate of roughly 13,000 files/second on the SGI 4xR1000 CPU Power Challenge machine was deemed to be sufficiently fast for this work.
The prototype software system consisted of four daemons on the SGI, a parent to control three child processes. The children are: an archiver to move files between the data stores, a purge daemon to maintain the amount of freespace by clearing dual-state files and the event loop process which responds to operations on the files (read/write/truncate/delete
We have successfully managed files on a large striped logical volume, maintaining specified level of free space using DMAPI application to handle file events, moving and retrieving files from backend storage on a local tape drive and across HiPPI to one or more CRI systems as archiving back-ends, including the File Archiving machine.