Mass Storage at the PSCPhil Andrews Manager, Data Intensive SystemsPittsburgh Supercomputing Center, 4400 Fifth Ave, Pittsburgh Pa 15213, USA
Other Personnel: Janet Brown, Susan Straub, Bruce Collier, Vidya Dinamani, Rob Pennington. This talk is online at http://www.psc.edu/~andrews IntroductionUntil April '97 the Pittsburgh Supercomputing Center was one of 4 NSF-funded Supercomputing Centers. Unfortunately, in the PACI recompete it was not selected for continued funding. The main machines are all Cray products; some of them the first of their kind. Figure 1 is the Cray C90, 16 processorsFigure 2 is the Cray T3D, 512 processorsFigure 3 is the Cray T3E, 512 processorsFigure 4 shows 3 Cray J90s with 8,8,10 processorsArchival needs: commonly run dedicated machine jobs, need to get large files in and out of storage quickly. Shelf operations unacceptable. Overall strategy:
Figure 5 is a schematic of the file server layoutHardware Configuration:File Server/File Systems:Golem has 10 CPUs, 11 IOSs available for Archive use. Low memory (64MW-512MB). Eight identical user file systems because:
Each file system has:
Figure 6 is the eight-way file system layoutEach pair of file systems gets an MSP, each MSP can grab 2 tape drives Tape Systems:Timings:
Secondaries run with 2 Disks per SCSI chain, 2SCSI chains per SI3 adapter, 2 SI3 adapters per IOS. File systems deliver 26 MB/s. Tape Systems:IBM ATL:7 Cabinets, 8 IBM Magstar drives, space for 2400 tapes. Run one drive per SCSI chain.Timings, etc,:
STK Silos:The two STK silos in the PSC machine room at Westinghouse.Two connected silos, 6,000 slots each Tape drives:
Running mixed media, both older STK tapes and IBM 3590 tapes in silos. Plus shelf operation for STK tapes. Conversion is via IBM C-12 frame, bolts to STK Silo, holds 4 IBM Magstar drives. No problems so far. An MPEG animation of the actual drive frame installation.View after the drive frame installation.Total automated tape capacity now 14,400 tapes, approximately 200 TB. Double length tapes (later this year) would bring it to ~400 TB. Archive growthThe new (Golem-based) archive now now totals approximately 34TB, growing at about 2TB/month. There is an almost identical amount of data in the pre-1996 archive. Growth rate of the new archive in total data and files.User accessInitial user access was by FAR (home grown archiving software) and FTP to Golem file systems. Archival measures:Each pair of file systems gets a Media Specific Process; DMF counts files and data on an MSP basis. Load Balance, bytes stored by MSPWhenever the disk occupancy on an MSP reaches a "warning threshold", DMF performs a "relief" operation; deleting premigrated files and premigrating more files to tape. Relief from warning thresholdThe IBM drives perform hardware compression; we track this on an MSP basis: Compression ratio by MSPWe attempt to keep the "hit" ratio on file requests as high as possible; the following figures measure it on a daily basis as a function of both file number and data. "Get" Hit Rate by files"Get" Hit Rate by dataThe daily growth rate is spiky, but without long-term trends. Daily growth rate, files & dataWe graph both the seek times (time to find tape and insert into drive), and mount times (time to start reading data) Seek and mount timesDistributed File SystemNow running DFS server on Golem, exporting all 8 Golem user file systems. Clients running on SGI, Sun, J90, IBM/AIX. Windows NT, Apple Macintosh, C90 coming. Early days yet. DFS performance, all operations to/from the archival file server (Golem). We present raw data, we have not yet had time for evaluation. Client Write ComparisonClient Write Comparison(2)Client Full Read (Complete File) ComparisonClient Partial Read (40 bytes from file center) Comparison<.h5>SGI Client R/W 10MB CacheSGI Client R/W 100MB CacheSGI Client Full Read Cache ComparisonSGI Client Partial Read Cache ComparisonSGI Client Write Cache ComparisonSun Client R/W 40MB CacheJ90 Client R/W 64MB cacheDMAPI Application on SGI XFSWe have used the DMAPI (Data Management API) implementation available with the XFS filesystem under IRIX 6.2 to create a user level DMAPI application that successfully managed a large striped filesystem on an SGI Power Challenge machine. The goal of this work was to build a prototype system that was capable of providing a high-performance storage management system for the SGI based on currently available standard technologies that provided transparent access to the file archiving system resident on the CRI J90 system at the PSC, the File Archiver. This is a follow-on project to the Multi-Resident AFS work at the PSC and was intended to examine the DMAPI technology that is beginning to become available from several of the vendors (HP/Convex and SGI, in particular). The software prototype had to meet the following requirements:
Performance tests were performed early on to ensure that the overhead of using the DMAPI was acceptable. Of particular concern was the rate at which the filesystem could be scanned to locate candidates for migration. If this rate is too low, then it would not be possible to keep up with a reasonable rate of creation of new files on the filesystem. The measured rate of roughly 13,000 files/second on the SGI 4xR1000 CPU Power Challenge machine was deemed to be sufficiently fast for this work. The prototype software system consisted of four daemons on the SGI, a parent to control three child processes. The children are: an archiver to move files between the data stores, a purge daemon to maintain the amount of freespace by clearing dual-state files and the event loop process which responds to operations on the files (read/write/truncate/delete We have successfully managed files on a large striped logical volume, maintaining specified level of free space using DMAPI application to handle file events, moving and retrieving files from backend storage on a local tape drive and across HiPPI to one or more CRI systems as archiving back-ends, including the File Archiving machine.
|