Castor:CERN 高级存储管理器
Castor: CERN Advanced STORage Manager

原始链接: https://castor.web.cern.ch/content/home.html

CERN高级存储管理器(CASTOR)是一个分级存储系统,旨在管理欧洲核子研究中心(CERN)产生的海量物理数据。作为SHIFT的继任者,CASTOR采用基于组件的架构和中央数据库,通过五个主要模块来管理磁盘和磁带存储,这五个模块分别是:分段器(Stager,负责磁盘池管理)、名称服务器(Name Server,负责元数据和文件目录结构)、磁带基础设施(Tape Infrastructure,负责自动化归档)、客户端工具以及用于网格集成的存储资源管理(SRM)。 CASTOR通过XROOT和GridFTP等协议提供数据访问,允许用户使用命令行工具或API与存储的文件进行交互。通过平衡高速磁盘访问与Oracle和IBM系统等经济高效的大容量磁带库,CASTOR确保了PB级归档数据的安全。尽管CASTOR多年来一直是标准,但其角色已逐渐向CERN磁带归档系统(CTA)过渡,后者于2020年成为官方继任者。

```Hacker News新帖 | 往期 | 评论 | 提问 | 展示 | 招聘 | 提交登录Castor:CERN 高级存储管理器 (cern.ch)11 点,由 naves 于 38 分钟前发布 | 隐藏 | 往期 | 收藏 | 1 条评论 帮助 john_strinlai 4 分钟前 [–] 右侧的图片好像挂了,原本应该是:https://cta.web.cern.ch/cta/assets/images/namespace_statisti...(看起来这个提交使用了 https://castor.web.cern.ch/content/home.html 而不是 https://castor.web.cern.ch/castor/,第二个链接没有图片损坏的问题)回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 加入 YC | 联系 搜索: ```
相关文章

原文

The CERN Advanced STORage manager (CASTOR) is a hierarchical storage (i.e. has disk and tape) management system which was developed at CERN for archiving physics data (with very large data volumes, see the plot on the right). Files can be stored, listed, retrieved and remotely accessed using CASTOR command-line tools or user applications that were developed using the CASTOR API. CASTOR provides a set of access protocols such as XROOT (the main and recommended protocol) and GridFTP. RFIO (Remote File IO) used to be supported until 2016.

CASTOR is the successor of SHIFT, the Scalable Heterogeneous Integrated FaciliTy for HEP computing, which was developed and operated in the 1990s. As of June 29th 2020, CTA, the CERN Tape Archive, started to be operated as the successor of CASTOR and gradually replaced it. The evolution of total data on tape at CERN since 2001 is displayed on the right, including statistics gathered from CASTOR 1 (1998-2007), CASTOR 2 (2005-2022), and CTA (2020-onwards).

The design is based on a component architecture (Architecture diagram) using a central database in order to safeguard the state changes of the CASTOR components. The access to disk pools is controlled by the Stager; the directory structure is kept by the Name Server. The tape access (writes and recalls) is controlled by the Tape Infrastructure.

The 5 major functional modules are:

  1. Stager - this disk pool manager allocates and reclaims space; it also controls client access and oversees the disk pool local catalogue
  2. Name Server - this CASTOR name space (files and directories) includes the corresponding file metadata (size, dates, checksum, ownership and ACLs (Access Control List), tape copy information). Command-line tools modelled along Unix tools enables the manipulation of the name space (e.g. nsls corresponds to ls, etc...)
  3. Tape Infrastructure - under certain conditions CASTOR saves files onto tape in order to provide data safety and to manage data storage that is larger than the available disks. At CERN, the high capacity tape units that are used are Oracle StorageTek (photo) T10000C (5 TB) and IBM TS1140 (4 TB). Cartridges are housed in tape libraries, and access to them is fully automatized. The libraries used by CASTOR in production are 4 x Oracle SL8500 and 3 x IBM TS3500. The current total tape archive capacity is ~100 PB (January 2013).

    The CASTOR Volume Manager database contains information about each tape's characteristics, capacity and status. The Name Server database contains information about the files (sometimes referred to as segments) on a tape:

    • ownership
    • permission details
    • file offset location on tape

    User commands are available to display information in both the Name Server and Volume Manager databases.

    The mounting of cartridges to and from tape drives is managed by the Volume Drive Queue Manager (VDQM) in conjunction with library control software specific to each model of tape library.

    The cost of storage per terabyte on tape is a lot less than that on hard disk, and it has the advantage of not consuming electricity when tapes are not being accessed. However, access times on tape are longer, in the order of minutes rather than seconds.

  4. Client - this allows the user to upload, download, access and manage CASTOR data
  5. Storage Resource Management - allows for data access in a computing Grid via the SRM protocol. It interacts with CASTOR on behalf of a user or other services (such as FTS, the File Transfer System used by the LHC community to export data).
联系我们 contact @ memedata.com