Data Management of Scientific Applications in a Reinforcement Learning-Based Hierarchical Storage System

44 Pages Posted: 27 Mar 2023

See all articles by Tianru Zhang

Tianru Zhang

Uppsala University

Ankit Gupta

Uppsala University

Maria Andreina Francisco Rodriguez

Uppsala University

Ola Spjuth

Uppsala University

Andreas Hellander

Uppsala University

Salman Toor

Uppsala University

Abstract

In many areas of data-driven science, large datasets are generated where the individual data objects are images, matrices, or otherwise have a clear structure. However, these objects can be information-sparse, and a challenge is to efficiently find and work with the most interesting data as early as possible in an analysis pipeline. We have recently proposed a new model for big data management where the internal structure and information of the data are associated with each data object (as opposed to simple metadata). There is then an opportunity for comprehensive data management solutions to account for data-specific internal structure as well as access patterns. In this article, we explore this idea together with our recently proposed hierarchical storage management framework that uses reinforcement learning (RL) for autonomous and dynamic data placement in different tiers in a storage hierarchy. Our case-study is based on four scientific datasets: Protein translocation microscopy images, Airfoil angle of attack meshes, 1000 Genomes sequences, and Phenotypic screening images. The presented results highlight that our framework is optimal and can quickly adapt to new data access requirements. It overall reduces the data processing time, and the proposed autonomous data placement is superior compared to any static or semi-static data placement policies.

Keywords: Data management, Scientific Application, Hierarchical Storage System, Reinforcement Learning, Large scientific datasets

Suggested Citation

Zhang, Tianru and Gupta, Ankit and Francisco Rodriguez, Maria Andreina and Spjuth, Ola and Hellander, Andreas and Toor, Salman, Data Management of Scientific Applications in a Reinforcement Learning-Based Hierarchical Storage System. Available at SSRN: https://ssrn.com/abstract=4401829 or http://dx.doi.org/10.2139/ssrn.4401829

Tianru Zhang (Contact Author)

Uppsala University ( email )

Box 513
Uppsala, 751 20
Sweden

Ankit Gupta

Uppsala University ( email )

Box 513
Uppsala, 751 20
Sweden

Maria Andreina Francisco Rodriguez

Uppsala University ( email )

Box 513
Uppsala, 751 20
Sweden

Ola Spjuth

Uppsala University ( email )

Box 513
Uppsala, 751 20
Sweden

Andreas Hellander

Uppsala University ( email )

Box 513
Uppsala, 751 20
Sweden

Salman Toor

Uppsala University ( email )

Box 513
Uppsala, 751 20
Sweden

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
22
Abstract Views
129
PlumX Metrics