Регистрация / Вход
Прислать материал

Exabyte Scale Data Processing and Analysis System in Heterogeneous Computing Infrastructure.

Name
Klimentov
Surname
Alexei
Scientific organization
Brookhaven National Laboratory, National Research Center "Kurchatov Institute"
Academic degree
Professor
Position
The Head of Laboratory
Scientific discipline
Information technologies
Topic
Exabyte Scale Data Processing and Analysis System in Heterogeneous Computing Infrastructure.
Abstract
Exabyte Scale Data Processing and Analysis System in Heterogeneous Computing Infrastructure.

The processing, management and analysis of data in the current Mega-Science-scale projects require integration of computing centers of different sizes, power and architecture into a single computing environment (cyberinfrastructure). When
designing such an environment one must consider not only the disk and computing resources, but also the bandwidth of the global computing network and the throughputs
and data access time between the computing centers.
Keywords
BigData, LHC, NICA, Federated Storage, Workload Management System
Summary

Exabyte Scale Data Processing and Analysis System in Heterogeneous Computing Infrastructure.

"BigData Technologies for mega-science experiments" Laboratory, National Research Center "Kurchatov Institute"

The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were  credited for the discovery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS and other LHC experiments are  relying on a heterogeneous distributed computational infrastructure.The ATLAS experiment uses PanDA  (Production and Data Analysis Workload Management System) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world.  The scale is demonstrated by the following numbers: PanDA manages O(102) sites, O(105) cores, O(108) jobs per year, O(103) users and ATLAS Data Volume is O(1017) bytes.  In 2014 we started an ambitious program to expand PanDA to all available computing  resources, including opportunistic use of commercial and academic clouds, supercomputers and Leadership Computing Facilities (LCF) to address LHC Run2 challenges.  In 2015 the Large Hadron Collider opened new “Gates of Nature” by reaching instantaneous luminosities exceeding 2·1034  cm-2 s-1  and center of mass energies of 13 TeV. The physics goals of the the experiments include searches for physics beyond the Standard Model and high precision Higgs sector studies. These goals require detailed comparison of the expected physics and detector behavior with data. As of today ATLAS manages more than 200 petabytes of data on more than hundred computational sites.

We've developed, deployed and operating the next generation of the ATLAS Production System to deal with widely distributed data volume and to do data processing on heterogeneous computing environment. The system has unique characteristics and its operates on 250 000 computing cores and more than 1.3 exabytes have been processed in 2015. 

Another important research topic is building a federated computing infrastructure. Aristotle asserted that "the whole is greater than the sum of its parts", so the integration of heterogeneous computing centers into a single federated distributed cyberinfrastructure  allows more efficient utilization of computing and disk resources for a wide range of scientific applications. To address it in 2015 in the framework of the Laboratory “Big Data Technologies for mega-science class projects" in NRC "Kurchatov Institute" a work has begun on the creation of a united disk resource federation for geographically distributed data centers,  located in Moscow, St. Petersburg, Dubna, Gatchina and CERN (Geneva), its integration with existing computing resources and provision of access to this resources for applications running on both supercomputers and high throughput distributed computing systems.  It was demonstrated that such cyberinfrastructure  can be efficiently used for data processing and analysis by the LHC scientific applications, as well as for bioinformatical applications