HomeData Migration between Distributed Repositories for Collaborative Research
Data Migration between Distributed Repositories for Collaborative Research
Date: Wednesday, June 27, 2009
Time: 2:00pm - 3:00pm
Location: LBNL Bldg. 50F, Room 1647
Speaker:
Mehmet Balman
Department of Computer Science
Louisiana State University
Abstract:
Scientific applications especially in several areas such as physics,
biology, and astronomy have become more complex and compute
intensive. Often, such applications require geographically
distributed resources to satisfy their immense computational
requirements. Consequently, these applications also have increasing
distributed data intensive requirements, dealing with petabytes of
data.The distributed nature of the resources made data movement
the major bottleneck for end-to-end application performance. Our
approach is to use a dynamic network layer where data placement
middleware needs to adapt to the changing conditions in the
environment. Furthermore, heterogeneous resource and different data
access and security protocols are some of the challenges the data
placement middleware needs to deal with. Complex middleware is
required to orchestrate the use of these storage and network
resources between collaborating parties, and to manage the
end-to-end distribution of data.
We present a data placement scheduler, for mitigating the data
bottleneck in collaborative peta-scale applications. In this talk,
we will give details on recent research in data scheduling, some use
cases for transferring very large data sets into distributed
repositories, and experiments of effective data movement over 1Gpbs
and 10Gbps networks. We will also describe advanced features
including aggregation of data placement jobs with small data files,
dynamic tuning of data transfer operations to minimize the effect of
network latency, error detection and classification, and restarting
transfer operations after transfer interruptions.
Host of Seminar:
Arie Shoshani