SI5 & SI6 Grid Replication Framework Code ExplanationThe Grid Replication Framework (GRF) provides a uniform interface that enables access to a wide range of replica management systems, such as the SRB and the Globus RLS. Significantly, GRF hides the complexity of interfacing with these systems by providing an abstraction layer on top of their data access interfaces. Thus, applications built on the framework are able to read data from and write data to different systems transparently, without worrying about where the data is physically located and how it can be accessed. GRF employs a plug-in architecture; support for replica systems is done via plug-ins. A plug-in is a lightweight software component that is dynamically loaded by the framework. Each plug-in implements file I/O mechanisms that are specific to a particular replica system. In addition, a plug-in is identified by an identifier, which is specified in a plug-in configuration file. Currently, GRF supports SRB, Globus RLS, Gfarm and local file I/O (using conventional system calls). Furthermore, GRF provides a POSIX-like API, which provides file I/O functions such as open, close, read, write, seek and start. Importantly, all these functions are based on a logical file spacing; each logical file is mapped to a physical file, which may be stored in any supported replica system. This allows users to easily modify the locations of the input and output files without changing any source code. File mapping is maintained in a mapping file. Each mapping contains two parts separated by a space; on the left hand side is the logical file name (used by applications) and on the right side is the detail of the mapped physical file. Each physical file contains a plug-in identifier, which tells the framework which plug-in to use during runtime. Moreover, the physical file also includes plug-in-specific information, which will be parsed by the corresponding plug-in. For example, the SRB plug-in requires information such as SRB host address, port, collection, object ID and resource, while the Gfarm plug-in only requires the Gfarm URL. GRF supports dynamic replica selection; rather than choosing the best replica before execution, the framework continuously monitors the resource conditions (such as the network and the replica servers) during runtime and dynamically chooses the best replica based on real-time metrics. This allows applications to adapt to any change on the Grid dynamically. When a better data source is found, the framework will transparently switch to the new source without interrupting program execution. Importantly, the framework maintains the file offset information for each replica and automatically synchronizes the offset when required. Significantly, we have extended GriddLeS to support GRF. Hence, users can run legacy programs (eg. Unix binaries) on top of the GRF, allowing them to source data from a wide range of systems without code modification. In addition to the Bypass Toolkit, we also leverage Parrot, which is a tool that uses ptrace for tracing file I/O performed by an application. When a system call is intercepted, it will be redirected to GriddLeS, which in turn passes the call to GRF seamlessly. Experimental results show that overhead is minimal. More information can be found at: http://www.csse.monash.edu.au/~davida/griddles/references.htm |
