A study of approaches to reducing cost of ownership data warehouse exabyte sizes
widely used to deal with the problem of storage capacities and data access, federating the
resource of distributed platform.
Currently most of the market store data, including metadata, is a technology network storage NAS, for which reliable operation requires specialized, expensive equipment. An example of such equipment are widespread servers NetApp FAS8000, FAS2500 and other NetApp. Of course the establishment of data warehouse of a large company in this type of equipment will require significant financial resources.
In the last two or three years, rapidly evolving market for distributed file system having the properties of scalability to exabyte size, fault tolerance, reliability, and allows the user to deploy clusters on the cheap servers, which significantly reduces the cost of implementing data warehouses.
The report analyses the technical characteristics of the file systems (GlusterFS, Hadoop CephFS, MooseFS) at different loads and client-server configurations. For example, for the system Moose FS:
Possible protocols connection: FUSE
Test data transfer:
Configuration: 1 client / 1 server storage
File 1496 MB; time, 19 h; the speed of 78.6 MB/s
File 15394 MB; time 172 sec; speed 89.5 MB/s
Configuration 1 client / 2 server storage
File 1496 MB; time, 9 h; the speed of 166.6 MB/s
File 15394 MB; time 93 seconds; the speed of 165.5 Mbps
Configuration: 4 client (parallel recording) / 1 storage server
CLIENT1-File 1496 MB time 45 sec; the speed of 33.2 MB/s
Client2-File 1496 MB; time, 47 h; the speed of 31.8 MB/s
Client 3-File 1496 MB time 56 seconds; the speed of 26.7 MB/s
Client 4-File 1496 MB; time, 57 seconds; speed 26.2 MB/s
Configuration: 4 client (parallel entry) / 2 storage server
CLIENT1-File 1496 MB; time, 29 seconds; speed 51.6 MB/s
Client2-File 1496 MB time 34 sec; the speed of 44.0 MB/s
Client 3-File 1496 MB; time, 35 sec; speed 42.7 MB/s
Client 4-File 1496 MB; time, 33 h; the speed of 45.3 MB/s.
The results of the analysis demonstrate the possibility to scale the file system, issuing bandwidth, limited only by the speed of data transmission in the network. That in turn allows to refuse the use of expensive server hardware in favor of the distributed file systems.