Performance evaluation of MapReduce using full virtualisation on a departmental cloud

Horacio González-Vélez; Maryam Kontagora

Performance evaluation of MapReduce using full virtualisation on a departmental cloud

Horacio González-Vélez; Maryam Kontagora

International Journal of Applied Mathematics and Computer Science (2011)

Volume: 21, Issue: 2, page 275-284
ISSN: 1641-876X

Access Full Article

top

Access to full text

Full (PDF)

Abstract

top

This work analyses the performance of Hadoop, an implementation of the MapReduce programming model for distributed parallel computing, executing on a virtualisation environment comprised of 1 + 16 nodes running the VMWare workstation software. A set of experiments using the standard Hadoop benchmarks has been designed in order to determine whether or not significant reductions in the execution time of computations are experienced when using Hadoop on this virtualisation platform on a departmental cloud. Our findings indicate that a significant decrease in computing times is observed under these conditions. They also highlight how overheads and virtualisation in a distributed environment hinder the possibility of achieving the maximum (peak) performance.

How to cite

top

MLA
BibTeX
RIS

Horacio González-Vélez, and Maryam Kontagora. "Performance evaluation of MapReduce using full virtualisation on a departmental cloud." International Journal of Applied Mathematics and Computer Science 21.2 (2011): 275-284. <http://eudml.org/doc/208046>.

@article{HoracioGonzález2011,
abstract = {This work analyses the performance of Hadoop, an implementation of the MapReduce programming model for distributed parallel computing, executing on a virtualisation environment comprised of 1 + 16 nodes running the VMWare workstation software. A set of experiments using the standard Hadoop benchmarks has been designed in order to determine whether or not significant reductions in the execution time of computations are experienced when using Hadoop on this virtualisation platform on a departmental cloud. Our findings indicate that a significant decrease in computing times is observed under these conditions. They also highlight how overheads and virtualisation in a distributed environment hinder the possibility of achieving the maximum (peak) performance.},
author = {Horacio González-Vélez, Maryam Kontagora},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {MapReduce; server virtualization; cloud computing; algorithmic skeletons; structured parallelism; parallel computing},
language = {eng},
number = {2},
pages = {275-284},
title = {Performance evaluation of MapReduce using full virtualisation on a departmental cloud},
url = {http://eudml.org/doc/208046},
volume = {21},
year = {2011},
}

TY - JOUR
AU - Horacio González-Vélez
AU - Maryam Kontagora
TI - Performance evaluation of MapReduce using full virtualisation on a departmental cloud
JO - International Journal of Applied Mathematics and Computer Science
PY - 2011
VL - 21
IS - 2
SP - 275
EP - 284
AB - This work analyses the performance of Hadoop, an implementation of the MapReduce programming model for distributed parallel computing, executing on a virtualisation environment comprised of 1 + 16 nodes running the VMWare workstation software. A set of experiments using the standard Hadoop benchmarks has been designed in order to determine whether or not significant reductions in the execution time of computations are experienced when using Hadoop on this virtualisation platform on a departmental cloud. Our findings indicate that a significant decrease in computing times is observed under these conditions. They also highlight how overheads and virtualisation in a distributed environment hinder the possibility of achieving the maximum (peak) performance.
LA - eng
KW - MapReduce; server virtualization; cloud computing; algorithmic skeletons; structured parallelism; parallel computing
UR - http://eudml.org/doc/208046
ER -

References

top

Anon, E.A. (1998). A measure of transaction processing power, in M. Stonebraker and J.M. Hellerstein (Eds.), Readings in Database Systems, 3rd Edn., Morgan Kaufmann, San Francisco, CA, pp. 609-621.
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R.H., Konwinski, A., Lee, G., Patterson, D.A., Rabkin, A., Stoica, I. and Zaharia, M. (2010). A view of cloud computing, Communications of the ACM 53(4): 50-58.
Bacci, B., Danelutto, M., Pelagatti, S. and Vanneschi, M. (1999). SkIE: A heterogeneous environment for HPC applications, Parallel Computing 25(13): 1827-1852.
Beaumont, O., Casanova, H., Legrand, A., Robert, Y. and Yang, Y. (2005). Scheduling divisible loads on star and tree networks: Results and open problems, IEEE Transactions on Parallel and Distributed Systems 16(3): 207-218.
Buono, D., Danelutto, M. and Lametti, S. (2010). Map, reduce and MapReduce, the skeleton way, Procedia Computer Science 1(1): 2089-2097.
Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J. and Brandic, I. (2009). Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility, Future Generation Computer Systems-The International Journal of Grid Computing: Theory Methods and Applications 25(6): 599-616.
Buzen, J.P. and Gagliardi, U.O. (1973). The evolution of virtual machine architecture, Proceedings of the National Computer Conference and Exposition, AFIPS '73 , ACM, New York, NY, pp. 291-299.
Cole, M. (1989). Algorithmic Skeletons: Structured Management of Parallel Computation, Pitman/MIT Press, London. Zbl0681.68041
Cole, M. (2004). Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming, Parallel Computing 30(3): 389-406.
Danelutto, M. (2004). Adaptive task farm implementation strategies, 12th Euromicro Workshop on Parallel, Distributed and Network-Based Processing, PDP 2004, IEEE, La Coruña, pp. 416-423.
Dean, J. and Ghemawat, S. (2004). MapReduce: Simplified data processing on large clusters, Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation OSDI'04, Vol. 6, USENIX, San Francisco, CA, pp. 137-150.
Dean, J. and Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters, Communications of the ACM 51(1): 107-113.
González-Vélez, H. (2006). Self-adaptive skeletal task farm for computational grids, Parallel Computing 32(7-8): 479-490.
González-Vélez, H. and Cole, M. (2010a). Adaptive statistical scheduling of divisible workloads in heterogeneous systems, Journal of Scheduling 13(4): 427-441.
González-Vélez, H. and Cole, M. (2010b). Adaptive structured parallelism for distributed heterogeneous architectures: A methodological approach with pipelines and farms, Concurrency and Computation: Practice and Experience 22(15): 2073-2094.
González-Vélez, H. and Leyton, M. (2010). A survey of algorithmic skeleton frameworks: High-level structured parallel programming enablers, Software: Practice and Experience 40(12): 1135-1160.
Ibrahim, S., Jin, H., Lu, L., Qi, L., Wu, S. and Shi, X. (2009). Evaluating MapReduce on virtual machines: The Hadoop case, in M. Jaatun, G. Zhao, and C. Rong (Eds.) CloudCom 2009, Lecture Notes in Computer Science, Vol. 5931, Springer-Verlag, Berlin/Heidelberg, pp. 519-528.
Kontagora, M. and González-Vélez, H. (2010). Benchmarking a MapReduce environment on a full virtualisation platform, in L. Barolli, F. Xhafa, S. Vitabile and H.-H. Hsu (Eds.), CISIS 2010, The Fourth International Conference on Complex, Intelligent and Software Intensive Systems, Krakow, Poland, 15-18 February 2010, IEEE Computer Society, Washington, DC, pp. 433-438.
Kuchen, H. and Striegnitz, J. (2005). Features from functional programming for a C++ skeleton library, Concurrency and Computation: Practice and Experience 17(7-8): 739-756.
Mesghouni, K., Hammadi, S. and Borne, P. (2004). Evolutionary algorithms for job-shop scheduling, International Journal of Applied Mathematics and Computer Science 14(1): 91-103. Zbl1171.90402
Nagarajan, A.B., Mueller, F., Engelmann, C. and Scott, S.L. (2007). Proactive fault tolerance for HPC with Xen virtualization, in B. J. Smith (Ed.), Proceedings of the 21th Annual International Conference on Supercomputing, ICS 2007, Seattle, Washington, USA, June 17-21, 2007, ACM, New York, NY, pp. 23-32.
Nokia Research Center (2009). Disco, Manual version 0.2.3, Nokia Research Center, discoproject.org.
Pisoni, A. (2007). Skynet, Manual version 0.9.3, Geni.com, skynet.rubyforge.org.
Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G. and Kozyrakis, C. (2007). Evaluating MapReduce for multi-core and multiprocessor systems, 13th International Conference on High-Performance Computer Architecture (HPCA-13 2007), Phoenix, AZ, USA, pp. 13-24.
Robertazzi, T.G. (2003). Ten reasons to use divisible load theory, Computer 36(5): 63-68.
Sandholm, T. and Lai, K. (2009). MapReduce optimization using regulated dynamic prioritization, in J.R. Douceur, A.G. Greenberg, T. Bonald, J. Nieh (Eds.), Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems, SIGMETRICS/Performance 2009, Seattle, WA, USA, June 15-19, 2009, ACM, New York, NY, pp. 299-310.
The Apache Software Foundation (2008). Hadoop MapReduce tutorial, Manual version 0.15, Hadoop Project, hadoop.apache.org.
VMware (2007). Understanding full virtualization, paravirtualization, and hardware assist, White Paper Revision: 20070911, VMware, Inc., Palo Alto, CA.
Whitaker, A., Shaw, M. and Gribble, S.D. (2002). Scale and performance in the Denali isolation kernel, ACM SIGOPS Operating Systems Review 36(SI): 195-209.
Youseff, L., Wolski, R., Gorda, B. and Krintz, C. (2006). Paravirtualization for HPC systems, in G. Min, B. Di Martino, L.T. Yang, M. Guo and Gudula Rünger (Eds.), Frontiers of High Performance Computing and Networking - ISPA 2006 International Workshops, Sorrento, Italy, December 4-7, 2006, Lecture Notes in Computer Science, Vol. 4331, Springer-Verlag, Berlin/Heidelberg, pp. 474-486.
Zaharia, M., Konwinski, A., Joseph, A., Katz, R. and Stoica, I. (2008). Improving MapReduce performance in heterogeneous environments, in R. Draves and R. van Renesse (Eds.), 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008, December 8-10, 2008, San Diego, California, USA, USENIX Association, Berkeley, CA.

Citations in EuDML Documents

top

Grzegorz Chmaj, Krzysztof Walkowiak, Michał Tarnawski, Michał Kucharzak, Heuristic algorithms for optimization of task allocation and result distribution in peer-to-peer computing systems

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.