Página 1 dos resultados de 23 itens digitais encontrados em 0.075 segundos

## Performance results of running parallel applications on the InteGrade

CACERES, E. N.; MONGELLI, H.; LOUREIRO, L.; NISHIBE, C.; SONG, S. W.
Fonte: JOHN WILEY & SONS LTD Publicador: JOHN WILEY & SONS LTD
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
49.65578%
The InteGrade middleware intends to exploit the idle time of computing resources in computer laboratories. In this work we investigate the performance of running parallel applications with communication among processors on the InteGrade grid. As costly communication on a grid can be prohibitive, we explore the so-called systolic or wavefront paradigm to design the parallel algorithms in which no global communication is used. To evaluate the InteGrade middleware we considered three parallel algorithms that solve the matrix chain product problem, the 0-1 Knapsack Problem, and the local sequence alignment problem, respectively. We show that these three applications running under the InteGrade middleware and MPI take slightly more time than the same applications running on a cluster with only LAM-MPI support. The results can be considered promising and the time difference between the two is not substantial. The overhead of the InteGrade middleware is acceptable, in view of the benefits obtained to facilitate the use of grid computing by the user. These benefits include job submission, checkpointing, security, job migration, etc. Copyright (C) 2009 John Wiley & Sons, Ltd.; Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP); FAPESP[2004/08928-3]; Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq); CNPq[55.0895/07-8]; CNPq[30.5362/06-2]; Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq); CNPq[30.2942/04-1]; Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq); CNPq[62.0123/04-4]; Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq); CNPq[48.5460/06-8]; Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq); Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq); CNPq[62.0171/06-5]; FUNDECT[41/100.115/2006]; FUNDECT

## A novel strategy for building interoperable MPI environment in heterogeneous high performance systems

Massetto, Francisco Isidro; Sato, Liria Matsumoto; Li, Kuan-Ching
Fonte: SPRINGER; DORDRECHT Publicador: SPRINGER; DORDRECHT
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
50.1882%
Breakthrough advances in microprocessor technology and efficient power management have altered the course of development of processors with the emergence of multi-core processor technology, in order to bring higher level of processing. The utilization of many-core technology has boosted computing power provided by cluster of workstations or SMPs, providing large computational power at an affordable cost using solely commodity components. Different implementations of message-passing libraries and system softwares (including Operating Systems) are installed in such cluster and multi-cluster computing systems. In order to guarantee correct execution of message-passing parallel applications in a computing environment other than that originally the parallel application was developed, review of the application code is needed. In this paper, a hybrid communication interfacing strategy is proposed, to execute a parallel application in a group of computing nodes belonging to different clusters or multi-clusters (computing systems may be running different operating systems and MPI implementations), interconnected with public or private IP addresses, and responding interchangeably to user execution requests. Experimental results demonstrate the feasibility of this proposed strategy and its effectiveness...

## Removing inefficiencies from scientific code : the study of the Higgs boson couplings to top quarks

Pereira, André Martins; Onofre, A.; Proença, Alberto José
Tipo: Conferência ou Objeto de Conferência
Relevância na Pesquisa
38.759165%
Publicado em "Computational science and its applications – ICCSA 2014 : proceedings", Series : Lecture notes in computer science, vol. 8582; This paper presents a set of methods and techniques to remove inefficiencies in a data analysis application used in searches by the ATLAS Experiment at the Large Hadron Collider. Profiling scientific code helped to pinpoint design and runtime inefficiencies, the former due to coding and data structure design. The data analysis code used by groups doing searches in the ATLAS Experiment contributed to clearly identify some of these inefficiencies and to give suggestions on how to prevent and overcome those common situations in scientific code to improve the efficient use of available computational resources in a parallel homogeneous platform.; This work is funded by National Funds through the FCT - Fundaçãoo para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within project PEst-OE/EEI/UI0752/2014, by LIP (Laborat ́orio de Instrumentação e Física Experimental de Partículas), and the SeARCH cluster (REEQ/443/EEI/2005).

## SDT: A Virus Classification Tool Based on Pairwise Sequence Alignment and Identity Calculation

Muhire, Brejnev Muhizi; Varsani, Arvind; Martin, Darren Patrick
Fonte: Public Library of Science Publicador: Public Library of Science
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
48.2376%
The perpetually increasing rate at which viral full-genome sequences are being determined is creating a pressing demand for computational tools that will aid the objective classification of these genome sequences. Taxonomic classification approaches that are based on pairwise genetic identity measures are potentially highly automatable and are progressively gaining favour with the International Committee on Taxonomy of Viruses (ICTV). There are, however, various issues with the calculation of such measures that could potentially undermine the accuracy and consistency with which they can be applied to virus classification. Firstly, pairwise sequence identities computed based on multiple sequence alignments rather than on multiple independent pairwise alignments can lead to the deflation of identity scores with increasing dataset sizes. Also, when gap-characters need to be introduced during sequence alignments to account for insertions and deletions, methodological variations in the way that these characters are introduced and handled during pairwise genetic identity calculations can cause high degrees of inconsistency in the way that different methods classify the same sets of sequences. Here we present Sequence Demarcation Tool (SDT)...

## Optimizing a parallel fast Fourier transform

Hu, Richard, 1982-
Fonte: Massachusetts Institute of Technology Publicador: Massachusetts Institute of Technology
Tipo: Tese de Doutorado Formato: 40 p.; 1477437 bytes; 1479765 bytes; application/pdf; application/pdf
Português
Relevância na Pesquisa
79.605864%
Parallel computing, especially cluster computing has become more popular and more powerful in recent years. Star-P is a means of harnessing that power by eliminating the difficulties in parallelizing code and by providing the user with a familiar and intuitive interface. This paper presents methods to create a parallel FFT module for Star-P. We find that because calculating a parallel FFT is more communication-intensive than processor-intensive, clever planning and distribution of data is needed to achieve speed-up in a parallel environment.; by Richard Hu.; Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.; Includes bibliographical references (p. 40).

## Analysis and Restoration of Daguerreotypes Using Cluster Computing

Ardis, Paul A. (1983 - ); Messing, Ross (1981 - ); Tang, Xiaoqing ; Brown, Christopher M. (1945 - ); Nelson, Randal C. ; Ravines, Patrick ; Wiegandt, Ralph
Fonte: University of Rochester. Computer Science Department. Publicador: University of Rochester. Computer Science Department.
Tipo: Relatório
Português
Relevância na Pesquisa
68.8539%
technical reports; The Conservation Laboratory of the George Eastman House International Museum of Photography and Film (GEH) and the Department of Computer Science at the University of Rochester (URCS) are collaborating on the problems of preservation and access to daguerreotypes. Parallel (cluster) computation provides high speed image processing to find, classify, and ultimately to eliminate defects and artifacts of deterioration. This TR describes early low-level techniques and applies them to scanner lighting, dust, and scratch defects.

## Establishing Linux Clusters for high-performance computing (HPC) at NPS

Daillidis, Christos
Português
Relevância na Pesquisa
49.16968%
Approved for public release; distribution is unlimited; S tasks. Discrete Event Simulation (DES) often involves repeated, independent runs of the same models with different input parameters. A system which is able to run many replications quickly is more useful than one in which a single monolithic application runs quickly. A loosely coupled parallel system is indicated. Inexpensive commodity hardware, high speed local area networking, and open source software have created the potential to create just such loosely coupled parallel systems. These systems are constructed from Linux-based computers and are called Beowulf clusters. This thesis presents an analysis of clusters in high-performance computing and establishes a testbed implementation at the MOVES Institute. It describes the steps necessary to create a cluster, factors to consider in selecting hardware and software, and describes the process of creating applications that can run on the cluster. Monitoring the running cluster and system administration are also addressed.

Inozemtsev, Grigori
Fonte: Quens University Publicador: Quens University
Português
Relevância na Pesquisa
38.745205%
As the demands of computational science and engineering simulations increase, the size and capabilities of High Performance Computing (HPC) clusters are also expected to grow. Consequently, the software providing the application programming abstractions for the clusters must adapt to meet these demands. Specifically, the increased cost of interprocessor synchronization and communication in larger systems must be accommodated. Non-blocking operations that allow communication latency to be hidden by overlapping it with computation have been proposed to mitigate this problem. In this work, we investigate offloading a portion of the communication processing to dedicated hardware in order to support communication/computation overlap efficiently. We work with the Message Passing Interface (MPI), the de facto standard for parallel programming in HPC environments. We investigate both point-to-point non-blocking communication and collective operations; our work with collectives focuses on the allgather operation. We develop designs for both flat and hierarchical cluster topologies and examine both eager and rendezvous communication protocols. We also develop a generalized primitive operation with the aim of simplifying further research into non-blocking collectives. We propose a new algorithm for the non-blocking allgather collective and implement it using this primitive. The algorithm has constant resource usage even when executing multiple operations simultaneously. We implemented these designs using CORE-Direct offloading support in Mellanox InfiniBand adapters. We present an evaluation of the designs using microbenchmarks and an application kernel that shows that offloaded non-blocking communication operations can provide latency that is comparable to that of their blocking counterparts while allowing most of the duration of the communication to be overlapped with computation and remaining resilient to process arrival and scheduling variations.; Thesis (Master...

## Alternating Maximization: Unifying Framework for 8 Sparse PCA Formulations and Efficient Parallel Codes

Richtárik, Peter; Takáč, Martin; Ahipaşaoğlu, Selin Damla
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
58.746416%
Given a multivariate data set, sparse principal component analysis (SPCA) aims to extract several linear combinations of the variables that together explain the variance in the data as much as possible, while controlling the number of nonzero loadings in these combinations. In this paper we consider 8 different optimization formulations for computing a single sparse loading vector; these are obtained by combining the following factors: we employ two norms for measuring variance (L2, L1) and two sparsity-inducing norms (L0, L1), which are used in two different ways (constraint, penalty). Three of our formulations, notably the one with L0 constraint and L1 variance, have not been considered in the literature. We give a unifying reformulation which we propose to solve via a natural alternating maximization (AM) method. We show the the AM method is nontrivially equivalent to GPower (Journ\'{e}e et al; JMLR 11:517--553, 2010) for all our formulations. Besides this, we provide 24 efficient parallel SPCA implementations: 3 codes (multi-core, GPU and cluster) for each of the 8 problems. Parallelism in the methods is aimed at i) speeding up computations (our GPU code can be 100 times faster than an efficient serial code written in C++), ii) obtaining solutions explaining more variance and iii) dealing with big data problems (our cluster code is able to solve a 357 GB problem in about a minute).; Comment: 20 pages...

## WinBioinfTools: Bioinformatics Tools for Windows High Performance Computing Server 2008

Abouelhoda, Mohamed; Mohamed, Hisham
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
49.79872%
Open source bioinformatics tools running under MS Windows are rare to find, and those running under Windows HPC cluster are almost non-existing. This is despite the fact that the Windows is the most popular operating system used among life scientists. Therefore, we introduce in this initiative WinBioinfTools, a toolkit containing a number of bioinformatics tools running under Windows High Performance Computing Server 2008. It is an open source code package, where users and developers can share and add to. We currently start with three programs from the area of sequence analysis: 1) CoCoNUT for pairwise genome comparison, 2) parallel BLAST for biological database search, and 3) parallel global pairwise sequence alignment. In this report, we focus on technical aspects concerning how some components of these tools were ported from Linux/Unix environment to run under Windows. We also show the advantages of using the Windows HPC Cluster 2008. We demonstrate by experiments the performance gain achieved when using a computer cluster against a single machine. Furthermore, we show the results of comparing the performance of WinBioinfTools on the Windows and Linux Cluster.

## Parallel degree computation for solution space of binomial systems with an application to the master space of $\mathcal{N}=1$ gauge theories

Chen, Tianran; Mehta, Dhagash
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
58.591934%
The problem of solving a system of polynomial equations is one of the most fundamental problems in applied mathematics. Among them, the problem of solving a system of binomial equations form a important subclass for which specialized techniques exist. For both theoretic and applied purposes, the degree of the solution set of a system of binomial equations often plays an important role in understanding the geometric structure of the solution set. Its computation, however, is computationally intensive. This paper proposes a specialized parallel algorithm for computing the degree on GPUs that takes advantage of the massively parallel nature of GPU devices. The preliminary implementation shows remarkable efficiency and scalability when compared to the closest CPU-based counterpart. Applied to the "master space problem of $\mathcal{N}=1$ gauge theories" the GPU-based implementation achieves nearly 30 fold speedup over its CPU-only counterpart enabling the discovery of previously unknown results. Equally important to note is the far superior scalability: with merely 3 GPU devices on a single workstation, the GPU-based implementation shows better performance, on certain problems, than a small cluster totaling 100 CPU cores.; Comment: 27 pages...

## A Parallel Framework for Parametric Maximum Flow Problems in Image Segmentation

Olaru, Vlad; Florea, Mihai; Sminchisescu, Cristian
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
48.8835%
This paper presents a framework that supports the implementation of parallel solutions for the widespread parametric maximum flow computational routines used in image segmentation algorithms. The framework is based on supergraphs, a special construction combining several image graphs into a larger one, and works on various architectures (multi-core or GPU), either locally or remotely in a cluster of computing nodes. The framework can also be used for performance evaluation of parallel implementations of maximum flow algorithms. We present the case study of a state-of-the-art image segmentation algorithm based on graph cuts, Constrained Parametric Min-Cut (CPMC), that uses the parallel framework to solve parametric maximum flow problems, based on a GPU implementation of the well-known push-relabel algorithm. Our results indicate that real-time implementations based on the proposed techniques are possible.

## Searching for Globally Optimal Functional Forms for Inter-Atomic Potentials Using Parallel Tempering and Genetic Programming

Slepoy, A.; Thompson, A. P.; Peters, M. D.
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
49.12466%
We develop a Genetic Programming-based methodology that enables discovery of novel functional forms for classical inter-atomic force-fields, used in molecular dynamics simulations. Unlike previous efforts in the field, that fit only the parameters to the fixed functional forms, we instead use a novel algorithm to search the space of many possible functional forms. While a follow-on practical procedure will use experimental and {\it ab inito} data to find an optimal functional form for a forcefield, we first validate the approach using a manufactured solution. This validation has the advantage of a well-defined metric of success. We manufactured a training set of atomic coordinate data with an associated set of global energies using the well-known Lennard-Jones inter-atomic potential. We performed an automatic functional form fitting procedure starting with a population of random functions, using a genetic programming functional formulation, and a parallel tempering Metropolis-based optimization algorithm. Our massively-parallel method independently discovered the Lennard-Jones function after searching for several hours on 100 processors and covering a miniscule portion of the configuration space. We find that the method is suitable for unsupervised discovery of functional forms for inter-atomic potentials/force-fields. We also find that our parallel tempering Metropolis-based approach significantly improves the optimization convergence time...

## A Hybrid Parallelization of AIM for Multi-Core Clusters: Implementation Details and Benchmark Results on Ranger

Wei, Fangzhou; Yılmaz, Ali E.
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
48.68004%
This paper presents implementation details and empirical results for a hybrid message passing and shared memory paralleliziation of the adaptive integral method (AIM). AIM is implemented on a (near) petaflop supercomputing cluster of quad-core processors and its accuracy, complexity, and scalability are investigated by solving benchmark scattering problems. The timing and speedup results on up to 1024 processors show that the hybrid MPI/OpenMP parallelization of AIM exhibits better strong scalability (fixed problem size speedup) than pure MPI parallelization of it when multiple cores are used on each processor.; Comment: 24 pages, 3 tables, 9 figures. Due to space constraints, some implementation details and empirical data are omitted in authors' another paper (reference [1]), which has been submitted to Parallel Computing. This paper here serves as a major reference with the implementation details and comprehensive empirical data

## Biomolecular electrostatics using a fast multipole BEM on up to 512 GPUs and a billion unknowns

Yokota, Rio; Bardhan, Jaydeep P.; Knepley, Matthew G.; Barba, L. A.; Hamada, Tsuyoshi
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
48.650166%
We present teraflop-scale calculations of biomolecular electrostatics enabled by the combination of algorithmic and hardware acceleration. The algorithmic acceleration is achieved with the fast multipole method (FMM) in conjunction with a boundary element method (BEM) formulation of the continuum electrostatic model, as well as the BIBEE approximation to BEM. The hardware acceleration is achieved through graphics processors, GPUs. We demonstrate the power of our algorithms and software for the calculation of the electrostatic interactions between biological molecules in solution. The applications demonstrated include the electrostatics of protein--drug binding and several multi-million atom systems consisting of hundreds to thousands of copies of lysozyme molecules. The parallel scalability of the software was studied in a cluster at the Nagasaki Advanced Computing Center, using 128 nodes, each with 4 GPUs. Delicate tuning has resulted in strong scaling with parallel efficiency of 0.8 for 256 and 0.5 for 512 GPUs. The largest application run, with over 20 million atoms and one billion unknowns, required only one minute on 512 GPUs. We are currently adapting our BEM software to solve the linearized Poisson-Boltzmann equation for dilute ionic solutions...

## Efficient implementation of the overlap operator on multi-GPUs

Alexandru, Andrei; Lujan, Michael; Pelissier, Craig; Gamari, Ben; Lee, Frank X.
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
38.849321%
Lattice QCD calculations were one of the first applications to show the potential of GPUs in the area of high performance computing. Our interest is to find ways to effectively use GPUs for lattice calculations using the overlap operator. The large memory footprint of these codes requires the use of multiple GPUs in parallel. In this paper we show the methods we used to implement this operator efficiently. We run our codes both on a GPU cluster and a CPU cluster with similar interconnects. We find that to match performance the CPU cluster requires 20-30 times more CPU cores than GPUs.; Comment: 8 pages with 10 figures; accepted for presentation at the 2011 Symposium on Application Accelerators in High Performance Computing (Knoxville, July 19-20, 2011)

## How to obtain efficient GPU kernels: an illustration using FMM & FGT algorithms

Cruz, Felipe A.; Layton, Simon K.; Barba, Lorena A.
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
38.885034%
Computing on graphics processors is maybe one of the most important developments in computational science to happen in decades. Not since the arrival of the Beowulf cluster, which combined open source software with commodity hardware to truly democratize high-performance computing, has the community been so electrified. Like then, the opportunity comes with challenges. The formulation of scientific algorithms to take advantage of the performance offered by the new architecture requires rethinking core methods. Here, we have tackled fast summation algorithms (fast multipole method and fast Gauss transform), and applied algorithmic redesign for attaining performance on gpus. The progression of performance improvements attained illustrates the exercise of formulating algorithms for the massively parallel architecture of the gpu. The end result has been gpu kernels that run at over 500 Gigaflops on one nvidia Tesla C1060 card, thereby reaching close to practical peak. We can confidently say that gpu computing is not just a vogue, it is truly an irresistible trend in high-performance computing.

## Scaling Datalog for Machine Learning on Big Data

Bu, Yingyi; Borkar, Vinayak; Carey, Michael J.; Rosen, Joshua; Polyzotis, Neoklis; Condie, Tyson; Weimer, Markus; Ramakrishnan, Raghu
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
38.776687%
In this paper, we present the case for a declarative foundation for data-intensive machine learning systems. Instead of creating a new system for each specific flavor of machine learning task, or hardcoding new optimizations, we argue for the use of recursive queries to program a variety of machine learning systems. By taking this approach, database query optimization techniques can be utilized to identify effective execution plans, and the resulting runtime plans can be executed on a single unified data-parallel query processing engine. As a proof of concept, we consider two programming models--Pregel and Iterative Map-Reduce-Update---from the machine learning domain, and show how they can be captured in Datalog, tuned for a specific task, and then compiled into an optimized physical plan. Experiments performed on a large computing cluster with real data demonstrate that this declarative approach can provide very good performance while offering both increased generality and programming ease.

## Planificación de diferentes clases de aplicaciones en entornos no dedicados considerando procesadores multicore

García Gutiérrez, José Ramón
Fonte: Bellaterra: Universitat Autònoma de Barcelona, Publicador: Bellaterra: Universitat Autònoma de Barcelona,
Tipo: Tesis i dissertacions electròniques; info:eu-repo/semantics/doctoralThesis Formato: application/pdf