Engineers with a need for their computer programs to run faster or to process larger datasets prefer that every program automatically use parallelism. However, automatic parallelism still is a subject of basic computer science research. The responsibility of using parallelism to run programs across multiple cores or computers is shared by the designers of programming languages/environments and their users.
Parallel computing capabilities
Engineers are primarily concerned with solving complex problems within their technical domains. While some are
Scalability and portability are key requirements for a parallel computing environment because most engineers want the parallel applications to seamlessly use the available resources. Engineers use a variety of operating systems and hardware. They do not want to change code when migrating applications from one operating system to another or from a multicore desktop computer to a large cluster. The need to have specific knowledge about a cluster is a roadblock for an engineer who wants to use remote cluster hardware. Most engineers prefer that the cluster administrator write system-specific scripts, set environment variables, and manage job queues. Separating user and administrator tasks is an important requirement.
Specialized technologies challenges
There are a number of parallel computing technologies available to an engineer. Some, such as Intel TBB and Cilk, enable programmers to write parallel programs that use multicore computers. However, the same programs cannot scale up to use remote resources such as clusters. Often they need to be rewritten to use other technologies such as MPI, which are complex and require specialized knowledge. This workflow violates the requirement that the same parallel program scales from workstations to clusters without any recoding.
Specialized technologies such as MPI have the additional drawback of requiring the parallel program user to have some knowledge of the system on which it will be run. This reduces the portability of code and the number of people who can use it.
Scalable parallel computing
MATLAB offers different levels of control to a programmer who wishes to convert a program to run efficiently in parallel. Some programs require no recoding, while others require the use of low-level programming methods. The most commonly used programming techniques involve adding annotations to code. For example, a “for” loop with independent iterations can be annotated as a “parfor” loop. At runtime, the computing environment will attempt to run the loop iterations in parallel across multiple MATLAB “workers,” a worker being an execution engine that runs in the background on a workstation or cluster.
An engineer who writes a parallel MATLAB program does not need to know anything about where the program eventually will be run because the MATLAB programming language is separated from the execution environment; the same parallel program can run on multicore workstations as well as on clusters, grids, and clouds. The portability of MATLAB programs across different hardware and operating systems facilitates sharing and deploying parallel programs. For example, an engineer who develops a program on a Windows workstation can run the same program on a Linux cluster or share it with a colleague who uses a Mac.
Scaling up a parallel MATLAB program from workstation to cluster does not require the user to have knowledge of the cluster because MATLAB allows for the roles of user and cluster administrator to be independent of each other. The administrator stores information about the cluster in a configuration file (e.g., how to submit jobs and transfer data) and sends it to cluster users. A user could receive several configurations, one for each remote resource. The user imports the configurations into the MATLAB user interface and selects one of them as the resource on which to run the parallel program.
The typical workflow of an engineer who wishes to solve a large technical problem in MATLAB is:
1. The user writes a serial program and then parallelizes it by using constructs such as parfor.
2. The user tests and debugs the program with small inputs from a workstation.
3. The user increases the size of inputs to the program, imports a configuration for a remote cluster, and reruns the program on that cluster.
A real-world story
The Optics group at University of Bristol performs research on semiconductor vertical-cavity surface-emitting lasers (VCSELs), which are used widely in fiber-optic telecommunication networks. The group develops new generations of VCSELs with photonic crystals (PC-VCSELs). To perform numerical simulations using models of PCVCSELs, MATLAB solvers were used for 2-D partial differential scalar Helmholtz equations and ordinary differential laser rate equations.
The approximated solution time of equations of models varied from 10 to 700 minutes for some models and four to 60 hours for others. Since these models used many input parameters, the computation of PC-VCSEL characteristics and their optimization required hundreds of solutions of equations. Performing the computations on a laboratory workstation would have taken days.
Researchers parallelized the MATLAB program by structuring it as a job that computed parameters of optical modes of PC-VCSELs N times. Therefore, there were N tasks, each of which computed parameters of optical modes. Researchers first tested and debugged the program by using multiple MATLAB workers on a workstation. Once the correctness of the parallel MATLAB program had been established, it was run on a grid system provided by Enabling Grids for E-sciencE (EGEE), the consortium that provides more than 70,000 processor cores to users worldwide. By using a portion of this infrastructure, the time for computation of 300 tasks was reduced from more than five days to just six hours – a speedup of 21 times.
Recommended Reading
Baker Hughes Eases the Pain of Intervention from Artificial Lift
2024-10-11 - To lessen the “pain of intervention” during artificial lift, Baker Hughes’ Primera and InjectRT services take an innovative approach to address industry challenges.
US Drillers Add Oil, Gas Rigs for First Time in Four Weeks
2024-10-11 - The oil and gas rig count rose by one to 586 in the week to Oct. 11. Baker Hughes said the total count was still down 36 rigs or 6% from this time last year.
EY: How AI Can Transform Subsurface Operations
2024-10-10 - The inherent complexity of subsurface data and the need to make swift decisions demands a tailored approach.
Atlas Commissions 42-mile Dune Express Conveyor, Lowers 3Q Guidance
2024-10-10 - Atlas Energy Solutions said its Dune Express proppant conveyor remains on time and on budget, but the company expects lower revenue and EBITDA for the third quarter.
Baker Hughes Lands Company’s Largest Compressor Line Order
2024-10-10 - Baker Hughes’ project includes 10 integrated compressor line units that will be installed at the Margham Gas storage facility in Dubai, United Arab Emirates.
Comments
Add new comment
This conversation is moderated according to Hart Energy community rules. Please read the rules before joining the discussion. If you’re experiencing any technical problems, please contact our customer care team.