Parallel computing technologies offer engineers the means to accelerate solving computational problems. The ability to solve large problems by scaling computer programs to run on multicore workstations, clusters, grids, and clouds can help engineers gain research and competitive advantages.

Engineers with a need for their computer programs to run faster or to process larger datasets prefer that every program automatically use parallelism. However, automatic parallelism still is a subject of basic computer science research. The responsibility of using parallelism to run programs across multiple cores or computers is shared by the designers of programming languages/environments and their users.

Parallel computing capabilities
Engineers are primarily concerned with solving complex problems within their technical domains. While some are
A top view of PC-VCSEL

A top view of PC-VCSEL shows photonic crystal holes with 2 µm diameter and 4 µm center-to-center spacing. (Images courtesy of The MathWorks Inc.)

experienced programmers, most prefer to be shielded from the finer points of parallel programming such as multithreading, synchronization, and data management across clusters. For this reason, a parallel computing environment needs to make it as easy as possible for engineers to write, use, and maintain parallel programs.

Scalability and portability are key requirements for a parallel computing environment because most engineers want the parallel applications to seamlessly use the available resources. Engineers use a variety of operating systems and hardware. They do not want to change code when migrating applications from one operating system to another or from a multicore desktop computer to a large cluster. The need to have specific knowledge about a cluster is a roadblock for an engineer who wants to use remote cluster hardware. Most engineers prefer that the cluster administrator write system-specific scripts, set environment variables, and manage job queues. Separating user and administrator tasks is an important requirement.

Specialized technologies challenges
There are a number of parallel computing technologies available to an engineer. Some, such as Intel TBB and Cilk, enable programmers to write parallel programs that use multicore computers. However, the same programs cannot scale up to use remote resources such as clusters. Often they need to be rewritten to use other technologies such as MPI, which are complex and require specialized knowledge. This workflow violates the requirement that the same parallel program scales from workstations to clusters without any recoding.

Specialized technologies such as MPI have the additional drawback of requiring the parallel program user to have some knowledge of the system on which it will be run. This reduces the portability of code and the number of people who can use it.

Scalable parallel computing
MATLAB offers different levels of control to a programmer who wishes to convert a program to run efficiently in parallel. Some programs require no recoding, while others require the use of low-level programming methods. The most commonly used programming techniques involve adding annotations to code. For example, a “for” loop with independent iterations can be annotated as a “parfor” loop. At runtime, the computing environment will attempt to run the loop iterations in parallel across multiple MATLAB “workers,” a worker being an execution engine that runs in the background on a workstation or cluster.
Computation time on one workstation core is compared to time on an EGEE grid.

Computation time on one workstation core is compared to time on an EGEE grid.

An engineer who writes a parallel MATLAB program does not need to know anything about where the program eventually will be run because the MATLAB programming language is separated from the execution environment; the same parallel program can run on multicore workstations as well as on clusters, grids, and clouds. The portability of MATLAB programs across different hardware and operating systems facilitates sharing and deploying parallel programs. For example, an engineer who develops a program on a Windows workstation can run the same program on a Linux cluster or share it with a colleague who uses a Mac.

Scaling up a parallel MATLAB program from workstation to cluster does not require the user to have knowledge of the cluster because MATLAB allows for the roles of user and cluster administrator to be independent of each other. The administrator stores information about the cluster in a configuration file (e.g., how to submit jobs and transfer data) and sends it to cluster users. A user could receive several configurations, one for each remote resource. The user imports the configurations into the MATLAB user interface and selects one of them as the resource on which to run the parallel program.

The typical workflow of an engineer who wishes to solve a large technical problem in MATLAB is:
1. The user writes a serial program and then parallelizes it by using constructs such as parfor.
2. The user tests and debugs the program with small inputs from a workstation.
3. The user increases the size of inputs to the program, imports a configuration for a remote cluster, and reruns the program on that cluster.

A real-world story
The Optics group at University of Bristol performs research on semiconductor vertical-cavity surface-emitting lasers (VCSELs), which are used widely in fiber-optic telecommunication networks. The group develops new generations of VCSELs with photonic crystals (PC-VCSELs). To perform numerical simulations using models of PCVCSELs, MATLAB solvers were used for 2-D partial differential scalar Helmholtz equations and ordinary differential laser rate equations.

The approximated solution time of equations of models varied from 10 to 700 minutes for some models and four to 60 hours for others. Since these models used many input parameters, the computation of PC-VCSEL characteristics and their optimization required hundreds of solutions of equations. Performing the computations on a laboratory workstation would have taken days.

Researchers parallelized the MATLAB program by structuring it as a job that computed parameters of optical modes of PC-VCSELs N times. Therefore, there were N tasks, each of which computed parameters of optical modes. Researchers first tested and debugged the program by using multiple MATLAB workers on a workstation. Once the correctness of the parallel MATLAB program had been established, it was run on a grid system provided by Enabling Grids for E-sciencE (EGEE), the consortium that provides more than 70,000 processor cores to users worldwide. By using a portion of this infrastructure, the time for computation of 300 tasks was reduced from more than five days to just six hours – a speedup of 21 times.