Learn more about Hart Energy Conferences
Get our latest conference schedules, updates and insights straight to your inbox.
Engineers with a need for their computer programs to run faster or to process larger datasets prefer that every program automatically use parallelism. However, automatic parallelism still is a subject of basic computer science research. The responsibility of using parallelism to run programs across multiple cores or computers is shared by the designers of programming languages/environments and their users.
Parallel computing capabilities
Engineers are primarily concerned with solving complex problems within their technical domains. While some are
Scalability and portability are key requirements for a parallel computing environment because most engineers want the parallel applications to seamlessly use the available resources. Engineers use a variety of operating systems and hardware. They do not want to change code when migrating applications from one operating system to another or from a multicore desktop computer to a large cluster. The need to have specific knowledge about a cluster is a roadblock for an engineer who wants to use remote cluster hardware. Most engineers prefer that the cluster administrator write system-specific scripts, set environment variables, and manage job queues. Separating user and administrator tasks is an important requirement.
Specialized technologies challenges
There are a number of parallel computing technologies available to an engineer. Some, such as Intel TBB and Cilk, enable programmers to write parallel programs that use multicore computers. However, the same programs cannot scale up to use remote resources such as clusters. Often they need to be rewritten to use other technologies such as MPI, which are complex and require specialized knowledge. This workflow violates the requirement that the same parallel program scales from workstations to clusters without any recoding.
Specialized technologies such as MPI have the additional drawback of requiring the parallel program user to have some knowledge of the system on which it will be run. This reduces the portability of code and the number of people who can use it.
Scalable parallel computing
MATLAB offers different levels of control to a programmer who wishes to convert a program to run efficiently in parallel. Some programs require no recoding, while others require the use of low-level programming methods. The most commonly used programming techniques involve adding annotations to code. For example, a “for” loop with independent iterations can be annotated as a “parfor” loop. At runtime, the computing environment will attempt to run the loop iterations in parallel across multiple MATLAB “workers,” a worker being an execution engine that runs in the background on a workstation or cluster.
An engineer who writes a parallel MATLAB program does not need to know anything about where the program eventually will be run because the MATLAB programming language is separated from the execution environment; the same parallel program can run on multicore workstations as well as on clusters, grids, and clouds. The portability of MATLAB programs across different hardware and operating systems facilitates sharing and deploying parallel programs. For example, an engineer who develops a program on a Windows workstation can run the same program on a Linux cluster or share it with a colleague who uses a Mac.
Scaling up a parallel MATLAB program from workstation to cluster does not require the user to have knowledge of the cluster because MATLAB allows for the roles of user and cluster administrator to be independent of each other. The administrator stores information about the cluster in a configuration file (e.g., how to submit jobs and transfer data) and sends it to cluster users. A user could receive several configurations, one for each remote resource. The user imports the configurations into the MATLAB user interface and selects one of them as the resource on which to run the parallel program.
The typical workflow of an engineer who wishes to solve a large technical problem in MATLAB is:
1. The user writes a serial program and then parallelizes it by using constructs such as parfor.
2. The user tests and debugs the program with small inputs from a workstation.
3. The user increases the size of inputs to the program, imports a configuration for a remote cluster, and reruns the program on that cluster.
A real-world story
The Optics group at University of Bristol performs research on semiconductor vertical-cavity surface-emitting lasers (VCSELs), which are used widely in fiber-optic telecommunication networks. The group develops new generations of VCSELs with photonic crystals (PC-VCSELs). To perform numerical simulations using models of PCVCSELs, MATLAB solvers were used for 2-D partial differential scalar Helmholtz equations and ordinary differential laser rate equations.
The approximated solution time of equations of models varied from 10 to 700 minutes for some models and four to 60 hours for others. Since these models used many input parameters, the computation of PC-VCSEL characteristics and their optimization required hundreds of solutions of equations. Performing the computations on a laboratory workstation would have taken days.
Researchers parallelized the MATLAB program by structuring it as a job that computed parameters of optical modes of PC-VCSELs N times. Therefore, there were N tasks, each of which computed parameters of optical modes. Researchers first tested and debugged the program by using multiple MATLAB workers on a workstation. Once the correctness of the parallel MATLAB program had been established, it was run on a grid system provided by Enabling Grids for E-sciencE (EGEE), the consortium that provides more than 70,000 processor cores to users worldwide. By using a portion of this infrastructure, the time for computation of 300 tasks was reduced from more than five days to just six hours – a speedup of 21 times.
2024-01-03 - CNOOC’s South China Sea development is expected to reach peak production of 22,600 bbl/d in 2025.
2024-01-07 - California’s Trio Petroleum plans to restart oilfield production at McCool Ranch, where six wells that previously reached a production peak of 400 bbl/d.
2023-12-06 - Exxon Mobil plans to achieve $14 billion in earnings and cash flow growth between 2024 and 2027, anchored by production growth of at least 11% amid a continued focus on high-return, low-cost-of-supply, value-accretive investments.
2023-12-15 - The oil and gas rig count, an early indicator of future output, fell by three to 623 in the week to Dec. 15.
2024-01-12 - The oil and gas rig count, an early indicator of future output, fell by two to 619 in the week to Jan. 12, the lowest since November.