AN OpenMP IMPLEMENTATION OF THE MuMM SOLVER FOR POROUS MEDIA FLOW PROBLEMS UMA IMPLEMENTAÇÃO OpenMP DO RESOLUTOR MuMM PARA

Multiscale methods are usually developed for solving second-order elliptic problems in which coefficients are of multiscale heterogeneous nature. The Multiscale Mixed Method (MuMM ) was introduced aiming at the efficient and accurate approximation of large flow problems in highly heterogeneous porous media. In the MuMM numerical solver, first mixed multiscale basis functions are constructed, and next global domain decomposition iterations are performed to compute the discrete solution of the problems. However, this iterative procedure is a time-consuming step. In this paper, the authors improve the MuMM solver through the implementation of parallel computations in the step concerning the global iterative procedure. The parallel version of the solver employs the application programming interface Open Multi-Processing (OpenMP ). The implementation with the OpenMP reduces significantly the computational effort to perform the domain decomposition iterations, as indicated by the numerical results.


INTRODUCTION
The development of multiscale numerical methods for second-order elliptic differential equations arising in porous media flow problems has attracted the attention of several research groups. The reader is refered to (CHU et al., 2008;EFENDIEV;HOU, 2009;KIPPE;AARNES;LIE, 2008;WHEELER;WILDEY;XUE, 2010) in order to see of the related procedures to address these problems.
In this paper, the authors focus on the Multiscale Mixed Method (M uM M ), a numerical solver that uses a non-overlapping domain decomposition iterative procedure in which the spatial discretization of local problems uses the hybridized mixed finite elements at fine scale and the Robin boundary condition at coarse scale (FRANCISCO et al., 2014). Mixed multiscale basis functions are used to compute the discrete solutions in local problems. The analogous use of the concept of multiscale flux basis functions can be found in (GANIS; YOTOV, 2009). The M uM M solver provides fast and accurate approximation for second-order elliptic equations.
The multiscale procedure can take the advantage of heterogeneous processing units, which are relatively inexpensive and have great computational power.
An enhancement of the M uM M solver is done by implementing parallel computing in the algorithm. The parallel algorithm of this solver takes the advantage of the application programming interface Open Multi-Processing (OpenM P ). The implementation with OpenM P reduces significantly the computational effort to generate a good quality numerical approximation of the solution at very fine scale (CHAPMAN; JOST; PAS, 2008). This paper is organized as follows. Section 2 describes the multiscale procedure for the porous media flow problem, and shortly gives the variational formulation involved in the construction of mixed multiscale basis functions and iterative procedure. In Section 3 the parallel implementation by using the OpenM P of the M uM M solver is described. In Section 4 the numerical results from the implementation are discussed. Finally, in Section 5, the paper present the concluding remarks.

THE GLOBAL PROBLEM
The single-phase, incompressible flow in porous media is modeled by the pressurevelocity system of equations on a bounded domain Ω (CHEN; HUAN; MA, 2006) that is expressed by where K is the permeability divided by the viscosity, u is the Darcy velocity, p is the pressure, and q is the source term. Dirichlet and Neumann boundary conditions on ∂Ω are respectively expressed as where ∂Ω =Γ D ∪Γ N , Γ D ∩ Γ N = φ, and ν is the outward unit normal vector to ∂Ω.   Over the decomposition, fine and coarse rectangular elements Ω γ j , j = 1, ..., M γ , γ = f, c, are constructed such thatΩ = ∪ Mγ j=1Ω γ j , Ω γ j ∩ Ω γ k = φ, j = k. The interface between an element and another adjacent is denoted by Γ γ jk = ∂Ω γ j ∩ ∂Ω γ k .

Global Coarse-Mesh Decomposition
First, a global coarse-mesh decomposition is done. Thus, local problems are posed in coarse elements Ω c m , where the pressure and velocity spaces are defined respectively as W = L 2 (Ω c m ) and V = H(div; Ω c m ). The mixed finite element formulation for a local problem is to find {u m , p m } ∈ W × V , such that (JR. et al., 1993) Revista Mundi, Engenharia e Gestão, Paranaguá, PR, v. 5, n. 6, p. 284-01, 284-14, 2020 DOI: 10.21575/25254782rmetg2020vol5n61333 with the imposition of the generic Robin boundary condition where a and g are given functions.

Mixed Multiscale Basis Functions
Let Γ c mn = ∂Ω c m ∩ ∂Ω c n , n = r, l, b, t, denote the interfaces between an element and the other four adjacents. For efficiency purposes, the intermediate length scaleH (h ≤H ≤ H) is introduced, and each interface Γ c mn is decomposed into line segments I i mn , i = 1, ...,r, such thatr = H/H.
Next, divergence free local problems are posed in the coarse elements Ω c m . The mixed finite element formulation for a new local problem is to find with the imposition of the Robin boundary condition where Each solution {ψ i mn , φ i mn } determined at fine elements Ω f mn defines the mixed multiscale basis functions tied to one I i mn .

Global Coarse-Mesh Iteration
For solving numerically the Eqs. (3) and (4), the M uM M solver proceeds a global iteration in which the current solution in local problems Ω c m is computed by the following linear combination (FRANCISCO et al., 2014): whereḡ i mn are coefficients defined by averaging the Robin boundary condition on each I i mn , at previous iteration, that isḡ whereā i mn is a positive constant on each I i mn .

Downscaling Solution
After the convergence of the global iteration procedure is achieved, the flux conservation at coarse mesh are hold. To get the flux conservation at fine mesh, local problems must be solved with the imposition of Neumann boundary conditions, where the boundary values are specified by averaging the velocities on each I i mn . With this downscaling procedure, the discrete solution at fine mesh is found for the global problem.

OpenMP IMPLEMENTATION
In summary, the MuMM solver was developed by using a non-overlapping iterative decomposition procedure with the Robin boundary condition at coarse mesh. Mixed multiscale basis functions was constructed to represent the solutions in local problems at fine mesh, using hybridized mixed finite elements for the spatial discretization. Figure 2 displays the flowchart of the M uM M solver through stages and corresponding code functions.
In order to identify loops in which CPU time is longer, a computational analysis is carried out by using the Performance Profiler of the Visual Studio. The effect of the heterogeneity in the permeability field on the CPU time consumption of the code functions is also checked and Revista Mundi, Engenharia e Gestão, Paranaguá, PR, v. 5, n. 6, p. 284-01, 284-14, 2020 DOI: 10.21575/25254782rmetg2020vol5n61333 showed in Table 1. The loops in the GLOBAL_SOLV ER are found with the longer CPU time. It is observed that the CPU time consumption of the IT ERAT ION _SOLV ER is meaningful when the permeability field is heterogeneous. A parallel algorithm is developed and implemented in the computational code by using OpenM P directives. To avoid performing test cases with several construct and clause combinations, the parallel f or construct and specific clauses are considered the best choices for the GLOBAL_SOLV ER and IT ERAT ION _SOLV ER. The OpenM P parallel implementation in the code functions are respectively described in Algorithms 1 and 2.
1 Set the tolerance TOL.

RESULTS AND DISCUSSIONS
In order to implement the OpenM P in the M uM M solver, simulations for the singlephase, incompressible flow in porous medium in SLAB geometry are performed. The SLAB Revista Mundi, Engenharia e Gestão, Paranaguá, PR, v. 5, n. 6, p. 284-01, 284-14, 2020 DOI: 10.21575/25254782rmetg2020vol5n61333 problem is illustrated in Fig. 3. The physical domain has 12800 × 12800 m 2 . The left and right boundary conditions are of Dirichlet type with p b = 1.0 P a (Injection) or p b = 0 P a (Production), respectively. The top and bottom boundary conditions are of Neumann type with u b = 0 (No flow). There is no source term (q = 0), and the average permeability is 2.0 × 10˘1 1 m 2 . For all the simulations,H = H.

Sequential Algorithm
In order to get the best balance between fine and coarse meshes, simulations of the flow problem in homogeneous porous media are performed. The best multiscale mesh arrangement is the one that presents smaller global error when compared to a reference solution. For benchmark purposes, the problem is solved by a mixed finite element method (M F EM ), which is able to provide accurate velocity fields even for highly heterogeneous porous media (JR. et al., 1993). This numerical experiment is performed using a sequential algorithm of the M uM M solver, considering the fine mesh of 512 × 128 elements and the coarse mesh of 16 × 4, 32 × 8, 64 × 16, 128 × 32 and 256 × 64 elements. For the M F EM solver, it is considered the fine mesh of 512 × 128 elements. The CPU time consumption is obtained by using the omp_get_wtime function, and it is measured during the entire code running. Revista Mundi, Engenharia e Gestão, Paranaguá, PR, v. 5, n. 6, p. 284-01, 284-14, 2020 DOI: 10.21575/25254782rmetg2020vol5n61333 Figure 4 shows the results of global error for the velocity and pressure fields. As the coarse mesh is refined, the global error for both the fields is reduced. The smallest global error is obtained with the coarse mesh of 256 × 64 elements. Another important parameter to observe is the CPU time. Simulations with the coarse mesh of 16 × 4, 32 × 8, 64 × 16, 128 × 32 and 256 × 64 elements have taken respectively 26, 8, 4, 15 and 117 s, while the benchmark M F EM provides a CPU time of 2900 s. Therefore, the best multiscale mesh arrangement is the one with the fine mesh of 512 × 128 elements and the coarse mesh of 256 × 64 elements. Source: Author (2020).

Parallel Algorithm
The goal of this paper is to evaluate the performance of the parallel M uM M in simulations of the flow problem in highly heterogeneous porous media. The simulations are performed using the parallel M uM M with the fine mesh of 512 × 128 elements and the coarse mesh of 256 × 64 elements, considering distinct strengths of heterogeneity in permeability.
It is known that the strength of heterogeneity in permeability brings difficulties for the simulation of the flow problem. In this regard, the global error of the velocity and pressure fields can be analyzed in Fig. 5. The global error is greater as the strength of heterogeneity in permeability increases, for both the velocity and pressure fields.
Performance metrics for the parallel M uM M are presented for distinct strength of heterogeneity in permeability. Table 2 shows the CPU time and speedup for the following strength of heterogeneity: 0.1, 1.0, 2.0 and 4.0. The longest CPU time is found with the strength Revista Mundi, Engenharia e Gestão, Paranaguá, PR, v. 5, n. 6, p. 284-01, 284-14, 2020 DOI: 10.21575/25254782rmetg2020vol5n61333 of heterogeneity equal to 4.0; in the most threads, the greatest speedup is found with 2.0. As the number of threads is increased, the CPU time becomes smaller and the speedup, greater. The corresponding extreme values appear in bold face. From the information shown in Table 1, we can consider that the fraction of the CPU time available for parallel computations is about 13%. According to the Amdahl law (AMDAHL, 1967), the maximum speedup would be 4.2 with parallel implementation with 8 threads and strength of heterogeneity of 4.0. This result differs from 2.2 because a greater fraction of the CPU time should be actually available for sequential computations.
The pressure, permeability, and error fields with strength of permeability heterogeneity of 4.0 are illustrated in Fig. 6. When comparing the pressure fields, the parallel M uM M Revista Mundi, Engenharia e Gestão, Paranaguá, PR, v. 5, n. 6, p. 284-01, 284-14, 2020 DOI: 10.21575/25254782rmetg2020vol5n61333 solver with fine mesh of 512 × 128 elements and coarse mesh of 256 × 64 elements produces a numerical approximation with good quality in relation to the benchmark M F EM . The greatest local errors can be noted in transversal zones to the flow that correspond to fronts of reverse gradient of permeability in the porous medium.

CONCLUSIONS
In this paper, a single-phase, incompressible flow in heterogeneous porous media is simulated. In searching for computational efficiency, the M uM M multiscale method for solving the problem is employed in a parallel version using the OpenM P interface. Numerical experiments are performed in order to check the advantages of the parallel algorithm.
The best multiscale mesh arrangement is that with the fine mesh of 512 × 128 elements and the coarse mesh of 256 × 64 elements. This arrangement allows the M uM M solver to compute accurately average values on the interface of coarse elements.
It is evident that a prior analysis using the Performance Profiler of the Visual Studio is applicable for finding the loops that are time consuming in the M uM M solver. This computational task reveals parts of the solver that should be implemented in parallel, as well as the better directives suited for it.
The parallel algorithm of the M uM M solver presents a reasonable computational performance in good agreement with the benchmark M F EM , despite the relative global error produced in numerical simulations. The results show that is plausible to decrease efficiently the CPU time consumption of the parallel algorithm using the OpenM P interface with a large number of threads.