微波EDA网,见证研发工程师的成长!
首页 > 研发问答 > 微波和射频技术 > 电磁仿真讨论 > HFSS distributed solve : Fail to get presolved data

HFSS distributed solve : Fail to get presolved data

时间:03-30 整理:3721RD 点击:
I am using HFSS to solve about a hundred (parametric) variations of an 3D EM simulation at 1 MHz. A single project file is used, and the variations are specified using 'Optimetrics'. The geometries are the same for all variations; only the material properties change. I am using 15 servers as 'Distributed Solvers" to solve all the 100 variations. To reduce the number of computations, I choose one variation that is approximately the "average" of all the 100 variations and as create as a separate project. Initially, this "average" variation is solved. All other variations re-use the mesh results from this "average" simulation, and performs a few further iterations to refine the mesh (I do this because the mesh density has a fairly strong dependence on material properties). All of this is done automatically once the two project files (main project and "average" project) are created and the variations are specified under 'Optimetrics' the main project. The main project is configured to import the mesh from the "average" project. So, when the main project with 100 variations is started, it automatically opens and solves the "average" project first and then proceeds to solve the 100 variations using the distributed solvers.

All this works fine, but occasionally I get an error message that a particular variation was not able to import the mesh data (see excerpt of HFSS log file below - 'Fail to get presolved data..'). This problem is not deterministic - a repeat run usually does not throw up this error, or throws up the error for a different variation, or none at all. I suspect network problems. I have tried isolating all the machines on a single sub network with 1GBPS Ethernet. This reduced the number of errors, but the problem persists. Using wireshark, I found out that there is a tremendous number of data exchanges going on (several tens of thousand TCP exchanges per second). I wonder if somebody has faced similar problem or could throw some light on this. I've been grappling with this for several months now.

To solve each variation the solver takes up a couple of GB of memory (10k-20K mesh elements). I am running OpenSUSE 11.4 on all the machines, and HFSS version 14.

Code:
Project directory: /home/hfss
-- Message Window --
      Main_project (/home/hfss/)
        HFSSDesign1 (DrivenTerminal)
          [info] Parametric Analysis on ParametricSetup1 has been started. (12:09:11 PM  Jul 08, 2015)
          [info] A variation (Spar11='9.999' Spar12='-0.994' Spar21='1.25' Spar22='-0.994') has been requested on machine 192.168.2.19 (12:09:11 PM  Jul 08, 2015)
          [info] A variation (Spar11='9.999' Spar12='-0.994' Spar21='1.25' Spar22='-0.36') has been requested on machine 192.168.2.32 (12:09:12 PM  Jul 08, 2015)
[...]
          [info] A variation (Spar11='9.999' Spar12='-0.7' Spar21='5' Spar22='0.37') has been requested on machine 192.168.2.19 (12:18:49 PM  Jul 08, 2015)
          [info] A variation (Spar11='9.999' Spar12='-0.7' Spar21='5' Spar22='0.996') has been requested on machine 192.168.2.19 (12:18:49 PM  Jul 08, 2015)
          [error] Fail to get presolved data. -- Simulating on machine: 192.168.2.18 (12:18:50 PM  Jul 08, 2015)
          [info] A variation (Spar11='9.999' Spar12='-0.25' Spar21='1.25' Spar22='-0.994') has been requested on machine 192.168.2.18 (12:18:50 PM  Jul 08, 2015)
          [error] Fail to get presolved data. -- Simulating on machine: 192.168.2.18 (12:18:50 PM  Jul 08, 2015)
          [info] A variation (Spar11='9.999' Spar12='-0.25' Spar21='1.25' Spar22='-0.36') has been requested on machine 192.168.2.18 (12:18:50 PM  Jul 08, 2015)
[...]
          [info] A variation (Spar11='9.999' Spar12='0.7' Spar21='3.5' Spar22='-0.36') has been requested on machine 192.168.2.19 (12:23:11 PM  Jul 08, 2015)
          [info] Parametric Analysis is done. (12:30:09 PM  Jul 08, 2015)
      Average_project (/home/hfss/)
        HFSSDesign1 (DrivenTerminal)
          [warning] Adaptive Passes did not converge based on specified criteria. -- Simulating on machine: 192.168.2.19 (12:14:31 PM  Jul 08, 2015)
          [info] Normal completion of simulation on server: 192.168.2.19. (12:14:36 PM  Jul 08, 2015)
          [info] Normal completion of simulation on server: 192.168.2.32. (12:18:50 PM  Jul 08, 2015)
[...]
          [info] Normal completion of simulation on server: 192.168.2.14. (12:19:32 PM  Jul 08, 2015)
          [warning] Com Engine non-responsive since 05:30:00, January 01, 1970.   Can be due to CPU intensive processing or network problems.   If persisting for long, manually kill the com engine process and restart analysis. Retrying.....  (12:23:11 PM  Jul 08, 2015)
          [warning] Com Engine has responded to the application at 12:23:11, July 08, 2015. (12:23:11 PM  Jul 08, 2015)
          [info] Normal completion of simulation on server: 192.168.2.18. (12:23:14 PM  Jul 08, 2015)
          [info] Normal completion of simulation on server: 192.168.2.19. (12:23:24 PM  Jul 08, 2015)
[...]
          [info] Normal completion of simulation on server: 192.168.2.17. (12:25:27 PM  Jul 08, 2015)
-- Message Window --
Stopping Batch Run: 12:30:12 PM  Jul 08, 2015
Note: I have replaced several similar lines in the log file with a single line '[...]' in order to save space and make it more readable.


I would appreciate any thoughts on this.
Thanks and Regards

Copyright © 2017-2020 微波EDA网 版权所有

网站地图

Top