[mephi-hpc] Fortran runtime error: File already opened in another unit

anikeev anikeev at ut.mephi.ru
Mon Apr 17 14:26:00 MSK 2017


On Mon, 2017-04-17 at 13:04 +0300, Phil Korneev wrote:
> Добрый день, ошибка повторилась:
> At line 254 of file em2d.f (unit = 10, file = '')
> Fortran runtime error: File already opened in another unit

У меня возникают множественные SegFault. Я создам ещё одну копию и
запущу задачу от Вашего пользователя, чтобы избавиться от побочных
проблем.

> Для диагностики, чтобы не повредить данные, можно либо скопировать
> директорию /mnt/pool/3/phkorneev/magn_2D_TNSA_3a/ в новую (необходимо
> скопировать "restart", "ipicls" , файл "timer" и скрипт "task_basov")
> и запускать оттуда, или запускать из текущей директории, но файл
> "timer" при начале счёта заменяется на "timer_0", поэтому для
> сохранения данных желательно следить за ним и при необходимости перед
> запуском переименовывать в "timer".
> Проще всего сделать резервное копирование директории, мне кажется, а
> когда проблема будет решена, удалить все новые данные.
> Я уже всё сохранил на всякий случай в
> "/mnt/pool/3/phkorneev/magn_2D_TNSA_3a_cc"
> с уважением,
> к.
> 
> 
> 2017-04-17 12:06 GMT+03:00 anikeev <anikeev at ut.mephi.ru>:
> > On Sat, 2017-04-15 at 12:37 +0300, Phil Korneev wrote:
> > > Добрый день,
> > > подскажите пожалуйста, что это значит (ниже - содержание error -
> > > файла) и как с этим бороться?
> > > Задача на cherenkov, эта ошибка возникает всегда сегодня с утра,
> > > пробовал запустить около 10 раз. Вчера задача считалась.
> > 
> > Добрый день!
> > 
> > На кластере произошёл инцидент с переполнением хранилища.
> > Попробуйте,
> > пожалуйста, запустить задачу ещё раз. В случае повторения ошибки,
> > сообщите, как мне запустить задачу для дальнейшей диагностики,
> > чтобы не
> > повредить Ваши данные.
> > 
> > > Спасибо!
> > > к
> > >
> > ___________________________________________________________________
> > __
> > > __
> > > At line 254 of file em2d.f (unit = 10, file = 'H����*')
> > > Fortran runtime error: File already opened in another unit
> > > ---------------------------------------------------------------
> > ----
> > > -------
> > > mpirun has exited due to process rank 0 with PID 15890 on
> > > node n217 exiting improperly. There are two reasons this could
> > occur:
> > >
> > > 1. this process did not call "init" before exiting, but others in
> > > the job did. This can cause a job to hang indefinitely while it
> > waits
> > > for all processes to call "init". By rule, if one process calls
> > > "init",
> > > then ALL processes must call "init" prior to termination.
> > >
> > > 2. this process called "init", but exited without calling
> > "finalize".
> > > By rule, all processes that call "init" MUST call "finalize"
> > prior to
> > > exiting or it will be considered an "abnormal termination"
> > >
> > > This may have caused other processes in the application to be
> > > terminated by signals sent by mpirun (as reported here).
> > > ---------------------------------------------------------------
> > ----
> > > -------
> > >
> > ___________________________________________________________________
> > __
> > > __
> > >
> > > -- 
> > > All the best , 
> > > Philipp K
> > > _______________________________________________
> > > hpc mailing list
> > > hpc at lists.mephi.ru
> > > https://lists.mephi.ru/listinfo/hpc
> > --
> > С уважением,
> > аспирант кафедры 4 МИФИ,
> > инженер отдела Unix-технологий,
> > Аникеев Артём.
> > Тел.: 8 (495) 788-56-99, доб. 8998
> > _______________________________________________
> > hpc mailing list
> > hpc at lists.mephi.ru
> > https://lists.mephi.ru/listinfo/hpc
> > 
> 
> 
> 
> -- 
> All the best , 
> Philipp K
> _______________________________________________
> hpc mailing list
> hpc at lists.mephi.ru
> https://lists.mephi.ru/listinfo/hpc
-- 
С уважением,
аспирант кафедры 4 МИФИ,
инженер отдела Unix-технологий,
Аникеев Артём.
Тел.: 8 (495) 788-56-99, доб. 8998
-------------- next part --------------

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.  

The process that invoked fork was:

  Local host:          n203 (PID 17195)
  MPI_COMM_WORLD rank: 125

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x2B1EF6AE2407
#1  0x2B1EF6AE2A1E
#2  0x2B1EF778F0DF
#0  0x2B3CD328B407
#1  0x2B3CD328BA1E
#2  0x2B3CD3F380DF
#3  0x429A54 in ms_solver_dif_x_
#4  0x4207E1 in e_magnetic_
#0  0x2B7770B05407
#5  0x404CA9 in MAIN__ at em2d.f:?
#3  0x429A54 in ms_solver_dif_x_
#4  0x4207E1 in e_magnetic_
#5  0x404CA9 in MAIN__ at em2d.f:?
#0  0x2B55B5E36407
#1  0x2B55B5E36A1E
#2  0x2B55B6AE30DF
#0  0x2AAE93B7F407
#1  0x2AAE93B7FA1E
#2  #3  0x429A54 in ms_solver_dif_x_
#4  0x4207E1 in e_magnetic_0x2AAE9482C0DF

#5  0x404CA9 in MAIN__ at em2d.f:?
#3  0x429A54 in ms_solver_dif_x_
#4  0x4207E1 in e_magnetic_
#5  0x404CA9 in MAIN__ at em2d.f:?
#1  0x2B7770B05A1E
#2  0x2B77717B20DF
#3  0x429A54 in ms_solver_dif_x_
#4  0x4207E1 in e_magnetic_
#5  0x404CA9 in MAIN__ at em2d.f:?
#0  0x2B04A3147407
#1  0x2B04A3147A1E
#2  0x2B04A3DF40DF
#3  0x429A54 in ms_solver_dif_x_
#4  0x4207E1 in e_magnetic_
#5  0x404CA9 in MAIN__ at em2d.f:?
#0  0x2B1302901407
#1  0x2B1302901A1E
#2  0x2B13035AE0DF
#3  0x429A54 in ms_solver_dif_x_
#4  0x4207E1 in e_magnetic_
#5  0x404CA9 in MAIN__ at em2d.f:?
#0  0x2B2459137407
#1  0x2B2459137A1E
#2  0x2B2459DE40DF
#3  0x429A54 in ms_solver_dif_x_
#4  0x4207E1 in e_magnetic_
#5  0x404CA9 in MAIN__ at em2d.f:?
#0  0x2AC7D297B407
#1  0x2AC7D297BA1E
#2  0x2AC7D36280DF
#3  0x429A54 in ms_solver_dif_x_
#4  0x4207E1 in e_magnetic_
#5  0x404CA9 in MAIN__ at em2d.f:?
#0  0x2B1A57CC5407
#1  0x2B1A57CC5A1E
#2  0x2B1A589720DF
#3  0x429A54 in ms_solver_dif_x_
#4  0x4207E1 in e_magnetic_
#5  0x404CA9 in MAIN__ at em2d.f:?
#0  0x2B27E7A13407
#1  0x2B27E7A13A1E
#2  0x2B27E86C00DF
#3  0x429A54 in ms_solver_dif_x_
#4  0x4207E1 in e_magnetic_
#5  0x404CA9 in MAIN__ at em2d.f:?
#0  0x2B086B3AA407
#1  0x2B086B3AAA1E
#2  0x2B086C0570DF
#3  0x429A54 in ms_solver_dif_x_
#4  0x4207E1 in e_magnetic_
#5  0x404CA9 in MAIN__ at em2d.f:?
#0  0x2B976271E407
#1  0x2B976271EA1E
#2  0x2B97633CB0DF
#3  0x429A54 in ms_solver_dif_x_
#4  0x4207E1 in e_magnetic_
#5  0x404CA9 in MAIN__ at em2d.f:?
--------------------------------------------------------------------------
mpirun noticed that process rank 114 with PID 17183 on node n203 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[n216:04997] 12 more processes have sent help message help-mpi-runtime.txt / mpi_init:warn-fork
[n216:04997] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages


More information about the hpc mailing list