[mephi-hpc] проблема с расчетом на Басове, MCU
anikeev
anikeev at ut.mephi.ru
Wed Nov 9 11:39:11 MSK 2016
On Mon, 2016-11-07 at 14:49 +0300, Богданович Ринат Бекирович wrote:
> Добрый день, я Вам звонил сегодня по этому вопросу.
Здравствуйте!
> Вот ссылка на папку с кодом /mnt/pool/2/rynatb/MCUPTR/EXE_LINUX
> Я настроил запуск задачи которая долго считается (название c2m5).
>
> В процессе первой стадии (подготовка бибилотек) программа использует
> имеющиеся библиотеки (папка MDBPT50).
> К ним прописывается путь в файле MCU.INI
Я запустил эту задачу от Вашего пользователя и наблюдаю за ней. Пока я
вижу, что загружен только процессор, оперативной памяти достаточно,
сеть свободна, на диски нагрузки нет. Либо до узкого горлышка ещё не
дошло, либо проблема в алгоритме. Возможно, падает производительность
от чрезмерного масштабирования параллельного алгоритма.
Подробный отчёт в приложении. Я проверю показатели в конце дня.
Подпишитесь, пожалуйста, на лист рассылки https://lists.mephi.ru/listin
fo/hpc с текущей почты, а то Ваши письма могут задерживаться и
теряться.
> C уважением,
> Ринат
>
> _______________________________________________
> hpc mailing list
> hpc at lists.mephi.ru
> https://lists.mephi.ru/listinfo/hpc
--
С уважением,
аспирант кафедры 4 МИФИ,
инженер отдела Unix-технологий,
Аникеев Артём.
Тел.: 8 (495) 788-56-99, доб. 8998
-------------- next part --------------
n104.basov ~ # atop
ATOP - n104 2016/11/09 10:54:40 --------- 10s elapsed
PRC | sys 0.69s | user 4m59s | | #proc 383 | #trun 32 | #tslpi 388 | #tslpu 0 | #zombie 0 | clones 0 | | #exit 0 |
CPU | sys 3% | user 3001% | irq 0% | | idle 199% | wait 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu000 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu001 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu002 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu003 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu004 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu005 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu006 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu007 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu008 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu009 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu010 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu011 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu012 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu013 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu014 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu015 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu017 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu018 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu019 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu020 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu021 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu022 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu023 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu024 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu025 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu026 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu027 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu028 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 100% | irq 0% | | idle 0% | cpu029 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 2% | user 98% | irq 0% | | idle 0% | cpu016 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 0% | irq 0% | | idle 100% | cpu030 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
cpu | sys 0% | user 0% | irq 0% | | idle 100% | cpu031 w 0% | | steal 0% | guest 0% | curf ?MHz | curscal ?% |
CPL | avg1 29.99 | avg5 23.68 | | avg15 11.85 | | | csw 620453 | intr 310234 | | | numcpu 32 |
MEM | tot 125.9G | free 69.8G | cache 54.7G | dirty 0.0M | buff 341.5M | slab 438.6M | slrec 362.3M | shmem 0.7M | shrss 0.0M | shswp 0.0M | |
SWP | tot 0.0M | free 0.0M | | | | | | | | vmcom 1.7G | vmlim 62.9G |
NET | transport | tcpi 26 | tcpo 104 | udpi 0 | udpo 0 | tcpao 0 | tcppo 0 | tcprs 0 | tcpie 0 | tcpor 0 | udpip 0 |
NET | network | ipi 26 | ipo 48 | ipfrw 0 | deliv 26 | | | | | icmpi 0 | icmpo 0 |
NET | eth0 ---- | pcki 25 | pcko 103 | si 4 Kbps | so 532 Kbps | coll 0 | mlti 0 | erri 0 | erro 0 | drpi 0 | drpo 0 |
NET | eth2 ---- | pcki 1 | pcko 1 | si 0 Kbps | so 0 Kbps | coll 0 | mlti 0 | erri 0 | erro 0 | drpi 0 | drpo 0 |
PID TID SYSCPU USRCPU VGROW RGROW RUID EUID THR ST EXC S CPU CMD 1/17
12150 - 0.00s 9.99s 0K 0K rynatb rynatb 2 -- - R 100% mcu5_mpi_ptr
12135 - 0.00s 9.99s 0K 0K rynatb rynatb 2 -- - R 100% mcu5_mpi_ptr
12136 - 0.00s 9.99s 0K 0K rynatb rynatb 2 -- - R 100% mcu5_mpi_ptr
12154 - 0.00s 9.99s 0K 0K rynatb rynatb 2 -- - R 100% mcu5_mpi_ptr
n104.basov ~ # bwm-ng
bwm-ng v0.6 (probing every 0.500s), press 'h' for help
input: /proc/net/dev type: rate
\ iface Rx Tx Total
==============================================================================
eth0: 0.00 KB/s 0.00 KB/s 0.00 KB/s
eth1: 0.00 KB/s 0.00 KB/s 0.00 KB/s
eth2: 0.14 KB/s 0.28 KB/s 0.43 KB/s
lo: 0.00 KB/s 0.00 KB/s 0.00 KB/s
------------------------------------------------------------------------------
total: 0.14 KB/s 0.28 KB/s 0.43 KB/s
n104.basov ~ # bwm-ng -t 600000
bwm-ng v0.6 (probing every 600.000s), press 'h' for help
input: /proc/net/dev type: rate
\ iface Rx Tx Total
==============================================================================
eth0: 150.65 KB/s 124.79 KB/s 275.44 KB/s
eth1: 0.01 KB/s 0.00 KB/s 0.01 KB/s
eth2: 0.02 KB/s 0.03 KB/s 0.05 KB/s
lo: 0.00 KB/s 0.00 KB/s 0.00 KB/s
------------------------------------------------------------------------------
total: 150.67 KB/s 124.82 KB/s 275.49 KB/s
basov anikeev # iostat -x 1
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.03 0.00 0.00 99.97
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.03 0.00 0.00 0.00 0.00 99.97
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
1.66 0.00 0.12 0.00 0.00 98.22
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 5.00 0.00 20.00 0.00 100.00 10.00 0.20 9.85 0.00 9.85 0.70 1.40
sdb 0.00 0.00 0.00 32.00 0.00 15480.00 967.50 0.37 11.69 0.00 11.69 0.62 2.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
master.basov anikeev # iostat -x 600
Linux 3.13.6-basov (master.basov.hpc.mephi.ru) 11/09/2016 _x86_64_ (32 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.39 0.00 0.10 0.53 0.00 98.98
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.75 1.78 0.58 3.56 14.94 266.77 135.93 0.35 85.58 3.78 99.01 1.12 0.46
sdb 0.04 6.67 0.77 10.85 56.96 3221.58 564.58 0.39 33.98 6.84 35.91 0.65 0.75
sdc 0.00 0.00 0.00 0.00 0.00 0.00 15.42 0.00 11.95 11.95 0.00 11.93 0.00
sde 0.07 0.13 1.87 0.95 230.01 443.49 476.75 0.08 29.10 3.56 79.19 0.94 0.27
sdd 0.00 0.00 0.00 0.00 0.00 0.00 15.42 0.00 1.30 1.30 0.00 1.26 0.00
sdf 0.00 0.00 0.00 0.00 0.01 0.00 9.42 0.00 1.26 1.23 33.00 1.15 0.00
sdg 0.00 0.00 0.00 0.00 0.01 0.00 9.60 0.00 1.25 1.22 28.00 1.13 0.00
sdh 0.00 0.00 0.00 0.00 0.05 0.00 45.63 0.00 2.28 2.16 33.00 1.55 0.00
sdi 1.21 0.27 23.06 0.42 3203.07 128.60 283.87 1.23 52.25 12.05 2276.71 1.41 3.30
avg-cpu: %user %nice %system %iowait %steal %idle
0.04 0.00 0.16 0.03 0.00 99.78
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 1.56 0.00 2.34 0.00 21.14 18.06 0.01 4.59 0.00 4.59 0.72 0.17
sdb 0.00 3.88 0.01 47.80 0.02 23632.21 988.76 1.74 36.36 11.67 36.36 0.66 3.15
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.07 0.00 0.60 0.00 274.55 915.18 0.04 62.12 0.00 62.12 0.61 0.04
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
More information about the hpc
mailing list