[mephi-hpc] проблема с расчетом на Басове, MCU

anikeev anikeev at ut.mephi.ru
Wed Nov 9 11:39:11 MSK 2016


On Mon, 2016-11-07 at 14:49 +0300, Богданович Ринат Бекирович wrote:
> Добрый день, я Вам звонил сегодня по этому вопросу.

Здравствуйте!
 
> Вот ссылка на папку с кодом /mnt/pool/2/rynatb/MCUPTR/EXE_LINUX
> Я настроил запуск задачи которая долго считается (название c2m5).
>  
> В процессе первой стадии (подготовка бибилотек) программа использует
> имеющиеся библиотеки (папка MDBPT50).
> К ним прописывается путь в файле MCU.INI

Я запустил эту задачу от Вашего пользователя и наблюдаю за ней. Пока я
вижу, что загружен только процессор, оперативной памяти достаточно,
сеть свободна, на диски нагрузки нет. Либо до узкого горлышка ещё не
дошло, либо проблема в алгоритме. Возможно, падает производительность
от чрезмерного масштабирования параллельного алгоритма.

Подробный отчёт в приложении. Я проверю показатели в конце дня.

Подпишитесь, пожалуйста, на лист рассылки https://lists.mephi.ru/listin
fo/hpc с текущей почты, а то Ваши письма могут задерживаться и
теряться.

> C уважением,
> Ринат
>  
> _______________________________________________
> hpc mailing list
> hpc at lists.mephi.ru
> https://lists.mephi.ru/listinfo/hpc
-- 
С уважением,
аспирант кафедры 4 МИФИ,
инженер отдела Unix-технологий,
Аникеев Артём.
Тел.: 8 (495) 788-56-99, доб. 8998
-------------- next part --------------
n104.basov ~ # atop
ATOP - n104                                         2016/11/09  10:54:40                                         ---------                                         10s elapsed
PRC | sys    0.69s |  user   4m59s |              | #proc    383 |  #trun     32 | #tslpi   388 | #tslpu     0  | #zombie    0 | clones     0 |               | #exit      0 |
CPU | sys       3% |  user   3001% | irq       0% |              |  idle    199% | wait      0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu000 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu001 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu002 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu003 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu004 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu005 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu006 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu007 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu008 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu009 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu010 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu011 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu012 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu013 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu014 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu015 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu017 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu018 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu019 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu020 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu021 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu022 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu023 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu024 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu025 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu026 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu027 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu028 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user    100% | irq       0% |              |  idle      0% | cpu029 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       2% |  user     98% | irq       0% |              |  idle      0% | cpu016 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user      0% | irq       0% |              |  idle    100% | cpu030 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
cpu | sys       0% |  user      0% | irq       0% |              |  idle    100% | cpu031 w  0% |               | steal     0% | guest     0% | curf    ?MHz  | curscal   ?% |
CPL | avg1   29.99 |  avg5   23.68 |              | avg15  11.85 |               |              | csw   620453  | intr  310234 |              |               | numcpu    32 |
MEM | tot   125.9G |  free   69.8G | cache  54.7G | dirty   0.0M |  buff  341.5M | slab  438.6M | slrec 362.3M  | shmem   0.7M | shrss   0.0M | shswp   0.0M  |              |
SWP | tot     0.0M |  free    0.0M |              |              |               |              |               |              |              | vmcom   1.7G  | vmlim  62.9G |
NET | transport    |  tcpi      26 | tcpo     104 | udpi       0 |  udpo       0 | tcpao      0 | tcppo      0  | tcprs      0 | tcpie      0 | tcpor      0  | udpip      0 |
NET | network      |  ipi       26 | ipo       48 | ipfrw      0 |  deliv     26 |              |               |              |              | icmpi      0  | icmpo      0 |
NET | eth0    ---- |  pcki      25 | pcko     103 | si    4 Kbps |  so  532 Kbps | coll       0 | mlti       0  | erri       0 | erro       0 | drpi       0  | drpo       0 |
NET | eth2    ---- |  pcki       1 | pcko       1 | si    0 Kbps |  so    0 Kbps | coll       0 | mlti       0  | erri       0 | erro       0 | drpi       0  | drpo       0 |

  PID         TID       SYSCPU        USRCPU        VGROW        RGROW        RUID           EUID             THR       ST       EXC        S        CPU        CMD       1/17
12150           -        0.00s         9.99s           0K           0K        rynatb         rynatb             2       --         -        R       100%        mcu5_mpi_ptr
12135           -        0.00s         9.99s           0K           0K        rynatb         rynatb             2       --         -        R       100%        mcu5_mpi_ptr
12136           -        0.00s         9.99s           0K           0K        rynatb         rynatb             2       --         -        R       100%        mcu5_mpi_ptr
12154           -        0.00s         9.99s           0K           0K        rynatb         rynatb             2       --         -        R       100%        mcu5_mpi_ptr

n104.basov ~ # bwm-ng

  bwm-ng v0.6 (probing every 0.500s), press 'h' for help
  input: /proc/net/dev type: rate
  \         iface                   Rx                   Tx                Total
  ==============================================================================
             eth0:           0.00 KB/s            0.00 KB/s            0.00 KB/s
             eth1:           0.00 KB/s            0.00 KB/s            0.00 KB/s
             eth2:           0.14 KB/s            0.28 KB/s            0.43 KB/s
               lo:           0.00 KB/s            0.00 KB/s            0.00 KB/s
  ------------------------------------------------------------------------------
            total:           0.14 KB/s            0.28 KB/s            0.43 KB/s

n104.basov ~ # bwm-ng -t 600000

  bwm-ng v0.6 (probing every 600.000s), press 'h' for help
  input: /proc/net/dev type: rate
  \         iface                   Rx                   Tx                Total
  ==============================================================================
             eth0:         150.65 KB/s          124.79 KB/s          275.44 KB/s
             eth1:           0.01 KB/s            0.00 KB/s            0.01 KB/s
             eth2:           0.02 KB/s            0.03 KB/s            0.05 KB/s
               lo:           0.00 KB/s            0.00 KB/s            0.00 KB/s
  ------------------------------------------------------------------------------
            total:         150.67 KB/s          124.82 KB/s          275.49 KB/s

basov anikeev # iostat -x 1
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.03    0.00    0.00   99.97

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.03    0.00    0.00    0.00    0.00   99.97

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.66    0.00    0.12    0.00    0.00   98.22

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     5.00    0.00   20.00     0.00   100.00    10.00     0.20    9.85    0.00    9.85   0.70   1.40
sdb               0.00     0.00    0.00   32.00     0.00 15480.00   967.50     0.37   11.69    0.00   11.69   0.62   2.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

master.basov anikeev # iostat -x 600
Linux 3.13.6-basov (master.basov.hpc.mephi.ru)  11/09/2016      _x86_64_        (32 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.39    0.00    0.10    0.53    0.00   98.98

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.75     1.78    0.58    3.56    14.94   266.77   135.93     0.35   85.58    3.78   99.01   1.12   0.46
sdb               0.04     6.67    0.77   10.85    56.96  3221.58   564.58     0.39   33.98    6.84   35.91   0.65   0.75
sdc               0.00     0.00    0.00    0.00     0.00     0.00    15.42     0.00   11.95   11.95    0.00  11.93   0.00
sde               0.07     0.13    1.87    0.95   230.01   443.49   476.75     0.08   29.10    3.56   79.19   0.94   0.27
sdd               0.00     0.00    0.00    0.00     0.00     0.00    15.42     0.00    1.30    1.30    0.00   1.26   0.00
sdf               0.00     0.00    0.00    0.00     0.01     0.00     9.42     0.00    1.26    1.23   33.00   1.15   0.00
sdg               0.00     0.00    0.00    0.00     0.01     0.00     9.60     0.00    1.25    1.22   28.00   1.13   0.00
sdh               0.00     0.00    0.00    0.00     0.05     0.00    45.63     0.00    2.28    2.16   33.00   1.55   0.00
sdi               1.21     0.27   23.06    0.42  3203.07   128.60   283.87     1.23   52.25   12.05 2276.71   1.41   3.30

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.04    0.00    0.16    0.03    0.00   99.78

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     1.56    0.00    2.34     0.00    21.14    18.06     0.01    4.59    0.00    4.59   0.72   0.17
sdb               0.00     3.88    0.01   47.80     0.02 23632.21   988.76     1.74   36.36   11.67   36.36   0.66   3.15
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.07    0.00    0.60     0.00   274.55   915.18     0.04   62.12    0.00   62.12   0.61   0.04
sdd               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00


More information about the hpc mailing list