[컴퓨터구조] Parallel Processors 정리

Study

[컴퓨터구조] Parallel Processors 정리

daehwi 2023. 3. 21. 16:51

Introduction

Connection multiple computers to get higher performance

Multiprocessor: 최소 두 개의 프로세서를 가진 컴퓨터
Multicore microprocessor: 프로세서는 1개, 코어가 여러 개인
Task-level(process-level) parallelism: 여래 개의 일을 여러 개의 프로세서가 각각 실행
Parallel processing program: multiple processor에서 실행되는 single program.

Hardware and Software

Hardware
- Serial: Sigle Core
- Parallel: Multicore
Software
- Sequential: 순차적 실행만 가능
- Concurrent: 병렬적 실행 가능

Parallel Programming

멀티프로세서로 처리하는 경우에 주의사항

Parallel software is hard to programing
Need to get significant performance improvement.
⇒ 의미 있는 성능향상이 없으면, 굳이 사용할 이유가 없다.
Difficulties
1. Partitioning: 각각의 프로세서에 sub task를 똑같이 분배하는 것.
  그렇지 않으면 어떤 프로세서는 작업을 마치고 기다리는 경우 발생
2. Corrdination: Sub task의 결과를 합쳐서 최종 결과를 만들어야 하기 때문에 비용이 소모된다.
3. Communications overhead: 네트워크를 이용해서 데이터를 주고받기 때문에 overhead 발생

Amdahl’s Law

Amount of improvemen: 향상된 정도(ex. n times faster)
Execution Time affected by improvement: 성능향상에 영향을 받는 시간(향상되기 전 시간)

위 공식을 Speed-up에 대해서 정리하면,

Execution time before을 1이라고 가정했을 때,
Execution Time before - Execution Time affectd = 1-Fraction time affected로 나타낼 수 있다.

Strong vs Weak Scaling

Strong: Size of the problem 증가 없이 하드웨어만 증가
Weak: Size of the problem 증가 + 하드웨어 증가

Load Balancing Problem

summary: 똑같이 분배할수록 성능이 향상되고, 불균형이 심해질수록 성능이 하락한다.

Instruction and Data Streams

SISD: Single Instruction Single Data Steam (uniprocessor)
SIMD: Single Instruction Mutiple Data Stream (vector architecture)
MISD: 실제로는 없음
MIMD: multiprocessor
SPMD: Single Program Multiple Data (in MIMD)
→ 여러 개의 프로세서가 하나의 프로그램을 실행하는 경우. 서로 다른 코드 섹션을 실행해야 한다.

Vector Architecture (=SIMD)

하나의 명령어에 여러 개의 Data stream

Instruction: lv, sv, addv.d, addvs.d(scalar)

⇒ Significanlty reduce instruction-fetch bandwith

Example: DAXPY (Double precision a*X Plus Y)
⇒Vector Architecture을 이용해서 계산하면, loop를 돌지 않아도 된다. = 효율적이다.
Vector Units can be combination of pipelined units (lanes)

Vector vs Scalar

Single vector instruction은 전체 반복문을 실행하는 것과 똑같은 효과를 낸다.
명령어 fetch-decode bandwidth가 굉장히 증가하고, 시간 및 에너지도 절약된다.
Data Hazard, Control Hazard도 사라진다.
SIMD는 MIMD 보다 구현이 쉽다.
메모리 latency도 줄어든다. (scalar는 여러 번, vector 1)

저작자표시