In order to improve the software performance for a CUDA developed application, the programmers must optimize the number of active threads and to balance their memory resources: the number of registers and threads used per
multiprocessor, the global memory bandwidth and the percentage of memory allocated to each thread.
Ayguade, "Performance, power efficiency and scalability of asymmetric cluster chip
multiprocessors," IEEE Computer Architecture Letters, vol.
In Section 4, we originate the hardware details of
multiprocessor systems and Section 5 deliberates real-time simulator model.
We are considering the mapping of tasks onto a
multiprocessor system and it consists of n identical processors P.
Torrellas, "Variation-aware application scheduling and power management for chip
multiprocessors," ACM SIGARCH Computer Architecture News, vol.
Voltage selection for time-constrained
multiprocessor systems on chip.
PHIL EDMONDS, ELEANOR CHU, AND ALAN GEORGE, Dynamic Programming on a Shared-Memory
Multiprocessor, Parallel Comput.
In shared-memory
multiprocessors with private caches, large cache blocks may also cause false sharing [Lilja 1993], which occurs when two or more processors wish to access different words within the same cache block and at least one of the accesses is a store.
The
multiprocessor capabilities on our high-end HP 9000 technical servers, combined with paralleled industry-leading software applications, provide chemists with unequalled computing resources."
Coping with memory latency is a fundamental challenge in large-scale shared-memory
multiprocessors. Part of the problem is the ever-widening gap between processor and memory speeds--a technology trend that is expected to continue.
Parallel, concurrent, and distributed programming required new ways to mediate access to hardware, for example user-level threads were devised to ameliorate the problems of programming
multiprocessor machines.
This design has been used in mainframe
multiprocessors for something like two decades.