Distributed & Parallel Processing

<H1>Distributed &amp; Parallel Processing<H1

<H2>Presenter: Professor George Wells</H2>

<H3>July &ndash; August 2011</H3>

<H4>Description</H4>

<P>This course is aimed at introducing the student to the general area of 
computer science known as distributed and parallel processing. A broad overview 
of the subject is taken, describing a variety of distributed parallel and client 
server options, from formal specifications to practical implementations. The 
content covered is wide ranging, and relatively shallow, and is intended to 
build upon a general undergraduate Computer Science knowledge. A special 
emphasis is placed on affordable asynchronous processing, currently the most 
prevalent model, and on models applicable to multicore processors, which are of 
ever-increasing importance.</p>
      
<P>This course is based closely on versions given in previous years by Dr. Peter 
Clayton and Dr. Karen Bradshaw.</P>
      
<H4>Introduction</H4>

<P>In the past two to three decades, the field of concurrent programming has 
been a prolific area of research, spurred on by the realization that concurrent 
algorithms frequently provide a more naturally expressed solution to many 
problems. Concurrent programming principles are no longer solely the domain of 
the implementors of operating systems, but are being applied to an ever 
increasing range of applications. Several authors affirm that when a concurrent 
solution is formulated in a sequential programming language, the mapping of the 
solution onto the sequential program is unnatural, and therefore error prone, 
difficult to maintain, and unreliable. Numerous programming languages have been 
developed for the expression of concurrent algorithms, and many distributed 
environments exist in which general programming notations can be used to express 
concurrent behaviour. In addition to the software engineering benefits that 
modern concurrent programming tools afford applications in which concurrent 
behaviour is a fundamental aspect of the problem area, they provide for the 
direct representation of processes which will execute in parallel on 
multiprocessor hardware. This enables the software designer to obtain increased 
performance, in terms of speed, or reliability, or both.</p>

<p>Coupled to the concept of concurrent processing (many simultaneous 
activities), is the opportunity of distributed processing (many simultaneous 
places). This quickly extends from multicore processors and multiprocessor 
architectures, to local area networks, and to wide area networks. The basic 
requirements for distributing applications are simple: to distribute application 
processes across multiple machines/processors/cores, to locate those processes, 
and to communicate among them. Of course, other services are required, such as: 
maintaining security of the data; accommodating different networks, operating 
systems, and data formats; providing transactional support; and accessing often 
incompatible data sources. In order to address these tasks in general networks, 
a proliferation of application development products identify themselves as 
<em>middleware</em>.</p>

<p>Because the degree of complexity rises rapidly in system that are spread 
about a network, with a lot of simultaneous activity, there is a strong need for 
formal software engineering approaches to the design and implementation of 
distributed and parallel systems.</p>

<H4>Some Terminology</h4>

<p><B>Concurrency and Parallelism</B>: Two entities are said to be executing in 
<em>parallel</em> if at some instant in time both are actually executing. 
Entities are described as <em>concurrent</em> if they have the potential for 
executing in parallel. Therefore, programming languages or run-time environments 
are described as concurrent, rather than parallel. A concurrent programming 
language will have more than one thread of control, enabling code segments which 
could execute in parallel to be directly represented.</p>
        
<P><B>Processes, processors, and tasks:</B> We define a <em>task</em> as an 
operation or a set of operations to be performed, and a <em>process</em> as an 
instance of the task (these terms are frequently used interchangeably in the 
literature). The term <em>processing</em> means the execution of an instance of 
the task. <em>Processors</em> are those things which carry out processes. In a 
computer system this is often taken to be an ALU (Arithmetic Logic unit) or CPU 
(Central Processing Unit). However, in most modern systems the ability of a 
single CPU to time-slice between several processes gives rise to the concept of 
a <em>virtual machine</em> or <em>abstract processor</em>.</p>

<p>After coming into existence, a process' life-history can be defined in terms 
of three primary states: <em>executing</em>, <em>executable</em>, and 
<em>suspended</em>. A process is suspended if it is delayed (most 
synchronization primitives can lead to delay). This state is also known as 
<em>blocked</em>. If it is not suspended then a process can either be executing, 
if there is a processor available, or be able to execute (but prohibited from 
doing so by the lack of a processor). Modern languages usually require their 
compiler to generate a <em>run-time system</em> that will manage the queues of 
suspended and executable processes, and to schedule the executable work when 
there are not enough processors for the executable processes. When moving from 
running one process to running another, the run-time system must also cater for 
state changes, which it does using a procedure known as a <em>context 
switch</em>. This will involve storing the volatile environment of the 
concurrent process and restoring the corresponding environment of the process 
that is due to run.</p>

<p><B>Classifying Computer Architectures:</B> A popular method of classifying 
computer architectures, published by Flynn, considers the potential multiplicity 
of the instruction and data streams of a computer. The <em>instruction 
stream</em> is the sequence of instructions performed by a computer; the 
<em>data stream</em> is a sequence of data used in the execution of the 
instruction stream. Flynn's taxonomy identifies four classes of computers:</p> 
<ul> <li><em>Single Instruction stream, Single Data stream 
(<strong>SISD</strong>)</em>. Most serial computers fall into this category. 
Although instruction execution may be pipelined, or certain components of the 
ALU may be able to operate simultaneously, computers in this category can decode 
only a single instruction in unit time. A SISD computer might have multiple 
functional units (e.g. the CDC 6600), but these are under the direction of a 
single control unit. </li> <li><em>Single Instruction stream, Multiple Data 
stream (<strong>SIMD</strong>)</em>. Processor arrays and vector processors fall 
into this category. A processor array executes a single stream of instructions, 
but contains a number of arithmetic processing units, each capable of fetching 
and manipulating its own data. hence in any time unit, a single operation is in 
the same state of execution on multiple processing units, each manipulating 
different data. </li> <li><em>Multiple Instruction stream, Single Data stream 
(<strong>MISD</strong>)</em>. No practical computers fall into this category. 
</li> <li><em>Multiple Instruction stream, Multiple Data stream 
(<strong>MIMD</strong>)</em>. This category contains most multiprocessor 
systems. Each CPU in the MIMD computer asynchronously executes its own 
instruction stream to perform computations using its data stream. To qualify as 
a parallel processing MIMD computer, a multiprocessor should allow efficient 
interactions among its processors.</li> </ul>

<P><B>Distributed and parallel processing:</B> As a rather vague working 
definition, we shall consider a distributed and parallel processing system as 
one which involves the simultaneous operation of multiple interconnected 
processors.</p>

<p><B>Application Development Environments:</B> ADEs also span a wide array of 
products and services. ADEs generally provide a high-level development language, 
and usually include tools that facilitate cross-platform applications by 
accommodating differences in operating environments and user interfaces. 
Deployment of applications may require additional services such as network 
communications, application partitioning and distribution services, component 
location services, management, and cross-platform deployment services. These 
services may be an integrated part of the ADE, or the ADE may rely on other 
middleware and communications products.</p>

<p><B>Object Development Environments:</B> Object development environments are 
designed for the development of reusable software components. In a distributed 
environment, the components (objects) usually interact through an object request 
broker (ORB). When an application is distributed, the ORB handles the requests 
that one object makes of another object, and provides the mechanism for locating 
and interacting with objects across the network. ORBs can also interact with and 
rely on other forms of middleware for application communication and distributed 
services.</p>

<p><B>Data Access:</B> Database management systems traditionally house and 
manage the consistent access to data for an application. However, distributed 
applications need to access data from numerous back-end sources, often running 
on different platforms. Data access products allow developers to view disparate 
data sources in a consistent way. The vast majority of business logic resides in 
the client application, and database middleware (data passing) is targeted at 
providing a solution for two-tier architectures where the dataflow across the 
network to and from a remote database server is in the form of statements.</p>

<p><B>MOM &mdash; Message Oriented Middleware:</B> MOM is an enabling software 
layer residing between the business applications and the network infrastructure 
that supports high-performance interoperability of large-scale distributed 
applications in heterogeneous environments. It supports multiple communication 
protocols, languages, applications, and hardware and software platforms. It 
resides between the business applications and the network infrastructure, or 
between applications themselves, depending on the implementation. MOM refers to 
the process of distributing data and control through the exchange of messages. 
It extends process-to-process communication in a distributed environment by 
providing message passing or message queuing models, supporting both synchronous 
and asynchronous communications. MOM lends itself to event-driven rather than 
procedural processing. Time-dependent and time-independent processing, as well 
as memory and disk-based systems, are all available.</p>

<P><HR></p>

<P>Any enquiries concerning the material on these course pages should be 
directed to <a href="mailto:G.Wells@ru.ac.za">George Wells</a>.</P>

<p<font size="-1"><em>Last updated: 25 September 2011</em></font></p>