Web computing is a variant of parallel computing where the idle times of PCs
donated by worldwide distributed users are employed to execute parallel
programs. In this thesis we consider a web computing variant with two
important properties: First, we support the execution of coupled, massively
parallel algorithms (rather than distributed data processing). And second,
we organize the system in peer-to-peer fashion.

We present the Paderborn University BSP-based Web Computing (PUB-Web) library,
which supports the execution of parallel programs in the bulk-synchronous style
(BSP) in such a web computing setting. In this thesis, we focus on important
technical and algorithmic aspects, in particular: In order to schedule
processes with respect to the currently available computing power, which
continually changes in an unpredictable fashion, we need intelligent load
balancing algorithms and -- as a basic precondition -- the technical ability
to migrate threads at runtime.

To achieve the latter in a way suitable for production use, compatible with
recent Java versions, available for all important platforms, and easy-to-use
for developers, we develop the PadMig thread migration and checkpointing
library.

In order to tackle the distributed load balancing problem, we present an
algorithm based on Distributed Heterogeneous Hash-Tables. In order to judge
the quality of the schedules produced, we perform extensive experiments to
compare several variants of the DHHT-based load balancer with the well-
established Work Stealing algorithm, using realistic input data obtained by
profiling the utilization of several hundred PCs for a period of several
months.

Beside the available computing power, we finally also consider the network
bandwidth as a secondary criterion for load balancing. For this purpose, we
cluster the PUB-Web network according to bandwidth, employing a novel,
fault-tolerant, adaptive, and scaling distributed clustering algorithm called
DiDiC. In order to judge the quality of the clusterings produces by DiDiC,
we experimentally compare it to the well-established MCL algorithm using a
simulator.