Monday, May 3, 2010
Humboldt-Universität zu Berlin
Institut für Informatik
Rudower Chaussee 25,
12489 Berlin
room 4.112
Lecture - 14:15
Abstract:
Linear regression analysis is a basic statistical method to study linear dependencies between variables in the presence of noise, arising, for example,
from experimental measurements. In the standard setting we have one dependent variable $Y$, which is assumed to be linearly
dependent from a set of observed variables $X_0,\dots, X_m$. Typically, one assumes that
$$
Y= \beta_0 + \beta_1 X_1 + \dots \beta_m X_m + \epsilon,
$$
where $\epsilon$ is assumed to be a noise variable. The input of a regression problem is a set of $n$ observations
$(b_i, a_i)$ of the pair $(Y,X)$. We will summarize the $a_i$ to form a matrix $A$ whose $i$-th row is $(1,a_i)$ and
the $b_i$ to form a vector $b$. The problem is then to determine the values of the $\beta_i$ that explain the
data the best way. One standard model is $l_2$-regression, where $\epsilon$ is assumed to be Gaussian
and one tries to maximize the log-likelihood of the observed data, i.e., one minimizes
$\| A\beta -b\|_2$. Other popular variants of regression analysis include $\ell_p$-regression, where one
tries to minimize $\|A\beta-b\|_p$ for some $p$, $1\le p < 2$ with the special case $p=1$ also being called
\emph{least absolute deviation} (LAD) regression.
In this talk we will study the $\ell_p$-regression problem, $1 \le p <2$, in the general turnstile
model for data streams, i.e., we assume that the stream consists of a sequence of updates to
the matrix $A$ and the vector $b$, where each update specifies an entry in $A$ or $b$ and
the amount of the update. We show that the heavily overconstraint version of the problem, i.e. n>>m,
can be approximated within a factor of (1+eps) in poly(m, 1/eps, log n) space and time per update.
Colloquium - 16:00
Abstract:
We consider classical multiprocessor scheduling, where
preemptible jobs arrive over time and the objective is to minimize the
total completion time. It is well known that the SRPT (Shortest Remaining
Processing Time) algorithm is 2-competitive, but the bound was not further
improved for 15 years. We break the barrier of 2 by showing that SRPT is
1.86-competitive.
joint work with Christine Chung and Tim Nonner (SODA 2010)