# Graduiertenkolleg: Methods for Discrete Structures

Deutsche Forschungsgemeinschaft
|
locations | Term schedule | history
|

# Monday Lecture and Colloquium

Monday, May 3, 2010

Humboldt-Universität zu Berlin
Institut für Informatik
Rudower Chaussee 25,
12489 Berlin
room 4.112

Lecture - 14:15

### Fast l_p Regression in a Data Stream (joint work with David Woodruff, IBM Almaden)

Abstract:
Linear regression analysis is a basic statistical method to study linear dependencies between variables in the presence of noise, arising, for example, from experimental measurements. In the standard setting we have one dependent variable $Y$, which is assumed to be linearly dependent from a set of observed variables $X_0,\dots, X_m$. Typically, one assumes that
$$Y= \beta_0 + \beta_1 X_1 + \dots \beta_m X_m + \epsilon,$$
where $\epsilon$ is assumed to be a noise variable. The input of a regression problem is a set of $n$ observations $(b_i, a_i)$ of the pair $(Y,X)$. We will summarize the $a_i$ to form a matrix $A$ whose $i$-th row is $(1,a_i)$ and the $b_i$ to form a vector $b$. The problem is then to determine the values of the $\beta_i$ that explain the data the best way. One standard model is $l_2$-regression, where $\epsilon$ is assumed to be Gaussian and one tries to maximize the log-likelihood of the observed data, i.e., one minimizes $\| A\beta -b\|_2$. Other popular variants of regression analysis include $\ell_p$-regression, where one tries to minimize $\|A\beta-b\|_p$ for some $p$, $1\le p < 2$ with the special case $p=1$ also being called \emph{least absolute deviation} (LAD) regression.

In this talk we will study the $\ell_p$-regression problem, $1 \le p <2$, in the general turnstile model for data streams, i.e., we assume that the stream consists of a sequence of updates to the matrix $A$ and the vector $b$, where each update specifies an entry in $A$ or $b$ and the amount of the update. We show that the heavily overconstraint version of the problem, i.e. n>>m, can be approximated within a factor of (1+eps) in poly(m, 1/eps, log n) space and time per update.

Colloquium - 16:00

### SRPT is 1.86-competitive for Completion Time Scheduling

Abstract:
We consider classical multiprocessor scheduling, where preemptible jobs arrive over time and the objective is to minimize the total completion time. It is well known that the SRPT (Shortest Remaining Processing Time) algorithm is 2-competitive, but the bound was not further improved for 15 years. We break the barrier of 2 by showing that SRPT is 1.86-competitive.
joint work with Christine Chung and Tim Nonner (SODA 2010)

Letzte Aktualisierung: 26.04.2010