HPCViz seminar March 20, Don Estep

Don Estep will be visiting the CTL group in March 20th and will give a seminar on that day. The topic is “Stochastic Inverse Problems for Parameter Determination”. See below for an abstract. The seminar is between 14-15 in the Visualisation Studio on the fourth floor in the D-building.

4Don Estep

Abstract. A mathematical model of a physical system determines a map between the parameter and data values characterizing the properties of a particular system and the output quantities describing the behavior of the system. In many cases, we can make observations of the behavior of the system, but the parameter and data values cannot be observed directly. This raises the inverse problem of determining the possible parameter/data values that correspond to given observations. A defining characteristic of this inverse problem is that the solutions are set-valued or equivalence classes, since in general multiple parameter values can yield the same output value. Moreover, since observation data generally has a stochastic nature, the solution of the inverse problem is described as a probability measure.

We describe recent work on the formulation, solution, and uncertainty quantification for the parameter identification problem. This new approach has two stages: a systematic way to approximate set-valued solutions of inverse problems and the use of measure theoretic techniques for approximating probability distributions in parameter space. We also carry out an error analysis for uncertainty quantification. We will also describe some current work and the relation to other inverse problems such as data assimilation.

Anders Ynnerman’s affiliated professorship renewed

The president of KTH has decided to renew Anders Ynnerman’s affiliated professorship in scientific visualisation at the school of CSC until the end of 2016.


Previously, professor Ynnerman was affiliated with CSC between 2007 and 2011, in the same area. Anders Ynnerman is the director of the Visualization Center C in Linköping and is a professor in scientific visualisation at Linköping University, where he has held a chair since 1999. He is also one of the co-founders of the Center for Medical Image Science and Visualization (CMIV).

HPCViz seminar, November 13

On November 13, at 9:00, there will be another entry in the HPCViz seminar series, once again held in the visualization studio (room 4450, fourth floor of the D-building). Laeeq Ahmed will talk about Parallel Virtual Screening using MapReduce. It is a one hour lecture.


2013-03-29 21.07.35

Drug discovery is the process of screening a large number of chemical libraries to find new medicines. Due to the huge size of chemical libraries, traditional screening is time-consuming and costly. With advancements in computer technology, Virtual screening is performed using machine learning techniques for filtering large collection of chemical structures. Support-vector-machine (SVM) is one of the most famous machine learning techniques for classification and regression analysis. In this work we developed a parallel version of SVM based virtual screening using iterative MapReduce programming model Spark, to further reduce the filtering time and thus the cost. I will first introduce Spark and its usage in the cluster environment and later discuss the case study of parallel SVM based virtual screening.


HPCViz seminar, Sept 23 (11:00 – 12:00) in D3

On September 23 there will be a guest lecture in the HPCViz seminar series, held in the D3 lecture hall between 11:00-12:00. We will be visited by John Wilkes of Google, who will speak on cluster management at Google.

Cluster management at Google

Cluster management is the term that Google uses to describe how we control the computing infrastructure in our datacenters that supports almost all of our external services. It includes allocating resources to different applications on our fleet of computers, looking after software installations and hardware, monitoring, and many other things. My goal is to present an overview of some of these systems, introduce Omega, the new cluster-manager tool we are building, and present some of the challenges that we’re facing along the way. Many of these challenges represent research opportunities, so I’ll spend the majority of the time discussing those.

John Wilkes

Short bio:
John Wilkes has been at Google since 2008, where he is working on cluster management and infrastructure services. He is interested in far too many aspects of distributed systems, but a recurring theme has been technologies that allow systems to manage themselves. In his spare time he continues, stubbornly, trying to learn how to blow glass.

John will be around for individual discussions after the talk – if interested please inform erwinl@pdc.kth.se

HPCViz seminar, 26 sept (11.00 – 12.00)

Internet of sports – combining sports and computer science to cool services 

The talk will be divided in two parts: the area of internet of sports and the business opportunities and an example of a service, MySkiLab, for movement analysis of cross country skiing in the field. The service has been developed in close collaboration with the Swedish ski-team and leading researchers in integrative physiology and bio mechanics. The competences needed to realize the services are machine learning, statistics, user experience, big data analytics, client and server programming. The application has now been released in a beta version.


Dr Christer Norström

Dr Christer Norström is CEO of SICS, Swedish Institute of Computer Science. He has a PhD from KTH in 1997. He has an extensive experience from industry both as engineer and manager with in ABB as well as expert consultant for both Swedish and international companies.

Today his interest is in internet of things and especially its usage within well-being and sport. Christer loves cross country skiing. He is a youth trainer in cross country skiing, author of the waxing book Vallaguiden and the entrepreneur behind the movement analysis tool MySkiLab.

HPCViz seminar, Sept 25 (11.00 – 12.00)

Spark Streaming: Fault-tolerant Streaming Computation at Scale

Matei Zaharia, UC Berkeley

September 25, 2013, 11:00 – 12:00 in the Visualization Studio

Many “big data” applications need to act on data arriving in real time. Running these applications at ever-larger scales requires parallel execution platforms that automatically handle faults and stragglers. Unfortunately, current distributed stream processing models provide fault recovery in an expensive manner, requiring hot replication or long recovery times, and do not handle stragglers. We propose a new processing model, discretized streams (D-streams), that overcomes these challenges. D-streams support a parallel recovery mechanism that improves efficiency over the traditional replication and upstream backup schemes in streaming databases, and also handles stragglers. We show that D-streams can support a rich set of streaming operators while attaining high per-node throughput similar to single-node systems, linear scaling to 100 nodes, sub-second latency, and sub-second fault recovery. Finally, the D-stream model can seamlessly be composed with batch and interactive query models for clusters (e.g. MapReduce), enabling rich applications that combine these modes. We have implemented D-streams in Spark Streaming, an extension to the Spark cluster computing framework.

Logistic Regression: This is an iterative machine learning algorithm that seeks to find the best hyperplane that separates two sets of points in a multi-dimensional feature space. It can be used to classify messages into spam vs non-spam, for example.

Matei Zaharia








Matei Zaharia finishing his PhD at UC Berkeley, where he worked with Scott Shenker and Ion Stoica on topics in large-scale data processing and cloud computing. After Berkeley, he will be starting an assistant professor position at MIT. During his PhD, Matei has also been an active open source contributor, becoming a committer on the Apache Hadoop project and starting the Mesos and Spark projects.

HPCViz seminar series continue on May 29th at 9:00 in the visualisation studio


Mikael Vejdemo Johansson will give part two of his seminar on “Topological Data Analysis – applying homology in medicine, robotics, sensor networks, and graphics”, see abstract from part one here.

We will continue to look at the foundations of these generalized applied topology methods in some detail, and see how they have been applied in the past.

Xavi Aguilar will hold his seminar, “Parallel Performance: tools and techniques

Abstract: HPC systems are becoming more heterogeneous, with higher levels of concurrency and energy constraints. In such scenarios, the gap between system peak performance and real application performance is widening due to the complexity of the platform and its programmability. Thus, the use of tools and runtime systems plays a major role to achieve good application performance and scalability.

In this seminar we will have an overview of the state of the art in performance analysis together with a success story from the Scalalife project. There, a performance analysis study was used to re-design a chemistry application (DALTON), obtaining a considerably increase in its scalability.


We will also look at different issues that tools face to reach the exascale era and the current research performed within the department in performance analysis.

HPCViz seminar, March 20 (9:00-11:00)

Christopher Peters, HPCViz

Title: Interacting with Virtual Embodied Agents: Computation and Evaluation

Mikael Vejdemo Johansson, CVAP

Title: Topological Data Analysis – applying homology in medicine, robotics, sensor networks, and graphics

Abstract: In recent decades, computation and data analysis techniques have matured to the point where our entire society runs on machine learning and data analysis – the commercials you see are generated from analyses of your shopping behavior, your travel is optimized with data collected from past travel intensities, the medical care you receive is optimized by data-intensive studies of various kinds. Data becomes available at high volume and high speed, and techniques to deal with data grow by leaps and bounds.

In the past decade, various approaches to data analysis rooted in algebraic topology have gained traction as a vital research field.

Already clustering – a powerful family of methods with ubiquitous application – is an essentially topological technique, and generalizations are increasingly useful.

We shall look at the foundations of these generalized applied topology methods in some detail, and see how they have been applied in the past.

Along the journey, we shall meet classifications of breast cancer types, statistics of naturally occurring images, approaches to understanding how languages encode color, and methods for understanding periodicity in complex systems.

HPCViz seminar, December 5, 9.00 – 11.00

Next HPCViz seminar will be held Wednesday December 5: starting at 9:00 in the VIC studio. We are happy to announce the following speakers: Rossen Apostolov and Ali Gholami. See below for more information.

See you there!

 Rossen Apostolov

Title: Ensemble Computing and Markov State Models in Molecular Dynamics Simulations

Molecular Dynamics (MD) simulations are extensively used for atomistic  level studies of structural changes in molecules, inter-molecular interactions, phase transitions etc. A single MD simulation follows the time evolution of a stochastic system and is in practice unable to fully explore all possible states of the system, as needed for achieving statistical significance. The system can reach meta-stable states that  it cannot escape or some events of interest may occur on timescales far beyond what is achievable with current methodology. In addition, most of the supercomputers nowadays have capabilities far beyond the hard-scaling limits of many bio-molecular systems of interest.

Being inherently embarrassingly parallel in nature, ensemble computing presents opportunities for improving the sampling of system states by staging thousands of system replicas to run in parallel. However extracting meaningful information from the vast amount of this trajectory data is not an easy task. Markov State Models (MSM) generate  kinetic
models based on collected large-scale simulation data. They can  help greatly in understanding the evolution of investigated system while allowing to reach timescales larger than those of the individual simulation runs.


Ali Gholami

Title: Security of the Biobank Cloud


Recent improvements in the cost and throughput of DNA sequencing machines have caused a mismatch between the increasing rate at which they can generate genome data and the ability of existing tools and computational infrastructure to both store and analyze this data. Biobank Cloud as a new project in context of 7FP aims to address shortage of storage and computational resources through cloud computing service models. However, EU directives on data protection hinder users and organizations to exploit capabilities of cloud computing. In this seminar, we present the Biobank Cloud security concerns and factors that need to be addressed at both interface and internal level. In addition, we discuss authentication, authorization and  auditing mechanisms that are proven in large scale distributed systems.