PhD Thesis

Networks are a convenient and flexible representation for many datasets. Network data often concerns interactions or relations. These can be relations between people, such as being friend; connections between neurons in the brain; or simply the notion of two things being similar to each other. But also the interactions between proteins, drugs, etc. can be considered as a network.

Given a network, there are several machine learning questions that can be asked. What sets these machine learning problems apart from traditional graph theory is that they involve some kind of inference. The goal is to look beyond the network at hand, and discover something about the underlying structure. Of course, this relies on the assumption that there is a certain underlying structure. If this assumption is wrong, then it may be impossible to say anything about the network. Additionally, we can never get certainty about the results, and we have to settle for very likely ones.

In this thesis we look at two different such machine learning problems on networks.

In part 1 we look at finding clusters in networks. Clustering is a prototypical unsupervised machine learning problem. The goal is to split a dataset into clusters, where the elements of the clusters are similar in some sense. In our setting, the elements of the dataset are the nodes in a network, and similarity is defined by the edges of that network.

In part 2 we will look at predicting links in bipartite networks. A bipartite network is one where the nodes can be split into two sets, and the only edges are between nodes in different sets. Many biological networks take this form, in particular the interaction network of drugs and target proteins which we consider in this part of the thesis.