MSc Defences Summer 2024 (2024)

Table of Contents
Department of Computer Science Computer Science 6 June: Jingyi Zheng and Yaokun Li 6 June: Qiongyan Wang 6 June: He Lyu 7 June: Jakob Krogh Petersen and Johan Valdemar Licht 7 June: Aljaz Jazbec 7 June: Cristopher Jung 10 June: Sofie Sylvest Aastrup 11 June: Alibek Cholponbaev 11 June: Sigurður Kalman Oddsson 12 June: Yixuan Chen and Guodong Shi 12 June: Anja Vrecer 12 June: Zhongxing Ren 12 June: Radu Taraburca 13 June: Barnabás Baka 14 June: Yannick Neubert 14 June: Sebastian Paarmann 14 June: Nick Hauptvogel 14 June: Nichlas Udengaard 17 June: Louis Marott Normann 17 June: Mathias Marott Sundram 17 June: Nikolai Kjær Nielsen 18 June: Sune Skaanning Engtorp 18 June: Thorbjørn Bülow Bringgaard 18 June: Siyi Wu 18 June: Asta Feodora Sjöberg Burhenne, Lucas Østergaard Jarmer, and Laufey Karitas Ólafsdóttir 18 June: Anders Lietzen Holst 18 June: Aske Rory Ching and Niklas Joost Borge 19 June: Lennart Mischnaewski 20 June: Claudia Ann Hinkle 20 June: Yaqi Zhou 21 June: Oliver Christopher Juhl, Matthias Schultz Busch, Frederik Meyer Møller-Jørgensen, and Jonathan Gram Stenkilde 21 June: Alexis Jean René Dumélié 21 June: Thomas Jackson Terry 25 June: Xuanlang Zhao 26 June: Aleksas Prelgauskis 26 June: Bernard Legay Halfeld Ferrari Alves 27 June: Stefan Kröll Rasmussen 28 June: Emil Høghsgaard Hansen and Bruce Isiah Thomas Esplago 28 June: Dong She 28 June: Jakob Flinck Sheye and Hristo Atanasov Georgiev 28 June: Mads Daugaard and Emil Christoffer Riis-Jacobsen 28 June: Runfei Wu 28 June: Christoph Alexander Prehn 28 June: Jonas Hagel and Marcus Frostholm 28 June: Marie Elkjær Rødsgaard 28 June: Tim Ruschke 28 June: Nicklas Boserup 28 June: Ying Liu Bioinformatics 11 June: Cheng Chen 12 June: Lucas Phillip Krieger 13 June: Yifan Sun 20 June: Yan Li 21 June: Rasmus Alex Buntzen-Frederiksen IT and Cognition 10 June: Xiangyu Lu 28 June: Ben Yao 28 June: Adrianna Helena Klank Health Informatics 18 June: Alexander Haderup Alsing and Felix Björklund Osmark Details References

Department of Computer Science

  • Department of Computer Science DIKU
  • Event Calendar 2024
  • MSc Defences Summer 2024

See the list of MSc defences at DIKU this summer (incl. September and October). The list will be continuously updated.

Information about the thesis, supervisor, location of the defence, etc. can be found on the respective events below.

Computer Science

Name of student(s)

Jingyi Zheng and Yaokun Li

Study Programme

Computer Science Thesis

Title

MobileDFL: Embracing Heterogeneity and Dynamism for Decentralized Federated Learning in Mobile Networks

Abstract

The exponential growth of mobile devices has led to abundant data for training AI models, but it has also raised privacy concerns. Decentralized federated learning (DFL) has emerged as a solution to balance user privacy and avoid single points of failure. However, DFL places higher demands on nodes, requiring increased communication and storage resources compared to conventional federated learning. The heterogeneity of mobile networks introduces nodes with varying capabilities, and some may struggle with resource-intensive requirements. Moreover, the dynamic nature of mobile networks can lead to ineffective communication. To address these inefficiencies, we propose the MobileDFL that integrates the advantages of both Client-Server and Peer-toPeer architectures, enabling each node to select the optimal strategy in heterogeneous and dynamic mobile networks. We also discuss incentive strategies to reward nodes undertaking greater communication and storage costs. Our experiments with different model complexities show that MobileDFL successfully mitigates communication and storage overhead of nodes while ensuring
model convergence is not affected by node instability.

Supervisor(s)

Xikun Jiang

External examiner(s)

Hua Lu

Date and time

June 6 2024, 13:00-14:00

Room

UP1 771-01-2-16

Name of student(s)

Qiongyan Wang

Study Programme

Computer Science Thesis

Title

DRL4DPRA: A Deep Reinforcement Learning Framework for Dynamic Public Resource Allocation

Abstract

The goal of allocating public resources, such as billboards, surveillance cameras, base stations, and trash bins, is to cater to a larger population. However, the uneven distribution of people across spatial and temporal domains is influenced by the dynamic patterns of human mobility. To address this, we introduce Hierarchical Spatial Temporal Netowrk (HSTNet) for dynamic public resources allocation tasks. Based on the reinforcement learning framework, our model can learn from experience without setting complex rules. It operates in two key stages: Firstly, We capture the spatiotemporal characteristics of crowd flow. Secondly, we use a hierarchical selection method to reduce the action space. We evaluate HSTNet’s performance using two real-world crowd flow datasets, demonstrating its superiority over baseline models.

Supervisor(s)

Xikun Jiang

External examiner(s)

Hua Lu

Date and time

June 6 2024, 14:00-15:00

Room

Room UP1, 771-01-2-16

Name of student(s)

He Lyu

Study Programme

Computer Science Thesis

Title

Securing privacy in reinforcement learning through zero-Knowledge proofs

Abstract

Reinforcement learning (RL) is widely used in real-world applications, but
deploying it in areas that need high privacy and transparency can be challenging. This is due to the difficulties in balancing algorithm verification – which includes understanding the inputs, outputs, and processes used–and maintaining the secrecy of data and algorithm parameters. This study examines how privacy, verifiability, and RL interact, focusing on Markov Decision Processes (MDPs) that use the stochastic gradient ascent (SGA) algorithm. We have developed a new algorithm that uses zero-knowledge succinct noninteractive arguments of knowledge (zk-SNARKs) to keep data and algorithm parameters private within RL tasks. This helps protect the privacy of data and parameters while ensuring the RL process is transparent and reliable. Our extensive testing shows that this new method keeps data and parameters secure and performs better than traditional RL algorithms in some situations. Additionally, the proofs it generates are small and only require linear time to verify, making it suitable for large-scale use. This development is a major step forward in improving data privacy in complex decision-making and offers a practical solution for privacy-sensitive environments.

Supervisor(s)

Xikun Jiang

External examiner(s)

Hua Lu

Date and time

June 6 2024, 15:00-16:00

Room

UP1 771-01-2-16

Name of student(s)

Jakob Krogh Petersen and Johan Valdemar Licht

Study Programme

Computer Science Thesis

Title

Contrastive Language-Image Pre-Training In Three-Dimensional Space

Abstract

Current state-of-the-art medical image analysis methods require dense
data annotated by experts. The scalability of such methods is constrained
by the time-consuming and expensive nature of such annotations. In
medicine, however, it is a common practice to pair visual artifacts with
descriptive textual reports. Recently the use of natural language to supervise 2D visual representation learning with contrastive learning has
revolutionized modern vision learning. In this thesis, we propose a contrastive language-image pre-training method for 3-dimensional images,
with a focus on retaining the heuristics that lead to impressive results
for 2-dimensional images. To the best of our knowledge, this is the first
attempt at applying CLIP to 3D medical data. We further investigate
how batch sizes affect performance stability on a downstream task. The
method is evaluated based on the task of predicting the location of a lesion
on brain magnetic resonance imaging (MRI) scans. The results show that
natural language can guide contrastive learning of 3D visual representations and an alignment across modalities in a joint embedding space can be learned. This thesis serves as a proof of concept, demonstrating the feasibility of learning a joint embedding space for 3D images and texts.

Supervisor(s)

Mads Nielsen

External examiner(s)

Rasmus Reinhold Paulsen

Date and time

June 7 2024, 10:00-11:00

Room

ØV 3 (The Pioneer Centre)

Name of student(s)

Aljaz Jazbec

Study Programme

Computer Science Thesis

Title

Targeted Security Analysis of ARYZE's platform

Abstract

In recent years, there has been an increase in the number of successful attacks on smart contracts. Therefore, in this project, we perform a targeted security analysis of ARYZE’s platform. Specifically, we analyze the platform’s smart contract implementation of ERC20 token eEUR published on Ethereum Mainnet in three different ways: manual static analysis, analysis of the output of automatic vulnerability scanner, and formal verification. The whole analysis is based on the customers requirements gathered during the requirements elicitation phase. We use Slither as a tool for the automatic vulnerability scanner, and F* proof-oriented language for the formal verification process.

Supervisor(s)

Boris Düdder

External examiner(s)

Pavel Hruby

Date and time

June 7 2024, 14:00-15:00

Room

UP1-2-015

Name of student(s)

Christopher Jung

Study Programme

Computer Science Thesis

Title

C-SPEC: Formal Specification for Blockchain-based Crowdsourcing Systems

Abstract

The digital crowdsourcing paradigm has revolutionized modern outsourcing through its highly-systematic approach to task coordination. Researchers in Blockchain technologies are currently investigating the
efficacy of Ethereum-based smart contracts as drivers of the crowdsourcing process, emphasizing the analysis of Truth Discovery (TD) techniques, data encryption, and system design. However, a shortage of standardized frameworks impedes the progress of research, as each new implementation necessitates the development of a novel system architecture. To resolve this shortcoming, we introduce a Blockchain-based formal specification known as C-SPEC, which serves as a generalized framework for smart contract-based crowdsourcing systems. This thesis offers a comprehensive overview of the C-SPEC protocol, expanding upon the internal structure, behaviors, and interactivity of all associated
system components. Furthermore, it summarizes the use of formal verification techniques to ascertain the protocol’s correctness and desired qualities. Finally, it provides implementation guidelines and an
extensive assessment of the protocol’s effectiveness in the context of Blockchain-based crowdsourcing.

Supervisor(s)

Boris Düdder

External examiner(s)

Pavel Hruby

Date and time

June 7 2024, 13:00-14:00

Room

UP1-2-015

Name of student(s)

Sofie Sylvest Aastrup

Study Programme

Computer Science Thesis

Title

Fiber Break Segmentation in Composite Materials

Abstract

Composite materials have many applications and are used in buildings,
bridges, and vehicles. We wish to understand how damage propagates in
these materials. Fiber breaks form within composite materials when tensile loads are applied to them. Segmentations of these fiber breaks are often needed to study the damage propagation within the material, but such
segmentations can be hard to obtain in some materials. In this study, we
utilize interactive machine learning to create a pipeline for fiber break segmentation in synchrotron radiation computed tomography images. This
method does not provide a single model for fiber break segmentation, but
a framework that can be used to obtain new models in new scenarios.
We use the open-source IML software RootPainter3D [Smith et al., 2021]
and adapt it for composite materials. We propose to diminish the cold
start problem by including preliminary segmentations and present an algorithm to obtain preliminary segmentations of fiber breaks. Our method
was successful and allowed us to obtain fiber break segmentations with
a Dice score of up to 0.95 during the interactive annotation procedure.
We perform user tests of the pipeline and get a usability SUS score of 0.85.
Current models fail to predict fiber break development accurately. We
study the hypothesis that the microstructure of the material influences the
development, by using the SRCT in-situ images and the labels obtained
with IML to predict fiber breakage at later time steps. We did not find
conclusive evidence that microstructure influences fiber break formation.

Supervisor(s)

Abraham George Smith

External examiner(s)

Melih Kandemir

Date and time

June 10 2024, 10:30-11:30

Room

diku up1-2-0-15

Name of student(s)

Alibek Cholponbaev

Study Programme

Computer Science Thesis

Title

AsyncDB: a database system for asynchronous application architectures

Abstract

The current software development paradigm is an ongoing trend towards asynchronous and concurrent programming, focusing on better exploitation of hardware in monolithic architectures and decoupled systems for better scalability and availability. While some domains are inherently straightforward to decouple without impacting the correctness of the system, domains such as finance and e-commerce, which require transactional logic, cannot enjoy the benefits of concurrency to its full extent. The issue lies in inadequate abstractions in the communication approach with databases that do not allow developers to exercise
concurrency while maintaining transactional properties of application logic.
In this project, we introduce AsyncDB - A database system that supports intra-transactional concurrency, or concurrency of operations within transactions. Developers have a flexible API that allows them to submit queries asynchronously and concurrently and wait for the results of one or many queries with a barrier pattern. Application scenarios that suit AsyncDB involve workflow-like business processes, where multiple steps of the process can be parallelized without the risk of data races within a transaction. AsyncDB provides ACID compliance of transactions, provided the developers do not introduce data races within transactions. We used a real-life system example to model our micro-benchmark due to better fitness for the system. In some environments, such as high I/O latency in multi-tier architectures with decoupled storage layers, intra-transactional concurrency can offer performance benefits regarding latency and throughput of transactions.

Supervisor(s)

Yongluan Zhou

External examiner(s)

Alceste Scalas

Date and time

June 11 2024, 15:00-16:00

Room

UP1 01-2-10

Name of student(s)

Sigurður Kalman Oddsson

Study Programme

Computer Science Thesis

Title

Mapping Relational Benchmarks to Actor Systems

Abstract

In the last few decades, we have witnessed an unprecedented growth in computing power and a grand rise in accessible cloud infrastructure. The need for an efficient, parallel programming model that can take advantage of these advancements is at an all-time high. The actor model, which is specifically designed for high concurrency and distribution, is an excellent contender. The paradigm has already demonstrated its prowess in a number of fields and has recently caught the interest of database enthusiasts. Despite this, many industry-standard benchmarks for on-line transaction processing (OLTP) systems are tailored to relational schemas. Translating these relational schemas to actor systems, whilst still adhering to relevant specifications, is non-trivial. Questions about the correlation between actor composition and performance remain largely unanswered.
In this thesis, I explore strategies to map the schemas of Transaction Processing Performance Council Benchmark C (TPC-C) to actor systems. I present and reason about twelve possible implementation strategies that I follow in two different actor environments: Microsoft Orleans and Snapper. Finally, I benchmark all implementations, demonstrating the efficacy of coarse data partitioning in actors. I conclude by discussing the results and contrasting them with relevant literature.

Supervisor(s)

Yongluan Zhou

External examiner(s)

Alceste Scalas

Date and time

June 11 2024, 16:00-17:00

Room

UP1 01-2-10

Name of student(s)

Yixuan Chen and Guodong Shi

Study Programme

Computer Science Thesis

Title

Toy with their shapes like a puppeteer - An exploration in the potential of elliptical shape-changing mechanisms

Abstract

Shape-changing devices are increasingly popular for their adaptive and responsive abilities. Geometry shapes like circles benefit from parametric design methods and have been applied in designs that perform shape-changing. But compared with circles, ellipses, as the geometry shape with the ability to perform non-symmetric shape-changing, are rarely discussed in this field. Therefore, we came up with the idea of stacked ellipses that can simulate objects such as ergonomic grips, VR proxies, etc., and perform parametric shape changing. There are two models we proposed that could perform such actions, the Shell model and the Tubes model. We have explored the influences of parameters on the shape-changing performance of our two models, both physically and geometrically. We have also verified our theory based on geometric properties of the ellipse, and discussed some real-life scenario applications that are based on our models. Our contribution lies in that we have come up with a new potential solution for manufacturing shape-changing devices that are based on elliptical structures, and proven that elliptical shapechanging is potentially useful in real-life scenarios.

Supervisor(s)

Valkyrie Savage

External examiner(s)

Anca-Simona Horvath

Date and time

June 12 2024, 15:00-16:00

Room

Sigurdsgade 41, conference room 0-11

Name of student(s)

Anja Vrecer

Study Programme

Computer Science Thesis

Title

Enhancing reliability of Language Models through minimization of uncertaint

Abstract

In recent years, text-generating AI assistants, such as ChatGPT, demonstrated remarkable abilities to quickly find information and answer questions. However, despite their proficiency in generating fluent, human-like text, a significant drawback is their inability to express uncertainty. This often leads to syntactically correct but factually imprecise or inaccurate answers, compromising their reliability. In this project, we focus on an aspect of uncertainty that can be resolved through interaction with users, specifically on ambiguity in question-answering tasks. Through a review of related work, we identify one research tightly connected to this problem and analyze it through re-implementation. The authors propose a framework consisting of prompting and estimation of entropy over
user intents, Intent-Sim, which we reproduce and confirm its usefulness using only open-source LLMs. We identify its limitation of using an oracle prompting method, which makes it impossible to be tested on unambiguous examples or used in practice. To address this, we propose a fine-tuned LLM that automatically detects ambiguous questions, asks clarifying questions, and responds based on the user’s clarification. We show that our model detects more than 60% of ambiguous questions in the dataset while incorrectly identifying less than 50% of unambiguous questions as ambiguous. Additionally, we compare the fine-tuned model’s answers to the ones generated by the pretrained model and demonstrate an increase in answer recall for both ambiguous and unambiguous questions. Although the quantitative results are promising, a qualitative analysis of our model’s generated text highlights the need for further
improvements in detecting LLMs’ ignorance. Our research underscores
the importance of LLMs addressing uncertainty and demonstrates their ability to improve their responses when provided with additional context, laying the foundation for the development of more accurate and reliable LLMs.

Supervisor(s)

Desmond Elliott

External examiner(s)

Claus Witfelt

Date and time

June 12 2024, 16:30-17:30

Room

Hybrid defence (Vibenshuset + Zoom)

Name of student(s)

Zhongxing Ren

Study Programme

Computer Science Thesis

Title

Melody Code: A research study on how to make sound triggered by 3D printed objects have higher information density

Abstract

Information requires a medium for transmission, and sound, as a crucial carrier of information, has been extensively studied by researchers in the field of human-computer interaction, particularly in acoustic interaction. However, non-speech acoustic interaction is often less expressive, partly due to the low information density of non-speech sounds. In this paper, we first define information density in the context of non-speech sound. Based on this definition, we propose Melody Code, a technique that encodes information into melodies for information transmission. We detail the implementation of Melody Code using the music box mechanisms, including the mapping of information to notes to create melodies, the fabrication of sound triggering devices to form a melody, and the decoding of the melody by recognizing notes through a trained machine learning model (CNN). This demonstrates that Melody Code enhances the information density of encoded non-speech sounds. Furthermore,
we evaluate our three CNNs that are trained respectively by using MFCC,
chroma features, and Tonnetz features, observing that the CNN trained by
using MFCC achieves an accuracy of over 85% in recognizing seven different notes, which is the best performance among the three. Finally, we present three use cases of Melody Code: (1) melodic ID card; (2) acoustic logic gate; (3) acoustic encryption and decryption.

Supervisor(s)

Daniel Lee Ashbrook

External examiner(s)

Anca-Simona Horvath

Date and time

June 12 2024, 10:00-11:00

Room

Sigurdsgade 0-11

Name of student(s)

Radu Taraburca

Study Programme

Computer Science Thesis

Title

Objectify: Item recommendation through LLMs and prompt engineering

Abstract

Nowadays, 3D printing has become more common, this can be done in one’s own home, in universities, and in public libraries, but 3D modeling and using a 3D printer to print is still difficult. Previous work like Objectify[1] explored an automated calendar-to-print workflow, but it had issues with the recommendation of real objects to print. It did not always suggest physical objects, sometimes it could suggest abstract things like ”internet connection” or ”relaxing music” and its system ignored data beyond users’ calendars. Other work on prompt engineering explores how to limit the kinds of answers that AIs can give to generate only what is needed for a specific task. We set out to merge these two areas, adding new data streams, exploring prompt engineering techniques, and building a specialized recommendation system that would generate real, physical, and printable 3D objects that would be culturally relevant for users. Also, Objectify[1] has its goal to highlight the ridiculousness of digital fabrication claims rather than attempting to seriously facilitate anything, so our motivation is to investigate what we need to actually do to achieve this. We evaluated the recommendation system to see how many of the generated objects are relevant, both from the point of view of the context and if they are printable physical objects (0.1% of the objects were ambiguous). We had 15 people who participated in the evaluation, 13 people speaking Romanian and English and the remaining 2 people speaking English
and another language. They evaluated how good the objects were about their cultural conceptions, but also concerning how printable they were (that is if they were real physical objects that could be printed).Overall, we had satisfactory results, with an average of 3.1 out of 5, for 9 different events (such as Christmas, Halloween or Birthday). Our research can have a potential impact concerning future studies on AI recommendation systems because the more researchers explore the AI-to-physical objects pipeline, the more relevant this study will become.

Supervisor(s)

Valkyrie Savage

External examiner(s)

Anca-Simona Horvath

Date and time

June 12 2024, 16:00-17:00

Room

Sigurdsgade 41, conference room 0-11

Name of student(s)

Barnabás Baka

Study Programme

Computer Science Thesis

Title

Critical Making

Abstract

This thesis investigates how can a critical design artifact foster conversation
about an inclusive learning environment in higher computer science education, particularly focusing on neurodiversity. The study is grounded on theoretical frameworks like Universal Design for Learning and GenderMag and uses Critical Making as the primary methodological approach. A critical design artifact, named Parrot Personas, was created and tested in six workshops, engaging participants interested in computer science and neurodiversity in a discussion about a neuro-inclusive learning environment. During the workshops, qualitative data was collected. This data was analyzed using thematic analysis, revealing the following key themes: Relating to own experiences, Considering multiple perspectives, and Ideating about teaching practices. The findings suggest that Parrot Personas can inspire conversation about neuro-inclusive teaching practices and it can effectively introduce diverse needs to the participants. This thesis contributes to the broader discourse about diversity, equity and inclusion in computer science and about the role of critical design in
fostering societal change.

Supervisor(s)

Pernille Bjørn

External examiner(s)

Nanna Inie Strømberg-Derczynski

Date and time

June 13 2024, 09:15-10:15

Room

Sigurdsgade 41

Name of student(s)

Yannick Neubert

Study Programme

Computer Science Thesis

Title

Shape Priors and Pose Invariance in Neural SDF

Abstract

This thesis presents an approach to disentangle pose and scale from the latent representation of shapes implicitly defined through signed distance fields as well as enforcing a normalization prior on the latent space. Using shape moments of order up to 3, a normalized pose and scale can be defined for arbitrary shapes, which is then used to learn a latent representation invariant under similarity transforms. The normalized latent codes are then combined with a set of pose parameters to reconstruct shapes of arbitrary pose and scale. Experiments conducted on 2-dimensional shape data produce promising initial results but also highlight some shortcomings of the proposed approach. Some potential solutions are discussed together with the potential for use in downstream tasks. Finally, it is demonstrated that the proposed method can be easily generalized to 3D data. All code is made publicly available on GitHub.

Supervisor(s)

Francois Bernard Lauze

External examiner(s)

Rasmus Reinhold Paulsen

Date and time

June 14 2024, 09:30-10:30

Room

Image Section Study office

Name of student(s)

Sebastian Paarmann

Study Programme

Computer Science Thesis

Title

A WebGPU backend for Futhark

Abstract

In this thesis project, we create a new backend for the Futhark compiler. Futhark is a functional data-parallel array programming language whose optimizing compiler can generate efficient GPGPU code. Our backend targets the WebGPU API, enabling Futhark programs to be run in web browsers while still taking advantage of GPU compute capability.
As part of the backend, we implement code generation of shaders in the WebGPU Shading Language (WGSL) from the Futhark compiler’s internal kernel representation as well as an appropriate implementation of the host-side runtime system for the WebGPU API. Additionally, we provide tooling to use Futhark’s built-in testing and benchmarking tools with our backend by interacting with a browser to run the programs under test.
WebGPU and WGSL have many restrictions not present in the APIs used by Futhark’s existing GPU backends. We devise and implement workarounds for many but not all of them. As a result, our backend can successfully run some Futhark programs, but some other valid programs are unsupported. We also investigate the remaining limitations and discuss potential future solutions.

Supervisor(s)

Troels Henriksen

External examiner(s)

Willard þór Rafnsson

Date and time

June 14 2024, 13:45-14:45

Room

DIKU UP1-2-0-04

Name of student(s)

Nick Hauptvogel

Study Programme

Computer Science Thesis

Title

Bayesian vs. PAC-Bayesian Ensembles

Abstract

Diverse ensembles of deep neural networks (deep ensembles) can
profit from the cancellation of errors effect. In other words, errors of
ensemble members may average out and the ensemble model achieves
better predictive performance than each member. Bayesian neural networks, on the other hand, promise improved predictive uncertainty due
to the principled incorporation of epistemic uncertainty into the output
by learning a posterior distribution over the model’s parameters. Averaging networks sampled from this posterior yields an approximation of
the Bayesian model average (BMA), also referred to as Bayesian ensemble. In this work, it is argued that the BMA of Bayesian neural networks
is neither a particularly good way of sampling, nor weighting members
of an ensemble when considering improvement of generalization performance. The Bernstein-von Mises theorem applied to the Bayesian posterior (assuming identifiability of the model) shows that the distribution
converges towards the maximum likelihood point estimate with growing
dataset size. In the limit, the BMA will eventually concentrate on a single model without exploiting ensemble diversity and therefore typically
not leverage the cancellation of errors effect. An experimental evaluation
of neural network ensembles on four classification datasets showed that
state-of-the-art Bayesian approximate inference performance results can
often be matched or exceeded by a uniform deep ensemble relying solely
on randomization in model initialization and stochastic training. By
optimizing the uniform weighting of individual deep ensemble members
using a generalization bound from the Probably-Approximately-Correct
(PAC) framework, it was possible to maintain diversity and generalization performance of the deep ensemble, while obtaining non-vacuous PAC
guarantees on the ensemble classifier’s performance. Essential hereby is
a generalization bound that is based on a tandem loss function taking
pairwise correlations of member’s predictions into account. The PACBayesian weighting optimization was shown to be especially useful when intermediate snapshots from model training were included, in which case the optimization performed model selection by following a trade-off between single-member performance and ensemble diversity. The price to
pay for the improved weighting and the performance guarantees from the
PAC-Bayesian generalization bounds is additional hold-out data for the
optimization of the weighting.

Supervisor(s)

Christian Igel

External examiner(s)

Jes Frellsen

Date and time

June 14 2024, 11:00-13:00

Room

P1, Øster Voldgade 3

Name of student(s)

Nichlas Udengaard

Study Programme

Computer Science

Title

Computer Science

Abstract

U-Sleep is a software for analysing human sleep data. It is a convolutional
neural network segmenting input EEG and EMG time series into discrete sleep stages. The goal of the project is to tailor and apply the U-Sleep sleep analysis system to EGG/EMG/NE data from mice. This data is collected from a laboratory at the Division for Glial Disease and Therapeutics. Raw data is converted into a format suitable for modelling with the U-Sleep system. The adapted variant of U-Sleep is shown to perform on par with sDREAMER – a transformerbased model designed for sleep staging of mice. From application of U-Sleep to data of all three modalities, I found that the inclusion of NE had little to no impact on performance.

Supervisor(s)

Christian Igel

External examiner(s)

Kristoffer Hougaard Madsen

Date and time

June 14 2024 - 10:30 - 11:30

Room

@DIKU

Name of student(s)

Louis Marott Normann

Study Programme

Computer Science Thesis

Title

A Deeper Dive: Improving the Partial Evaluation of RL

Abstract

Partial evaluation of RL has been accomplished previously. The purpose of this thesis is to build further on the groundwork done in this article, improving it in both breadth and depth. The thesis shows that a pointwise binding-time analysis can be used to further improve the offline partial evaluator, enabling it to specialize more programs in a non-trivial way.
It does this by performing a larger amount of computations at specialization time than previously. The residual programs will be further improved by detecting and removing semantically redundant assertions via abstract interpretation. The quality of the partial evaluator is measured in different ways. First quantitatively, by bench-marking the residual
programs against the source programs in all possible configurations of binding-time analyses and post-processes. Then qualitatively, by performing the first inversion projection and seeing if the programs are the same as from the corresponding Futamura projection. The extensions to the RL partial evaluator are all implemented in the Haskell prototype. Keywords: Partial Evaluation, Specialization, Assertion Removal, Inversion Projections, Reversible Computing, Program Inversion, Static Analysis.

Supervisor(s)

Robert Glück

External examiner(s)

Ulrik Pagh Schultz

Date and time

June 17 2024, 13:30-14:30

Room

UP1, 1-0-34

Name of student(s)

Mathias Marott Sundram

Study Programme

Computer Science Thesis

Title

Private synthetic data using public data

Abstract

Synthetic data presents a privacy-conscious approach to publishing accurate data, ensuring a faithful preservation of the data properties while also protecting the privacy of individuals. The advent of big data in numerous areas across society, underscores the importance of private, synthetic data, and Differential Privacy is a fundamental measure of privacy that helps quantify the degree to which an algorithm treats data
in a privacy-preserving manner. However, there are multiple notions of Differential Privacy. The AIM mechanism generates synthetic data with the goal of preserving the marginals of the original data. We propose two modifications of the AIM mechanism called AIMPub and AIM-GEM which utilize publicly available data to improve utility. We provide analyses demonstrating their privacy guarantees using a measure of privacy called Zero-Concentrated Differential Privacy due to its ease of use and convertibility to (ε, δ)-Differential Privacy. To evaluate the performance of these modifications, we use contemporary data from real households in the U.S. as well as historic data from passengers aboard the Titanic. These evaluations show that while a public dataset may not be helpful for generating useful synthetic data for loose privacy guarantees, it does increase utility when strong privacy guarantees are ensured. The benefits of using public data is further exacerbated when the public data comes
from a distribution similar to the private data or when measuring lower dimensional data. However, at the same time, due to the sheer magnitude of noise added to the private data, too strong privacy guarantees result in AIMPub and AIM-GEM underperforming baseline models that only rely on public data in their training.

Supervisor(s)

Rasmus Pagh

External examiner(s)

Martin Aumüller

Date and time

June 17 2024, 11:00-12:00

Room

Store UP1

Name of student(s)

Nikolai Kjær Nielsen

Study Programme

Computer Science Thesis

Title

Semi-supervised multi-modal generative models for structure elucidation of tandem mass spectra

Abstract

Predicting molecule structures from tandem mass spectra is a critical challenge of modern analytical chemistry. Most existing computational methods rely on training the models on fully supervised datasets of molecular structures and mass spectra, which limits the predictive power for these complex modalities. Deep generative modeling is a powerful technique allowing to fit complex joint distributions of multiple data
modalities. This study presents C-GF-VAE, a conditional normalizing flow, allowing to reconstruct molecule structures given tandem mass spectra as an input. We demonstrate that using the normalizing flow model, pretrained on large molecule datasets that do not have recorded spectra, improves the quality of multi-modal spectra-to-molecule reconstruction from 0.2299 to 0.4287 mean molecule fingerprint Tanimoto distance. We
also provide a thorough evaluation of the different components of C-GF-VAE, including spectra-to-spectra and molecule-to-molecule reconstruction in supervised and semisupervised settings. The presented model is an improvement over the previously reported performance of semi-supervised bi-modal variational autoencoder, raisining the mean
reconstruction quality from 0.12 to 0.4287 measured in molecule fingerprint Tanimoto distance. The presented model has a broad range of immediate applications in untargeted metabolomics and can also be applied to other multi-modal problems where the high-quality reconstructions of complex data is required.

Supervisor(s)

Svetlana Kutuzova

External examiner(s)

Jes Frellsen

Date and time

June 17 2024, 10:00-11:00

Room

AI Pioneer Center

Name of student(s)

Sune Skaanning Engtorp

Study Programme

Computer Science Thesis

Title

Implementation of a type-safe generalized syntax-directed editor

Abstract

This thesis investigates the development and implementation of a type-safe, generalized syntax-directed editor. The goal is to create an editor capable of supporting any language, including but not limited to programming languages. The foundation of this work is a proposed generalized editor calculus, which has been encoded in an extended lambda calculus to theoretically establish the capability of building such an editor. This project realizes this calculus in practice by implementing it in the functional programming language Elm, which has already been proven capable of supporting a nongeneric structure editor. The report details the implementation process, encompassing the representation of abstract syntax, source code generation, and the handling of editor expressions. The implementation, written in Elm, features a language specification parser and a source code generator. Currently, the generated editor
for a given language can perform basic edits on the abstract syntax tree, such as cursor movement and substitution. Practical examples are provided for subsets of different languages, including C, SQL, and LATEX, demonstrating the editor’s generality. The thesis concludes with proposals for future work, including the implementation of missing editor expressions, handling of context-sensitive syntax and views of the program being edited. The source code of this project is available at following public GitHub repository: https://github.com/Sunese/generalized-editor

Supervisor(s)

Hans Hüttel

External examiner(s)

Mads Rosendahl

Date and time

June 18 2024, 14:15-15:15

Room

SCI-DIKU-HCO-01-0-029

Name of student(s)

Thorbjørn Bülow Bringgaard

Study Programme

Computer Science Thesis

Title

Efficient Big Integer Arithmetic Using GPGPU

Abstract

Exact big integer arithmetic is a fundamental component of numerous
scientific fields, and therefore, required to be efficient. One way to
increase efficiency is by acceleration on GPGPU, calling for parallel
arithmetic algorithms. This thesis examines parallel algorithms for
addition, multiplication, and division, with the premise of fitting in a
CUDA block, and consequently, suited for medium-sized big integers.
The algorithms are implemented in the high-level languages C++ and
Futhark. The addition algorithm boils down to a prefix sum, which
runs efficiently in both implementations. The multiplication algorithm
is the classical quadratic method, parallelized by orchestrating the
convolutions in a way that balances the sequential work per thread
and minimizes synchronization. The C++ implementation exhibits
good performance, while the Futhark implementation leaves room for
improvement. The division algorithm is based on finding multiplicative
inverses without leaving the domain of big integers. To do so, a variety
of big integer operators and routines are defined, including shifts,
comparisons, and signed subtraction using the prefix sum approach
of addition. The algorithm parameterizes over the methods involved
for big integer arithmetic, and its efficiency directly mirrors the given
multiplication method. In addition to conveying the algorithm, as well
as adapting it to big integers, supplementary implementations have
been produced. This includes a validating and inefficient sequential
implementation in C, and a partially validating and semi-efficient
parallel implementation in Futhark.

Supervisor(s)

Cosmin Eugen Oancea

External examiner(s)

Mads Rosendahl

Date and time

June 18 2024, 17:00-18:00

Room

SCI-DIKU-HCO-01-0-029 (PLTC meeting room)

Name of student(s)

Siyi Wu

Study Programme

Computer Science Thesis

Title

SDF-TopoNet: A Hybrid Approach for Enhancing Topological Accuracy in Tubular Structure Segmentation

Abstract

Accurate segmentation of tubular structures such as blood vessels, neurons, and pathways has received increasing attention recently. Preserving the topological features of these structures has become particularly important in various applications. One of the main challenges in this area is to find effective loss functions to handle such data. Although several studies have explored some practical loss functions, they often encounter potential problems such as high training costs and degradation of pixel accuracy. Hu et al. proposed a topological loss based on Betti error and persistent hom*ology. Based on their research, we propose further improvements to improve the segmentation performance and reduce the training cost. Our approach incorporates a pre-training and fine-tuning strategy based on the weighted sum of a pixelbased loss function (e.g., MSE) and the topological loss as the loss function for model training. Specifically, we use the signed distance function (SDF) as a prior task in the pre-training stage to enable the model to learn the topological structure information of the image and use a dynamic threshold layer and topological loss in the fine-tuning stage to ensure the topological
accuracy of the segmentation. Since the primary source of training cost is the computation of topology loss, using topology loss only in the fine-tuning phase can significantly reduce the training cost. Evaluations of the DRIVE, CREMI, and Pancreas datasets show that our method performs well, especially in the DRIVE dataset.

Supervisor(s)

Jon Sporring

External examiner(s)

Rasmus Reinhold Paulsen

Date and time

June 18 2024, 09:15-10:10

Room

SCI-DIKU-UP1-1-1-N116A

Name of student(s)

Asta Feodora Sjöberg Burhenne, Lucas Østergaard Jarmer, and Laufey Karitas Ólafsdóttir

Study Programme

Computer Science Thesis

Title

Root analysis using geometric and topological descriptors

Abstract

Topological data analysis (TDA) offer promising avenues for the classification and analysis of root systems, yet their application and efficacy in the field of root analysis remain unexplored. In this study, we investigate the performance of various topological descriptors in classifying root systems based on persistence diagrams derived from bifurcation
points. Our analysis reveals intriguing findings regarding the impact that field of view has on the input data to TDA, noise in persistence diagrams, and the distribution of hom*ology groups employed in descriptors for classification. We observe that models using descriptors incorporating a greater set of points from the persistence diagrams often outperform
those that do not. Moreover, our study highlights the resilience of the Log N-histogram descriptor in capturing the unique distribution of bifurcation points for each species, with the latter demonstrating superior performance across multiple datasets and models. Furthermore, we challenge traditional interpretations of noise in persistence diagrams, suggesting that seemingly noisy points may contain valuable information for root classification. We also investigate discrepancies between real and synthetic data, emphasizing the importance of standardizing segmentation processes to minimize uncertainties and enhance model robustness. Finally, we identify key challenges and propose future research directions to refine topology-based classification methods and broaden their applicability in root analysis. Overall, our study contributes to advancing the understanding and utilization of topology-based techniques for root classification and offers insights for future research in this field.

Supervisor(s)

Jon Sporring

External examiner(s)

Rasmus Reinhold Paulsen

Date and time

June 18 2024, 10:20-12:15

Room

DIKU, UP1-1-1-N116A

Name of student(s)

Anders Lietzen Holst

Study Programme

Computer Science Thesis

Title

Optimizing Tensor Contractions for GPU Execution in Futhark

Abstract

The tensor contraction, a higher-dimensional analogue to the matrix multiplication, is a widely used basic building block that is not only suitable for efficient GPU execution due to its highly parallel nature, but also ripe for locality of reference optimizations due to a high degree of data reuse. Futhark, a highly optimizing compiler targeting GPU hardware, generates efficient 2D block/register tiled code for GEMM-like programs, but does not apply the transformation to arbitrary contractions. With an offset in tensor contraction and GPU code transformation theory, we detail how we successfully implemented block/register tiling of arbitrary tensor contractions into the Futhark compiler, using generic LMAD copies to stage input data and a number of other minor optimizations, and describe some of the problems overcome in doing so as well as the roadblocks and limitations which unfortunately remain. Using a small benchmarking plan we examine the practical benefits of the transformation, using a hand-written prototype kernel and a GPU code generator for high-performance tensor contractions as points of reference – the implementation performs well, reaching between 68% and 98% of the reference programs, but the opportunities for optimization are many. Finally, we present some ideas for future work in both improving and generalizing the implementation.

Supervisor(s)

Cosmin Eugen Oancea

External examiner(s)

Mads Rosendahl

Date and time

June 18 2024, 16:00-17:00

Room

SCI-DIKU-HCO-01-0-029 (PLTC meeting room)

Name of student(s)

Aske Rory Ching and Niklas Joost Borge

Study Programme

Computer Science Thesis

Title

Towards adversarially robust dataset compression

Abstract

As the need for larger datasets in state-of-the-art machine learning increases, so does the costs of storing and using these datasets. Both economically and environmentally. This has spawned an interest in dataset compression for faster and cheaper training. These methods create small but information rich datasets by smart selection or condensation techniques. Many condensation methods have shown promising results in benign settings. However, the adversarial robustness of these datasets has not been well studied. In this project, we take a closer look at adversarial training on compressed datasets. We show that these do not respond well to adversarial training by default. We further investigate ways to improve the robustness potential of state-of-the-art dataset condensation methods. Here, we found that Gradient Matching shows potential for improvements, although at the expense of benign accuracy and an increased computational cost. Alternatively, we propose a method of latent space feature selection to create an adversarially trainable synthetic dataset. Furthermore, we show that coreset selection can be improved by selecting data points in latent space instead of data space. Our results suggest that feature selection in latent space is promising for adversarially trainable dataset compression

Supervisor(s)

Raghavendra Selvan

External examiner(s)

Lee Herluf Lund Lassen

Date and time

June 18 2024, 10:00-12:00

Room

DIKU UP1-2-0-04

Name of student(s)

Lennart Mischnaewski

Study Programme

Computer Science Thesis

Title

Rating-Aware Sequential Recommendation Systems using Generative Retrieval and Semantic Encoding

Abstract

Given sets of users, items, and their interactions, recommendation systems aim to provide users with personalized selections of items that adhere to criteria such as user preferences or fairness metrics. To create meaningful suggestions, the recommendation engine may use a multitude of signals, including explicit signals such as ratings, implicit signals such as the user’s actions, or information about the users and items such as their country of residence or the item’s country of origin. Such signals provide recommendation models with information about which items the users prefer or dislike and can be used to tailor future recommendations.
Recently, generative models have been used to model data of a sequential nature, such as natural language or time-series data, but have also been applied to the domain of recommendation by encoding individual sessions of users interacting with items as sequences. Given a sequence of items a user has interacted with, the generative model is used to generate the item the user is most likely to interact with next. Further, recent work has investigated the use of semantic encoding for sessionbased recommendation. Before using generative models, a second model is used to create a semantic encoding for each item that can then be used with standard generative sequential models. We introduce two methods of injecting rating information into such generative sequential models.
We evaluate the effect of making the model rating-aware and study the impact on different ratings. We observe that while the ratings can influence the model, the performance of the rating-aware model decreases compared to a rating-agnostic model. We further investigate the effect of different methods of injecting rating information and find that token-based methods outperform embeddingbased methods. Finally, we show that rating-aware methods can be used to influence the model’s recommendations towards items with specific ratings, providing a valuable property for real-world systems.

Supervisor(s)

Christina Lioma

External examiner(s)

Konstantinos Manikas

Date and time

June 19 2024, 10:00-11:00

Room

UP1 room 1.2.26

Name of student(s)

Claudia Ann Hinkle

Study Programme

Computer Science Thesis

Title

Co-Designing Pacing Technologies for People with Energy-Limiting Conditions

Abstract

People with chronic illnesses affecting energy levels such as ME/CFS and
Long Covid need to limit their activity level to avoid “post exertion malaise”
(PEM), a condition in which those who overexert themselves experience even more intense symptoms for days or weeks after the exertion. This practice of keeping activity levels within certain limits is known as “pacing”. There is an opportunity for technology to help people with this process, but conducting research with this population can be difficult given their limited and unpredictable energy levels. This work explores how co-design of pacing technologies can be conducted responsibly, and what we can learn from codesigning about how these tools should be designed. This is done through a 5 week Asynchronous Remote Community study utilizing various co-design techniques, followed by a design ideation process that explores how the learnings from the study could be put into action in technology design.

Supervisor(s)

Sarah Frances Homewood

External examiner(s)

Signe Louise Yndigegn

Date and time

June 20 2024, 10:15-11:15

Room

2-03 at Sigurdsgade 41, 2200, KBH N

Name of student(s)

Yaqi Zhou

Study Programme

Computer Science Thesis

Title

Optimal external resizable arrays

Abstract

Resizable arrays are crucial in managing big data across various industries. The traditional implementation may leave up to half of the allocated space unused, which represents a significant inefficiency in the context of big data. Tarjan and Zwick’s implementation, which consumes N + O(N 1/r) memory cells for maintaining a resizable array of N items and temporarily uses N + O(N 1−1/r) memory cells for any integer r > 2, is designed for internal memory. This design assumes that each item and each pointer
take one memory cell. Access operations take O(1) worst-case time, and insert and delete operations take O(r) amortized time. They proved the implementation is optimal. In scenarios where the resizable array is too large, external storage is used, which leads to suboptimal performance with a straightforward application of their implementation to the external memory model. Our proposed implementation adapts to the external memory model, consuming aN + O((abN) 1/r) bits of space to maintain the array and aN + O((abN) 1−1/r) bits temporarily during some insert operations. Here, N is the current number of items, each item takes a bits, and each pointer b bits. Accessing an item by its index takes 2+o(r) time in the worst case, and the amortized time cost of insert operations is r/B + o(1). We demonstrate that any data structure that consumes aN + O((abN)
1/r) bits to maintain the array must use aN + Ω((abN) 1−1/r) bits at some point. Furthermore, for a wide class of resizable arrays, the amortized time cost of insert operations cannot be asymptotically lower than Ω(r/B), where
the external memory model allows accessing and copying B items at once.

Supervisor(s)

Mingmou Liu

External examiner(s)

Rüdiger Riko Jacob

Date and time

June 20 2024, 13:00-14:00

Room

DIKU UP1-1-1-N116B

Name of student(s)

Oliver Christopher Juhl, Matthias Schultz Busch, Frederik Meyer Møller-Jørgensen, and Jonathan Gram Stenkilde

Study Programme

Computer Science Thesis

Title

Automatic Text Retrieval & Parsing of Digital Herbarium Sheets

Abstract

Organizations such as the Natural History Museum of Denmark (NHMD) specialize in the collection of herbarium sheet specimens for the purpose of documenting their samples for the future, according to a scientifically defined nomenclature. However, the process of digitizing such plant specimens into a database is currently dominated by manual labor, as no fully automatic system exists yet for that task. This calls for a more efficient solution, which would be highly beneficial for history and botanical
museums. Previous research on this problem has mainly been focused on individual parts of a system, but research on a combination of each technique into a fully functional end-to-end pipeline is lacking. Therefore, we present a pipeline that automatically extracts text from the specimen
images. We utilize a pre-trained YOLO model for object detection of institutional and annotation labels in the specimen images. We then make use of pre-trained CRAFT and CRNN models to perform optical character recognition. Lastly, named entity recognition is used with a BERT model to classify the text into categories like SPECIMEN, LOCATION, and more, before a series of post-processing strategies are applied to improve the semantics of the final database. On our own ground truth of 100 machine-written NHMD label images, our pipeline is able to extract the image text with 78,71% accuracy using fuzzy string similarity. Other key findings include that our system struggles the most with extracting legit and
determinant names, and each component performs well enough when isolated, but our overall scores take a hit when the full pipeline performance is tested. With our findings, we conclude that how the pipeline’s components are connected and interact is essential for building the system, rather than focusing on perfecting each individual component. Furthermore, the results also depend on meticulous assumptions and interpretations about how a plant specimen should be processed and stored.

Supervisor(s)

Kim Steenstrup Pedersen

External examiner(s)

Rasmus Reinhold Paulsen

Date and time

21 June 2024, 11:15-12:45

Room

Konferencelokalet på Zoologisk Museum, UP 15

Name of student(s)

Alexis Jean René Dumélié

Study Programme

Computer Science Thesis

Title

Technology Induced Lucid Dreams (TILDs)

Abstract

A lucid dream is a dream where the dreamer is aware they are dreaming.
Lucid Dreaming (LD) has a wide variety of practical applications, such
as its potential as a psychological tool for overcoming phobias, addictions, or nightmares, as a psychophysiological tool for the refinement of motor-skills, and is the ultimate form of immersive experience. Finding simple and reliable ways to induce LD has been one of the forefront challenges in LD research. We highlight the potential of low-cost wearable technology as a practical tool for facilitating future LD research, and human-computer dream interactions, on a wider scale. In this autoethnographic work, we demonstrated a way to use a low-cost wearable device to perform Lucid Dreaming Incubation. We designed the Technology Induced Lucid Dream (TILD) protocol to provide midsleep tactile stimulation during REM sleep, as estimated by heuristics, and, over three experiments, each spanning five nights, we successfully incubated LDs, and increased the incidence of Lucid Dreaming in a non-laboratory environment.

Supervisor(s)

Valkyrie Savage

External examiner(s)

Jakob Eg Larsen

Date and time

June 21 2024, 10:00-11:00

Room

Sigurdsgade 41, room 0-11

Name of student(s)

Thomas Jackson Terry

Study Programme

Computer Science Thesi

Title

Distilling Reliance from Trust: A Survey of Explainable AI Research

Abstract

Explainable artificial intelligence research often targets trust as a goal for XAI. But the case for trust in AI may not be so straightforward. I conducted a survey of 43 recent XAI research articles to determine their stance toward trust. Finding a lack of consensus, I propose that another concept - reliance - could unite research efforts and provide a more achievable goal.

Supervisor(s)

Irina Alex Shklovski

External examiner(s)

Niels van Berkel

Date and time

June 21 2024, 13:00-14:00

Room

https://ucph-ku.zoom.us/j/6141772694

Name of student(s)

Xuanlang Zhao

Study Programme

Computer Science Thesis

Title

Shortest Path in Three Dimensions

Abstract

We discuss the problem of computing shortest obstacle-avoiding paths under an Lp metric (e.g. an Euclidean metric), and we present three algorithms for this problem. Our first algorithm is a fully polynomial approximate algorithm for the problem. The second algorithm
can compute an L1-shortest path between two points on or above a polyhedral terrain. Our third algorithm is a polynomial-time exact algorithm to calculate the shortest path between some stacked flat obstacles.

Supervisor(s)

Mikkel Vind Abrahamsen

External examiner(s)

Nutan Limaye

Date and time

June 25 2024, 09:30-10:30

Room

HCØ Aud. 7

Name of student(s)

Aleksas Prelgauskis

Study Programme

Computer Science Thesis

Title

Investigating the Effect of Outlier Removal by Process Discovery Algorithms

Abstract

Many advanced process discovery algorithms have built-in mechanisms to exclude certain outlier traces from logs in order to simplify the resulting process models. The prevailing justification is that these uncommon traces are merely noise in the data, but this claim lacks concrete evidence. This could raise fairness issues, as when mining processes involve human participants, these outliers may very well represent minorities that, through their removal from the training data, are marginalized by the resulting process models. In this paper, we explore how various process discovery algorithms determine which cases are outliers and how they treat protected groups in practice. We designed an experiment to obtain outlier traces from various process discovery algorithms and examined the overlap of these outliers, including the representation of protected cases among them. Furthermore, we evaluated each model’s outliers split between protected and unprotected traces against the original split in the event logs. Our findings reveal that different algorithms do not consistently
exclude the same traces as outliers. Moreover, we observed biases in different process discovery algorithms, with some favoring and others discriminating against protected groups.

Supervisor(s)

Tijs Slaats

External examiner(s)

Søren Debois

Date and time

June 26 2024, 09:00-10:00

Room

SCI-DIKU-sigurdsgade-0-11

Name of student(s)

Bernard Legay Halfeld Ferrari Alves

Study Programme

Computer Science Thesis

Title

Compiling Hermes to RSSA

Abstract

Reversible programming languages have been a focus of research for more than a decade, mostly due to the work of Glück, Yokoyama, Mogensen, and many others. In this paper, we report about our recent activities to compile code written in the reversible language Hermes to reversible static-single-assignment form RSSA. We will also discuss how we wrote an interpreter for an extended version of RSSA using a type system. Our compiler allows the execution of simple Hermes programs and provides the basis for further optimizations.

Keywords: Reversible computing · Reversible programming languages · Hermes · Reversible static-single-assignment . Encryption

Supervisor(s)

Torben Ægidius Mogensen

External examiner(s)

Morten Rhiger

Date and time

June 26 2024, 10:00-12:00

Room

772-01-0-S29 - PLTC meeting room

Name of student(s)

Stefan Kröll Rasmussen

Study Programme

Computer Science Thesis

Title

Building Mobile Robot Platform for Open Space Interaction

Abstract

This thesis aims to design, develop, and evaluate a mobile robot platform that aims to ensure a seamless Human-Robot Interaction (HRI) in open human-occupied spaces. The project builds on a study conducted by Cornell University in 2023 [1], which used a Wizard-of-Oz experiment to explore Human-Robot Interactions in open spaces. The main goal is to utilize commercially available low-cost components to design and build a reliable and low-cost robotic platform to address the challenge of autonomous navigation and Human-Robot Interaction in unpredictable environments. The hardware includes a hoverboard-based motor system, ODrive motor controllers, a Raspberry Pi 5, and various sensors, including the RealSense D415 camera. The robot’s software stack is built on the Robot Operating System, facilitating robust communication and control mechanisms. Key functionalities such as human detection and following are implemented using computer vision techniques, like Histogram of Oriented Gradients (HOG) with Support Vector Machine (SVM) classifiers, and deep learning models like You Only Look Once version 8 (YoloV8). The robot platform has been thoroughly tested and validated in terms of basic motion, motor precision, durability, and load capacity. Using ODrive Motor Controllers allowed configurable and precise control, and the utilization of hoverboard hardware allows the robot to carry weight in excess of 80
kilograms. Field studies were performed in real-world environments to evaluate the Human-Robot Interaction and acceptance and it was observed that the participants interacted with the robot both functionally and socially. Ethical considerations and safety protocols for public deployment are discussed to ensure responsible integration of robots into human environments. Possible future work includes enhancing user experience through the addition of visual and audio feedback, developing and testing advanced autonomy algorithms, and exploring the use of multirobot collaboration. All code, configurations files and a demonstration video can be found on github at: https://github.com/SKroell/TERRA-bot

Supervisor(s)

Hang Yin

External examiner(s)

Jeppe Revall Frisvad

Date and time

June 27 2024, 15:30-16:30

Room

Image Hot Room – Image Section, UP1

Name of student(s)

Emil Høghsgaard Hansen and Bruce Isiah Thomas Esplago

Study Programme

Computer Science Thesis

Title

Leveraging Computer Vision for Housing Price Estimation

Abstract

Automated models have been widely used in house price assessment. Many models have been developed with numerical and categorical features (size, location, etc) as the primary prediction indicator. In this study, we examine whether and how images in the form of floor plans can be used as a supplement to feature-based models. In connection with this, we have developed a net-scraper that can retrieve relevant data and floor plans. Based on the collected data, we have investigated the efficiency and
precision of various features-based models. We have developed and tested multiple CNN models that analyze and predict prices based on floor plans, as well as combined ensemble models that use these two types of data. Our experiment shows that the features-based models perform better than the models that only use floor plans. Furthermore, we find that especially size and location have a huge influence on the price. This is exactly what our CNN models find difficult to map, and is also the primary reason for their inaccuracy. The combination of image data and features can improve predictions, although not to a great extent.

Supervisor(s)

Bulat Ibragimov

External examiner(s)

Veronika Vladimirovna Cheplygina

Date and time

June 28 2024, 11:00-12:00

Room

SCI-DIKU-UP1-1-1-N116B

Name of student(s)

Dong She

Study Programme

Computer Science Thesis

Title

Blood Vessel Mesh Generation

Abstract

This master’s thesis explores vessel mesh generation and blood Computational Fluid Dynamics (CFD) simulation in depth. Accurate modeling of blood vessels is crucial for understanding hemodynamics and diagnosing cardiovascular diseases. The primary objective of this study
is to develop robust methods for generating high-quality vessel meshes and to utilize these meshes for simulating blood flow dynamics. The study begins by reviewing existing research on the 3D reconstruction of blood vessels, highlighting the limitations of current approaches. It then details our approach to surface mesh generation using two methods: convolution surfaces and neural implicit methods. Convolution surfaces provide smooth and continuous vessel representations, while neural implicit methods offer flexibility and precision in complex geometries. Following surface mesh generation, we build the volume mesh through tetrahedralization, ensuring smoothness and high quality through rigorous geometry analysis. CFD experiments are conducted on the vessel mesh to simulate blood flow dynamics, revealing critical insights into flow patterns and potential areas of pathological concern. The results demonstrate the efficacy of our methods in producing accurate and reliable vessel meshes
suitable for CFD simulations. Key findings include significant improvements in mesh quality and simulation accuracy compared to traditional methods.
In conclusion, this thesis presents effective techniques for vessel mesh generation and blood flow simulation, contributing valuable tools for medical research and clinical applications. Future work will explore the integration of these methods with real-time imaging data and the development of more advanced simulation models.

Supervisor(s)

Kenny Erleben

External examiner(s)

Jakob Andreas Bærentzen

Date and time

June 28 2024, 13:00-14:30

Room

UP1, 3:2:20

Name of student(s)

Jakob Flinck Sheye and Hristo Atanasov Georgiev

Study Programme

Computer Science Thesis

Title

Generative Models for Children's Head Motion in Resting State Functional Magnetic Resonance Imaging

Abstract

Motion artifacts are a major obstacle in MRI image acquisition, as they
could render the clinical analysis of the scans futile. Researchers have developed several approaches to mitigate the impact of motion artifacts on scans. One such approach is retrospective motion correction, often performed by a deep learning model. Those models require large datasets of motion-free and motion-corrupted scans, so they are often challenging to train due to insufficient data. Furthermore, the current state-of-the-art methods of creating artificial motion artifacts rely on random transformations in the scan’s spatial domain. A potential solution to this problem is the employment of synthetic data. Given a realistic synthetic motion curve, one could introduce motion artifacts corresponding to actual movement. Our project’s primary goal is to evaluate the ability of several generative models to produce realistic synthetic motion curves. For this purpose, we train the models on the motion curves provided by the ABCD study - a large US study on adolescent brain health and development. We employ various evaluation techniques, including a novel protocol for time-series benchmarking, to gauge the models’ ability to generate realistic sinusoidal waves and motion curves. We discovered that Fourier Flow does an excellent job of capturing the real distribution’s range, shape, and correlations. However, the synthetic data does not successfully recreate abrupt jumps in the motion curves, often resulting from involuntary actions, such as coughing. Nonetheless, the results are promising and could lay the foundations for further research in the field.

Supervisor(s)

Melanie Ganz-Benjaminsen

External examiner(s)

Kristoffer Hougaard Madsen

Date and time

June 28 2024, 11:00-13:00

Room

SCI-DIKU-UP1-2-0-04

Name of student(s)

Mads Daugaard and Emil Christoffer Riis-Jacobsen

Study Programme

Computer Science Thesis

Title

Simulating Head Motion in MRI: A Silicone Phantom Approach with Machine Learning Integration

Abstract

Due to long image acquisition times in magnetic resonance imaging (MRI), it is prone to image artefacts caused by patient motion, which is especially prevalent for children, potentially resulting in unsuccessful diagnoses. As a result of this, many MRI examinations need to be repeated, occasionally requiring the use of general anaesthesia to limit patient motion, both being a costly process. Consequently, research in motion correction methods has become of great interest. In this thesis, we propose the use of a 6 degrees of freedom cable-suspended parallel robotics (CSPR) system with machine learning integration for inverse kinematics, used to induce head motion of a silicone-based phantom head inside an MR scanner. The system aims to provide a reliable and reproducible method for motion simulation in MRI to facilitate training and validation of motion correction methods. Through an iterative design and manufacturing process, we develop a method for 3D printing and casting a silicone-based phantom using hard shell moulds, finding that our method shows great promise but that better equipment may be required to eliminate all MR artefacts. A CSPR system for controlling the phantom head inside an MR head coil is developed and
shown to be able to achieve the motion ranges of ten children. Using feedforward neural networks for approximation of inverse kinematics, we show through analysis that the system is capable of replicating prevalent forms of motion, such as nodding and translation in and out of the head
coil. However, performance on low amplitude motion for different neural networks varied due to data sensitivity. Using a specific set of cable attachments with the limited cables of the system, some large amplitude motion were found to be difficult to achieve in isolation. Physical system
changes and high-frequency tracking data may thus be necessary to enhance simulation accuracy and realism for motion correction methods in MRI.

Supervisor(s)

Melanie Ganz-Benjaminsen

External examiner(s)

Kristoffer Hougaard Madsen

Date and time

June 28 2024. 08:30-09:30

Room

DIKU-UP1-2-0-04

Name of student(s)

Runfei Wu

Study Programme

Computer Science Thesis

Title

Towards High-Fidelity Simulation of the Human Colon Modeling A Deformable Tube

Abstract

Colorectal cancer is among the most prevalent cancers worldwide, with
colonoscopy essential for early diagnosis. However, a shortage of experienced professionals necessitates effective simulation-based training. High-fidelity simulations for surgical training face challenges, including the need for real-time feedback and the computational demands of accurate simulations. Common challenges involve simulating the elasticity and deformation of colon tissue and managing self-collision. The colon’s unique structural properties, such as the extreme ratio between its thickness and overall size, require complex mesh designs. This research aims to develop a softbody simulator that accurately models the human colon as a hollow, thin, deformable tube. A key focus is devising methods that use a reasonably complex mesh to simulate accurate contact and elasticity. We introduced an embedded mesh technique where the volume mesh handles elasticity, and the embedded surface mesh generates accurate contact points, reducing the need for highly complex meshes. This project also
developed techniques such as a dense representation for the Jacobian matrix and a parallel Jacobi contact solver. This project offers valuable insights into softbody and contact simulation, providing a solution for contact simulation of extremely thin structures. It identifies potential issues and suggests directions for future improvements. While immediate practical impacts on medical practices are not quantified, the findings lay a foundation for further research and development.

Keywords: high-fidelity simulation, human colon, deformable tube, biomedical modeling, softbody simulation, embedded mesh

Supervisor(s)

Kenny Erleben

External examiner(s)

Jakob Andreas Bærentzen

Date and time

June 28 2024, 14:30-16:00

Room

UP1 3:2:20

Name of student(s)

Christoph Alexander Prehn

Study Programme

Computer Science Thesis

Title

Flying drones autonomously with Hierarchical Reinforcement Learning. A Study of Hierarchical Reinforcement Learning for autonomous drones

Abstract

Flying drones autonomously with Hierarchical Reinforcement Learning Autonomous drone flight is one of the most complex task within robotics, while it at the same time offers a multitude of applications in the real world. From inspecting difficult or dangerous to access areas to finding people in avalanche regions, autonomous drones offer the potential to make various task safer and more efficient. Even though Reinforcement Learning and Control Theory have shown impressive performance on individual task, they require meticulous modelling of the environment or extensive training. Simultaneously they struggle to adapt to perturbations in the environment or transfer to new task. Hierarchical Reinforcement Learning enables agents to utilize temporal abstraction and hierarchical structures similarly to the thought process of humans. In previous research, this has shown improved performance and strengthened robustness against environment perturbations in simple tasks. For complex task, like autonomous drone flight, the research is limited.
In this thesis, we apply Hierarchical Reinforcement learning to the task of autonomous drone flight to examine the performance, Adaptability against perturbations and transferability of Hierarchical Reinforcement on complex task. To facilitate the necessary experiments, we create three drone flight task extending existing simulation environments and develop a framework, which enables learning on multiple task simultaneously. Additionally, we provide an up-to-date implementation for the Hierarchical Reinforcement Learning variant of the state-of-the-art Reinforcement Learning algorithm for autonomous drone flight. Our experiments show that Hierarchical Reinforcement Learning agent, can indeed improve the adaptability to perturbations in the environment. Furthermore, we demonstrate, that the Hierarchical Reinforcement Learning agent can learn better multiple task simultaneously than its Reinforcement Learning counterpart. On the other hand, we showcase that for single tasks without any perturbations standard Reinforcement Learning outperforms Hierarchical Reinforcement Learning.

Supervisor(s)

Stefan Sommer

External examiner(s)

Dan Witzner Hansen

Date and time

June 28 2024, 13:00-14:00

Room

Mødelokale A, Østervoldgade 3

Name of student(s)

Jonas Hagel and Marcus Frostholm

Study Programme

Computer Science Thesis

Title

Ensuring safety using Barrier functions for collision detection and avoidance between multiple drones

Abstract

We construct a barrier function and explain how to utilise the barrier function for collision detection and avoidance in a drone swarm. We focus on modifying trajectories to ensure safe flight through a hoop. The collision potentials are computed by integrating line segments of different trajectories, and gradients of barrier values are utilised for modifying trajectories. We test our implementation in the Webots simulator by mimicking the Crazyflie 2.1 platform. The results shows success with three drones, flying in collisionfree trajectories in order to guarantee safety for the drones and complete the task of flying through a hoop. This highlights the use of barrier functions in producing safe and reliable trajectories, showing their potential in achieving collision detection and avoidance. Efforts were made to implement a physical experiment, combining software and hardware, setting the foundation for future work.

Supervisor(s)

Kenny Erleben

External examiner(s)

Jakob Andreas Bærentzen

Date and time

June 28 2024, 10:30-12:00

Room

UP1 3:2:20

Name of student(s)

Marie Elkjær Rødsgaard

Study Programme

Computer Science Thesis

Title

HybridGNet: Exploring medical image segmentation and shapes

Abstract

The HybridGNet is a new segmentation method that explores how to improve anatomical image segmentation in the medical field. The HybridGNet uses landmark-based segmentation to output a segmentation graph, unlike a traditional segmentation U-Net which outputs a pixel-level segmentation. This project investigates what shape models neural networks such as U-Net are and how this then relates to the HybridGNet method. Several experiments have been done with the segmentation methods. The data consists of synthetic data of smileys and X-ray datasets containing lungs. The results of these experiments show that the HybridGNet is a viable alternative to the U-Net when segmenting images where shape is the defining factor.

Supervisor(s)

Erik Bjørnager Dam

External examiner(s)

Dan Witzner Hansen

Date and time

June 28 2024, 11:00-12:00

Room

Observatoriet på Østre Voldgade (Pioneer centret)

Name of student(s)

Tim Ruschke

Study Programme

Computer Science Thesis

Title

Guided Synthesis of Labeled Brain MRI Data Using Latent Diffusion Models for Segmentation of Enlarged Ventricles

Abstract

Scarcity, inhom*ogeneity, and privacy are common obstacles for deep learning in a medical context. While synthetic data appears as an ostensibly easy solution, research has shown time and time again that training with synthetic data fails to perform as well as with real data. In the context of ventricular segmentation in brain MRI images, we present a proof of concept for the successful use of synthetic data in training segmentation models. State of the art segmentation models often struggle to accurately segment patients suffering from enlarged ventricles due to afflictions like normal pressure hydrocephalus. We show that synthetic data can serve to address this by customizing the distribution of ventricular volume in the training set. We employ two latent diffusion models, a mask generator and a corresponding spade image generator, to create labeled 3D brain
MRI images. Conditioning the mask generator on ventricular volume in combination with classifier-free guidance enables us to control the ventricular volume distribution of the generated images. We test the synthetic data using three nnU-Net segmentation models trained on a real, augmented and entirely synthetic dataset respectively, where the synthetic data contains a more uniform spread of ventricular volume compared to
the real data. The resulting models are tested on a dataset of patients with enlarged ventricles. We also measure predicted ventricular volume of binary segmentation masks, where the real model has a a mean absolute error (MAE) of 16.32 ± 12.28mL, while the models trained on synthetic and augmented data with 9.67 ± 4.69mL and 8.89 ± 6.15mL respectively achieve much better results. Both also outperform state of the art model SynthSeg, which due to severe outliers has a MAE of 10.49 ± 17.52mL with notably
high standard deviation. The models trained with synthetic and augmented data also slightly outperform both SynthSeg and the model trained on real data in Dice score. The comparative success of synthetic data in this work may be surprising against a backdrop of research where it consistently performs worse. As expert evaluation shows our synthetic data, while realistic, does not match the quality of real data, we primarily attribute its success to the more even distribution of ventricular volume in the synthetic data.

Supervisor(s)

Martin Nørgaard

External examiner(s)

Veronika Vladimirovna Cheplygina

Date and time

June 28 2024, 13:00-14:00

Room

SCI-DIKU-UP1-2-0-06 og SCI-DIKU-UP1-2-0-04

Name of student(s)

Nicklas Boserup

Study Programme

Computer Science Thesis

Title

Score Learning for Parameter Inference in Stochastic Shape Evolutions

Abstract

In the field of computational evolutionary morphometry, inference of
most likely parameters of assumed underlying stochastic processes is
of interest; for gaining insights into the stochastic nature of evolution,
as well as for potentially establishing relationships between observations. This work proposes techniques for performing parameter inference
through simulated likelihood estimation, and illustrates the applicability
of the proposed methods for both desires. Building upon the pioneering work of Heng et al. (2022), this thesis shows how score matching may be used to establish conditional stochastic flows of shapes. The shape domain introduces numerical complexities which are tackled by deriving a novel numerically stable objective function. It is further shown how samples from approximated conditioned processes may be used as proposals in an importance sampler capable of performing likelihood estimation on shape flows. Through illustrative examples it is shown how the proposed methodology for parameter inference works; it is explored how to successfully apply it; and experiments showcasing the usefulness of the techniques are presented. Along the way, this text aims to serve as a self-contained, reasonably accessible introduction to this highly technical field.

Supervisor(s)

Stefan Sommer

External examiner(s)

Dan Witzner Hansen

Date and time

June 28 2024, 14:00-15:00

Room

Mødelokale A, Østervoldgade 3

Name of student(s)

Ying Liu

Study Programme

Computer Science Thesis

Title

Exploration of Self-Supervised Learning Methods for Longitudinal Image Analysis

Abstract

The advent of self-supervised learning has alleviated the bottleneck
of lacking annotated datasets in supervised learning for medical image
analysis. This research aims to explore whether two relatively recent
self-supervised learning methods, BYOL and SimSIAM, can train a 3D
model suitable for longitudinal analysis. In this study, the LIDC-IDRI
dataset is used to pre-train two 3D ResNet-50 models using BYOL and
SimSIAM, both of which converge and prevent representation collapse.
Notably, SimSIAM converges faster and at a lower loss than BYOL.
Subsequently, these models are employed to predict the tumor volume
of a subject on the final day of the radiotherapy treatment using the
learned representations on the 4D Lung dataset. The predictions differ
from the actual tumor volumes significantly. To further verify whether
these models can capture tumor volume-related representations, linear
probing predictions are made. The results show that the predictions are
inaccurate for all subjects, displaying considerable discrepancies from the
actual values, and substantial variation between different predictions for
the same subject. Moreover, there is no significant correlation between
the actual tumor volumes and the linear probing predictions, although
the model trained using SimSIAM method demonstrates a marginally
better correlation. Therefore, this research concludes that the BYOL
and SimSIAM methods, as implemented, are not suitable for training
3D models for longitudinal analysis due to their inability to efficiently
capture tumor volume-related representations. The research may serve as
a reference for future studies that apply self-supervised learning methods
akin to BYOL and SimSIAM to train 3D models for longitudinal analysis.

Keywords: self-supervised learning; 3D medical images; longitudinal analysis

Supervisor(s)

Jens Petersen

External examiner(s)

Rasmus Reinhold Paulsen

Date and time

June 28 2024, 09:00-10:00

Room

SCI-DIKU-UP1-1-1-N116A

Bioinformatics

Name of student(s)

Cheng Chen

Study Programme

Bioinformatics

Title

Unsupervised learning of multi-omics and phenotype data in the UK Biobank

Abstract

To generate deeper understanding of complex phenotypes we need to
move beyond single data modalities and focus on multi-omics and multimodal data integration. The project focus on applying the MOVE pipeline
to perform integration of anthropometrics, biomarker, proteomics and
metabolomics data from UK Biobank database to latent representations,
and interpreting these latent representations and important features they
represent. Then we performed genome-wide association studies (GWAS)
on the latent representations to investigate genetic associations related
to metabolic syndrome. This approach enhances our understanding of
how genetic variations influence a combination of complex phenotypes
and provides a potential direction for future research.

Supervisor(s)

Anders Krogh

External examiner(s)

Ole Lund

Date and time

June 11 2024, 15:30 - 16:30

Room

Panum, Mærsk Tower, floor 8, room 145A.

Name of student(s)

Lucas Phillip Krieger

Study Programme

Bioinformatics

Title

Exploring the Capabilities of Protein Language Models at Predicting Glycosylation

Abstract

Glycosylation is the most abundant and diverse form of protein post-translational modification. O-glycosylation biosynthesis starts when a group of twenty partially redundant polypeptides transfer an N-acetylgalactosamine (GalNAc) to a serine or threonine in the
golgi apparatus. Unlike N-glycosylation, O-linked glycosylation occurs on serine or threonine residues, which is then followed by various amino acids that lack a consensus motif. The lack of a motif complicates the prediction of O-linked glycosylation sites. The main objective is to explore the capabilities of protein embeddings generated by ESM-2 in
predicting if disordered regions will be O-linked glycosylated.

Supervisor(s)

Hiren Joshi, Wouter Boomsma

External examiner(s)

Henrik Nielsen

Date and time

June 12 2024, 14:30 - 15:20

Room

07-10-143a in the Mærsk tower

Name of student(s)

Yifan Sun

Study Programme

Bioinformatics Thesis

Title

Extension and application of a side-chain customization protocol for receptor probe and drug discovery

Abstract

G protein-coupled receptors (GPCRs) are crucial in mediating the actions of approximately two-thirds of human hormones, the majority of which, around 71%, are peptides or proteins. Given their pivotal role in numerous physiological processes, GPCRs represent critical targets in drug discovery. However, designing ligands that effectively target GPCRs is particularly
challenging for several reasons, including the structural diversity and heterogeneity of these receptors, the lack of detailed structural information, and the need for high selectivity and specificity.
In this context, this study significantly refines and expands a foundational side-chain selection algorithm based on residue-to-residue interactions. The enhancement targets the design of probes for Class A GPCRs by optimizing both our GPCR-peptide interaction library and the side-chain selection algorithm. This study implements a novel approach, utilizing a comprehensive collection of Class A GPCR-peptide complexes to generate a residue pair interaction library that incorporates information from all available GPCR-peptide structures. Subsequently, the library will be utilized to design peptides by removing and reconstructing side chains on peptidereceptor backbones to achieve the prediction of amino acid sequences of ligands. With the support of different statistical measures, we validate this new method to ensure the predicted reliability and accuracy. At the same time, in collaboration with our team’s drug design and protein science experts, we select several structures of interest based on our
novel approach and the latest deep learning-based methods, ProteinMPNN, for prediction. The current results show significant improvements in prediction accuracy compared to the initial version. In the future, we may use molecular dynamics further to confirm the stability of the predicted ligand-receptor binding scaffold. The novel approach in ligand discovery may provide insights for GPCR-targeted drug design, particularly in identifying ligands for GPCRs without an endogenous ligand.

Supervisor(s)

Wouter Boomsma

External examiner(s)

Jes Frellsen

Date and time

June 13 2024, 10:00-11:00

Room

UP1-2-0-04

Name of student(s)

Yan Li

Study Programme

Bioinformatics Thesis

Title

Development of a deep generative model for cancer gene expression and clinical data combined

Abstract

This thesis aims to handle the multi-modality metadata in the TCGA
dataset and integrate it with the gene expression data to enhance cancer
studies using a deep generative model. The oncology deep generative decoder model in this thesis, utilizes a Gaussian mixture model followed by
a two-layer decoder capable of processing multi-modal data. After being
trained on gene expression data combine with metadata, the oncology
deep generative decoder model effectively captures the latent representations of different cancer types and generates different modalities from its output layer. Our approach not only demonstrates the capability to reconstruct and generate realistic synthetic data but also enhances cancer subtype identification and survival analysis. Experimental results highlight the model’s proficiency in distinguishing cancer subtypes, as evidenced by a strong correspondence between Gaussian mixture model’s components and external cancer type labels. Furthermore, the model excels in survival analysis, accurately predicting survival probabilities
over time for different cancers. The survival analysis revealed distinct
survival curves for aggressive cancers like mesothelioma (MESO) and
pancreatic adenocarcinoma (PAAD), compared to more indolent cancers such as lung adenocarcinoma (LUAD), breast invasive carcinoma
(BRCA), and prostate adenocarcinoma (PRAD), aligning well with clinical observations. This thesis illustrates that the oncology deep generative decoder model addresses the limitations of traditional dataset-based
cancer research, which often struggles with multi-modal datasets, and
demonstrates potential for a variety of clinical applications.

Keywords: Deep generative model, TCGA dataset, Multimodality, Oncology, Survival analysis

Supervisor(s)

Anders Krogh

External examiner(s)

Jes Frellsen

Date and time

June 20 2024, 14:30-15:30

Room

Panum, mødelokale 33.4D

Name of student(s)

Rasmus Alex Buntzen-Frederiksen

Study Programme

Bioinformatics Thesis

Title

Predicting contaminated DNA samples within the class of Insecta from DNA and image embeddings

Abstract

Contaminated DNA samples pose significant challenges in biological research. Current methods for detecting contamination are often time-consuming and lack precision, necessitating the development of more robust, automated approaches. This thesis addresses this need by exploring a multi-modal machine learning model that integrates both image and DNA data to enhance contamination detection accuracy. The primary objective of this work is to develop and evaluate the feasibility and performance of a multi-modal machine learning model for predicting DNA contamination, using specimen images and DNA sequences. The methodology for this project uses deep convolutional neural networks (VGG16, ResNet-34 and EfficientNetB0) to extract image features
and large language models (DNABERT2) to extract DNA features. Improvements to the image and DNA tracks are explored to enhance model performance. From the feature embeddings, a multilayer perceptron produces the contamination prediction. The study also explored methods to address overfitting and class imbalance issues.
The initial model achieved a 61% accuracy on unseen training data, indicating the foundational capability of the multi-modal contamination prediction model. Subsequent refinements improved the model’s performance to an accuracy of 85%. EfficientNetB0 showed the highest effectiveness for the image track with a validation species classification accuracy of 74%, while DNABERT2 achieved a 98.5% validation species classification accuracy for the DNA track. This study provides
preliminary evidence that a multi-modal machine learning model can effectively predict DNA contamination by leveraging both image and DNA data.

Supervisor(s)

Kim Steenstrup Pedersen

External examiner(s)

Rasmus Reinhold Paulsen

Date and time

June 21 2024, 10:05-11:05

Room

Konferencelokalet på Zoologisk Museum, UP 15

IT and Cognition

Name of student(s)

Xiangyu Lu

Study Programme

IT and Cognition Thesis

Title

Fiber Break Prediction Using 3D Generative Models

Abstract

Reliable failure predictions of fiber-reinforced composites (FRCs) are crucial for ensuring the safety of products, reducing costs, and optimizing performance. Investigating the underlying mechanism of tensile failure of individual fibers is an important research direction. In this thesis,we aim to predict potential fiber breaks by synthesizing computed tomography (CT) images of FRCs at higher forces based on CT scans taken under initial force. We developed and evaluated 3D conditional generative adversarial networks (c-GANs) with 3D U-Nets as generators to generate CT images of fiber structures under increased force. Our models obtained MAE and
MSE scores of 0.0369 and 0.00278, respectively, on the validation set. The visual quality of the generated images indicates that the models excelled at predicting the movement of the fiber structure under increasing force. However, we observed a weak correlation between the generated
and ground truth fiber breaks.

Supervisor(s)

Abraham George Smith

External examiner(s)

Melih Kandemir

Date and time

June 10 2024, 09:30-10:30

Room

DIKU UP1-2-0-15

Name of student(s)

Ben Yao

Study Programme

IT and Cognition Thesis

Title

Self-supervised Pre-training for Quantum Natural Language Processing

Abstract

Quantum computing has been broadly applied to many AI fields and achieved impressive results. However, the quantum machine learning models are limited to the inherent linearity in quantum computing architecture, resulting in their constrained capabilities and adaptability.
Inspired by the remarkable success of self-supervised learning and pre-training method, we integrate these two promising mechanisms into our quantum natural language processing (QNLP) model, attempting to address this limitation and to increase the power of the QNLP model on the representation level. Specifically, we pre-train a hybrid quantum-classical natural language processing model, which consists of a BERT-like text representation encoder, a feature extractor, and a task-related classifier, on large text corpora. In contrast to traditional natural language processing models, we employ quantum architectures within both
the feature extractor and the classifier. In the downstream tasks, the QNLP model inherits the pre-trained word representations and quantum encodings of sentences, and is fine-tuned for specific tasks on its basis. Experiments show that pre-trained mechanism brings remarkable improvement over end-to-end quantum models, yielding meaningful prediction results on a variety of downstream text classification datasets. Furthermore, our work brings significant enhancement to the capability of language understanding over the pure QNLP model. As a pioneering work, this study conducts a precursory exploration of pre-training quantum
language models and sheds light on future research in this direction.

Supervisor(s)

Qiuchi Li

External examiner(s)

Troels Andreasen

Date and time

June 28 2024, 10:00-11:00

Room

DIKU UP1-2-0-06

Name of student(s)

Adrianna Helena Klank

Study Programme

IT and Cognition Thesis

Title

Diffusion Models in 3D Medical Image Synthesis and Forecasting Radiotherapy Outcome for Lung Cancer Treatment

Abstract

The aim of this work is to develop a method for forecasting the progress
of cancer treatment by employing Generative Artificial Intelligence to synthesize images that capture changes across the radiotherapy. We evaluate a method that incorporates an attention-based module for capturing temporal changes[22] as an additional input to a diffusion model. Our findings indicate that this approach is unsuccessful in accurately capturing anatomical structures in both CT lung scans and MRI cardiac cycle datasets. We further investigate the method through an ablation study with various architectural variants. Ultimately, we demonstrate that while one of the models is capable of generating high-quality images, its clinical usefulness requires further evaluation.

Supervisor(s)

Jens Petersen

External examiner(s)

Rasmus Reinhold Paulsen

Date and time

June 28 2024, 10:00-11:00

Room

SCI-DIKU-UP1-1-1-N116A

Health Informatics

Name of student(s)

Alexander Haderup Alsing and Felix Björklund Osmark

Study Programme

Speciale i Sundhed og Informatik

Title

Teknologisk løsning til arbejdsgangen ved supervision i Vesterbro Lægehus

Abstract

Background:
General practice constitutes the citizens’ primary access to the Danish healthcare system and deals with both minor and major health problems. With an ageing population and a growing proportion of patients with chronic diseases, there is a need for increased treatment capacity
and efficient work organisation in general practice. The 2022 collective agreement supports this by promoting the use of practice staff to relieve the burden on doctors.

Method:
In this study, we use the Double Diamond process model to structure our study of how a technological solution can improve the supervision process at Vesterbro Lægehus. To investigate this, we collect empirical data using participant observation to understand and observe workflows, as well as interviews to understand staff experiences and needs. Through open coding and clustering, we identify patterns in the empirical material that result in a workflow analysis. We use the Septigon model to understand how our findings affect each other from the workflow analysis. We also supplement with a literature search to validate our findings. For the prototype development of the concept, we use the digital design tool Figma
as well as validation through user tests.

Results:
A technological solution can be used for one of the supervision workflows in Vesterbro Lægehus. The solution aims to reduce the inactive waiting time by restructuring the waiting time location. The general practice staff can instead of physically wait at the supervisor's office, perform other relevant work tasks. The technological solution requires that the
supervisor receives a notification with information from the general practice staff about the applicable supervision. The solution must provide an opportunity to respond to the inquiries, to reduce the general practice staff’s concern about being overlooked.

Conclusion:
The on-call function has the potential to improve the workflow and reduce downtime in general practice. Further research is needed to determine the full efficacy and optimize the system for different supervision needs.

Supervisor(s)

Henriette Mabeck

External examiner(s)

Yutaka Yoshinaka

Date and time

June 18 2024. Alexander Haderup Alsing: 15:15-16:15 & Felix Björklund Osmark: 16:15-17:15

Room

2.3.I.164 på NBB

Details

Time: 5 June - 28 June 2024

Organizer: Department of Computer Science

MSc Defences Summer 2024 (2024)

References

Top Articles
Latest Posts
Article information

Author: Nicola Considine CPA

Last Updated:

Views: 6487

Rating: 4.9 / 5 (69 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Nicola Considine CPA

Birthday: 1993-02-26

Address: 3809 Clinton Inlet, East Aleisha, UT 46318-2392

Phone: +2681424145499

Job: Government Technician

Hobby: Calligraphy, Lego building, Worldbuilding, Shooting, Bird watching, Shopping, Cooking

Introduction: My name is Nicola Considine CPA, I am a determined, witty, powerful, brainy, open, smiling, proud person who loves writing and wants to share my knowledge and understanding with you.