MSc Defences Summer 2024 (2024)

Department of Computer Science

Department of Computer Science DIKU
Event Calendar 2024
MSc Defences Summer 2024

See the list of MSc defences at DIKU this summer (incl. September and October). The list will be continuously updated.

Information about the thesis, supervisor, location of the defence, etc. can be found on the respective events below.

Computer Science

6 June: Jingyi Zheng and Yaokun Li

Name of student(s)	Jingyi Zheng and Yaokun Li
Study Programme	Computer Science Thesis
Title	MobileDFL: Embracing Heterogeneity and Dynamism for Decentralized Federated Learning in Mobile Networks
Abstract	The exponential growth of mobile devices has led to abundant data for training AI models, but it has also raised privacy concerns. Decentralized federated learning (DFL) has emerged as a solution to balance user privacy and avoid single points of failure. However, DFL places higher demands on nodes, requiring increased communication and storage resources compared to conventional federated learning. The heterogeneity of mobile networks introduces nodes with varying capabilities, and some may struggle with resource-intensive requirements. Moreover, the dynamic nature of mobile networks can lead to ineffective communication. To address these inefficiencies, we propose the MobileDFL that integrates the advantages of both Client-Server and Peer-toPeer architectures, enabling each node to select the optimal strategy in heterogeneous and dynamic mobile networks. We also discuss incentive strategies to reward nodes undertaking greater communication and storage costs. Our experiments with different model complexities show that MobileDFL successfully mitigates communication and storage overhead of nodes while ensuring model convergence is not affected by node instability.
Supervisor(s)	Xikun Jiang
External examiner(s)	Hua Lu
Date and time	June 6 2024, 13:00-14:00
Room	UP1 771-01-2-16

6 June: Qiongyan Wang

Name of student(s)	Qiongyan Wang
Study Programme	Computer Science Thesis
Title	DRL4DPRA: A Deep Reinforcement Learning Framework for Dynamic Public Resource Allocation
Abstract	The goal of allocating public resources, such as billboards, surveillance cameras, base stations, and trash bins, is to cater to a larger population. However, the uneven distribution of people across spatial and temporal domains is influenced by the dynamic patterns of human mobility. To address this, we introduce Hierarchical Spatial Temporal Netowrk (HSTNet) for dynamic public resources allocation tasks. Based on the reinforcement learning framework, our model can learn from experience without setting complex rules. It operates in two key stages: Firstly, We capture the spatiotemporal characteristics of crowd flow. Secondly, we use a hierarchical selection method to reduce the action space. We evaluate HSTNet’s performance using two real-world crowd flow datasets, demonstrating its superiority over baseline models.
Supervisor(s)	Xikun Jiang
External examiner(s)	Hua Lu
Date and time	June 6 2024, 14:00-15:00
Room	Room UP1, 771-01-2-16

6 June: He Lyu

Name of student(s)	He Lyu
Study Programme	Computer Science Thesis
Title	Securing privacy in reinforcement learning through zero-Knowledge proofs
Abstract	Reinforcement learning (RL) is widely used in real-world applications, but deploying it in areas that need high privacy and transparency can be challenging. This is due to the difficulties in balancing algorithm verification – which includes understanding the inputs, outputs, and processes used–and maintaining the secrecy of data and algorithm parameters. This study examines how privacy, verifiability, and RL interact, focusing on Markov Decision Processes (MDPs) that use the stochastic gradient ascent (SGA) algorithm. We have developed a new algorithm that uses zero-knowledge succinct noninteractive arguments of knowledge (zk-SNARKs) to keep data and algorithm parameters private within RL tasks. This helps protect the privacy of data and parameters while ensuring the RL process is transparent and reliable. Our extensive testing shows that this new method keeps data and parameters secure and performs better than traditional RL algorithms in some situations. Additionally, the proofs it generates are small and only require linear time to verify, making it suitable for large-scale use. This development is a major step forward in improving data privacy in complex decision-making and offers a practical solution for privacy-sensitive environments.
Supervisor(s)	Xikun Jiang
External examiner(s)	Hua Lu
Date and time	June 6 2024, 15:00-16:00
Room	UP1 771-01-2-16

7 June: Jakob Krogh Petersen and Johan Valdemar Licht

Name of student(s)	Jakob Krogh Petersen and Johan Valdemar Licht
Study Programme	Computer Science Thesis
Title	Contrastive Language-Image Pre-Training In Three-Dimensional Space
Abstract	Current state-of-the-art medical image analysis methods require dense data annotated by experts. The scalability of such methods is constrained by the time-consuming and expensive nature of such annotations. In medicine, however, it is a common practice to pair visual artifacts with descriptive textual reports. Recently the use of natural language to supervise 2D visual representation learning with contrastive learning has revolutionized modern vision learning. In this thesis, we propose a contrastive language-image pre-training method for 3-dimensional images, with a focus on retaining the heuristics that lead to impressive results for 2-dimensional images. To the best of our knowledge, this is the first attempt at applying CLIP to 3D medical data. We further investigate how batch sizes affect performance stability on a downstream task. The method is evaluated based on the task of predicting the location of a lesion on brain magnetic resonance imaging (MRI) scans. The results show that natural language can guide contrastive learning of 3D visual representations and an alignment across modalities in a joint embedding space can be learned. This thesis serves as a proof of concept, demonstrating the feasibility of learning a joint embedding space for 3D images and texts.
Supervisor(s)	Mads Nielsen
External examiner(s)	Rasmus Reinhold Paulsen
Date and time	June 7 2024, 10:00-11:00
Room	ØV 3 (The Pioneer Centre)

7 June: Aljaz Jazbec

Name of student(s)	Aljaz Jazbec
Study Programme	Computer Science Thesis
Title	Targeted Security Analysis of ARYZE's platform
Abstract	In recent years, there has been an increase in the number of successful attacks on smart contracts. Therefore, in this project, we perform a targeted security analysis of ARYZE’s platform. Specifically, we analyze the platform’s smart contract implementation of ERC20 token eEUR published on Ethereum Mainnet in three different ways: manual static analysis, analysis of the output of automatic vulnerability scanner, and formal verification. The whole analysis is based on the customers requirements gathered during the requirements elicitation phase. We use Slither as a tool for the automatic vulnerability scanner, and F* proof-oriented language for the formal verification process.
Supervisor(s)	Boris Düdder
External examiner(s)	Pavel Hruby
Date and time	June 7 2024, 14:00-15:00
Room	UP1-2-015

7 June: Cristopher Jung

Name of student(s)	Christopher Jung
Study Programme	Computer Science Thesis
Title	C-SPEC: Formal Specification for Blockchain-based Crowdsourcing Systems
Abstract	The digital crowdsourcing paradigm has revolutionized modern outsourcing through its highly-systematic approach to task coordination. Researchers in Blockchain technologies are currently investigating the efficacy of Ethereum-based smart contracts as drivers of the crowdsourcing process, emphasizing the analysis of Truth Discovery (TD) techniques, data encryption, and system design. However, a shortage of standardized frameworks impedes the progress of research, as each new implementation necessitates the development of a novel system architecture. To resolve this shortcoming, we introduce a Blockchain-based formal specification known as C-SPEC, which serves as a generalized framework for smart contract-based crowdsourcing systems. This thesis offers a comprehensive overview of the C-SPEC protocol, expanding upon the internal structure, behaviors, and interactivity of all associated system components. Furthermore, it summarizes the use of formal verification techniques to ascertain the protocol’s correctness and desired qualities. Finally, it provides implementation guidelines and an extensive assessment of the protocol’s effectiveness in the context of Blockchain-based crowdsourcing.
Supervisor(s)	Boris Düdder
External examiner(s)	Pavel Hruby
Date and time	June 7 2024, 13:00-14:00
Room	UP1-2-015

10 June: Sofie Sylvest Aastrup

Name of student(s)	Sofie Sylvest Aastrup
Study Programme	Computer Science Thesis
Title	Fiber Break Segmentation in Composite Materials
Abstract	Composite materials have many applications and are used in buildings, bridges, and vehicles. We wish to understand how damage propagates in these materials. Fiber breaks form within composite materials when tensile loads are applied to them. Segmentations of these fiber breaks are often needed to study the damage propagation within the material, but such segmentations can be hard to obtain in some materials. In this study, we utilize interactive machine learning to create a pipeline for fiber break segmentation in synchrotron radiation computed tomography images. This method does not provide a single model for fiber break segmentation, but a framework that can be used to obtain new models in new scenarios. We use the open-source IML software RootPainter3D [Smith et al., 2021] and adapt it for composite materials. We propose to diminish the cold start problem by including preliminary segmentations and present an algorithm to obtain preliminary segmentations of fiber breaks. Our method was successful and allowed us to obtain fiber break segmentations with a Dice score of up to 0.95 during the interactive annotation procedure. We perform user tests of the pipeline and get a usability SUS score of 0.85. Current models fail to predict fiber break development accurately. We study the hypothesis that the microstructure of the material influences the development, by using the SRCT in-situ images and the labels obtained with IML to predict fiber breakage at later time steps. We did not find conclusive evidence that microstructure influences fiber break formation.
Supervisor(s)	Abraham George Smith
External examiner(s)	Melih Kandemir
Date and time	June 10 2024, 10:30-11:30
Room	diku up1-2-0-15

11 June: Alibek Cholponbaev

Name of student(s)	Alibek Cholponbaev
Study Programme	Computer Science Thesis
Title	AsyncDB: a database system for asynchronous application architectures
Abstract	The current software development paradigm is an ongoing trend towards asynchronous and concurrent programming, focusing on better exploitation of hardware in monolithic architectures and decoupled systems for better scalability and availability. While some domains are inherently straightforward to decouple without impacting the correctness of the system, domains such as finance and e-commerce, which require transactional logic, cannot enjoy the benefits of concurrency to its full extent. The issue lies in inadequate abstractions in the communication approach with databases that do not allow developers to exercise concurrency while maintaining transactional properties of application logic. In this project, we introduce AsyncDB - A database system that supports intra-transactional concurrency, or concurrency of operations within transactions. Developers have a flexible API that allows them to submit queries asynchronously and concurrently and wait for the results of one or many queries with a barrier pattern. Application scenarios that suit AsyncDB involve workflow-like business processes, where multiple steps of the process can be parallelized without the risk of data races within a transaction. AsyncDB provides ACID compliance of transactions, provided the developers do not introduce data races within transactions. We used a real-life system example to model our micro-benchmark due to better fitness for the system. In some environments, such as high I/O latency in multi-tier architectures with decoupled storage layers, intra-transactional concurrency can offer performance benefits regarding latency and throughput of transactions.
Supervisor(s)	Yongluan Zhou
External examiner(s)	Alceste Scalas
Date and time	June 11 2024, 15:00-16:00
Room	UP1 01-2-10

11 June: Sigurður Kalman Oddsson

Name of student(s)	Sigurður Kalman Oddsson
Study Programme	Computer Science Thesis
Title	Mapping Relational Benchmarks to Actor Systems
Abstract	In the last few decades, we have witnessed an unprecedented growth in computing power and a grand rise in accessible cloud infrastructure. The need for an efficient, parallel programming model that can take advantage of these advancements is at an all-time high. The actor model, which is specifically designed for high concurrency and distribution, is an excellent contender. The paradigm has already demonstrated its prowess in a number of fields and has recently caught the interest of database enthusiasts. Despite this, many industry-standard benchmarks for on-line transaction processing (OLTP) systems are tailored to relational schemas. Translating these relational schemas to actor systems, whilst still adhering to relevant specifications, is non-trivial. Questions about the correlation between actor composition and performance remain largely unanswered. In this thesis, I explore strategies to map the schemas of Transaction Processing Performance Council Benchmark C (TPC-C) to actor systems. I present and reason about twelve possible implementation strategies that I follow in two different actor environments: Microsoft Orleans and Snapper. Finally, I benchmark all implementations, demonstrating the efficacy of coarse data partitioning in actors. I conclude by discussing the results and contrasting them with relevant literature.
Supervisor(s)	Yongluan Zhou
External examiner(s)	Alceste Scalas
Date and time	June 11 2024, 16:00-17:00
Room	UP1 01-2-10

12 June: Yixuan Chen and Guodong Shi

Name of student(s)	Yixuan Chen and Guodong Shi
Study Programme	Computer Science Thesis
Title	Toy with their shapes like a puppeteer - An exploration in the potential of elliptical shape-changing mechanisms
Abstract	Shape-changing devices are increasingly popular for their adaptive and responsive abilities. Geometry shapes like circles benefit from parametric design methods and have been applied in designs that perform shape-changing. But compared with circles, ellipses, as the geometry shape with the ability to perform non-symmetric shape-changing, are rarely discussed in this field. Therefore, we came up with the idea of stacked ellipses that can simulate objects such as ergonomic grips, VR proxies, etc., and perform parametric shape changing. There are two models we proposed that could perform such actions, the Shell model and the Tubes model. We have explored the influences of parameters on the shape-changing performance of our two models, both physically and geometrically. We have also verified our theory based on geometric properties of the ellipse, and discussed some real-life scenario applications that are based on our models. Our contribution lies in that we have come up with a new potential solution for manufacturing shape-changing devices that are based on elliptical structures, and proven that elliptical shapechanging is potentially useful in real-life scenarios.
Supervisor(s)	Valkyrie Savage
External examiner(s)	Anca-Simona Horvath
Date and time	June 12 2024, 15:00-16:00
Room	Sigurdsgade 41, conference room 0-11

12 June: Anja Vrecer

Name of student(s)	Anja Vrecer
Study Programme	Computer Science Thesis
Title	Enhancing reliability of Language Models through minimization of uncertaint
Abstract	In recent years, text-generating AI assistants, such as ChatGPT, demonstrated remarkable abilities to quickly find information and answer questions. However, despite their proficiency in generating fluent, human-like text, a significant drawback is their inability to express uncertainty. This often leads to syntactically correct but factually imprecise or inaccurate answers, compromising their reliability. In this project, we focus on an aspect of uncertainty that can be resolved through interaction with users, specifically on ambiguity in question-answering tasks. Through a review of related work, we identify one research tightly connected to this problem and analyze it through re-implementation. The authors propose a framework consisting of prompting and estimation of entropy over user intents, Intent-Sim, which we reproduce and confirm its usefulness using only open-source LLMs. We identify its limitation of using an oracle prompting method, which makes it impossible to be tested on unambiguous examples or used in practice. To address this, we propose a fine-tuned LLM that automatically detects ambiguous questions, asks clarifying questions, and responds based on the user’s clarification. We show that our model detects more than 60% of ambiguous questions in the dataset while incorrectly identifying less than 50% of unambiguous questions as ambiguous. Additionally, we compare the fine-tuned model’s answers to the ones generated by the pretrained model and demonstrate an increase in answer recall for both ambiguous and unambiguous questions. Although the quantitative results are promising, a qualitative analysis of our model’s generated text highlights the need for further improvements in detecting LLMs’ ignorance. Our research underscores the importance of LLMs addressing uncertainty and demonstrates their ability to improve their responses when provided with additional context, laying the foundation for the development of more accurate and reliable LLMs.
Supervisor(s)	Desmond Elliott
External examiner(s)	Claus Witfelt
Date and time	June 12 2024, 16:30-17:30
Room	Hybrid defence (Vibenshuset + Zoom)

12 June: Zhongxing Ren

Name of student(s)	Zhongxing Ren
Study Programme	Computer Science Thesis
Title	Melody Code: A research study on how to make sound triggered by 3D printed objects have higher information density
Abstract	Information requires a medium for transmission, and sound, as a crucial carrier of information, has been extensively studied by researchers in the field of human-computer interaction, particularly in acoustic interaction. However, non-speech acoustic interaction is often less expressive, partly due to the low information density of non-speech sounds. In this paper, we first define information density in the context of non-speech sound. Based on this definition, we propose Melody Code, a technique that encodes information into melodies for information transmission. We detail the implementation of Melody Code using the music box mechanisms, including the mapping of information to notes to create melodies, the fabrication of sound triggering devices to form a melody, and the decoding of the melody by recognizing notes through a trained machine learning model (CNN). This demonstrates that Melody Code enhances the information density of encoded non-speech sounds. Furthermore, we evaluate our three CNNs that are trained respectively by using MFCC, chroma features, and Tonnetz features, observing that the CNN trained by using MFCC achieves an accuracy of over 85% in recognizing seven different notes, which is the best performance among the three. Finally, we present three use cases of Melody Code: (1) melodic ID card; (2) acoustic logic gate; (3) acoustic encryption and decryption.
Supervisor(s)	Daniel Lee Ashbrook
External examiner(s)	Anca-Simona Horvath
Date and time	June 12 2024, 10:00-11:00
Room	Sigurdsgade 0-11

12 June: Radu Taraburca

Name of student(s)	Radu Taraburca
Study Programme	Computer Science Thesis
Title	Objectify: Item recommendation through LLMs and prompt engineering
Abstract	Nowadays, 3D printing has become more common, this can be done in one’s own home, in universities, and in public libraries, but 3D modeling and using a 3D printer to print is still difficult. Previous work like Objectify[1] explored an automated calendar-to-print workflow, but it had issues with the recommendation of real objects to print. It did not always suggest physical objects, sometimes it could suggest abstract things like ”internet connection” or ”relaxing music” and its system ignored data beyond users’ calendars. Other work on prompt engineering explores how to limit the kinds of answers that AIs can give to generate only what is needed for a specific task. We set out to merge these two areas, adding new data streams, exploring prompt engineering techniques, and building a specialized recommendation system that would generate real, physical, and printable 3D objects that would be culturally relevant for users. Also, Objectify[1] has its goal to highlight the ridiculousness of digital fabrication claims rather than attempting to seriously facilitate anything, so our motivation is to investigate what we need to actually do to achieve this. We evaluated the recommendation system to see how many of the generated objects are relevant, both from the point of view of the context and if they are printable physical objects (0.1% of the objects were ambiguous). We had 15 people who participated in the evaluation, 13 people speaking Romanian and English and the remaining 2 people speaking English and another language. They evaluated how good the objects were about their cultural conceptions, but also concerning how printable they were (that is if they were real physical objects that could be printed).Overall, we had satisfactory results, with an average of 3.1 out of 5, for 9 different events (such as Christmas, Halloween or Birthday). Our research can have a potential impact concerning future studies on AI recommendation systems because the more researchers explore the AI-to-physical objects pipeline, the more relevant this study will become.
Supervisor(s)	Valkyrie Savage
External examiner(s)	Anca-Simona Horvath
Date and time	June 12 2024, 16:00-17:00
Room	Sigurdsgade 41, conference room 0-11

13 June: Barnabás Baka

Name of student(s)	Barnabás Baka
Study Programme	Computer Science Thesis
Title	Critical Making
Abstract	This thesis investigates how can a critical design artifact foster conversation about an inclusive learning environment in higher computer science education, particularly focusing on neurodiversity. The study is grounded on theoretical frameworks like Universal Design for Learning and GenderMag and uses Critical Making as the primary methodological approach. A critical design artifact, named Parrot Personas, was created and tested in six workshops, engaging participants interested in computer science and neurodiversity in a discussion about a neuro-inclusive learning environment. During the workshops, qualitative data was collected. This data was analyzed using thematic analysis, revealing the following key themes: Relating to own experiences, Considering multiple perspectives, and Ideating about teaching practices. The findings suggest that Parrot Personas can inspire conversation about neuro-inclusive teaching practices and it can effectively introduce diverse needs to the participants. This thesis contributes to the broader discourse about diversity, equity and inclusion in computer science and about the role of critical design in fostering societal change.
Supervisor(s)	Pernille Bjørn
External examiner(s)	Nanna Inie Strømberg-Derczynski
Date and time	June 13 2024, 09:15-10:15
Room	Sigurdsgade 41

14 June: Yannick Neubert

Name of student(s)	Yannick Neubert
Study Programme	Computer Science Thesis
Title	Shape Priors and Pose Invariance in Neural SDF
Abstract	This thesis presents an approach to disentangle pose and scale from the latent representation of shapes implicitly defined through signed distance fields as well as enforcing a normalization prior on the latent space. Using shape moments of order up to 3, a normalized pose and scale can be defined for arbitrary shapes, which is then used to learn a latent representation invariant under similarity transforms. The normalized latent codes are then combined with a set of pose parameters to reconstruct shapes of arbitrary pose and scale. Experiments conducted on 2-dimensional shape data produce promising initial results but also highlight some shortcomings of the proposed approach. Some potential solutions are discussed together with the potential for use in downstream tasks. Finally, it is demonstrated that the proposed method can be easily generalized to 3D data. All code is made publicly available on GitHub.
Supervisor(s)	Francois Bernard Lauze
External examiner(s)	Rasmus Reinhold Paulsen
Date and time	June 14 2024, 09:30-10:30
Room	Image Section Study office

14 June: Sebastian Paarmann

Name of student(s)	Sebastian Paarmann
Study Programme	Computer Science Thesis
Title	A WebGPU backend for Futhark
Abstract	In this thesis project, we create a new backend for the Futhark compiler. Futhark is a functional data-parallel array programming language whose optimizing compiler can generate efficient GPGPU code. Our backend targets the WebGPU API, enabling Futhark programs to be run in web browsers while still taking advantage of GPU compute capability. As part of the backend, we implement code generation of shaders in the WebGPU Shading Language (WGSL) from the Futhark compiler’s internal kernel representation as well as an appropriate implementation of the host-side runtime system for the WebGPU API. Additionally, we provide tooling to use Futhark’s built-in testing and benchmarking tools with our backend by interacting with a browser to run the programs under test. WebGPU and WGSL have many restrictions not present in the APIs used by Futhark’s existing GPU backends. We devise and implement workarounds for many but not all of them. As a result, our backend can successfully run some Futhark programs, but some other valid programs are unsupported. We also investigate the remaining limitations and discuss potential future solutions.
Supervisor(s)	Troels Henriksen
External examiner(s)	Willard þór Rafnsson
Date and time	June 14 2024, 13:45-14:45
Room	DIKU UP1-2-0-04

14 June: Nick Hauptvogel

Name of student(s)	Nick Hauptvogel
Study Programme	Computer Science Thesis
Title	Bayesian vs. PAC-Bayesian Ensembles
Abstract	Diverse ensembles of deep neural networks (deep ensembles) can profit from the cancellation of errors effect. In other words, errors of ensemble members may average out and the ensemble model achieves better predictive performance than each member. Bayesian neural networks, on the other hand, promise improved predictive uncertainty due to the principled incorporation of epistemic uncertainty into the output by learning a posterior distribution over the model’s parameters. Averaging networks sampled from this posterior yields an approximation of the Bayesian model average (BMA), also referred to as Bayesian ensemble. In this work, it is argued that the BMA of Bayesian neural networks is neither a particularly good way of sampling, nor weighting members of an ensemble when considering improvement of generalization performance. The Bernstein-von Mises theorem applied to the Bayesian posterior (assuming identifiability of the model) shows that the distribution converges towards the maximum likelihood point estimate with growing dataset size. In the limit, the BMA will eventually concentrate on a single model without exploiting ensemble diversity and therefore typically not leverage the cancellation of errors effect. An experimental evaluation of neural network ensembles on four classification datasets showed that state-of-the-art Bayesian approximate inference performance results can often be matched or exceeded by a uniform deep ensemble relying solely on randomization in model initialization and stochastic training. By optimizing the uniform weighting of individual deep ensemble members using a generalization bound from the Probably-Approximately-Correct (PAC) framework, it was possible to maintain diversity and generalization performance of the deep ensemble, while obtaining non-vacuous PAC guarantees on the ensemble classifier’s performance. Essential hereby is a generalization bound that is based on a tandem loss function taking pairwise correlations of member’s predictions into account. The PACBayesian weighting optimization was shown to be especially useful when intermediate snapshots from model training were included, in which case the optimization performed model selection by following a trade-off between single-member performance and ensemble diversity. The price to pay for the improved weighting and the performance guarantees from the PAC-Bayesian generalization bounds is additional hold-out data for the optimization of the weighting.
Supervisor(s)	Christian Igel
External examiner(s)	Jes Frellsen
Date and time	June 14 2024, 11:00-13:00
Room	P1, Øster Voldgade 3

14 June: Nichlas Udengaard

Name of student(s)	Nichlas Udengaard
Study Programme	Computer Science
Title	Computer Science
Abstract	U-Sleep is a software for analysing human sleep data. It is a convolutional neural network segmenting input EEG and EMG time series into discrete sleep stages. The goal of the project is to tailor and apply the U-Sleep sleep analysis system to EGG/EMG/NE data from mice. This data is collected from a laboratory at the Division for Glial Disease and Therapeutics. Raw data is converted into a format suitable for modelling with the U-Sleep system. The adapted variant of U-Sleep is shown to perform on par with sDREAMER – a transformerbased model designed for sleep staging of mice. From application of U-Sleep to data of all three modalities, I found that the inclusion of NE had little to no impact on performance.
Supervisor(s)	Christian Igel
External examiner(s)	Kristoffer Hougaard Madsen
Date and time	June 14 2024 - 10:30 - 11:30
Room	@DIKU

17 June: Louis Marott Normann

Name of student(s)	Louis Marott Normann
Study Programme	Computer Science Thesis
Title	A Deeper Dive: Improving the Partial Evaluation of RL
Abstract	Partial evaluation of RL has been accomplished previously. The purpose of this thesis is to build further on the groundwork done in this article, improving it in both breadth and depth. The thesis shows that a pointwise binding-time analysis can be used to further improve the offline partial evaluator, enabling it to specialize more programs in a non-trivial way. It does this by performing a larger amount of computations at specialization time than previously. The residual programs will be further improved by detecting and removing semantically redundant assertions via abstract interpretation. The quality of the partial evaluator is measured in different ways. First quantitatively, by bench-marking the residual programs against the source programs in all possible configurations of binding-time analyses and post-processes. Then qualitatively, by performing the first inversion projection and seeing if the programs are the same as from the corresponding Futamura projection. The extensions to the RL partial evaluator are all implemented in the Haskell prototype. Keywords: Partial Evaluation, Specialization, Assertion Removal, Inversion Projections, Reversible Computing, Program Inversion, Static Analysis.
Supervisor(s)	Robert Glück
External examiner(s)	Ulrik Pagh Schultz
Date and time	June 17 2024, 13:30-14:30
Room	UP1, 1-0-34

17 June: Mathias Marott Sundram

Name of student(s)	Mathias Marott Sundram
Study Programme	Computer Science Thesis
Title	Private synthetic data using public data
Abstract	Synthetic data presents a privacy-conscious approach to publishing accurate data, ensuring a faithful preservation of the data properties while also protecting the privacy of individuals. The advent of big data in numerous areas across society, underscores the importance of private, synthetic data, and Differential Privacy is a fundamental measure of privacy that helps quantify the degree to which an algorithm treats data in a privacy-preserving manner. However, there are multiple notions of Differential Privacy. The AIM mechanism generates synthetic data with the goal of preserving the marginals of the original data. We propose two modifications of the AIM mechanism called AIMPub and AIM-GEM which utilize publicly available data to improve utility. We provide analyses demonstrating their privacy guarantees using a measure of privacy called Zero-Concentrated Differential Privacy due to its ease of use and convertibility to (ε, δ)-Differential Privacy. To evaluate the performance of these modifications, we use contemporary data from real households in the U.S. as well as historic data from passengers aboard the Titanic. These evaluations show that while a public dataset may not be helpful for generating useful synthetic data for loose privacy guarantees, it does increase utility when strong privacy guarantees are ensured. The benefits of using public data is further exacerbated when the public data comes from a distribution similar to the private data or when measuring lower dimensional data. However, at the same time, due to the sheer magnitude of noise added to the private data, too strong privacy guarantees result in AIMPub and AIM-GEM underperforming baseline models that only rely on public data in their training.
Supervisor(s)	Rasmus Pagh
External examiner(s)	Martin Aumüller
Date and time	June 17 2024, 11:00-12:00
Room	Store UP1

17 June: Nikolai Kjær Nielsen

Name of student(s)	Nikolai Kjær Nielsen
Study Programme	Computer Science Thesis
Title	Semi-supervised multi-modal generative models for structure elucidation of tandem mass spectra
Abstract	Predicting molecule structures from tandem mass spectra is a critical challenge of modern analytical chemistry. Most existing computational methods rely on training the models on fully supervised datasets of molecular structures and mass spectra, which limits the predictive power for these complex modalities. Deep generative modeling is a powerful technique allowing to fit complex joint distributions of multiple data modalities. This study presents C-GF-VAE, a conditional normalizing flow, allowing to reconstruct molecule structures given tandem mass spectra as an input. We demonstrate that using the normalizing flow model, pretrained on large molecule datasets that do not have recorded spectra, improves the quality of multi-modal spectra-to-molecule reconstruction from 0.2299 to 0.4287 mean molecule fingerprint Tanimoto distance. We also provide a thorough evaluation of the different components of C-GF-VAE, including spectra-to-spectra and molecule-to-molecule reconstruction in supervised and semisupervised settings. The presented model is an improvement over the previously reported performance of semi-supervised bi-modal variational autoencoder, raisining the mean reconstruction quality from 0.12 to 0.4287 measured in molecule fingerprint Tanimoto distance. The presented model has a broad range of immediate applications in untargeted metabolomics and can also be applied to other multi-modal problems where the high-quality reconstructions of complex data is required.
Supervisor(s)	Svetlana Kutuzova
External examiner(s)	Jes Frellsen
Date and time	June 17 2024, 10:00-11:00
Room	AI Pioneer Center

18 June: Sune Skaanning Engtorp

Name of student(s)	Sune Skaanning Engtorp
Study Programme	Computer Science Thesis
Title	Implementation of a type-safe generalized syntax-directed editor
Abstract	This thesis investigates the development and implementation of a type-safe, generalized syntax-directed editor. The goal is to create an editor capable of supporting any language, including but not limited to programming languages. The foundation of this work is a proposed generalized editor calculus, which has been encoded in an extended lambda calculus to theoretically establish the capability of building such an editor. This project realizes this calculus in practice by implementing it in the functional programming language Elm, which has already been proven capable of supporting a nongeneric structure editor. The report details the implementation process, encompassing the representation of abstract syntax, source code generation, and the handling of editor expressions. The implementation, written in Elm, features a language specification parser and a source code generator. Currently, the generated editor for a given language can perform basic edits on the abstract syntax tree, such as cursor movement and substitution. Practical examples are provided for subsets of different languages, including C, SQL, and LATEX, demonstrating the editor’s generality. The thesis concludes with proposals for future work, including the implementation of missing editor expressions, handling of context-sensitive syntax and views of the program being edited. The source code of this project is available at following public GitHub repository: https://github.com/Sunese/generalized-editor
Supervisor(s)	Hans Hüttel
External examiner(s)	Mads Rosendahl
Date and time	June 18 2024, 14:15-15:15
Room	SCI-DIKU-HCO-01-0-029

18 June: Thorbjørn Bülow Bringgaard

Name of student(s)	Thorbjørn Bülow Bringgaard
Study Programme	Computer Science Thesis
Title	Efficient Big Integer Arithmetic Using GPGPU
Abstract	Exact big integer arithmetic is a fundamental component of numerous scientific fields, and therefore, required to be efficient. One way to increase efficiency is by acceleration on GPGPU, calling for parallel arithmetic algorithms. This thesis examines parallel algorithms for addition, multiplication, and division, with the premise of fitting in a CUDA block, and consequently, suited for medium-sized big integers. The algorithms are implemented in the high-level languages C++ and Futhark. The addition algorithm boils down to a prefix sum, which runs efficiently in both implementations. The multiplication algorithm is the classical quadratic method, parallelized by orchestrating the convolutions in a way that balances the sequential work per thread and minimizes synchronization. The C++ implementation exhibits good performance, while the Futhark implementation leaves room for improvement. The division algorithm is based on finding multiplicative inverses without leaving the domain of big integers. To do so, a variety of big integer operators and routines are defined, including shifts, comparisons, and signed subtraction using the prefix sum approach of addition. The algorithm parameterizes over the methods involved for big integer arithmetic, and its efficiency directly mirrors the given multiplication method. In addition to conveying the algorithm, as well as adapting it to big integers, supplementary implementations have been produced. This includes a validating and inefficient sequential implementation in C, and a partially validating and semi-efficient parallel implementation in Futhark.
Supervisor(s)	Cosmin Eugen Oancea
External examiner(s)	Mads Rosendahl
Date and time	June 18 2024, 17:00-18:00
Room	SCI-DIKU-HCO-01-0-029 (PLTC meeting room)

18 June: Siyi Wu

Name of student(s)	Siyi Wu
Study Programme	Computer Science Thesis
Title	SDF-TopoNet: A Hybrid Approach for Enhancing Topological Accuracy in Tubular Structure Segmentation
Abstract	Accurate segmentation of tubular structures such as blood vessels, neurons, and pathways has received increasing attention recently. Preserving the topological features of these structures has become particularly important in various applications. One of the main challenges in this area is to find effective loss functions to handle such data. Although several studies have explored some practical loss functions, they often encounter potential problems such as high training costs and degradation of pixel accuracy. Hu et al. proposed a topological loss based on Betti error and persistent hom*ology. Based on their research, we propose further improvements to improve the segmentation performance and reduce the training cost. Our approach incorporates a pre-training and fine-tuning strategy based on the weighted sum of a pixelbased loss function (e.g., MSE) and the topological loss as the loss function for model training. Specifically, we use the signed distance function (SDF) as a prior task in the pre-training stage to enable the model to learn the topological structure information of the image and use a dynamic threshold layer and topological loss in the fine-tuning stage to ensure the topological accuracy of the segmentation. Since the primary source of training cost is the computation of topology loss, using topology loss only in the fine-tuning phase can significantly reduce the training cost. Evaluations of the DRIVE, CREMI, and Pancreas datasets show that our method performs well, especially in the DRIVE dataset.
Supervisor(s)	Jon Sporring
External examiner(s)	Rasmus Reinhold Paulsen
Date and time	June 18 2024, 09:15-10:10
Room	SCI-DIKU-UP1-1-1-N116A

18 June: Asta Feodora Sjöberg Burhenne, Lucas Østergaard Jarmer, and Laufey Karitas Ólafsdóttir

Name of student(s)	Asta Feodora Sjöberg Burhenne, Lucas Østergaard Jarmer, and Laufey Karitas Ólafsdóttir
Study Programme	Computer Science Thesis
Title	Root analysis using geometric and topological descriptors
Abstract	Topological data analysis (TDA) offer promising avenues for the classification and analysis of root systems, yet their application and efficacy in the field of root analysis remain unexplored. In this study, we investigate the performance of various topological descriptors in classifying root systems based on persistence diagrams derived from bifurcation points. Our analysis reveals intriguing findings regarding the impact that field of view has on the input data to TDA, noise in persistence diagrams, and the distribution of hom*ology groups employed in descriptors for classification. We observe that models using descriptors incorporating a greater set of points from the persistence diagrams often outperform those that do not. Moreover, our study highlights the resilience of the Log N-histogram descriptor in capturing the unique distribution of bifurcation points for each species, with the latter demonstrating superior performance across multiple datasets and models. Furthermore, we challenge traditional interpretations of noise in persistence diagrams, suggesting that seemingly noisy points may contain valuable information for root classification. We also investigate discrepancies between real and synthetic data, emphasizing the importance of standardizing segmentation processes to minimize uncertainties and enhance model robustness. Finally, we identify key challenges and propose future research directions to refine topology-based classification methods and broaden their applicability in root analysis. Overall, our study contributes to advancing the understanding and utilization of topology-based techniques for root classification and offers insights for future research in this field.
Supervisor(s)	Jon Sporring
External examiner(s)	Rasmus Reinhold Paulsen
Date and time	June 18 2024, 10:20-12:15
Room	DIKU, UP1-1-1-N116A

18 June: Anders Lietzen Holst

Name of student(s)	Anders Lietzen Holst
Study Programme	Computer Science Thesis
Title	Optimizing Tensor Contractions for GPU Execution in Futhark
Abstract	The tensor contraction, a higher-dimensional analogue to the matrix multiplication, is a widely used basic building block that is not only suitable for efficient GPU execution due to its highly parallel nature, but also ripe for locality of reference optimizations due to a high degree of data reuse. Futhark, a highly optimizing compiler targeting GPU hardware, generates efficient 2D block/register tiled code for GEMM-like programs, but does not apply the transformation to arbitrary contractions. With an offset in tensor contraction and GPU code transformation theory, we detail how we successfully implemented block/register tiling of arbitrary tensor contractions into the Futhark compiler, using generic LMAD copies to stage input data and a number of other minor optimizations, and describe some of the problems overcome in doing so as well as the roadblocks and limitations which unfortunately remain. Using a small benchmarking plan we examine the practical benefits of the transformation, using a hand-written prototype kernel and a GPU code generator for high-performance tensor contractions as points of reference – the implementation performs well, reaching between 68% and 98% of the reference programs, but the opportunities for optimization are many. Finally, we present some ideas for future work in both improving and generalizing the implementation.
Supervisor(s)	Cosmin Eugen Oancea
External examiner(s)	Mads Rosendahl
Date and time	June 18 2024, 16:00-17:00
Room	SCI-DIKU-HCO-01-0-029 (PLTC meeting room)

18 June: Aske Rory Ching and Niklas Joost Borge

Name of student(s)	Aske Rory Ching and Niklas Joost Borge
Study Programme	Computer Science Thesis
Title	Towards adversarially robust dataset compression
Abstract	As the need for larger datasets in state-of-the-art machine learning increases, so does the costs of storing and using these datasets. Both economically and environmentally. This has spawned an interest in dataset compression for faster and cheaper training. These methods create small but information rich datasets by smart selection or condensation techniques. Many condensation methods have shown promising results in benign settings. However, the adversarial robustness of these datasets has not been well studied. In this project, we take a closer look at adversarial training on compressed datasets. We show that these do not respond well to adversarial training by default. We further investigate ways to improve the robustness potential of state-of-the-art dataset condensation methods. Here, we found that Gradient Matching shows potential for improvements, although at the expense of benign accuracy and an increased computational cost. Alternatively, we propose a method of latent space feature selection to create an adversarially trainable synthetic dataset. Furthermore, we show that coreset selection can be improved by selecting data points in latent space instead of data space. Our results suggest that feature selection in latent space is promising for adversarially trainable dataset compression
Supervisor(s)	Raghavendra Selvan
External examiner(s)	Lee Herluf Lund Lassen
Date and time	June 18 2024, 10:00-12:00
Room	DIKU UP1-2-0-04

19 June: Lennart Mischnaewski

Name of student(s)	Lennart Mischnaewski
Study Programme	Computer Science Thesis
Title	Rating-Aware Sequential Recommendation Systems using Generative Retrieval and Semantic Encoding
Abstract	Given sets of users, items, and their interactions, recommendation systems aim to provide users with personalized selections of items that adhere to criteria such as user preferences or fairness metrics. To create meaningful suggestions, the recommendation engine may use a multitude of signals, including explicit signals such as ratings, implicit signals such as the user’s actions, or information about the users and items such as their country of residence or the item’s country of origin. Such signals provide recommendation models with information about which items the users prefer or dislike and can be used to tailor future recommendations. Recently, generative models have been used to model data of a sequential nature, such as natural language or time-series data, but have also been applied to the domain of recommendation by encoding individual sessions of users interacting with items as sequences. Given a sequence of items a user has interacted with, the generative model is used to generate the item the user is most likely to interact with next. Further, recent work has investigated the use of semantic encoding for sessionbased recommendation. Before using generative models, a second model is used to create a semantic encoding for each item that can then be used with standard generative sequential models. We introduce two methods of injecting rating information into such generative sequential models. We evaluate the effect of making the model rating-aware and study the impact on different ratings. We observe that while the ratings can influence the model, the performance of the rating-aware model decreases compared to a rating-agnostic model. We further investigate the effect of different methods of injecting rating information and find that token-based methods outperform embeddingbased methods. Finally, we show that rating-aware methods can be used to influence the model’s recommendations towards items with specific ratings, providing a valuable property for real-world systems.
Supervisor(s)	Christina Lioma
External examiner(s)	Konstantinos Manikas
Date and time	June 19 2024, 10:00-11:00
Room	UP1 room 1.2.26

20 June: Claudia Ann Hinkle

Name of student(s)	Claudia Ann Hinkle
Study Programme	Computer Science Thesis
Title	Co-Designing Pacing Technologies for People with Energy-Limiting Conditions
Abstract	People with chronic illnesses affecting energy levels such as ME/CFS and Long Covid need to limit their activity level to avoid “post exertion malaise” (PEM), a condition in which those who overexert themselves experience even more intense symptoms for days or weeks after the exertion. This practice of keeping activity levels within certain limits is known as “pacing”. There is an opportunity for technology to help people with this process, but conducting research with this population can be difficult given their limited and unpredictable energy levels. This work explores how co-design of pacing technologies can be conducted responsibly, and what we can learn from codesigning about how these tools should be designed. This is done through a 5 week Asynchronous Remote Community study utilizing various co-design techniques, followed by a design ideation process that explores how the learnings from the study could be put into action in technology design.
Supervisor(s)	Sarah Frances Homewood
External examiner(s)	Signe Louise Yndigegn
Date and time	June 20 2024, 10:15-11:15
Room	2-03 at Sigurdsgade 41, 2200, KBH N

20 June: Yaqi Zhou

Name of student(s)	Yaqi Zhou
Study Programme	Computer Science Thesis
Title	Optimal external resizable arrays
Abstract	Resizable arrays are crucial in managing big data across various industries. The traditional implementation may leave up to half of the allocated space unused, which represents a significant inefficiency in the context of big data. Tarjan and Zwick’s implementation, which consumes N + O(N 1/r) memory cells for maintaining a resizable array of N items and temporarily uses N + O(N 1−1/r) memory cells for any integer r > 2, is designed for internal memory. This design assumes that each item and each pointer take one memory cell. Access operations take O(1) worst-case time, and insert and delete operations take O(r) amortized time. They proved the implementation is optimal. In scenarios where the resizable array is too large, external storage is used, which leads to suboptimal performance with a straightforward application of their implementation to the external memory model. Our proposed implementation adapts to the external memory model, consuming aN + O((abN) 1/r) bits of space to maintain the array and aN + O((abN) 1−1/r) bits temporarily during some insert operations. Here, N is the current number of items, each item takes a bits, and each pointer b bits. Accessing an item by its index takes 2+o(r) time in the worst case, and the amortized time cost of insert operations is r/B + o(1). We demonstrate that any data structure that consumes aN + O((abN) 1/r) bits to maintain the array must use aN + Ω((abN) 1−1/r) bits at some point. Furthermore, for a wide class of resizable arrays, the amortized time cost of insert operations cannot be asymptotically lower than Ω(r/B), where the external memory model allows accessing and copying B items at once.
Supervisor(s)	Mingmou Liu
External examiner(s)	Rüdiger Riko Jacob
Date and time	June 20 2024, 13:00-14:00
Room	DIKU UP1-1-1-N116B

21 June: Oliver Christopher Juhl, Matthias Schultz Busch, Frederik Meyer Møller-Jørgensen, and Jonathan Gram Stenkilde

Name of student(s)	Oliver Christopher Juhl, Matthias Schultz Busch, Frederik Meyer Møller-Jørgensen, and Jonathan Gram Stenkilde
Study Programme	Computer Science Thesis
Title	Automatic Text Retrieval & Parsing of Digital Herbarium Sheets
Abstract	Organizations such as the Natural History Museum of Denmark (NHMD) specialize in the collection of herbarium sheet specimens for the purpose of documenting their samples for the future, according to a scientifically defined nomenclature. However, the process of digitizing such plant specimens into a database is currently dominated by manual labor, as no fully automatic system exists yet for that task. This calls for a more efficient solution, which would be highly beneficial for history and botanical museums. Previous research on this problem has mainly been focused on individual parts of a system, but research on a combination of each technique into a fully functional end-to-end pipeline is lacking. Therefore, we present a pipeline that automatically extracts text from the specimen images. We utilize a pre-trained YOLO model for object detection of institutional and annotation labels in the specimen images. We then make use of pre-trained CRAFT and CRNN models to perform optical character recognition. Lastly, named entity recognition is used with a BERT model to classify the text into categories like SPECIMEN, LOCATION, and more, before a series of post-processing strategies are applied to improve the semantics of the final database. On our own ground truth of 100 machine-written NHMD label images, our pipeline is able to extract the image text with 78,71% accuracy using fuzzy string similarity. Other key findings include that our system struggles the most with extracting legit and determinant names, and each component performs well enough when isolated, but our overall scores take a hit when the full pipeline performance is tested. With our findings, we conclude that how the pipeline’s components are connected and interact is essential for building the system, rather than focusing on perfecting each individual component. Furthermore, the results also depend on meticulous assumptions and interpretations about how a plant specimen should be processed and stored.
Supervisor(s)	Kim Steenstrup Pedersen
External examiner(s)	Rasmus Reinhold Paulsen
Date and time	21 June 2024, 11:15-12:45
Room	Konferencelokalet på Zoologisk Museum, UP 15

21 June: Alexis Jean René Dumélié

Name of student(s)	Alexis Jean René Dumélié
Study Programme	Computer Science Thesis
Title	Technology Induced Lucid Dreams (TILDs)
Abstract	A lucid dream is a dream where the dreamer is aware they are dreaming. Lucid Dreaming (LD) has a wide variety of practical applications, such as its potential as a psychological tool for overcoming phobias, addictions, or nightmares, as a psychophysiological tool for the refinement of motor-skills, and is the ultimate form of immersive experience. Finding simple and reliable ways to induce LD has been one of the forefront challenges in LD research. We highlight the potential of low-cost wearable technology as a practical tool for facilitating future LD research, and human-computer dream interactions, on a wider scale. In this autoethnographic work, we demonstrated a way to use a low-cost wearable device to perform Lucid Dreaming Incubation. We designed the Technology Induced Lucid Dream (TILD) protocol to provide midsleep tactile stimulation during REM sleep, as estimated by heuristics, and, over three experiments, each spanning five nights, we successfully incubated LDs, and increased the incidence of Lucid Dreaming in a non-laboratory environment.
Supervisor(s)	Valkyrie Savage
External examiner(s)	Jakob Eg Larsen
Date and time	June 21 2024, 10:00-11:00
Room	Sigurdsgade 41, room 0-11

21 June: Thomas Jackson Terry

Name of student(s)	Thomas Jackson Terry
Study Programme	Computer Science Thesi
Title	Distilling Reliance from Trust: A Survey of Explainable AI Research
Abstract	Explainable artificial intelligence research often targets trust as a goal for XAI. But the case for trust in AI may not be so straightforward. I conducted a survey of 43 recent XAI research articles to determine their stance toward trust. Finding a lack of consensus, I propose that another concept - reliance - could unite research efforts and provide a more achievable goal.
Supervisor(s)	Irina Alex Shklovski
External examiner(s)	Niels van Berkel
Date and time	June 21 2024, 13:00-14:00
Room	https://ucph-ku.zoom.us/j/6141772694

25 June: Xuanlang Zhao

Name of student(s)	Xuanlang Zhao
Study Programme	Computer Science Thesis
Title	Shortest Path in Three Dimensions
Abstract	We discuss the problem of computing shortest obstacle-avoiding paths under an Lp metric (e.g. an Euclidean metric), and we present three algorithms for this problem. Our first algorithm is a fully polynomial approximate algorithm for the problem. The second algorithm can compute an L1-shortest path between two points on or above a polyhedral terrain. Our third algorithm is a polynomial-time exact algorithm to calculate the shortest path between some stacked flat obstacles.
Supervisor(s)	Mikkel Vind Abrahamsen
External examiner(s)	Nutan Limaye
Date and time	June 25 2024, 09:30-10:30
Room	HCØ Aud. 7

26 June: Aleksas Prelgauskis

Name of student(s)	Aleksas Prelgauskis
Study Programme	Computer Science Thesis
Title	Investigating the Effect of Outlier Removal by Process Discovery Algorithms
Abstract	Many advanced process discovery algorithms have built-in mechanisms to exclude certain outlier traces from logs in order to simplify the resulting process models. The prevailing justification is that these uncommon traces are merely noise in the data, but this claim lacks concrete evidence. This could raise fairness issues, as when mining processes involve human participants, these outliers may very well represent minorities that, through their removal from the training data, are marginalized by the resulting process models. In this paper, we explore how various process discovery algorithms determine which cases are outliers and how they treat protected groups in practice. We designed an experiment to obtain outlier traces from various process discovery algorithms and examined the overlap of these outliers, including the representation of protected cases among them. Furthermore, we evaluated each model’s outliers split between protected and unprotected traces against the original split in the event logs. Our findings reveal that different algorithms do not consistently exclude the same traces as outliers. Moreover, we observed biases in different process discovery algorithms, with some favoring and others discriminating against protected groups.
Supervisor(s)	Tijs Slaats
External examiner(s)	Søren Debois
Date and time	June 26 2024, 09:00-10:00
Room	SCI-DIKU-sigurdsgade-0-11

26 June: Bernard Legay Halfeld Ferrari Alves

Name of student(s)	Bernard Legay Halfeld Ferrari Alves
Study Programme	Computer Science Thesis
Title	Compiling Hermes to RSSA
Abstract	Reversible programming languages have been a focus of research for more than a decade, mostly due to the work of Glück, Yokoyama, Mogensen, and many others. In this paper, we report about our recent activities to compile code written in the reversible language Hermes to reversible static-single-assignment form RSSA. We will also discuss how we wrote an interpreter for an extended version of RSSA using a type system. Our compiler allows the execution of simple Hermes programs and provides the basis for further optimizations. Keywords: Reversible computing · Reversible programming languages · Hermes · Reversible static-single-assignment . Encryption
Supervisor(s)	Torben Ægidius Mogensen
External examiner(s)	Morten Rhiger
Date and time	June 26 2024, 10:00-12:00
Room	772-01-0-S29 - PLTC meeting room

27 June: Stefan Kröll Rasmussen

Name of student(s)	Stefan Kröll Rasmussen
Study Programme	Computer Science Thesis
Title	Building Mobile Robot Platform for Open Space Interaction
Abstract	This thesis aims to design, develop, and evaluate a mobile robot platform that aims to ensure a seamless Human-Robot Interaction (HRI) in open human-occupied spaces. The project builds on a study conducted by Cornell University in 2023 [1], which used a Wizard-of-Oz experiment to explore Human-Robot Interactions in open spaces. The main goal is to utilize commercially available low-cost components to design and build a reliable and low-cost robotic platform to address the challenge of autonomous navigation and Human-Robot Interaction in unpredictable environments. The hardware includes a hoverboard-based motor system, ODrive motor controllers, a Raspberry Pi 5, and various sensors, including the RealSense D415 camera. The robot’s software stack is built on the Robot Operating System, facilitating robust communication and control mechanisms. Key functionalities such as human detection and following are implemented using computer vision techniques, like Histogram of Oriented Gradients (HOG) with Support Vector Machine (SVM) classifiers, and deep learning models like You Only Look Once version 8 (YoloV8). The robot platform has been thoroughly tested and validated in terms of basic motion, motor precision, durability, and load capacity. Using ODrive Motor Controllers allowed configurable and precise control, and the utilization of hoverboard hardware allows the robot to carry weight in excess of 80 kilograms. Field studies were performed in real-world environments to evaluate the Human-Robot Interaction and acceptance and it was observed that the participants interacted with the robot both functionally and socially. Ethical considerations and safety protocols for public deployment are discussed to ensure responsible integration of robots into human environments. Possible future work includes enhancing user experience through the addition of visual and audio feedback, developing and testing advanced autonomy algorithms, and exploring the use of multirobot collaboration. All code, configurations files and a demonstration video can be found on github at: https://github.com/SKroell/TERRA-bot
Supervisor(s)	Hang Yin
External examiner(s)	Jeppe Revall Frisvad
Date and time	June 27 2024, 15:30-16:30
Room	Image Hot Room – Image Section, UP1

28 June: Emil Høghsgaard Hansen and Bruce Isiah Thomas Esplago

Name of student(s)	Emil Høghsgaard Hansen and Bruce Isiah Thomas Esplago
Study Programme	Computer Science Thesis
Title	Leveraging Computer Vision for Housing Price Estimation
Abstract	Automated models have been widely used in house price assessment. Many models have been developed with numerical and categorical features (size, location, etc) as the primary prediction indicator. In this study, we examine whether and how images in the form of floor plans can be used as a supplement to feature-based models. In connection with this, we have developed a net-scraper that can retrieve relevant data and floor plans. Based on the collected data, we have investigated the efficiency and precision of various features-based models. We have developed and tested multiple CNN models that analyze and predict prices based on floor plans, as well as combined ensemble models that use these two types of data. Our experiment shows that the features-based models perform better than the models that only use floor plans. Furthermore, we find that especially size and location have a huge influence on the price. This is exactly what our CNN models find difficult to map, and is also the primary reason for their inaccuracy. The combination of image data and features can improve predictions, although not to a great extent.
Supervisor(s)	Bulat Ibragimov
External examiner(s)	Veronika Vladimirovna Cheplygina
Date and time	June 28 2024, 11:00-12:00
Room	SCI-DIKU-UP1-1-1-N116B

28 June: Dong She

Name of student(s)	Dong She
Study Programme	Computer Science Thesis
Title	Blood Vessel Mesh Generation
Abstract	This master’s thesis explores vessel mesh generation and blood Computational Fluid Dynamics (CFD) simulation in depth. Accurate modeling of blood vessels is crucial for understanding hemodynamics and diagnosing cardiovascular diseases. The primary objective of this study is to develop robust methods for generating high-quality vessel meshes and to utilize these meshes for simulating blood flow dynamics. The study begins by reviewing existing research on the 3D reconstruction of blood vessels, highlighting the limitations of current approaches. It then details our approach to surface mesh generation using two methods: convolution surfaces and neural implicit methods. Convolution surfaces provide smooth and continuous vessel representations, while neural implicit methods offer flexibility and precision in complex geometries. Following surface mesh generation, we build the volume mesh through tetrahedralization, ensuring smoothness and high quality through rigorous geometry analysis. CFD experiments are conducted on the vessel mesh to simulate blood flow dynamics, revealing critical insights into flow patterns and potential areas of pathological concern. The results demonstrate the efficacy of our methods in producing accurate and reliable vessel meshes suitable for CFD simulations. Key findings include significant improvements in mesh quality and simulation accuracy compared to traditional methods. In conclusion, this thesis presents effective techniques for vessel mesh generation and blood flow simulation, contributing valuable tools for medical research and clinical applications. Future work will explore the integration of these methods with real-time imaging data and the development of more advanced simulation models.
Supervisor(s)	Kenny Erleben
External examiner(s)	Jakob Andreas Bærentzen
Date and time	June 28 2024, 13:00-14:30
Room	UP1, 3:2:20

28 June: Jakob Flinck Sheye and Hristo Atanasov Georgiev

Name of student(s)	Jakob Flinck Sheye and Hristo Atanasov Georgiev
Study Programme	Computer Science Thesis
Title	Generative Models for Children's Head Motion in Resting State Functional Magnetic Resonance Imaging
Abstract	Motion artifacts are a major obstacle in MRI image acquisition, as they could render the clinical analysis of the scans futile. Researchers have developed several approaches to mitigate the impact of motion artifacts on scans. One such approach is retrospective motion correction, often performed by a deep learning model. Those models require large datasets of motion-free and motion-corrupted scans, so they are often challenging to train due to insufficient data. Furthermore, the current state-of-the-art methods of creating artificial motion artifacts rely on random transformations in the scan’s spatial domain. A potential solution to this problem is the employment of synthetic data. Given a realistic synthetic motion curve, one could introduce motion artifacts corresponding to actual movement. Our project’s primary goal is to evaluate the ability of several generative models to produce realistic synthetic motion curves. For this purpose, we train the models on the motion curves provided by the ABCD study - a large US study on adolescent brain health and development. We employ various evaluation techniques, including a novel protocol for time-series benchmarking, to gauge the models’ ability to generate realistic sinusoidal waves and motion curves. We discovered that Fourier Flow does an excellent job of capturing the real distribution’s range, shape, and correlations. However, the synthetic data does not successfully recreate abrupt jumps in the motion curves, often resulting from involuntary actions, such as coughing. Nonetheless, the results are promising and could lay the foundations for further research in the field.
Supervisor(s)	Melanie Ganz-Benjaminsen
External examiner(s)	Kristoffer Hougaard Madsen
Date and time	June 28 2024, 11:00-13:00
Room	SCI-DIKU-UP1-2-0-04

28 June: Mads Daugaard and Emil Christoffer Riis-Jacobsen

Name of student(s)	Mads Daugaard and Emil Christoffer Riis-Jacobsen
Study Programme	Computer Science Thesis
Title	Simulating Head Motion in MRI: A Silicone Phantom Approach with Machine Learning Integration
Abstract	Due to long image acquisition times in magnetic resonance imaging (MRI), it is prone to image artefacts caused by patient motion, which is especially prevalent for children, potentially resulting in unsuccessful diagnoses. As a result of this, many MRI examinations need to be repeated, occasionally requiring the use of general anaesthesia to limit patient motion, both being a costly process. Consequently, research in motion correction methods has become of great interest. In this thesis, we propose the use of a 6 degrees of freedom cable-suspended parallel robotics (CSPR) system with machine learning integration for inverse kinematics, used to induce head motion of a silicone-based phantom head inside an MR scanner. The system aims to provide a reliable and reproducible method for motion simulation in MRI to facilitate training and validation of motion correction methods. Through an iterative design and manufacturing process, we develop a method for 3D printing and casting a silicone-based phantom using hard shell moulds, finding that our method shows great promise but that better equipment may be required to eliminate all MR artefacts. A CSPR system for controlling the phantom head inside an MR head coil is developed and shown to be able to achieve the motion ranges of ten children. Using feedforward neural networks for approximation of inverse kinematics, we show through analysis that the system is capable of replicating prevalent forms of motion, such as nodding and translation in and out of the head coil. However, performance on low amplitude motion for different neural networks varied due to data sensitivity. Using a specific set of cable attachments with the limited cables of the system, some large amplitude motion were found to be difficult to achieve in isolation. Physical system changes and high-frequency tracking data may thus be necessary to enhance simulation accuracy and realism for motion correction methods in MRI.
Supervisor(s)	Melanie Ganz-Benjaminsen
External examiner(s)	Kristoffer Hougaard Madsen
Date and time	June 28 2024. 08:30-09:30
Room	DIKU-UP1-2-0-04

28 June: Runfei Wu

Name of student(s)	Runfei Wu
Study Programme	Computer Science Thesis
Title	Towards High-Fidelity Simulation of the Human Colon Modeling A Deformable Tube
Abstract	Colorectal cancer is among the most prevalent cancers worldwide, with colonoscopy essential for early diagnosis. However, a shortage of experienced professionals necessitates effective simulation-based training. High-fidelity simulations for surgical training face challenges, including the need for real-time feedback and the computational demands of accurate simulations. Common challenges involve simulating the elasticity and deformation of colon tissue and managing self-collision. The colon’s unique structural properties, such as the extreme ratio between its thickness and overall size, require complex mesh designs. This research aims to develop a softbody simulator that accurately models the human colon as a hollow, thin, deformable tube. A key focus is devising methods that use a reasonably complex mesh to simulate accurate contact and elasticity. We introduced an embedded mesh technique where the volume mesh handles elasticity, and the embedded surface mesh generates accurate contact points, reducing the need for highly complex meshes. This project also developed techniques such as a dense representation for the Jacobian matrix and a parallel Jacobi contact solver. This project offers valuable insights into softbody and contact simulation, providing a solution for contact simulation of extremely thin structures. It identifies potential issues and suggests directions for future improvements. While immediate practical impacts on medical practices are not quantified, the findings lay a foundation for further research and development. Keywords: high-fidelity simulation, human colon, deformable tube, biomedical modeling, softbody simulation, embedded mesh
Supervisor(s)	Kenny Erleben
External examiner(s)	Jakob Andreas Bærentzen
Date and time	June 28 2024, 14:30-16:00
Room	UP1 3:2:20

28 June: Christoph Alexander Prehn

Name of student(s)	Christoph Alexander Prehn
Study Programme	Computer Science Thesis
Title	Flying drones autonomously with Hierarchical Reinforcement Learning. A Study of Hierarchical Reinforcement Learning for autonomous drones
Abstract	Flying drones autonomously with Hierarchical Reinforcement Learning Autonomous drone flight is one of the most complex task within robotics, while it at the same time offers a multitude of applications in the real world. From inspecting difficult or dangerous to access areas to finding people in avalanche regions, autonomous drones offer the potential to make various task safer and more efficient. Even though Reinforcement Learning and Control Theory have shown impressive performance on individual task, they require meticulous modelling of the environment or extensive training. Simultaneously they struggle to adapt to perturbations in the environment or transfer to new task. Hierarchical Reinforcement Learning enables agents to utilize temporal abstraction and hierarchical structures similarly to the thought process of humans. In previous research, this has shown improved performance and strengthened robustness against environment perturbations in simple tasks. For complex task, like autonomous drone flight, the research is limited. In this thesis, we apply Hierarchical Reinforcement learning to the task of autonomous drone flight to examine the performance, Adaptability against perturbations and transferability of Hierarchical Reinforcement on complex task. To facilitate the necessary experiments, we create three drone flight task extending existing simulation environments and develop a framework, which enables learning on multiple task simultaneously. Additionally, we provide an up-to-date implementation for the Hierarchical Reinforcement Learning variant of the state-of-the-art Reinforcement Learning algorithm for autonomous drone flight. Our experiments show that Hierarchical Reinforcement Learning agent, can indeed improve the adaptability to perturbations in the environment. Furthermore, we demonstrate, that the Hierarchical Reinforcement Learning agent can learn better multiple task simultaneously than its Reinforcement Learning counterpart. On the other hand, we showcase that for single tasks without any perturbations standard Reinforcement Learning outperforms Hierarchical Reinforcement Learning.
Supervisor(s)	Stefan Sommer
External examiner(s)	Dan Witzner Hansen
Date and time	June 28 2024, 13:00-14:00
Room	Mødelokale A, Østervoldgade 3

28 June: Jonas Hagel and Marcus Frostholm

Name of student(s)	Jonas Hagel and Marcus Frostholm
Study Programme	Computer Science Thesis
Title	Ensuring safety using Barrier functions for collision detection and avoidance between multiple drones
Abstract	We construct a barrier function and explain how to utilise the barrier function for collision detection and avoidance in a drone swarm. We focus on modifying trajectories to ensure safe flight through a hoop. The collision potentials are computed by integrating line segments of different trajectories, and gradients of barrier values are utilised for modifying trajectories. We test our implementation in the Webots simulator by mimicking the Crazyflie 2.1 platform. The results shows success with three drones, flying in collisionfree trajectories in order to guarantee safety for the drones and complete the task of flying through a hoop. This highlights the use of barrier functions in producing safe and reliable trajectories, showing their potential in achieving collision detection and avoidance. Efforts were made to implement a physical experiment, combining software and hardware, setting the foundation for future work.
Supervisor(s)	Kenny Erleben
External examiner(s)	Jakob Andreas Bærentzen
Date and time	June 28 2024, 10:30-12:00
Room	UP1 3:2:20

28 June: Marie Elkjær Rødsgaard

Name of student(s)	Marie Elkjær Rødsgaard
Study Programme	Computer Science Thesis
Title	HybridGNet: Exploring medical image segmentation and shapes
Abstract	The HybridGNet is a new segmentation method that explores how to improve anatomical image segmentation in the medical field. The HybridGNet uses landmark-based segmentation to output a segmentation graph, unlike a traditional segmentation U-Net which outputs a pixel-level segmentation. This project investigates what shape models neural networks such as U-Net are and how this then relates to the HybridGNet method. Several experiments have been done with the segmentation methods. The data consists of synthetic data of smileys and X-ray datasets containing lungs. The results of these experiments show that the HybridGNet is a viable alternative to the U-Net when segmenting images where shape is the defining factor.
Supervisor(s)	Erik Bjørnager Dam
External examiner(s)	Dan Witzner Hansen
Date and time	June 28 2024, 11:00-12:00
Room	Observatoriet på Østre Voldgade (Pioneer centret)

28 June: Tim Ruschke

Name of student(s)	Tim Ruschke
Study Programme	Computer Science Thesis
Title	Guided Synthesis of Labeled Brain MRI Data Using Latent Diffusion Models for Segmentation of Enlarged Ventricles
Abstract	Scarcity, inhom*ogeneity, and privacy are common obstacles for deep learning in a medical context. While synthetic data appears as an ostensibly easy solution, research has shown time and time again that training with synthetic data fails to perform as well as with real data. In the context of ventricular segmentation in brain MRI images, we present a proof of concept for the successful use of synthetic data in training segmentation models. State of the art segmentation models often struggle to accurately segment patients suffering from enlarged ventricles due to afflictions like normal pressure hydrocephalus. We show that synthetic data can serve to address this by customizing the distribution of ventricular volume in the training set. We employ two latent diffusion models, a mask generator and a corresponding spade image generator, to create labeled 3D brain MRI images. Conditioning the mask generator on ventricular volume in combination with classifier-free guidance enables us to control the ventricular volume distribution of the generated images. We test the synthetic data using three nnU-Net segmentation models trained on a real, augmented and entirely synthetic dataset respectively, where the synthetic data contains a more uniform spread of ventricular volume compared to the real data. The resulting models are tested on a dataset of patients with enlarged ventricles. We also measure predicted ventricular volume of binary segmentation masks, where the real model has a a mean absolute error (MAE) of 16.32 ± 12.28mL, while the models trained on synthetic and augmented data with 9.67 ± 4.69mL and 8.89 ± 6.15mL respectively achieve much better results. Both also outperform state of the art model SynthSeg, which due to severe outliers has a MAE of 10.49 ± 17.52mL with notably high standard deviation. The models trained with synthetic and augmented data also slightly outperform both SynthSeg and the model trained on real data in Dice score. The comparative success of synthetic data in this work may be surprising against a backdrop of research where it consistently performs worse. As expert evaluation shows our synthetic data, while realistic, does not match the quality of real data, we primarily attribute its success to the more even distribution of ventricular volume in the synthetic data.
Supervisor(s)	Martin Nørgaard
External examiner(s)	Veronika Vladimirovna Cheplygina
Date and time	June 28 2024, 13:00-14:00
Room	SCI-DIKU-UP1-2-0-06 og SCI-DIKU-UP1-2-0-04

28 June: Nicklas Boserup

Name of student(s)	Nicklas Boserup
Study Programme	Computer Science Thesis
Title	Score Learning for Parameter Inference in Stochastic Shape Evolutions
Abstract	In the field of computational evolutionary morphometry, inference of most likely parameters of assumed underlying stochastic processes is of interest; for gaining insights into the stochastic nature of evolution, as well as for potentially establishing relationships between observations. This work proposes techniques for performing parameter inference through simulated likelihood estimation, and illustrates the applicability of the proposed methods for both desires. Building upon the pioneering work of Heng et al. (2022), this thesis shows how score matching may be used to establish conditional stochastic flows of shapes. The shape domain introduces numerical complexities which are tackled by deriving a novel numerically stable objective function. It is further shown how samples from approximated conditioned processes may be used as proposals in an importance sampler capable of performing likelihood estimation on shape flows. Through illustrative examples it is shown how the proposed methodology for parameter inference works; it is explored how to successfully apply it; and experiments showcasing the usefulness of the techniques are presented. Along the way, this text aims to serve as a self-contained, reasonably accessible introduction to this highly technical field.
Supervisor(s)	Stefan Sommer
External examiner(s)	Dan Witzner Hansen
Date and time	June 28 2024, 14:00-15:00
Room	Mødelokale A, Østervoldgade 3

28 June: Ying Liu

Name of student(s)	Ying Liu
Study Programme	Computer Science Thesis
Title	Exploration of Self-Supervised Learning Methods for Longitudinal Image Analysis
Abstract	The advent of self-supervised learning has alleviated the bottleneck of lacking annotated datasets in supervised learning for medical image analysis. This research aims to explore whether two relatively recent self-supervised learning methods, BYOL and SimSIAM, can train a 3D model suitable for longitudinal analysis. In this study, the LIDC-IDRI dataset is used to pre-train two 3D ResNet-50 models using BYOL and SimSIAM, both of which converge and prevent representation collapse. Notably, SimSIAM converges faster and at a lower loss than BYOL. Subsequently, these models are employed to predict the tumor volume of a subject on the final day of the radiotherapy treatment using the learned representations on the 4D Lung dataset. The predictions differ from the actual tumor volumes significantly. To further verify whether these models can capture tumor volume-related representations, linear probing predictions are made. The results show that the predictions are inaccurate for all subjects, displaying considerable discrepancies from the actual values, and substantial variation between different predictions for the same subject. Moreover, there is no significant correlation between the actual tumor volumes and the linear probing predictions, although the model trained using SimSIAM method demonstrates a marginally better correlation. Therefore, this research concludes that the BYOL and SimSIAM methods, as implemented, are not suitable for training 3D models for longitudinal analysis due to their inability to efficiently capture tumor volume-related representations. The research may serve as a reference for future studies that apply self-supervised learning methods akin to BYOL and SimSIAM to train 3D models for longitudinal analysis. Keywords: self-supervised learning; 3D medical images; longitudinal analysis
Supervisor(s)	Jens Petersen
External examiner(s)	Rasmus Reinhold Paulsen
Date and time	June 28 2024, 09:00-10:00
Room	SCI-DIKU-UP1-1-1-N116A

Bioinformatics

11 June: Cheng Chen

Name of student(s)	Cheng Chen
Study Programme	Bioinformatics
Title	Unsupervised learning of multi-omics and phenotype data in the UK Biobank
Abstract	To generate deeper understanding of complex phenotypes we need to move beyond single data modalities and focus on multi-omics and multimodal data integration. The project focus on applying the MOVE pipeline to perform integration of anthropometrics, biomarker, proteomics and metabolomics data from UK Biobank database to latent representations, and interpreting these latent representations and important features they represent. Then we performed genome-wide association studies (GWAS) on the latent representations to investigate genetic associations related to metabolic syndrome. This approach enhances our understanding of how genetic variations influence a combination of complex phenotypes and provides a potential direction for future research.
Supervisor(s)	Anders Krogh
External examiner(s)	Ole Lund
Date and time	June 11 2024, 15:30 - 16:30
Room	Panum, Mærsk Tower, floor 8, room 145A.

12 June: Lucas Phillip Krieger

Name of student(s)	Lucas Phillip Krieger
Study Programme	Bioinformatics
Title	Exploring the Capabilities of Protein Language Models at Predicting Glycosylation
Abstract	Glycosylation is the most abundant and diverse form of protein post-translational modification. O-glycosylation biosynthesis starts when a group of twenty partially redundant polypeptides transfer an N-acetylgalactosamine (GalNAc) to a serine or threonine in the golgi apparatus. Unlike N-glycosylation, O-linked glycosylation occurs on serine or threonine residues, which is then followed by various amino acids that lack a consensus motif. The lack of a motif complicates the prediction of O-linked glycosylation sites. The main objective is to explore the capabilities of protein embeddings generated by ESM-2 in predicting if disordered regions will be O-linked glycosylated.
Supervisor(s)	Hiren Joshi, Wouter Boomsma
External examiner(s)	Henrik Nielsen
Date and time	June 12 2024, 14:30 - 15:20
Room	07-10-143a in the Mærsk tower

13 June: Yifan Sun

Name of student(s)	Yifan Sun
Study Programme	Bioinformatics Thesis
Title	Extension and application of a side-chain customization protocol for receptor probe and drug discovery
Abstract	G protein-coupled receptors (GPCRs) are crucial in mediating the actions of approximately two-thirds of human hormones, the majority of which, around 71%, are peptides or proteins. Given their pivotal role in numerous physiological processes, GPCRs represent critical targets in drug discovery. However, designing ligands that effectively target GPCRs is particularly challenging for several reasons, including the structural diversity and heterogeneity of these receptors, the lack of detailed structural information, and the need for high selectivity and specificity. In this context, this study significantly refines and expands a foundational side-chain selection algorithm based on residue-to-residue interactions. The enhancement targets the design of probes for Class A GPCRs by optimizing both our GPCR-peptide interaction library and the side-chain selection algorithm. This study implements a novel approach, utilizing a comprehensive collection of Class A GPCR-peptide complexes to generate a residue pair interaction library that incorporates information from all available GPCR-peptide structures. Subsequently, the library will be utilized to design peptides by removing and reconstructing side chains on peptidereceptor backbones to achieve the prediction of amino acid sequences of ligands. With the support of different statistical measures, we validate this new method to ensure the predicted reliability and accuracy. At the same time, in collaboration with our team’s drug design and protein science experts, we select several structures of interest based on our novel approach and the latest deep learning-based methods, ProteinMPNN, for prediction. The current results show significant improvements in prediction accuracy compared to the initial version. In the future, we may use molecular dynamics further to confirm the stability of the predicted ligand-receptor binding scaffold. The novel approach in ligand discovery may provide insights for GPCR-targeted drug design, particularly in identifying ligands for GPCRs without an endogenous ligand.
Supervisor(s)	Wouter Boomsma
External examiner(s)	Jes Frellsen
Date and time	June 13 2024, 10:00-11:00
Room	UP1-2-0-04

20 June: Yan Li

Name of student(s)	Yan Li
Study Programme	Bioinformatics Thesis
Title	Development of a deep generative model for cancer gene expression and clinical data combined
Abstract	This thesis aims to handle the multi-modality metadata in the TCGA dataset and integrate it with the gene expression data to enhance cancer studies using a deep generative model. The oncology deep generative decoder model in this thesis, utilizes a Gaussian mixture model followed by a two-layer decoder capable of processing multi-modal data. After being trained on gene expression data combine with metadata, the oncology deep generative decoder model effectively captures the latent representations of different cancer types and generates different modalities from its output layer. Our approach not only demonstrates the capability to reconstruct and generate realistic synthetic data but also enhances cancer subtype identification and survival analysis. Experimental results highlight the model’s proficiency in distinguishing cancer subtypes, as evidenced by a strong correspondence between Gaussian mixture model’s components and external cancer type labels. Furthermore, the model excels in survival analysis, accurately predicting survival probabilities over time for different cancers. The survival analysis revealed distinct survival curves for aggressive cancers like mesothelioma (MESO) and pancreatic adenocarcinoma (PAAD), compared to more indolent cancers such as lung adenocarcinoma (LUAD), breast invasive carcinoma (BRCA), and prostate adenocarcinoma (PRAD), aligning well with clinical observations. This thesis illustrates that the oncology deep generative decoder model addresses the limitations of traditional dataset-based cancer research, which often struggles with multi-modal datasets, and demonstrates potential for a variety of clinical applications. Keywords: Deep generative model, TCGA dataset, Multimodality, Oncology, Survival analysis
Supervisor(s)	Anders Krogh
External examiner(s)	Jes Frellsen
Date and time	June 20 2024, 14:30-15:30
Room	Panum, mødelokale 33.4D

21 June: Rasmus Alex Buntzen-Frederiksen

Name of student(s)	Rasmus Alex Buntzen-Frederiksen
Study Programme	Bioinformatics Thesis
Title	Predicting contaminated DNA samples within the class of Insecta from DNA and image embeddings
Abstract	Contaminated DNA samples pose significant challenges in biological research. Current methods for detecting contamination are often time-consuming and lack precision, necessitating the development of more robust, automated approaches. This thesis addresses this need by exploring a multi-modal machine learning model that integrates both image and DNA data to enhance contamination detection accuracy. The primary objective of this work is to develop and evaluate the feasibility and performance of a multi-modal machine learning model for predicting DNA contamination, using specimen images and DNA sequences. The methodology for this project uses deep convolutional neural networks (VGG16, ResNet-34 and EfficientNetB0) to extract image features and large language models (DNABERT2) to extract DNA features. Improvements to the image and DNA tracks are explored to enhance model performance. From the feature embeddings, a multilayer perceptron produces the contamination prediction. The study also explored methods to address overfitting and class imbalance issues. The initial model achieved a 61% accuracy on unseen training data, indicating the foundational capability of the multi-modal contamination prediction model. Subsequent refinements improved the model’s performance to an accuracy of 85%. EfficientNetB0 showed the highest effectiveness for the image track with a validation species classification accuracy of 74%, while DNABERT2 achieved a 98.5% validation species classification accuracy for the DNA track. This study provides preliminary evidence that a multi-modal machine learning model can effectively predict DNA contamination by leveraging both image and DNA data.
Supervisor(s)	Kim Steenstrup Pedersen
External examiner(s)	Rasmus Reinhold Paulsen
Date and time	June 21 2024, 10:05-11:05
Room	Konferencelokalet på Zoologisk Museum, UP 15

IT and Cognition

10 June: Xiangyu Lu

Name of student(s)	Xiangyu Lu
Study Programme	IT and Cognition Thesis
Title	Fiber Break Prediction Using 3D Generative Models
Abstract	Reliable failure predictions of fiber-reinforced composites (FRCs) are crucial for ensuring the safety of products, reducing costs, and optimizing performance. Investigating the underlying mechanism of tensile failure of individual fibers is an important research direction. In this thesis,we aim to predict potential fiber breaks by synthesizing computed tomography (CT) images of FRCs at higher forces based on CT scans taken under initial force. We developed and evaluated 3D conditional generative adversarial networks (c-GANs) with 3D U-Nets as generators to generate CT images of fiber structures under increased force. Our models obtained MAE and MSE scores of 0.0369 and 0.00278, respectively, on the validation set. The visual quality of the generated images indicates that the models excelled at predicting the movement of the fiber structure under increasing force. However, we observed a weak correlation between the generated and ground truth fiber breaks.
Supervisor(s)	Abraham George Smith
External examiner(s)	Melih Kandemir
Date and time	June 10 2024, 09:30-10:30
Room	DIKU UP1-2-0-15

28 June: Ben Yao

Name of student(s)	Ben Yao
Study Programme	IT and Cognition Thesis
Title	Self-supervised Pre-training for Quantum Natural Language Processing
Abstract	Quantum computing has been broadly applied to many AI fields and achieved impressive results. However, the quantum machine learning models are limited to the inherent linearity in quantum computing architecture, resulting in their constrained capabilities and adaptability. Inspired by the remarkable success of self-supervised learning and pre-training method, we integrate these two promising mechanisms into our quantum natural language processing (QNLP) model, attempting to address this limitation and to increase the power of the QNLP model on the representation level. Specifically, we pre-train a hybrid quantum-classical natural language processing model, which consists of a BERT-like text representation encoder, a feature extractor, and a task-related classifier, on large text corpora. In contrast to traditional natural language processing models, we employ quantum architectures within both the feature extractor and the classifier. In the downstream tasks, the QNLP model inherits the pre-trained word representations and quantum encodings of sentences, and is fine-tuned for specific tasks on its basis. Experiments show that pre-trained mechanism brings remarkable improvement over end-to-end quantum models, yielding meaningful prediction results on a variety of downstream text classification datasets. Furthermore, our work brings significant enhancement to the capability of language understanding over the pure QNLP model. As a pioneering work, this study conducts a precursory exploration of pre-training quantum language models and sheds light on future research in this direction.
Supervisor(s)	Qiuchi Li
External examiner(s)	Troels Andreasen
Date and time	June 28 2024, 10:00-11:00
Room	DIKU UP1-2-0-06

28 June: Adrianna Helena Klank

Name of student(s)	Adrianna Helena Klank
Study Programme	IT and Cognition Thesis
Title	Diffusion Models in 3D Medical Image Synthesis and Forecasting Radiotherapy Outcome for Lung Cancer Treatment
Abstract	The aim of this work is to develop a method for forecasting the progress of cancer treatment by employing Generative Artificial Intelligence to synthesize images that capture changes across the radiotherapy. We evaluate a method that incorporates an attention-based module for capturing temporal changes[22] as an additional input to a diffusion model. Our findings indicate that this approach is unsuccessful in accurately capturing anatomical structures in both CT lung scans and MRI cardiac cycle datasets. We further investigate the method through an ablation study with various architectural variants. Ultimately, we demonstrate that while one of the models is capable of generating high-quality images, its clinical usefulness requires further evaluation.
Supervisor(s)	Jens Petersen
External examiner(s)	Rasmus Reinhold Paulsen
Date and time	June 28 2024, 10:00-11:00
Room	SCI-DIKU-UP1-1-1-N116A

Health Informatics

18 June: Alexander Haderup Alsing and Felix Björklund Osmark

Name of student(s)	Alexander Haderup Alsing and Felix Björklund Osmark
Study Programme	Speciale i Sundhed og Informatik
Title	Teknologisk løsning til arbejdsgangen ved supervision i Vesterbro Lægehus
Abstract	Background: General practice constitutes the citizens’ primary access to the Danish healthcare system and deals with both minor and major health problems. With an ageing population and a growing proportion of patients with chronic diseases, there is a need for increased treatment capacity and efficient work organisation in general practice. The 2022 collective agreement supports this by promoting the use of practice staff to relieve the burden on doctors. Method: In this study, we use the Double Diamond process model to structure our study of how a technological solution can improve the supervision process at Vesterbro Lægehus. To investigate this, we collect empirical data using participant observation to understand and observe workflows, as well as interviews to understand staff experiences and needs. Through open coding and clustering, we identify patterns in the empirical material that result in a workflow analysis. We use the Septigon model to understand how our findings affect each other from the workflow analysis. We also supplement with a literature search to validate our findings. For the prototype development of the concept, we use the digital design tool Figma as well as validation through user tests. Results: A technological solution can be used for one of the supervision workflows in Vesterbro Lægehus. The solution aims to reduce the inactive waiting time by restructuring the waiting time location. The general practice staff can instead of physically wait at the supervisor's office, perform other relevant work tasks. The technological solution requires that the supervisor receives a notification with information from the general practice staff about the applicable supervision. The solution must provide an opportunity to respond to the inquiries, to reduce the general practice staff’s concern about being overlooked. Conclusion: The on-call function has the potential to improve the workflow and reduce downtime in general practice. Further research is needed to determine the full efficacy and optimize the system for different supervision needs.
Supervisor(s)	Henriette Mabeck
External examiner(s)	Yutaka Yoshinaka
Date and time	June 18 2024. Alexander Haderup Alsing: 15:15-16:15 & Felix Björklund Osmark: 16:15-17:15
Room	2.3.I.164 på NBB

Details

Time: 5 June - 28 June 2024

Organizer: Department of Computer Science