Research & development - Wageningen | Just now
Creating synthesis datasets of biomarker profiles along the GI tract, accelerating research and development of computational models for monitoring and prediction of gut health.
Up to 40% of the population suffers from some form of gastrointestinal (GI) disease. The physical symptoms of GI disorders are often disruptive to daily life, and through the gut-brain axis, mental health can also be affected. Further aggravating this profound societal impact, the diagnosis, treatment and management of GI disorders is complicated by the relative inaccessibility of the GI tract. To help solve this problem, at OnePlanet Research Center we are developing personalized models of gut health that can transform multimodal sensor data, from both the GI tract and other relevant biomarkers, into a continuously updated “digital twin” of a person’s GI health status. A key step in this process is the generation of synthetic datasets. These can be used to train models and AI algorithms, compensate for the scarcity of real, in-vivo measurements, reduce biases (e.g., underrepresentation of certain patient profiles), and enable data to be shared with third parties while safeguarding privacy. Your task in this project will be to expand and improve an existing statistical model of gut biomarkers, estimate population distributions over the model parameters on an existing dataset, and then use the model to generate new data conditioned on samples of “synthetic individuals” from these distributions. Guided by recent literature on generative AI for data synthesis, you will also design tests to assess the quality and usefulness of the resulting synthetic data. Time permitting, a capstone to the project could be training a deep neural network on a large synthetic dataset, to automatically annotate timeseries data from the ingestible.
In short, the internship involves:
- Working with unique data from cutting-edge sensors
- Extending an existing statistical model of gut biomarker profiles
- Synthesizing new data by sampling from this model
- Designing tests to evaluate the quality and usefulness of the resulting synthetic data
- Working in an agile setting to deliver timely, effective results.
The internship work and activities will be organized with a scrum-like methodology: you will maintain the backlog in coordination with your mentors. You will select prioritized tasks from the backlog, and you will tackle and evaluate them on a biweekly basis. At the end of each biweekly iteration, you will showcase and reflect on your progress in a regular meeting of the Human Digital Twin team (in which you will be embedded), which will be a valuable opportunity for broader feedback and collaborative problem-solving. In addition to this team, you will also be able to draw on the expertise of other data scientists and domain experts working at (or connected to) OnePlanet.
The ideal starting date for this (6-month) internship would be around the beginning of September (2025).
Does this position sound like an interesting next step in your career at imec? Don’t hesitate to submit your application by clicking on ‘APPLY NOW’.
Should you have more questions about the job, you can contact jobs@imec.nl.