👋

Hi, I'm Hanna

I am

Get to know me

@ayukh

📍Zurich, Switzerland

ex-🤗|eth zurich

Updates

November 2025 - Presenting a lecture on adapting LLMs to EEU languages at the Lviv Polytechnic University Data Science Club.

October 2025 - Presented our work on multilingual models for EEU languages at the GIST workshop in Geneva.

September 2025 - Released MamayLM v1.0 Ukrainian-focused multimodal LLM.

July 2025 - Joined Hugging Face as a Machine Learning Research Engineer Intern to work on data science coding agents.

May 2025 - Finished my master thesis under the supervision of prof. Vechev and released a resulting Ukrainian-focused MamayLM v0.1 model.

October 2024 - Our paper A Synthetic Dataset for Private Attribute Inference has been accepted to NeurIPS 2024 Datasets and Benchmarks Track.

Work experience

Hugging Face🤗

Machine Learning Research Engineer Intern

July 2025 - September 2025

◆ Improving data sourcing and training methods for data science coding agents
◆ Testing multi-node multilingual training

ETH Zurich

Research Assistant

May 2025 - July 2025

◆ Research Assistant at Secure, Reliable, and Intelligent Systems Lab (SRI)
◆ Improving multilingual training, alignment and safety for EEU languages

ETH Zurich

Research Assistant

Nov 2023 - May 2024

◆ Project Intern at Secure, Reliable, and Intelligent Systems Lab (SRI)
◆ Documenting capabilities and vulnerabilities of the state-of-the-art large language models
◆ Contributing to LVE (Language Model Vulnerabilities and Exposures) project
SynthPAI: A Synthetic Dataset for Personal Attribute Inference: Semester project under supervision of Prof. Martin Vechev (co-supervised by Robin Staab and Mark Vero)

Fractal Analytics

Junior Data Scientist

Sep 2021 - Jul 2022

◆ Went through a 3-month internship with intensive training for statistics, machine learning techniques, data engineering and cloud (Azure)
◆ Executed an end-to-end Market Mix Modelling project for a particular segment of one of the world's biggest CPG companies, including methods research, EDA, developing a statistical model and fine-tuning it

Education

ETH Zurich

Statistics MSc

2022-2025

⭐ Main courses: Natural Language Processing, Large Language Models, Interactive Machine Learning: Visualization & Explainability, Probabilistic AI, Big Data for Engineers, AI4Good.
⭐ Master thesis "Enhancing Mid-Resource Language Performance in Large Language Models": end-to-end pipeline recipe for efficient bilingual LLM training and alignment (under supervision of prof. Vechev).
⭐ Extracurricular activities:
◆ Statistics representative at Seminar fur Statistik (SfS): organizing and leading events for students of Statistics MSc program
◆ Statistics MSc mentor: mentoring incoming first-year students of the program
◆ Member of VMP (student organization of D-MATH ETH department)

National Technical University of Ukraine

Economic Cybernetics MSc

2021-2022

⭐ Master thesis: 'Modeling the investment portfolio of E-commerce companies':
◆ Twitter sentiment analysis of E-commerce stock tickers
◆ Stock prediction using Generative Adversarial Networks (GAN)
◆ Investment portfolio modeling (option hedge fund, stock portfolio prediction)

National Technical University of Ukraine

Economic Cybernetics BSc

2017-2021

⭐ Grade: 94/100
◆ Graduated with honors
◆ Bachelor thesis: 'Modeling an equity investment fund using financial derivative management and hedging strategies'

Projects and publications

MamayLM v1.0

INSAIT Institute

The First Open Multimodal Ukrainian LLM.

MamayLM v1.0

Jupyter Agent

Hugging Face 🤗

Multi-step pipeline to generate synthetic Jupyter notebooks with custom scaffolding to finetune efficient data science coding agents.

Jupyter Agent

MamayLM v0.1

INSAIT Institute/ETH Zurich

An efficient bilingual LLM with cutting-edge performance in Ukrainian and English.

MamayLM v0.1

SynthPAI: A Synthetic Dataset for Private Attribute Inference

NeurIPS D&B 2024

LLM generated collection of synthetic texts to ensure privacy-preserving research in area of private attribute inference benchmarking of Large Language Models.

SynthPAI: A Synthetic Dataset for Private Attribute Inference

LVE Project

ETH Zurich

An open-source repository of Language Model Vulnerabilities and Exposures (LVEs).

LVE Project

Urban Planning Project

ETH Zurich (Interactive Machine Learning: Visualization & Explainability Course FS23)

A project for ETH Zurich course 'Interactive Machine Learning: Visualization & Explainability', spring semester 2023.

Urban Planning Project
  • © 2025. Hanna Yukhymenko.