Follow
Peter Rupnik
Peter Rupnik
Verified email at ijs.si
Title
Cited by
Cited by
Year
MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages
M Banón, M Espla-Gomis, ML Forcada, C García-Romero, T Kuzman, ...
23rd Annual Conference of the European Association for Machine Translation …, 2022
112022
The GINCO training dataset for web genre identification of documents out in the wild
T Kuzman, P Rupnik, N Ljubešić
arXiv preprint arXiv:2201.03857, 2022
102022
ParlaSpeech-HR-a freely available ASR dataset for croatian bootstrapped from the parlaMint corpus
N Ljubešić, D Koržinek, P Rupnik, IP Jazbec
Proceedings of the workshop ParlaCLARIN III within the 13th language …, 2022
92022
Multilingual comparable corpora of parliamentary debates ParlaMint 3.0
T Erjavec, M Kopp, M Ogrodniczuk, P Osenova, D Fišer, H Pirker, T Wissik, ...
CLARIN ERIC, 2023
52023
The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia
M Mochtak, P Rupnik, N Ljubešič
arXiv preprint arXiv:2206.00929, 2022
52022
Get to Know Your Parallel Data: Performing English Variety and Genre Classification over MaCoCu Corpora
T Kuzman, P Rupnik, N Ljubešić
Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial …, 2023
42023
Slovene-English parallel corpus MaCoCu-sl-en 2.0
M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ...
Jožef Stefan Institute, 2023
42023
BENCHić-lang: A Benchmark for Discriminating between Bosnian, Croatian, Montenegrin and Serbian
P Rupnik, T Kuzman, N Ljubešić
Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial …, 2023
22023
Serbian-English parallel corpus MaCoCu-sr-en 1.0
M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ...
Jožef Stefan Institute, 2023
2*2023
Croatian-English parallel corpus MaCoCu-hr-en 2.0
M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ...
Jožef Stefan Institute, 2023
22023
The sentiment corpus of parliamentary debates ParlaSent-BCS v1. 0
M Mochtak, P Rupnik, N Ljubešić
Jožef Stefan Institute, 2022
22022
The twitter user dataset for discriminating between bosnian, croatian, montenegrin and serbian twitter-HBS 1.0
N Ljubešić, P Rupnik
Jožef Stefan Institute, 2022
22022
Slovene Web genre identification corpus GINCO 1.0
T Kuzman, M Brglez, P Rupnik, N Ljubešić
Jožef Stefan Institute, 2021
22021
Montenegrin web corpus CLASSLA-web. cnr 1.0
N Ljubešić, P Rupnik, T Kuzman
Jožef Stefan Institute, 2024
12024
Macedonian web corpus CLASSLA-web. mk 1.0
N Ljubešić, P Rupnik, T Kuzman
Jožef Stefan Institute, 2024
1*2024
Spoken corpus Gos 2.1 (transcriptions)
D Verdonik, A Zwitter Vitez, J Zemljarič Miklavčič, S Krek, M Stabej, ...
Centre for Language Resources and Technologies, University of Ljubljana, 2023
12023
Bulgarian-English parallel corpus MaCoCu-bg-en 2.0
M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ...
Jožef Stefan Institute, 2023
12023
Improving Effectiveness of a Coaching System Through Preference Learning
M Znidarsic, A Osojnik, P Rupnik, B Zenko
Proceedings of the 14th PErvasive Technologies Related to Assistive …, 2021
12021
" Choice of plausible alternatives" datasets in South Slavic dialects DIALECT-COPA
N Ljubešić, T Kuzman, P Rupnik, S Milosavljević, N Galant, S Benčina, ...
Jožef Stefan Institute, 2024
2024
Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining
N Ljubešić, V Suchomel, P Rupnik, T Kuzman, R van Noord
arXiv preprint arXiv:2404.05428, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–20