Erste (2017), located in Ankara-Türkiye, is a software engineering SME focusing on international R&D projects. It has a wide range of expertise in mobile-device management, predictive maintenance, digital twins, Internet of Things (IoT)-based intelligent port management, smart-software-powered manufacturing environments, blockchain-based ID management, etc. Sample projects Erste was involved in can be seen at https://www.ersteyazilim.com/en/projects-products-and-services/.
Technical/scientific Challenge
Although state-of-the-art speech-to-text models work well in English, they do not have the same performance for most of the other languages. Fine-tuning these models can be a remedy, and using HPC systems for this purpose is the most cost-efficient solution. Erste requires the NCC’s consultancy to get support in training and using large-scale models for speech-to-text transcription in Turkish. The developed tools will be utilized in an international ITEA project called InnoSale (https://www.innosale.eu/), which focuses on advancing state-of-the-art technology in the retail industry.
Solution
Although Erste possessed experience from a previous Proof of Concept (PoC) study in utilizing HPC resources to train traditional Machine Learning (ML) models, it did not have expertise on large-scale models and how to work with them. EuroCC’s training sessions and TRUBA User Documents proved to be quite informative. The model chosen for fine-tuning was the largest Whisper model. Working with the HPC expert, memory-efficient training approaches, Parameter-Efficient Fine-Tuning (PEFT) and Low-Rank Adaptation (LoRA), were implemented. Thanks to the TRUBA HPC Centre’s help, the SLURM scripts that can work on multiple GPUs were prepared and the parameters for effective and efficient fine-tuning were set. We have observed significant reductions in the character and word error rates of the pre-trained Whisper even after only 4-8 hours of training sessions.
Figure 1: Word Error Rate and Character Error Rate for the downloaded, pre-trained Whisper model and the model fine-tuned with extra Turkish dataset.
Business impact
Training a large-scale ML model is a highly demanding task, requiring significant computational power and financial resources. Despite numerous research and efforts to alleviate these demands, access to the computation power remains inevitable. The TRUBA resources provided through the EuroCC-2 project have been instrumental in supplying the necessary computational capabilities to Erste. In addition, using the NCC’s expertise on these models also enabled us to yield an effective training process with less time.
Ermetal (https://www.ermetal.com.tr/en), a large-scale manufacturing company in Türkiye and Erste’s use-case partner in InnoSale, shared meeting records which were manually transcribed by the engineers at Erste. This data is moved to TRUBA. Debug partitions available in the cluster were first utilized to rectify any potential errors and follow the error/loss behaviour. In addition, public datasets have also been experimented on to see if they help to make the model better. The Whisper model obtained after selecting the memory-efficient training approaches, a careful selection of the training parameters, and submitting the job to the main cluster was tested for its usability in the InnoSale project, and significant improvements were observed. Erste now has expertise in using these scripts and will continue to improve the model with new data.
Benefits
- The training time was significantly reduced by using (1) multiple GPUs, (2) efficient training approaches.
- The error in the transcribed text significantly reduced but there is still room for improvement.
- Erste now has the expertise in large-scale model training and will leverage it to make the model better and develop other cutting-edge tools.
Keywords
- Keywords:
- Technology: HPC, AI, speech to text, GPUs, LoRA, PEFT
- Industry sector: Manufacturing
Contact: Kamer Kaya, kaya@sabanciuniv.edu