In almost any discourse on artificial intelligence (AI) and natural language processing, the focus tends to be almost entirely on Large Language Models or LLMs. Moreover, these are the models with which most people are familiar. However, leveraging Small Language Models or SLMs could offer a greater strategic advantage to Caribbean countries and other developing regions.
The buzz around Large Language Models (LLMs) like Chat GPT-4 and Gemini is deafening. Their ability to generate human-quality text, translate languages, and even code has captured the imagination and sparked a global race to develop ever-larger and more capable models. However, amidst this excitement, a quieter revolution is brewing: the rise of Small Language Models (SLMs). While they might not grab the headlines in the same way, SLMs hold immense potential, particularly for developing nations like those in the Caribbean region, by offering a more practical and impactful path to leveraging the power of natural language processing.
So, what exactly are Small Language Models?
The primary distinction lies in their size, measured by the number of parameters or variables the model learns during training. LLMs boast billions, even trillions, of parameters, requiring massive datasets, immense computational power for training and inference, and significant energy consumption. SLMs, on the other hand, are trained on smaller datasets and have significantly fewer parameters, often ranging from a few million to a few hundred million. This difference in scale has profound implications.
First, it is regarding computational cost. Training and running LLMs demands expensive hardware, specialised infrastructure, and substantial energy sources, which is often prohibitive for many developing countries. SLMs, thanks to their reduced size, can be trained and deployed on more readily available and affordable hardware, even on edge devices like smartphones or embedded systems.
Second, there is a vast difference in data requirements. Although still requiring substantial data, SLMs can achieve impressive performance on specific tasks with smaller, more targeted datasets. This is particularly relevant for languages and domains where the massive datasets used to train LLMs are scarce, which is likely to be the case in developing regions.
Third, concerns energy consumption and efficiency. The environmental impact of training and deploying massive LLMs is a growing concern. For example, according to the Association of Data Scientists, training Open AI’s GPT-3, which had 175 billion parameters, was estimated to use around 1,287 Megawatt-hours (MWhof electricity, “…which is roughly equivalent to the energy consumption of an average American household over 120 years.” On the other hand, SLMs are significantly more energy-efficient, aligning with sustainability goals and reducing operational costs. Though a specific figure could not be readily found, SLMs are estimated to consume between 10% and 20% of the energy needed by an LLM. For example, a model with 7 billion parameters, hence considerably smaller than the 175 billion parameters of GPT-3, requires only 50 MWh of energy, or 3.9% of the energy used to train GPT-3 (Source: Association of Data Scientists).
Fourth, SLMs tend to be more accessible and customisable than LLMs. The smaller size of SLMs makes them more accessible for research and development purposes, especially in resource-constrained environments. It is thus possible to run SLMs on edge devices, such as smartphones, IoT devices, and embedded systems, where power availability is limited, and without constant cloud connectivity, which is especially relevant in the Caribbean and other developing regions where quality internet access may still be a challenge. Additionally, SLMs tend to be easier to fine-tune and customise for specific local languages, dialects, and cultural contexts, which again could be a crucial advantage for regions with unique linguistic and cultural landscapes, such as the Caribbean.
Finally, SLMs tend to be more attractive when latency and use for real-time applications are considered. For example, applications that require quick responses, such as virtual assistants on low-bandwidth networks or real-time translation in resource-limited settings, SLMs offer lower latency due to their simpler architecture and smaller computational footprint. Once again, in environments where internet bandwidth might be limited or transmission speeds are slow due to network congestion, SLMs are not as taxing on already limited resources, thus making them a very viable option when compared with LLMs.
Prioritising SLMs over LLMs in developing countries
Given the distinctions previously outlined, why should Caribbean and other developing countries consider prioritising SLMs over the pursuit of building or solely relying on LLMs? SLMs offer a more pragmatic alternative to LLMs with the potential to create more immediate and local solutions that have an impact:
- SLMs can address local language needs. The linguistic diversity within the Caribbean region, with its various creole languages and dialects, often leaves these languages underrepresented in the massive datasets used to train LLMs. SLMs can be trained on smaller, carefully curated datasets of local languages, leading to more accurate and culturally relevant applications in areas such as education, agriculture, healthcare, customer service, as well as wider public education and awareness campaigns.
- SLMs can bridge the digital divide. The high cost of infrastructure and access to reliable internet can be significant barriers in developing nations. SLMs, with their lower computational demands, can be deployed on existing infrastructure and even offline, thus able to reach underserved communities and bridge the digital divide.
- SLMs can foster local innovation. Focusing on SLMs can empower local researchers, developers, and entrepreneurs to build AI solutions tailored to their specific needs and challenges. This fosters local expertise and reduces reliance on expensive, often foreign-centric LLM-based solutions.
- SLMs can offer cost-effective solutions for key sectors. SLMs can be effectively applied in crucial sectors like agriculture (to provide localised weather information and farming advice), healthcare (to offer basic diagnostic support and health information in local languages), and tourism (to facilitate communication and providing culturally relevant information to visitors), all at a fraction of the cost of deploying LLM-based systems.
- SLMs can address data sovereignty and security. Noting the growing focus on data sovereignty and security in the Caribbean region,training and hosting SLMs locally can provide governments and organisations with greater control over their data, which can be particularly important for sensitive information within national contexts.
Although this article highlights the benefits of SLMs, it does not mean that LLMs have no role to play in developing nations. LLMs can be valuable tools for accessing and processing global knowledge, which we have been experiencing and enjoying. However, for building powerful and sustainable AI solutions that directly address local needs, leverage contexts and can operate successfully in resource-constrained environments, SLMs offer a more accessible, affordable, and ultimately, more empowering path forward for the Caribbean countries and the wider developing world by unlocking the transformative potential of natural language processing on their own terms.
Image credit: Pixabay
Michelle, many thanks for positioning SLMs as a good strategic fit for SIDS. I can affirm that for all the reasons you laid out, SLMs are within reach for SIDS. They are affordable, adaptable, and locally deployable. We should continue to promote this model of AI not only as a practical path to digital empowerment without dependency on global cloud infrastructure (low latency) but also emphasize versatility for addressing specific tasks – as in its utilisation in specific use cases and languages – makes it a really good choice for implementation as public goods with assured data sovereignty.