Small Language Models are Going to Eat the World.

January 15, 2024

Today, Large Language Models (LLMs) typically require internet access. As prompt-based applications become ubiquitous, there is a high likelihood we slowly begin to see a transition from internet-based models to locally hosted models.

Local models are nothing new. Google product users are often pushed to download local models for Google Maps, Google Translate, and Text2Speech. These models run locally for four primary purposes:

speed
reliability
privacy
cost

Benefits

Speed

Local models have no network latency. They run locally, and instructions and data transfers happen closer to the application layer, resulting in increased performance.

Reliability

Local models are self-reliant. They don't require additional computers to operate and don't rely on 3rd party service providers. They run as stand-alone and won't break if internet connectivity is lost.

Privacy

Private information is processed locally and never shared with another provider. Information passed into these models may contain private or confidential information that an external processor should not process.

Cost

Local models require zero hosting. Models may run frequently, and costs involved in processing data regularly at scale may become unaffordable or may better be absorbed by a local device.

How can we make local models a reality?

Python is the language of choice to run LLMs. However, we know that embedded devices, mobile apps, and web servers often use different languages to run and operate efficiently.

To bridge the gap in SDKs for accessing large language models across various platforms, engineers should consider developing and integrating multi-language libraries and frameworks that are compatible with mobile, embedded, and diverse server environments. Embracing innovation and flexibility in these developments is critical, as large language models represent a new technological frontier rather than merely enhancing existing tools.