Nvidia’s first cloud service makes AI less vague


Nvidia is trying to simplify AI with a cloud service that makes AI and its many forms of computing less vague and more conversational.

The NeMo LLM service, which Nvidia called its first cloud service, adds a layer of intelligence and interactivity allowing users to seamlessly interact with complex AI models in fields such as biotechnology and medicine.

Some AI models that have been developed or are being researched can be complicated and need to be turned into useful enterprise applications that can adapt to real-world business environments, said Ian Buck, managing director and vice president of Accelerated Computing at Nvidia.

“We need to adapt these big language models to answer questions in certain ways, to give them context and the domain problem to be solved,” Buck said during a press briefing ahead of the fall technology conference. company’s GPU, which is being held virtually this year. the week.

Large language models are considered a core technology to simplify user interaction with AI. Most recent DALL-E 2which has 3.5 billion parameters, can generate images from a natural description of a few words, as one would use to describe art.

Main image by Mahendra awale, licensed under CC BY-SA 3.0 via Wikimedia Commons

The NeMo LLM will facilitate access to large language models so that companies can play with, experiment with, and deploy these models for their specific use case.

While DALL-E 2 is a simple example of a generic use of a large language model, Nvidia is tuning the NeMo LLM service to add a conversational element to specialized areas such as finance, technology or medicine.

“This service will help bring great language models to all sorts of different use cases – for generating benefit summaries, for product reviews, for creating technical Q&As, for medical use cases” , Buck said.

The cloud service takes pre-existing, pre-trained models such as the NeMo Megatron (530 billion parameters), GPT-3 (175 billion parameters) or T5 (11 billion parameters) model and builds a domain-specific framework around of this one. The LLM will help models answer questions in a language best suited to a specific field.

“You don’t have to train the big language model from scratch. We’ve done it before and made it easy for you,” Buck said.

The service is easy to use and doesn’t require a lot of coding. A developer must enter domain prompts, sample questions and how they want answers, and text or summaries. The servers then train the model to answer questions in that particular way. The result is a cloud-based API allowing users to interact with the service or use it in applications.

Nvidia is also launching the NeMo LLM cloud service with BioNeMo, which gives researchers access to pre-trained chemistry and biology language models. These services will help research interact and manipulate proteins and data for applications such as drug discovery.

“Fortunately, chemistry and biology have their own language – SMILE strings for chemistry, amino acids for proteins, and nucleic acids for DNA and RNA,” said Kimberly Powell, vice president and director. General Health Care at Nvidia.

The first of two BioNeMo protein models, the ESM-1, captures or encodes important biological features from large protein databases. The model was originally developed by Meta (the parent company of Facebook), and was recycled by Nvidia and is now offered as a service. This template is designed for downstream use in research or enterprise communities.

“Users of the service can enter an amino acid sequence and the model will derive thousands of representations per second. This can be used to train a task-specific model, like predicting the stability or solubility of a protein,” Powell said.

The second BioNemo service – ESM2 – serves a model developed by an OpenFold consortium, which predicts 3D protein structure from an amino acid sequence in just a few minutes.

“Otherwise, you have to use experiments to determine 3D structures. And they are very difficult, expensive and can take years,” Powell said.

The OpenFold Consortium, which includes academics, startups, and biotech and pharmaceutical companies, developed the open-source protein language model. Nvidia will serve the model, but will also continue to iterate and co-develop the models with the consortium. Unlike ESM1, Nvidia did not recycle the model.

Users will have early access to BioNemo next month.

The NeMo LLM cloud service will be deployed in data centers that Nvidia classifies as “AI factories”. Customers can send raw data to the factory, resulting in a brilliant end product ready for deployment.

The NeMo LLM is the latest addition to a stable of software machines deployed in Nvidia’s AI factory. Other software products from Nvidia’s AI factory include RIVA, which is a voice AI, and Merlin, which is a recommendation system.

The NeMo LLM will take advantage of the new H100 GPUs based on the latest Hopper architecture, which Nvidia says is now in full production (although full SXM capability awaits the availability of Intel’s Sapphire Rapids processors). Nvidia said eight H100 GPUs can match the output of 64 previous generation A100 GPUs.

Large language models like NeMo in Nvidia’s cloud service are based on the Transformer architecture, which helps AI understand which parts of a sentence, image, or disparate data points are related to each other. to each other. This is different from convolutional neural networks, which only look at their immediate neighborhood relationships.

“Transformers can master the most distinct relationships and this is important for a whole class of problems. Natural language processing is important because to understand the meaning of a word, you have to look at the whole sentence, and even a paragraph, and the same goes for a number of other areas,” said Paresh Kharya, senior director of product management and marketing at Nvidia, said HPCwire.

Transformers allowed Nvidia to create more distinct relationships across languages ​​and also to train on unlabeled datasets.

“It has dramatically increased the volume of data. In the case of NLP, it is all the data on the Internet. In the case of genomics and protein sequencing, known structures, behaviors and patterns are the data set we have,” Khariya said.

The Hopper architecture has transformer motors that operate with FP8 precision. Along with the software, Hopper is able to dynamically adjust and adapt to the accuracy required by the different layers of a model, and speed up training without changing or affecting accuracy.

The new pre-trained models offered by NeMo LLM take advantage of an emerging method called “fast learning”.

The quick learning method is to take a large, already pre-trained language model and add some examples of the type of tasks, expected responses, and types of responses expected when faced with a certain type of question. At the end of the training cycle, based on the input, the pre-trained main model does not change, but a prompt token is emitted, which provides context.

“The next time you ask a question of a similar type, you supply that question with that prompt token. And this token gives the model the context it needs to answer that question more accurately,” Kharya said.

The process is called P-tuning, which takes advantage of the GPU Hopper’s new transformer cores. The P-tuning process can speed up the deployment of LLMs up to five times compared to previous generation A100 GPUs, Kharya said.

Models can be trained or tuned on multiple GPU types in addition to Hopper, yet performance increases with faster bandwidth and connectivity with Hopper’s HBM3 memory and NVLink interconnect.

Nvidia said access to the NeMo LLM service will be direct-to-enterprise starting next month and will not be publicly available.

Previous SBI Customers ALERT! Fees for THIS service are waived – details
Next Retooling “treasury” for efficient service delivery