Skip to main content

Get Started

Large Language Model on node.js.

This project is in an early stage, the API for nodejs may change in the future, use it with caution.


Prerequisites

  • Node.js version 16 or above

  • (Optional) Typescript: When you want statically typed interfaces

  • (Optional) Python 3: When you need to convert the pth to ggml format

  • (Optional) Rust/C++ compiling toolchains: When you need self compilation

    • Rust for building rust node api

    • CMake for building llama.cpp project

    • Clang/GNU/MSVC C++ compiler for compiling native C/C++ bindings, you can choose:


Compatibility

Currently supported models (All of these should be converted to GGML format):

Supported platforms:

  • darwin-x64
  • darwin-arm64
  • linux-x64-gnu (glibc >= 2.31)
  • linux-x64-musl
  • win32-x64-msvc

Node.js version: >= 16


Installation

  • Install llama-node npm package
npm install llama-node
  • Install anyone of the inference backends (at least one)

    • llama.cpp
    npm install @llama-node/llama-cpp
    • or llm-rs
    npm install @llama-node/core
    • or rwkv.cpp
    npm install @llama-node/rwkv-cpp

Getting Model

  • For llama and its derived models:

    The llama-node uses llm-rs/llama.cpp under the hook and uses the model format (GGML/GGMF/GGJT) derived from llama.cpp. Due to the fact that the meta-release model is only used for research purposes, this project does not provide model downloads. If you have obtained the original .pth model, please read the document and use the conversion tool provided by llama.cpp for conversion.

  • For RWKV models:

    RWKV is open source model developed by PENG Bo. All the model weights and training codes are open source. Our rwkv backend uses rwkv.cpp native bindings which also utilized the GGML tensor formats. You can download the GGML quantized model from here or convert it by following the document


First example

This is your first example that uses llama.cpp as inference backend, make sure you have installed @llama-node/llama-cpp package.

// index.mjs
import { LLM } from "llama-node";
import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js";
import path from "path";

const model = path.resolve(process.cwd(), "../ggml-vic7b-q5_1.bin");
const llama = new LLM(LLamaCpp);
const config = {
modelPath: model,
enableLogging: true,
nCtx: 1024,
seed: 0,
f16Kv: false,
logitsAll: false,
vocabOnly: false,
useMlock: false,
embedding: false,
useMmap: true,
nGpuLayers: 0
};

const template = `How are you?`;
const prompt = `A chat between a user and an assistant.
USER: ${template}
ASSISTANT:`;

const run = async () => {
await llama.load(config);

await llama.createCompletion({
nThreads: 4,
nTokPredict: 2048,
topK: 40,
topP: 0.1,
temp: 0.2,
repeatPenalty: 1,
prompt,
}, (response) => {
process.stdout.write(response.token);
});
}

run();

To run this example

node index.mjs

More examples

Visit our example folder here

Acknowledgments

This library was published under MIT/Apache-2.0 license. However, we stronly recommend you to cite our work/our dependencies work if you wish to reuse the code from this library.

Models/Inferencing tools dependencies:

Some source code comes from: