Together AI – The AI Acceleration Cloud - Fast Inference, Fine-Tuning & Training
Extracto
Run and fine-tune generative AI models with easy-to-use APIs and highly scalable infrastructure. Train & deploy models at scale on our AI Acceleration Cloud and scalable GPU clusters. Optimize performance and cost.
Contenido
The AI Acceleration AccelerationCloud
Train, fine-tune-and run inference on AI models blazing fast, at low cost, and at production scale.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
End-to-end platform for the full generative AI lifecycle
Leverage pre-trained models, fine-tune them for your needs, or build custom models from scratch. Whatever your generative AI needs, Together AI offers a seamless continuum of AI compute solutions to support your entire journey.
Inference
The fastest way to launch AI models:
✔ Serverless or dedicated endpoints
✔ Deploy in enterprise VPC
✔ SOC 2 and HIPAA compliant
Fine-Tuning
Tailored customization for your tasks
✔ Complete model ownership
✔ Fully tune or adapt models
✔ Easy-to-use APIs
Full Fine-Tuning
LoRA Fine-Tuning
GPU Clusters
Full control for massive AI workloads
✔ Accelerate large model training
✔ GB200, H200, and H100 GPUs
✔ Pricing from $1.75 / hour
SPEED RELATIVE TO VLLM
LLAMA-3 8B AT FULL PRECISION
COST RELATIVE TO GPT-4o
Control your IP.
Own your AI.
Fine-tune open-source models like Llama on your data and run them on Together Cloud or in a hyperscaler VPC. With no vendor lock-in, your AI remains fully under your control.
together files upload acme_corp_customer_support.jsonl
{
"filename" : "acme_corp_customer_support.json",
"id": "file-aab9997e-bca8-4b7e-a720-e820e682a10a",
"object": "file"
}
together finetune create --training-file file-aab9997-bca8-4b7e-a720-e820e682a10a
--model together compute/RedPajama-INCITE-7B-Chat
together finetune create --training-file $FILE_ID
--model $MODEL_NAME
--wandb-api-key $WANDB_API_KEY
--n-epochs 10
--n-checkpoints 5
--batch-size 8
--learning-rate 0.0003
{
"training_file": "file-aab9997-bca8-4b7e-a720-e820e682a10a",
"model_output_name": "username/togethercomputer/llama-2-13b-chat",
"model_output_path": "s3://together/finetune/63e2b89da6382c4d75d5ef22/username/togethercomputer/llama-2-13b-chat",
"Suffix": "Llama-2-13b 1",
"model": "togethercomputer/llama-2-13b-chat",
"n_epochs": 4,
"batch_size": 128,
"learning_rate": 1e-06,
"checkpoint_steps": 2,
"created_at": 1687982945,
"updated_at": 1687982945,
"status": "pending",
"id": "ft-5bf8990b-841d-4d63-a8a3-5248d73e045f",
"epochs_completed": 3,
"events": [
{
"object": "fine-tune-event",
"created_at": 1687982945,
"message": "Fine tune request created",
"type": "JOB_PENDING",
}
],
"queue_depth": 0,
"wandb_project_name": "Llama-2-13b Fine-tuned 1"
}

Pika creates the next gen text-to-video models on Together GPU Clusters
![]()

Nexusflow uses Together GPU Clusters to build cybersecurity models


Arcee builds domain adaptive language models with Together Custom Models
