Unifying 3D Mesh Generation with Language Models
A personal context-agent that learns how you work
Tools for merging pretrained large language models
Build and run agents you can see, understand and trust
Controllable and fast Text-to-Speech for over 7000 languages
Unified Multimodal Understanding and Generation Models
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
PyTorch code and models for VJEPA2 self-supervised learning from video
Educational framework exploring multi-agent orchestration
A lightweight vision library for performing large object detection
This repo contains the code for 1D tokenizer and generator
Flexible Photo Recrafting While Preserving Your Identity
A SOTA open-source image editing model
Multi-Agent daTa geneRation Infra and eXperimentation framework
Build cross-modal and multimodal applications on the cloud
Powering Amazon custom machine learning chips
Chemcrow
GUI Exploration Lab. One of the best GUI agent solutions
Large-language-model & vision-language-model based on Linear Attention
Chat & pretrained large audio language model proposed by Alibaba Cloud
Deep and online learning with spiking neural networks in Python
Did you say you like data?
Run LLMs locally on Cloud Workstations
An Efficient and Easy-to-use Federated Learning Framework