4M: Massively Multimodal Masked Modeling
ICLR2024 Spotlight: curation/training code, metadata, distribution
[CVPR 2025 Best Paper Award] VGGT
Code to accompany "A Method for Animating Children's Drawings"
TGMC: TerraGov Marine Corps, a SS13 mod
No-code multi-agent framework to build LLM Agents, workflows
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
High-Fidelity and Controllable Generation of Textured 3D Assets
Unifying 3D Mesh Generation with Language Models
A personal context-agent that learns how you work
Tools for merging pretrained large language models
Controllable and fast Text-to-Speech for over 7000 languages
Unified Multimodal Understanding and Generation Models
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
PyTorch code and models for VJEPA2 self-supervised learning from video
Educational framework exploring multi-agent orchestration
A lightweight vision library for performing large object detection
Python framework for adversarial attacks, and data augmentation
Chemcrow
This repo contains the code for 1D tokenizer and generator
Flexible Photo Recrafting While Preserving Your Identity
A SOTA open-source image editing model
Multi-Agent daTa geneRation Infra and eXperimentation framework
Library providing end-to-end GPU-accelerated recommender systems