AI tool converting video/audio into structured documents instantly
Qwen3-omni is a natively end-to-end, omni-modal LLM
Library for OCR-related tasks powered by Deep Learning
A security scanner for custom LLM applications
Machine Learning Engineering Open Book
Controllable and fast Text-to-Speech for over 7000 languages
An AI-powered security review GitHub Action using Claude
Renderer for the harmony response format to be used with gpt-oss
Lightweight framework for evaluating large language model performance
Data Lake for Deep Learning. Build, manage, and query datasets
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
State-of-the-art (SoTA) text-to-video pre-trained model
Open Agent Harness with a built-in personal agent, Ohmo
This repository contains code released by Google Research
Block Diffusion for Ultra-Fast Speculative Decoding
A long-running autonomous coding agent powered by the Claude Agent
Self-healing browser harness that enables LLMs to complete any task
Chat & pretrained large audio language model proposed by Alibaba Cloud
Retrieval Augmented Generation (RAG) framework
When LLM Meets Domain Experts
Towards Human-Sounding Speech
Converts text to speech in realtime
Run a full local LLM stack with one command using Docker
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
The fastest way to bring multi-agent workflows to production