Towards Understanding Sycophancy in Language ModelsControlled Decoding from Language ModelsHyperFields: Towards Zero-Shot Generation of NeRFs from TextSupport the show
LLM-FP4: 4-Bit Floating-Point Quantized TransformersDetecting Pretraining Data from Large Language ModelsConvNets Match Vision Transformers at ScaleA Picture is Worth a Thousand Words: Principled Recaptioning Improves Image GenerationQMoE: Practical Sub-1-Bit Compression of Trillion-Parameter ModelsSupport the show
Matryoshka Diffusion ModelsDissecting In-Context Learning of Translations in GPTsWoodpecker: Hallucination Correction for Multimodal Large Language ModelsSAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial UnderstandingSupport the show
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise ReschedulingHallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality ModelsLocalizing and Editing Knowledge in Text-to-Image Generative ModelsSupport the show
H2O Open Ecosystem for State-of-the-art Large Language ModelsLet's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small ModelsTeaching Language Models to Self-Improve through Interactive DemonstrationsSupport the show
Think before you speak: Training Language Models With Pause TokensTowards Self-Assembling Artificial Neural Networks through Neural Developmental ProgramsEfficient Streaming Language Models with Attention SinksLarge Language Models Cannot Self-Correct Reasoning YetSmartPlay : A Benchmark for LLMs as Intelligent AgentsSupport the show
Enable Language Models to Implicitly Learn Self-Improvement From DataPixArt-alpha: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image SynthesisFELM: Benchmarking Factuality Evaluation of Large Language ModelsSupport the show
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem SolvingDecaf: Monocular Deformation Capture for Face and Hand InteractionsDirectly Fine-Tuning Diffusion Models on Differentiable RewardsSupport the show
Effective Long-Context Scaling of Foundation ModelsDemystifying CLIP DataVision Transformers Need RegistersQwen Technical ReportDreamGaussian: Generative Gaussian Splatting for Efficient 3D Content CreationSupport the show
NeuRBF: A Neural Fields Representation with Adaptive Radial Basis FunctionsEmu: Enhancing Image Generation Models Using Photogenic Needles in a HaystackShow-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video GenerationFinite Scalar Quantization: VQ-VAE Made SimpleSupport the show
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer ModelsAligning Large Multimodal Models with Factually Augmented RLHFLAVIE: High-Quality Video Generation with Cascaded Latent Diffusion ModelsSupport the show
CoRF : Colorizing Radiance Fields using Knowledge DistillationThe Cambridge Law Corpus: A Corpus for Legal AI ResearchCodePlan: Repository-level Coding using LLMs and PlanningDualToken-ViT: Position-aware Efficient Vision Transformer with Dual Token FusionSupport the show
Parallelizing non-linear sequential models over the sequence lengthFast Feedforward NetworksLongLoRA: Efficient Fine-tuning of Long-Context Large Language ModelsA Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language ModelsBoolformer: Symbolic Regression of Logic Functions with TransformersSupport the show
FreeU: Free Lunch in Diffusion U-NetNeurons in Large Language Models: Dead, N-gram, PositionalDreamLLM: Synergistic Multimodal Comprehension and CreationKosmos-2.5: A Multimodal Literate ModelEnd-to-End Speech Recognition Contextualization with Large Language ModelsThe Languini Kitchen: Enabling Language Modelling Research at Different Scales of ComputeSupport the show
Graph Neural Networks Use Graphs When They Shouldn'tLarge Language Models for Compiler OptimizationOpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from ScratchBaichuan 2: Open Large-scale Language ModelsLanguage Modeling Is CompressionFoleyGen: Visually-Guided Audio GenerationSupport the show
Textbooks Are All You Need II: phi-1.5 technical reportDiffBIR: Towards Blind Image Restoration with Generative Diffusion PriorWhen Less is More: Investigating Data Pruning for Pretraining LLMs at ScaleMADLAD-400: A Multilingual And Document-Level Large Audited DatasetFIAT: Fusing learning paradigms with Instruction-Accelerated TuningOptimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMsSupport the show
Large-Scale Automatic Audiobook CreationCityDreamer: Compositional Generative Model of Unbounded 3D CitiesFrom Sparse to Dense: GPT-4 Summarization with Chain of Density PromptingMobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-ExpertsHigh-Quality Entity SegmentationSupport the show
Large Language Models as OptimizersFLM-101B: An Open LLM and How to Train It with $100K BudgetXGen-7B Technical ReportTracking Anything with Decoupled Video SegmentationDoLa: Decoding by Contrasting Layers Improves Factuality in Large Language ModelsSupport the show
SLiMe: Segment Like MeMatcha-TTS: A fast TTS architecture with conditional flow matchingPhysically Grounded Vision-Language Models for Robotic ManipulationScaling Autoregressive Multi-Modal Models: Pretraining and Instruction TuningSupport the show
One Wide Feedforward is All You NeedEfficient RLHF: Reducing the Memory Usage of PPOPromptTTS 2: Describing and Generating Voices with Text PromptAniPortraitGAN: Animatable 3D Portrait Generation from 2D Image CollectionsSupport the show