Ep. 245 - Part 1 - June 11, 2024

Ep. 245 - Part 1 - June 11, 2024

TechcraftingAI Computer Vision · 2024-06-13
37:02

ArXiv Computer Vision research for Tuesday, June 11, 2024.

00:20: Explaining Representation Learning with Perceptual Components

01:28: Optimal Matrix-Mimetic Tensor Algebras via Variable Projection

03:03: Sparse Bayesian Networks: Efficient Uncertainty Quantification in Medical Image Analysis

04:24: Neural Visibility Field for Uncertainty-Driven Active Mapping

05:21: Triple-domain Feature Learning with Frequency-aware Memory Enhancement for Moving Infrared Small Target Detection

06:55: Stepwise Regression and Pre-trained Edge for Robust Stereo Matching

08:38: Evolving from Single-modal to Multi-modal Facial Deepfake Detection: A Survey

10:08: Dual Thinking and Perceptual Analysis of Deep Learning Models using Human Adversarial Examples

11:10: Generative Lifting of Multiview to 3D from Unknown Pose: Wrapping NeRF inside Diffusion

12:34: RWKV-CLIP: A Robust Vision-Language Representation Learner

14:01: Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

15:03: Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection

16:40: MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

18:34: Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models

19:38: LiSD: An Efficient Multi-Task Learning Framework for LiDAR Segmentation and Detection

21:04: RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks

22:49: PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving

24:15: EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network

26:25: 1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

27:16: DualMamba: A Lightweight Spectral-Spatial Mamba-Convolution Network for Hyperspectral Image Classification

29:09: Triage of 3D pathology data via 2.5D multiple-instance learning to guide pathologist assessments

31:08: Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology

32:23: CAT: Coordinating Anatomical-Textual Prompts for Multi-Organ and Tumor Segmentation

33:54: RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents

35:17: AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding

TechcraftingAI Computer Vision

TechcraftingAI Computer Vision brings you summaries of the latest arXiv research daily. Research is read by your virtual host, Sage. The podcast is produced by Brad Edwards, an AI Engineer from Vancouver, BC, and a graduate student of computer science studying AI at the University of York. Thank you to arXiv for use of its open access interoperability.

  • No. of episodes: 315
  • Latest episode: 2024-06-15
  • Technology

Where can you listen?

Apple Podcasts Logo Spotify Logo Podtail Logo Google Podcasts Logo RSS

Episodes