
Ep. 245 - Part 1 - June 11, 2024
ArXiv Computer Vision research for Tuesday, June 11, 2024.
00:20: Explaining Representation Learning with Perceptual Components
01:28: Optimal Matrix-Mimetic Tensor Algebras via Variable Projection
03:03: Sparse Bayesian Networks: Efficient Uncertainty Quantification in Medical Image Analysis
04:24: Neural Visibility Field for Uncertainty-Driven Active Mapping
05:21: Triple-domain Feature Learning with Frequency-aware Memory Enhancement for Moving Infrared Small Target Detection
06:55: Stepwise Regression and Pre-trained Edge for Robust Stereo Matching
08:38: Evolving from Single-modal to Multi-modal Facial Deepfake Detection: A Survey
10:08: Dual Thinking and Perceptual Analysis of Deep Learning Models using Human Adversarial Examples
11:10: Generative Lifting of Multiview to 3D from Unknown Pose: Wrapping NeRF inside Diffusion
12:34: RWKV-CLIP: A Robust Vision-Language Representation Learner
14:01: Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
15:03: Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection
16:40: MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results
18:34: Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models
19:38: LiSD: An Efficient Multi-Task Learning Framework for LiDAR Segmentation and Detection
21:04: RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks
22:49: PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving
24:15: EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network
26:25: 1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation
27:16: DualMamba: A Lightweight Spectral-Spatial Mamba-Convolution Network for Hyperspectral Image Classification
29:09: Triage of 3D pathology data via 2.5D multiple-instance learning to guide pathologist assessments
31:08: Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology
32:23: CAT: Coordinating Anatomical-Textual Prompts for Multi-Organ and Tumor Segmentation
33:54: RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents
35:17: AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding
TechcraftingAI Computer Vision
TechcraftingAI Computer Vision brings you summaries of the latest arXiv research daily. Research is read by your virtual host, Sage. The podcast is produced by Brad Edwards, an AI Engineer from Vancouver, BC, and a graduate student of computer science studying AI at the University of York. Thank you to arXiv for use of its open access interoperability.
- No. of episodes: 315
- Latest episode: 2024-06-15
- Technology