Ep. 245 - Part 1 - June 11, 2024 – TechcraftingAI Computer Vision

ArXiv Computer Vision research for Tuesday, June 11, 2024.

00:20: Explaining Representation Learning with Perceptual Components

01:28: Optimal Matrix-Mimetic Tensor Algebras via Variable Projection

03:03: Sparse Bayesian Networks: Efficient Uncertainty Quantification in Medical Image Analysis

04:24: Neural Visibility Field for Uncertainty-Driven Active Mapping

05:21: Triple-domain Feature Learning with Frequency-aware Memory Enhancement for Moving Infrared Small Target Detection

06:55: Stepwise Regression and Pre-trained Edge for Robust Stereo Matching

08:38: Evolving from Single-modal to Multi-modal Facial Deepfake Detection: A Survey

10:08: Dual Thinking and Perceptual Analysis of Deep Learning Models using Human Adversarial Examples

11:10: Generative Lifting of Multiview to 3D from Unknown Pose: Wrapping NeRF inside Diffusion

12:34: RWKV-CLIP: A Robust Vision-Language Representation Learner

14:01: Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

15:03: Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection

16:40: MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

18:34: Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models

19:38: LiSD: An Efficient Multi-Task Learning Framework for LiDAR Segmentation and Detection

21:04: RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks

22:49: PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving

24:15: EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network

26:25: 1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

27:16: DualMamba: A Lightweight Spectral-Spatial Mamba-Convolution Network for Hyperspectral Image Classification

29:09: Triage of 3D pathology data via 2.5D multiple-instance learning to guide pathologist assessments

31:08: Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology

32:23: CAT: Coordinating Anatomical-Textual Prompts for Multi-Organ and Tumor Segmentation

33:54: RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents

35:17: AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding

Ep. 245 - Part 1 - June 11, 2024