Ep. 244 - Part 2 - June 10, 2024

Ep. 244 - Part 2 - June 10, 2024

TechcraftingAI Computer Vision · 2024-06-11
42:07

ArXiv Computer Vision research for Monday, June 10, 2024.

00:20: DualAD: Disentangling the Dynamic and Static World for End-to-End Driving

01:41: NeuroMoCo: A Neuromorphic Momentum Contrast Learning Method for Spiking Neural Networks

03:22: Vehicle Vectors and Traffic Patterns from Planet Imagery

04:15: A Guide to Stochastic Optimisation for Large-Scale Inverse Problems

05:37: Cascading Unknown Detection with Known Classification for Open Set Recognition

06:42: Latent Directions: A Simple Pathway to Bias Mitigation in Generative AI

07:57: MVGamba: Unify 3D Content Generation as State Space Sequence Modeling

09:32: UMAD: Unsupervised Mask-Level Anomaly Detection for Autonomous Driving

10:15: Improving Deep Learning-based Automatic Cranial Defect Reconstruction by Heavy Data Augmentation: From Image Registration to Latent Diffusion Models

11:47: Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization

13:12: Generalizing to Unseen Domains in Diabetic Retinopathy with Disentangled Representations

15:01: FPN-IAIA-BL: A Multi-Scale Interpretable Deep Learning Model for Classification of Mass Margins in Digital Mammography

16:18: STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics

17:53: Hybrid Video Anomaly Detection for Anomalous Scenarios in Autonomous Driving

18:35: Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

20:24: SYM3D: Learning Symmetric Triplanes for Better 3D-Awareness of GANs

21:48: Spatiotemporal Graph Neural Network Modelling Perfusion MRI

22:57: VCR: Visual Caption Restoration

24:37: AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction

26:29: NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative

28:09: Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer

30:12: Merlin: A Vision Language Foundation Model for 3D Computed Tomography

32:58: Genomics-guided Representation Learning for Pathologic Pan-cancer Tumor Microenvironment Subtype Prediction

34:26: PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction

36:04: NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

37:28: Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

39:08: GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation

40:52: IllumiNeRF: 3D Relighting without Inverse Rendering

TechcraftingAI Computer Vision

TechcraftingAI Computer Vision brings you summaries of the latest arXiv research daily. Research is read by your virtual host, Sage. The podcast is produced by Brad Edwards, an AI Engineer from Vancouver, BC, and a graduate student of computer science studying AI at the University of York. Thank you to arXiv for use of its open access interoperability.

  • No. of episodes: 315
  • Latest episode: 2024-06-15
  • Technology

Where can you listen?

Apple Podcasts Logo Spotify Logo Podtail Logo Google Podcasts Logo RSS

Episodes