• Ep. 246 - Part 1 - June 12, 2024

  • 2024/06/13
  • 再生時間: 46 分
  • ポッドキャスト

Ep. 246 - Part 1 - June 12, 2024

  • サマリー

  • ArXiv Computer Vision research for Wednesday, June 12, 2024.


    00:20: FaithFill: Faithful Inpainting for Object Completion Using a Single Reference Image

    01:21: Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

    02:49: Unveiling the Power of Wavelets: A Wavelet-based Kolmogorov-Arnold Network for Hyperspectral Image Classification

    04:26: Flexible Music-Conditioned Dance Generation with Style Description Prompts

    05:52: Robust 3D Face Alignment with Multi-Path Neural Architecture Search

    07:00: Small Scale Data-Free Knowledge Distillation

    08:48: KernelWarehouse: Rethinking the Design of Dynamic Convolution

    10:31: A Comprehensive Survey on Machine Learning Driven Material Defect Detection: Challenges, Solutions, and Future Prospects

    12:34: Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation

    14:02: IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes

    14:54: Multi-Teacher Multi-Objective Meta-Learning for Zero-Shot Hyperspectral Band Selection

    16:30: DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera

    18:10: Spatial-Frequency Dual Progressive Attention Network For Medical Image Segmentation

    20:07: Accurate Explanation Model for Image Classifiers using Class Association Embedding

    21:55: Real-world Image Dehazing with Coherence-based Label Generator and Cooperative Unfolding Network

    23:11: SimSAM: Simple Siamese Representations Based Semantic Affinity Matrix for Unsupervised Image Segmentation

    24:06: Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization

    25:34: OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding

    26:58: Generalizable Disaster Damage Assessment via Change Detection with Vision Foundation Model

    28:26: Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models

    29:52: Deep Learning for Slum Mapping in Remote Sensing Images: A Meta-analysis and Review

    31:49: LVBench: An Extreme Long Video Understanding Benchmark

    33:14: Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking

    34:48: A Robust Pipeline for Classification and Detection of Bleeding Frames in Wireless Capsule Endoscopy using Swin Transformer and RT-DETR

    36:23: 3D CBCT Challenge 2024: Improved Cone Beam CT Reconstruction using SwinIR-Based Sinogram and Image Enhancement

    37:29: MWIRSTD: A MWIR Small Target Detection Dataset

    38:34: CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

    40:27: A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder

    42:35: Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

    44:26: Identification of Conversation Partners from Egocentric Video

    続きを読む 一部表示
activate_samplebutton_t1

あらすじ・解説

ArXiv Computer Vision research for Wednesday, June 12, 2024.


00:20: FaithFill: Faithful Inpainting for Object Completion Using a Single Reference Image

01:21: Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

02:49: Unveiling the Power of Wavelets: A Wavelet-based Kolmogorov-Arnold Network for Hyperspectral Image Classification

04:26: Flexible Music-Conditioned Dance Generation with Style Description Prompts

05:52: Robust 3D Face Alignment with Multi-Path Neural Architecture Search

07:00: Small Scale Data-Free Knowledge Distillation

08:48: KernelWarehouse: Rethinking the Design of Dynamic Convolution

10:31: A Comprehensive Survey on Machine Learning Driven Material Defect Detection: Challenges, Solutions, and Future Prospects

12:34: Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation

14:02: IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes

14:54: Multi-Teacher Multi-Objective Meta-Learning for Zero-Shot Hyperspectral Band Selection

16:30: DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera

18:10: Spatial-Frequency Dual Progressive Attention Network For Medical Image Segmentation

20:07: Accurate Explanation Model for Image Classifiers using Class Association Embedding

21:55: Real-world Image Dehazing with Coherence-based Label Generator and Cooperative Unfolding Network

23:11: SimSAM: Simple Siamese Representations Based Semantic Affinity Matrix for Unsupervised Image Segmentation

24:06: Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization

25:34: OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding

26:58: Generalizable Disaster Damage Assessment via Change Detection with Vision Foundation Model

28:26: Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models

29:52: Deep Learning for Slum Mapping in Remote Sensing Images: A Meta-analysis and Review

31:49: LVBench: An Extreme Long Video Understanding Benchmark

33:14: Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking

34:48: A Robust Pipeline for Classification and Detection of Bleeding Frames in Wireless Capsule Endoscopy using Swin Transformer and RT-DETR

36:23: 3D CBCT Challenge 2024: Improved Cone Beam CT Reconstruction using SwinIR-Based Sinogram and Image Enhancement

37:29: MWIRSTD: A MWIR Small Target Detection Dataset

38:34: CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

40:27: A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder

42:35: Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

44:26: Identification of Conversation Partners from Egocentric Video

Ep. 246 - Part 1 - June 12, 2024に寄せられたリスナーの声

カスタマーレビュー:以下のタブを選択することで、他のサイトのレビューをご覧になれます。