Ep. 246 - Part 1 - June 12, 2024
2024/06/13
再生時間： 46 分
ポッドキャスト

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Ep. 246 - Part 1 - June 12, 2024

無料で聴く

ポッドキャストの詳細を見る

サマリー
ArXiv Computer Vision research for Wednesday, June 12, 2024.

00:20: FaithFill: Faithful Inpainting for Object Completion Using a Single Reference Image

01:21: Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

02:49: Unveiling the Power of Wavelets: A Wavelet-based Kolmogorov-Arnold Network for Hyperspectral Image Classification

04:26: Flexible Music-Conditioned Dance Generation with Style Description Prompts

05:52: Robust 3D Face Alignment with Multi-Path Neural Architecture Search

07:00: Small Scale Data-Free Knowledge Distillation

08:48: KernelWarehouse: Rethinking the Design of Dynamic Convolution

10:31: A Comprehensive Survey on Machine Learning Driven Material Defect Detection: Challenges, Solutions, and Future Prospects

12:34: Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation

14:02: IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes

14:54: Multi-Teacher Multi-Objective Meta-Learning for Zero-Shot Hyperspectral Band Selection

16:30: DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera

18:10: Spatial-Frequency Dual Progressive Attention Network For Medical Image Segmentation

20:07: Accurate Explanation Model for Image Classifiers using Class Association Embedding

21:55: Real-world Image Dehazing with Coherence-based Label Generator and Cooperative Unfolding Network

23:11: SimSAM: Simple Siamese Representations Based Semantic Affinity Matrix for Unsupervised Image Segmentation

24:06: Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization

25:34: OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding

26:58: Generalizable Disaster Damage Assessment via Change Detection with Vision Foundation Model

28:26: Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models

29:52: Deep Learning for Slum Mapping in Remote Sensing Images: A Meta-analysis and Review

31:49: LVBench: An Extreme Long Video Understanding Benchmark

33:14: Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking

34:48: A Robust Pipeline for Classification and Detection of Bleeding Frames in Wireless Capsule Endoscopy using Swin Transformer and RT-DETR

36:23: 3D CBCT Challenge 2024: Improved Cone Beam CT Reconstruction using SwinIR-Based Sinogram and Image Enhancement

37:29: MWIRSTD: A MWIR Small Target Detection Dataset

38:34: CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

40:27: A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder

42:35: Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

44:26: Identification of Conversation Partners from Egocentric Video

続きを読む一部表示

あらすじ・解説

ArXiv Computer Vision research for Wednesday, June 12, 2024.

00:20: FaithFill: Faithful Inpainting for Object Completion Using a Single Reference Image

01:21: Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

02:49: Unveiling the Power of Wavelets: A Wavelet-based Kolmogorov-Arnold Network for Hyperspectral Image Classification

04:26: Flexible Music-Conditioned Dance Generation with Style Description Prompts

05:52: Robust 3D Face Alignment with Multi-Path Neural Architecture Search

07:00: Small Scale Data-Free Knowledge Distillation

08:48: KernelWarehouse: Rethinking the Design of Dynamic Convolution

10:31: A Comprehensive Survey on Machine Learning Driven Material Defect Detection: Challenges, Solutions, and Future Prospects

12:34: Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation

14:02: IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes

14:54: Multi-Teacher Multi-Objective Meta-Learning for Zero-Shot Hyperspectral Band Selection

16:30: DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera

18:10: Spatial-Frequency Dual Progressive Attention Network For Medical Image Segmentation

20:07: Accurate Explanation Model for Image Classifiers using Class Association Embedding

21:55: Real-world Image Dehazing with Coherence-based Label Generator and Cooperative Unfolding Network

23:11: SimSAM: Simple Siamese Representations Based Semantic Affinity Matrix for Unsupervised Image Segmentation

24:06: Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization

25:34: OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding

26:58: Generalizable Disaster Damage Assessment via Change Detection with Vision Foundation Model

28:26: Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models

29:52: Deep Learning for Slum Mapping in Remote Sensing Images: A Meta-analysis and Review

31:49: LVBench: An Extreme Long Video Understanding Benchmark

33:14: Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking

34:48: A Robust Pipeline for Classification and Detection of Bleeding Frames in Wireless Capsule Endoscopy using Swin Transformer and RT-DETR

36:23: 3D CBCT Challenge 2024: Improved Cone Beam CT Reconstruction using SwinIR-Based Sinogram and Image Enhancement

37:29: MWIRSTD: A MWIR Small Target Detection Dataset

38:34: CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

40:27: A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder

42:35: Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

44:26: Identification of Conversation Partners from Egocentric Video

続きを読む一部表示