졸업작품 조사(2)

1. 멀티모달(영상+포인트클라우드) 객체 검출모델 훈련시 GPU 성능 및 추론시간 조사<div class="table-wrap"><table data-ke-type="table" data-ke-align="alignLeft" style="width: 100%; height: 126px;" border="1"><tbody><tr style="height: 18px;"><td style="width: 25.7671%; height: 18px; text-align: center;">멀티모달 모델</td><td style="width: 40.8995%; height: 18px; text-align: center;">GPU 사양</td><td style="width: 33.3333%; height: 18px; text-align: center;">추론속도</td></tr><tr style="height: 18px;"><td style="width: 25.7671%; height: 18px;">F-PointNet(CVPR 2018)</td><td style="width: 40.8995%; height: 18px;">Eval‎uation GPU: GTX 1080</td><td style="width: 33.3333%; height: 18px;">~5 FPS (200ms)</td></tr><tr style="height: 18px;"><td style="width: 25.7671%; height: 18px;"> PointPainting(CVPR 2020)</td><td style="width: 40.8995%; height: 18px;">없음 (GPU 미기재)</td><td style="width: 33.3333%; height: 18px;">+0.75 ms latency 증가</td></tr><tr style="height: 18px;"><td style="width: 25.7671%; height: 18px;">EPNet (CVPR 2020)</td><td style="width: 40.8995%; height: 18px;">Training GPU: TITAN XP ×4</td><td style="width: 33.3333%; height: 18px;">없음 (추론 속도 미기재)</td></tr><tr style="height: 18px;"><td style="width: 25.7671%; height: 18px;"> TransFusion (CVPR 2022) </td><td style="width: 40.8995%; height: 18px;">Training GPU: V100 ×8 Inference GPU: Titan V100 + i7 CPU</td><td style="width: 33.3333%; height: 18px;">- TransFusion-L: 114.9 ms → 8.7 FPS - TransFusion: 265.9 ms → 3.8 FPS</td></tr><tr style="height: 18px;"><td style="width: 25.7671%; height: 18px;"> BEVFusion (2022) </td><td style="width: 40.8995%; height: 18px;">Inference GPU: RTX 3090</td><td style="width: 33.3333%; height: 18px;">119.2 ms → 8.4 FPS</td></tr></tbody></table></div>배치 1~2로 돌릴수는 있다고 함-TransFusion은 gpu 메모리 12~16기가 원함- BEVFusion은 gpu 메모리 24GB를 원해서 우리 환경에서는 BEVFusion은 힘듦. <정확도><div class="figure-img" data-ke-type="image" data-ke-style="alignLeft" data-ke-mobilestyle="widthOrigin"><img src="https://t1.daumcdn.net/cafeattach/1RgNt/0c643a6bd33a4090f10642960a4e6a62e7d40528_re_1764725502886" class="txc-image" width="600" data-img-src="https://t1.daumcdn.net/cafeattach/1RgNt/0c643a6bd33a4090f10642960a4e6a62e7d40528_re_1764725502886" data-origin-width="1452" data-origin-height="576"></div>1) F-PointNet (Frustum PointNet)- Frustum PointNets for 3D Object Detection from RGB-D Data 논문: <a href="https://arxiv.org/abs/1711.08488" target="_top" class="ke-link">https://arxiv.org/abs/1711.08488 </a>(CVPR 2018)<div class="figure-open" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="Frustum PointNets for 3D Object Detection from RGB-D Data" data-og-description="In this work, we study 3D object detection from RGB-D data in both indoor and outdoor scenes. While previous methods focus on images or 3D voxels, often obscuring natural 3D patterns and invariances of 3D data, we directly operate on raw point clouds by po" data-og-host="arxiv.org" data-og-source-url="https://arxiv.org/abs/1711.08488" data-og-url="https://arxiv.org/abs/1711.08488v2" data-og-image="https://scrap.kakaocdn.net/dn/eS2Yp/hyZOGUcg5l/mHoNwxjXYTZNz3foGpkQP1/img.png?width=1200&height=700&face=0_0_1200_700,https://scrap.kakaocdn.net/dn/YWRdP/hyZORO6xs9/v2qojpRYq3Id8KHNiNAYAk/img.png?width=1000&height=1000&face=0_0_1000_1000"><a href="https://arxiv.org/abs/1711.08488" target="_blank" data-source-url="https://arxiv.org/abs/1711.08488"><div class="og-image"><img class="thumb_img" src="https://scrap.kakaocdn.net/dn/eS2Yp/hyZOGUcg5l/mHoNwxjXYTZNz3foGpkQP1/img.png?width=1200&height=700&face=0_0_1200_700,https://scrap.kakaocdn.net/dn/YWRdP/hyZORO6xs9/v2qojpRYq3Id8KHNiNAYAk/img.png?width=1000&height=1000&face=0_0_1000_1000" alt="" xxxxonerror="this.src="//img1.kakaocdn.net/thumb/C200x200/?fname=https%3A%2F%2Ft1.daumcdn.net%2Fcafe_image%2Fcafe_meta_image_190529.png""></div><div class="og-text">Frustum PointNets for 3D Object Detection from RGB-D DataIn this work, we study 3D object detection from RGB-D data in both indoor and outdoor scenes. While previous methods focus on images or 3D voxels, often obscuring natural 3D patterns and invariances of 3D data, we directly operate on raw point clouds by poarxiv.org</div></a></div>gpu 사양: NVIDIA GTX 1080 and a single CPU core추론속도: v1: 1 / 0.088 = 11.36 FPS, v2: 1 / 0.167 = 5.99 FPS ≈ 6 FPS <div class="figure-img" data-ke-type="image" data-ke-style="alignLeft" data-ke-mobilestyle="widthOrigin"><img src="https://t1.daumcdn.net/cafeattach/1RgNt/f6be819bd6669531ddc2b947ac4e4b0db6f0adb2_re_1764725502887" class="txc-image" width="400" data-img-src="https://t1.daumcdn.net/cafeattach/1RgNt/f6be819bd6669531ddc2b947ac4e4b0db6f0adb2_re_1764725502887" data-origin-width="664" data-origin-height="204"></div>Backbone=> v1:PointNet 기반, v2:PointNet++ 2) PointPainting (CVPR 2020)- PointPainting: Sequential Fusion for 3D Object Detection 논문: <a href="https://arxiv.org/abs/1911.10150" target="_top" class="ke-link">https://arxiv.org/abs/1911.10150</a><div class="figure-open" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="PointPainting: Sequential Fusion for 3D Object Detection" data-og-description="Camera and lidar are important sensor modalities for robotics in general and self-driving cars in particular. The sensors provide complementary information offering an opportunity for tight sensor-fusion. Surprisingly, lidar-only methods outperform fusion " data-og-host="arxiv.org" data-og-source-url="https://arxiv.org/abs/1911.10150" data-og-url="https://arxiv.org/abs/1911.10150v2" data-og-image="https://scrap.kakaocdn.net/dn/0PVDd/hyZO325rdI/2MKE16x7S8ncqit57uHsT1/img.png?width=1200&height=700&face=0_0_1200_700,https://scrap.kakaocdn.net/dn/hBXAf/hyZORO6wpm/f6FJCIGnmkCfe6tLQqOMo1/img.png?width=1000&height=1000&face=0_0_1000_1000"><a href="https://arxiv.org/abs/1911.10150" target="_blank" data-source-url="https://arxiv.org/abs/1911.10150"><div class="og-image"><img class="thumb_img" src="https://scrap.kakaocdn.net/dn/0PVDd/hyZO325rdI/2MKE16x7S8ncqit57uHsT1/img.png?width=1200&height=700&face=0_0_1200_700,https://scrap.kakaocdn.net/dn/hBXAf/hyZORO6wpm/f6FJCIGnmkCfe6tLQqOMo1/img.png?width=1000&height=1000&face=0_0_1000_1000" alt="" xxxxonerror="this.src="//img1.kakaocdn.net/thumb/C200x200/?fname=https%3A%2F%2Ft1.daumcdn.net%2Fcafe_image%2Fcafe_meta_image_190529.png""></div><div class="og-text">PointPainting: Sequential Fusion for 3D Object DetectionCamera and lidar are important sensor modalities for robotics in general and self-driving cars in particular. The sensors provide complementary information offering an opportunity for tight sensor-fusion. Surprisingly, lidar-only methods outperform fusion arxiv.org</div></a></div>- 이미지 segmentation → LiDAR 포인트에 색칠 - PointPillars, PV-RCNN 등 기존 LiDAR detector 성능을 쉽게 상승 gpu 사양: 따로 나와있지 않음추론속도: 기존모델에 PointPainting을 추가하였을때 0.75ms 만큼 추가된다고 나와있다.<div class="figure-img" data-ke-type="image" data-ke-style="alignLeft" data-ke-mobilestyle="widthOrigin"><img src="https://t1.daumcdn.net/cafeattach/1RgNt/66ea5134f3c732125e03c24fa4c7469d3713dd89_re_1764725502887" class="txc-image" width="300" data-img-src="https://t1.daumcdn.net/cafeattach/1RgNt/66ea5134f3c732125e03c24fa4c7469d3713dd89_re_1764725502887" data-origin-width="660" data-origin-height="134"></div><정확도><div class="figure-img" data-ke-type="image" data-ke-style="alignLeft" data-ke-mobilestyle="widthOrigin"><img src="https://t1.daumcdn.net/cafeattach/1RgNt/76d70c70fea30e689f20ba2260f9a05694e4700e_re_1764725502887" class="txc-image" width="600" data-img-src="https://t1.daumcdn.net/cafeattach/1RgNt/76d70c70fea30e689f20ba2260f9a05694e4700e_re_1764725502887" data-origin-width="1400" data-origin-height="490"></div>3) EPNet (CVPR 2020)논문: <a href="https://arxiv.org/abs/2007.08856" target="_top" class="ke-link">https://arxiv.org/abs/2007.08856</a><div class="figure-open" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection" data-og-description="In this paper, we aim at addressing two critical issues in the 3D detection task, including the exploitation of multiple sensors~(namely LiDAR point cloud and camera image), as well as the inconsistency between the localization and classification confidenc" data-og-host="arxiv.org" data-og-source-url="https://arxiv.org/abs/2007.08856" data-og-url="https://arxiv.org/abs/2007.08856v1" data-og-image="https://scrap.kakaocdn.net/dn/qg4fY/hyZOAl80A6/hl84ofbXZHiUgkcoCrOEG0/img.png?width=1200&height=700&face=0_0_1200_700,https://scrap.kakaocdn.net/dn/cMHpId/hyZOAsUjtl/8GMTTHkyecGz4cP78olBE0/img.png?width=1000&height=1000&face=0_0_1000_1000"><a href="https://arxiv.org/abs/2007.08856" target="_blank" data-source-url="https://arxiv.org/abs/2007.08856"><div class="og-image"><img class="thumb_img" src="https://scrap.kakaocdn.net/dn/qg4fY/hyZOAl80A6/hl84ofbXZHiUgkcoCrOEG0/img.png?width=1200&height=700&face=0_0_1200_700,https://scrap.kakaocdn.net/dn/cMHpId/hyZOAsUjtl/8GMTTHkyecGz4cP78olBE0/img.png?width=1000&height=1000&face=0_0_1000_1000" alt="" xxxxonerror="this.src="//img1.kakaocdn.net/thumb/C200x200/?fname=https%3A%2F%2Ft1.daumcdn.net%2Fcafe_image%2Fcafe_meta_image_190529.png""></div><div class="og-text">EPNet: Enhancing Point Features with Image Semantics for 3D Object DetectionIn this paper, we aim at addressing two critical issues in the 3D detection task, including the exploitation of multiple sensors~(namely LiDAR point cloud and camera image), as well as the inconsistency between the localization and classification confidencarxiv.org</div></a></div>- 이미지 semantics → LiDAR point features에 attention으로 주입- RoI 단계에서도 fusion- LiDAR-only 대비 높은 정확도 gpu사양: Titan XP GPU(TRAIN)<div class="figure-img" data-ke-type="image" data-ke-style="alignLeft" data-ke-mobilestyle="widthOrigin"><img src="https://t1.daumcdn.net/cafeattach/1RgNt/50c255b7613afa8f13191b23a4d4d5216983c38d_re_1764725502886" class="txc-image" width="600" data-img-src="https://t1.daumcdn.net/cafeattach/1RgNt/50c255b7613afa8f13191b23a4d4d5216983c38d_re_1764725502886" data-origin-width="940" data-origin-height="68"></div>추론시간: 나와있지 않음 4) TransFusion (CVPR 2022)논문: <a href="https://arxiv.org/abs/2203.11496" target="_top" class="ke-link">https://arxiv.org/abs/2203.11496</a><div class="figure-open" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers" data-og-description="LiDAR and camera are two important sensors for 3D object detection in autonomous driving. Despite the increasing popularity of sensor fusion in this field, the robustness against inferior image conditions, e.g., bad illumination and sensor misalignment, is" data-og-host="arxiv.org" data-og-source-url="https://arxiv.org/abs/2203.11496" data-og-url="https://arxiv.org/abs/2203.11496v1" data-og-image="https://scrap.kakaocdn.net/dn/cERrQY/hyZO57GwqY/JuohzgSURtgb9MsjqqYOIk/img.png?width=1200&height=700&face=0_0_1200_700,https://scrap.kakaocdn.net/dn/bJFJRj/hyZOP4R6on/aC5JJgaN3fXk4XYGK9Y7A0/img.png?width=1000&height=1000&face=0_0_1000_1000"><a href="https://arxiv.org/abs/2203.11496" target="_blank" data-source-url="https://arxiv.org/abs/2203.11496"><div class="og-image"><img class="thumb_img" src="https://scrap.kakaocdn.net/dn/cERrQY/hyZO57GwqY/JuohzgSURtgb9MsjqqYOIk/img.png?width=1200&height=700&face=0_0_1200_700,https://scrap.kakaocdn.net/dn/bJFJRj/hyZOP4R6on/aC5JJgaN3fXk4XYGK9Y7A0/img.png?width=1000&height=1000&face=0_0_1000_1000" alt="" xxxxonerror="this.src="//img1.kakaocdn.net/thumb/C200x200/?fname=https%3A%2F%2Ft1.daumcdn.net%2Fcafe_image%2Fcafe_meta_image_190529.png""></div><div class="og-text">TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with TransformersLiDAR and camera are two important sensors for 3D object detection in autonomous driving. Despite the increasing popularity of sensor fusion in this field, the robustness against inferior image conditions, e.g., bad illumination and sensor misalignment, isarxiv.org</div></a></div>- LiDAR BEV + Camera BEV → Transformer Cross-Attention- nuScenes 등에서 당시 SOTA- 강력한 multi-sensor fusion baselinegpu 사양: Tesla V100 GPU / GPU 개수 8개<div class="figure-img" data-ke-type="image" data-ke-style="alignLeft" data-ke-mobilestyle="widthOrigin"><img src="https://t1.daumcdn.net/cafeattach/1RgNt/67271d25da44e2b60e292dcc757596630ca6c4f8_re_1764725502886" class="txc-image" width="400" data-img-src="https://t1.daumcdn.net/cafeattach/1RgNt/67271d25da44e2b60e292dcc757596630ca6c4f8_re_1764725502886" data-origin-width="672" data-origin-height="44"></div>추론속도: 114.9~265.9ms (3.76 FPS~ 8.7 FP) <div class="figure-img" data-ke-type="image" data-ke-style="alignLeft" data-ke-mobilestyle="widthOrigin"><img src="https://t1.daumcdn.net/cafeattach/1RgNt/2997ed0035cad36d6fe51c593e6c093e2f8605e2_re_1764725502886" class="txc-image" width="500" data-img-src="https://t1.daumcdn.net/cafeattach/1RgNt/2997ed0035cad36d6fe51c593e6c093e2f8605e2_re_1764725502886" data-origin-width="668" data-origin-height="350"></div>5) BEVFusion (2022)논문: <a href="https://arxiv.org/abs/2205.13542" target="_top" class="ke-link">https://arxiv.org/abs/2205.13542</a><div class="figure-open" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation" data-og-description="Multi-sensor fusion is essential for an accurate and reliable autonomous driving system. Recent approaches are based on point-level fusion: augmenting the LiDAR point cloud with camera features. However, the camera-to-LiDAR projection throws away the seman" data-og-host="arxiv.org" data-og-source-url="https://arxiv.org/abs/2205.13542" data-og-url="https://arxiv.org/abs/2205.13542v3" data-og-image="https://scrap.kakaocdn.net/dn/g3aaK/hyZOANdlfe/O3lfQWxaBeju9VSFrUEoQk/img.png?width=1200&height=700&face=0_0_1200_700,https://scrap.kakaocdn.net/dn/dmxw7t/hyZNAnLrop/4mjdXU1vuNyKPISYUpIUd1/img.png?width=1000&height=1000&face=0_0_1000_1000"><a href="https://arxiv.org/abs/2205.13542" target="_blank" data-source-url="https://arxiv.org/abs/2205.13542"><div class="og-image"><img class="thumb_img" src="https://scrap.kakaocdn.net/dn/g3aaK/hyZOANdlfe/O3lfQWxaBeju9VSFrUEoQk/img.png?width=1200&height=700&face=0_0_1200_700,https://scrap.kakaocdn.net/dn/dmxw7t/hyZNAnLrop/4mjdXU1vuNyKPISYUpIUd1/img.png?width=1000&height=1000&face=0_0_1000_1000" alt="" xxxxonerror="this.src="//img1.kakaocdn.net/thumb/C200x200/?fname=https%3A%2F%2Ft1.daumcdn.net%2Fcafe_image%2Fcafe_meta_image_190529.png""></div><div class="og-text">BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View RepresentationMulti-sensor fusion is essential for an accurate and reliable autonomous driving system. Recent approaches are based on point-level fusion: augmenting the LiDAR point cloud with camera features. However, the camera-to-LiDAR projection throws away the semanarxiv.org</div></a></div>- 이미지 → BEV 변환(View Transformer)- LiDAR BEV와 직접 fusion- Tesla·XPeng 등 실제 자율주행 시스템에서 사용되는 구조=> 차세대 자율주행 perception 구조의 핵심gpu: RTX 3090(추론)추론시간: 8.4 FPS<div class="figure-img" data-ke-type="image" data-ke-style="alignLeft" data-ke-mobilestyle="widthOrigin"><img src="https://t1.daumcdn.net/cafeattach/1RgNt/a9a22677a320cd8ae0aadf75de3346ecfd723af0_re_1764725502886" class="txc-image" width="400" height="62" data-img-src="https://t1.daumcdn.net/cafeattach/1RgNt/a9a22677a320cd8ae0aadf75de3346ecfd723af0_re_1764725502886" data-origin-width="686" data-origin-height="106"></div><div class="figure-img" data-ke-type="image" data-ke-style="alignCenter" data-ke-mobilestyle="widthOrigin"><img src="https://t1.daumcdn.net/cafeattach/1RgNt/074a44f82d894799a708461dc93004ed5da2a865_re_1764725502886" class="txc-image" data-img-src="https://t1.daumcdn.net/cafeattach/1RgNt/074a44f82d894799a708461dc93004ed5da2a865_re_1764725502886" data-origin-width="1394" data-origin-height="572"></div>  2. occupancy network논문: <a href="https://arxiv.org/abs/1812.03828" target="_blank" class="ke-link">https://arxiv.org/abs/1812.03828</a><div class="figure-open" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="Occupancy Networks: Learning 3D Reconstruction in Function Space" data-og-description="With the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for repres" data-og-host="arxiv.org" data-og-source-url="https://arxiv.org/abs/1812.03828" data-og-url="https://arxiv.org/abs/1812.03828v2" data-og-image="https://scrap.kakaocdn.net/dn/bQL1r0/hyZOTNbWoi/EzYHOmcl1edSoDE6LQECj1/img.png?width=1200&height=700&face=0_0_1200_700,https://scrap.kakaocdn.net/dn/KTK49/hyZOSAKZMx/OpDO2ZAJXpjjEIT24lgcq0/img.png?width=1000&height=1000&face=0_0_1000_1000"><a href="https://arxiv.org/abs/1812.03828" target="_blank" data-source-url="https://arxiv.org/abs/1812.03828"><div class="og-image"><img class="thumb_img" src="https://scrap.kakaocdn.net/dn/bQL1r0/hyZOTNbWoi/EzYHOmcl1edSoDE6LQECj1/img.png?width=1200&height=700&face=0_0_1200_700,https://scrap.kakaocdn.net/dn/KTK49/hyZOSAKZMx/OpDO2ZAJXpjjEIT24lgcq0/img.png?width=1000&height=1000&face=0_0_1000_1000" alt="" xxxxonerror="this.src="//img1.kakaocdn.net/thumb/C200x200/?fname=https%3A%2F%2Ft1.daumcdn.net%2Fcafe_image%2Fcafe_meta_image_190529.png""></div><div class="og-text">Occupancy Networks: Learning 3D Reconstruction in Function SpaceWith the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for represarxiv.org</div></a></div> Occupancy Network=> 3D 공간의 모든 점에 대해 물체 내부 여부를 예측하는 함수 모델Occupancy Networks는 3D 표면을 심층 신경망 분류기의 연속 의사결정 경계로 암시적으로 나타낸다. 기존 접근 방식과 달리 Occupancy Networks는 무한한 해상도로 3D 출력을 기술할 수 있고, 차가적인 메모리 오버헤드를 필요로 하지 않음<div class="figure-img" data-ke-type="image" data-ke-style="alignCenter" data-ke-mobilestyle="widthOrigin"><img src="https://t1.daumcdn.net/cafeattach/1RgNt/0c6d71a16c01a4554c3956a090b9f209fbab2b96" class="txc-image" data-img-src="https://t1.daumcdn.net/cafeattach/1RgNt/0c6d71a16c01a4554c3956a090b9f209fbab2b96" data-origin-width="602" data-origin-height="341"><div class="figcaption">분류기(딥 뉴럴 네트워크)의 연속적 결정 경계를 3D 표면으로 취급 -> 어떤 해상도로는 3D 메쉬 추출 가능</div></div>voxel 재구성은 해상도가 증가함에 따라 메모리 비용 늘어나고 point와 mesh는 표현 제대로 하려면 후처리가 필요-> 고정된 해상도에서 복셀화된 표현을 예측하는 대신 임의 해상도로 평가할 수 있는 신경망 fθ를 사용하여 완전한 occupancy 함수를 예측---> 훈련동안의 메모리 사용량 급격하게 줄암Occupancy Network는 입력(이미지, 점군, 복셀 등)을 바탕으로공간의 모든 3D 좌표 p ∈ R³에 대해,해당 점이 물체 내부(occupancy) 외부인지(empty)를 예측하는 함수 fθ(p, x)를 학습하는 모델임.=> 이 함수의 결정 경계가 3d 물체의 표현이 됨 Occupancy Network는 voxel 좌표처럼 제한된 지점만 보는 게 아니라 3d 공간의 모든 점 p에 대해 occupancy를 예측=> 3d 공간 전체를 연속적으로 모델<div class="figure-img" data-ke-type="image" data-ke-style="alignLeft" data-ke-mobilestyle="widthOrigin"><img src="https://t1.daumcdn.net/cafeattach/1RgNt/247389100e05e3c4102c6582f6525af4eb94a3c8" class="txc-image" data-img-src="https://t1.daumcdn.net/cafeattach/1RgNt/247389100e05e3c4102c6582f6525af4eb94a3c8" data-origin-width="142" data-origin-height="34"><div class="figcaption">내부:1, 외부:0 이걸 신경망으로 근사 p: 위치정보, x: 관찰정보</div></div><학습과정>1) 샘플링학습 시, 물체의 3D bounding box 내부에서 K개의 점 p_ij 를 랜덤으로 샘플링2) label각 점 p_ij 가 객체 내부인지 외부인지 ->o_ij (ground truth occupancy)3) 손실함수- cross entropy 점을 어디서 어떻게 샘플링 하느냐가 모델 성능에 매우 중요, bounding box 내부를 균일 샘플링+작은 패딩 넣는 방식이 최적 <Inference-mesh 추출과정>Occupancy Network는 함수만 주기 때문에 표면(mesh)을 직접 추출해야 함<MISE(Multi-resolution IsoSurface Extraction)>1. 낮은 해상도 그리드에서 fθ(p, x)를 먼저 평가2. 점유/비점유가 섞여 있는 voxel(= 경계에 위치한 voxel)을 활성 voxel로 표시3. 해당 voxel을 8개로 subdivide(세분화) -> 물체의 표면일 가능성 높으니 촘촘히 나눠라4. 새로 생긴 grid points에 대해 다시 fθ 평가5. 원하는 해상도까지 반복6. 마지막에 marching cubes 적용하여 메시 생성7. 메시를 refinement(정제)       - Fast Quadric Simplification (불필요한 삼각형 제거하고 단순화)       - 1~2차 gradient로 smoothing(Occupancy Network fθ(p)의 gradient(기울기)를 이용해 Marching Cubes가 만든 메시를 더 매끄럽고 실제 표면에 가깝게 보정 ) <div class="figure-img" data-ke-type="image" data-ke-style="alignCenter" data-ke-mobilestyle="widthOrigin"><img src="https://t1.daumcdn.net/cafeattach/1RgNt/89f6d7a8bb92bfe508c2a528ff7b5d06b9ebc48b" class="txc-image" data-img-src="https://t1.daumcdn.net/cafeattach/1RgNt/89f6d7a8bb92bfe508c2a528ff7b5d06b9ebc48b" data-origin-width="396" data-origin-height="541"></div>  3. slam 을 AI로 처리하는게 있는지?SLAM을 AI로 처리하는 방법은 이미 존재하고, 최근에는 AI-SLAM, Learning-based SLAM, Neural SLAM이라고 불리는 연구 분야로 발전하고 있음.<기존 SLAM vs AI-SLAM 비교표 (전체 기능 항목별)<div class="table-wrap"><table data-ke-type="table" data-ke-align="alignLeft" style="width: 100%;" border="1"><tbody><tr><td style="text-align: center;"> SLAM 구성 요소 </td><td style="text-align: center;"> 전통 SLAM(기하학 기반) </td><td style="text-align: center;"> AI-SLAM(딥러닝 기반) </td><td style="text-align: center;"> 대표 논문 </td></tr><tr><td>Feature Extraction (특징점 추출)</td><td>ORB, SIFT, FAST 같은 수학 기반 특징</td><td>CNN이 특징점을 직접 학습 → 텍스처 부족·블러에서도 강함</td><td>SuperPoint</td></tr><tr><td>Feature Matching (특징점 매칭)</td><td>KNN 매칭 + RANSAC 기반 정합</td><td>GNN이 두 이미지 특징을 직접 매칭</td><td>SuperGlue</td></tr><tr><td>Visual Odometry(VO) (프레임간 이동 추정)</td><td>Essential Matrix, PnP, ICP</td><td>RNN/CNN이 직접 상대 pose 예측</td><td>DeepVO</td></tr><tr><td>Pose Estimation + Bundle Adjustment</td><td>g2o / ceres 기반 최적화</td><td>Neural optimizer가 직접 pose refinement</td><td>DROID-SLAM</td></tr><tr><td>Mapping(지도 생성)</td><td>point cloud map, octomap, TSDF</td><td>NeRF 기반 neural implicit map (고해상도)</td><td>iMAP / NICE-SLAM</td></tr><tr><td>Loop Closure(장소 재인식)</td><td>DBoW2(BoW 기반 장소 인식)</td><td>CNN 기반 place recognition, 더 robust</td><td>NeuralLC</td></tr><tr><td>Sensor Fusion(VIO, multi-sensor)</td><td>EKF, Graph-SLAM</td><td>Transformer 기반 multi-modal fusion</td><td>VIOFormer, DLIO</td></tr><tr><td>End-to-End SLAM</td><td>여러 모듈 조합, 직접 연결</td><td>맵핑+포즈+탐색을 하나의 NN이 수행</td><td>Neural-SLAM / Co-SLAM</td></tr><tr><td>강인성(Robustness)</td><td>조명 변화, 블러, 텍스처 없음에 취약</td><td>딥러닝 특징 덕분에 강함</td><td></td></tr><tr><td>정확도(Accuracy)</td><td>환경에 따라 천차만별</td><td>최근 연구들은 ORB-SLAM보다 높거나 비슷</td><td></td></tr><tr><td>속도(FPS)</td><td>보통 20~100 FPS (CPU도 가능)</td><td>모델에 따라 다름 (GPU 필요), 일부는 실시간</td><td>Co-SLAM(Real-time)</td></tr><tr><td>학습 필요 여부</td><td>학습 없음</td><td>데이터 학습 필요</td><td></td></tr></tbody></table></div>1) Neural-SLAM (ICLR 2020)가장 유명한 AI-SLAM 시리즈의 시작논문: <a href="https://arxiv.org/abs/1904.03790" target="_top" class="ke-link">https://arxiv.org/abs/1904.03790</a><div class="figure-open" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="Revisiting Metastable Dark Energy and Tensions in the Estimation of Cosmological Parameters" data-og-description="We investigate constraints on some key cosmological parameters by confronting metastable dark energy models with different combinations of the most recent cosmological observations. Along with the standard $Λ$CDM model, two phenomenological metastable dar" data-og-host="arxiv.org" data-og-source-url="https://arxiv.org/abs/1904.03790" data-og-url="https://arxiv.org/abs/1904.03790v2" data-og-image="https://scrap.kakaocdn.net/dn/wWryh/hyZO4gCKxK/zADe3QIARRDUUkhcTAKkuK/img.png?width=1200&height=700&face=0_0_1200_700,https://scrap.kakaocdn.net/dn/cf6Dxf/hyZNwZYUSb/38ib7uyyH2ymOkjzA0re10/img.png?width=1000&height=1000&face=0_0_1000_1000"><a href="https://arxiv.org/abs/1904.03790" target="_blank" data-source-url="https://arxiv.org/abs/1904.03790"><div class="og-image"><img class="thumb_img" src="https://scrap.kakaocdn.net/dn/wWryh/hyZO4gCKxK/zADe3QIARRDUUkhcTAKkuK/img.png?width=1200&height=700&face=0_0_1200_700,https://scrap.kakaocdn.net/dn/cf6Dxf/hyZNwZYUSb/38ib7uyyH2ymOkjzA0re10/img.png?width=1000&height=1000&face=0_0_1000_1000" alt="" xxxxonerror="this.src="//img1.kakaocdn.net/thumb/C200x200/?fname=https%3A%2F%2Ft1.daumcdn.net%2Fcafe_image%2Fcafe_meta_image_190529.png""></div><div class="og-text">Revisiting Metastable Dark Energy and Tensions in the Estimation of Cosmological ParametersWe investigate constraints on some key cosmological parameters by confronting metastable dark energy models with different combinations of the most recent cosmological observations. Along with the standard $Λ$CDM model, two phenomenological metastable dararxiv.org</div></a></div>- 강화학습(RL) 기반으로 맵핑 + 로컬라이제이션 + 탐색 정책을 신경망 내에서 모두 학습- AI로 SLAM을 대체한다는 개념을 처음 명확히 제시 2) DROID-SLAM (NeurIPS 2021) — Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras 논문: <a href="https://arxiv.org/abs/2108.08898" target="_top" class="ke-link">https://arxiv.org/abs/2108.08898</a><div class="figure-open" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="On generalized Turán results in height two posets" data-og-description="For given posets $P$ and $Q$ and an integer $n$, the generalized Turán problem for posets, asks for the maximum number of copies of $Q$ in a $P$-free subset of the $n$-dimensional Boolean lattice, $2^{[n]}$. In this paper, among other results, we show the" data-og-host="arxiv.org" data-og-source-url="https://arxiv.org/abs/2108.08898" data-og-url="https://arxiv.org/abs/2108.08898v2" data-og-image="https://scrap.kakaocdn.net/dn/ywbHs/hyZOQ3NnnD/vH8RwcaKwTwSTrLOWXCUP1/img.png?width=1200&height=700&face=0_0_1200_700,https://scrap.kakaocdn.net/dn/bNhW1O/hyZODwomeR/oKYgKAWVbWi9kAOPpvyG4k/img.png?width=1000&height=1000&face=0_0_1000_1000"><a href="https://arxiv.org/abs/2108.08898" target="_blank" data-source-url="https://arxiv.org/abs/2108.08898"><div class="og-image"><img class="thumb_img" src="https://scrap.kakaocdn.net/dn/ywbHs/hyZOQ3NnnD/vH8RwcaKwTwSTrLOWXCUP1/img.png?width=1200&height=700&face=0_0_1200_700,https://scrap.kakaocdn.net/dn/bNhW1O/hyZODwomeR/oKYgKAWVbWi9kAOPpvyG4k/img.png?width=1000&height=1000&face=0_0_1000_1000" alt="" xxxxonerror="this.src="//img1.kakaocdn.net/thumb/C200x200/?fname=https%3A%2F%2Ft1.daumcdn.net%2Fcafe_image%2Fcafe_meta_image_190529.png""></div><div class="og-text">On generalized Turán results in height two posetsFor given posets $P$ and $Q$ and an integer $n$, the generalized Turán problem for posets, asks for the maximum number of copies of $Q$ in a $P$-free subset of the $n$-dimensional Boolean lattice, $2^{[n]}$. In this paper, among other results, we show thearxiv.org</div></a></div>- Stereo(ZED)에서 성능 최강- Tracking + Depth + BA + Feature 전부 NN- 실시간 또는 준실시간 가능 3) Co-SLAM (CVPR 2023)- Co-SLAM: Real-time Dense Collaborative Neural SLAM 가장 최신 Neural SLAM 중 성능 최강급논문: <a href="https://arxiv.org/abs/2303.13999" target="_top" class="ke-link">https://arxiv.org/abs/2303.13999</a><div class="figure-open" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="Near coincidence of metal-insulator transition and quantum critical fluctuations: Electronic ground state and magnetic order in " data-og-description="We present a detailed study of the electronic and magnetic ground state properties of Fe$_{1-x}$Co$_{x}$Si using a combination of macroscopic and microscopic experimental techniques. From these experiments we quantitatively characterize the metal-insulator" data-og-host="arxiv.org" data-og-source-url="https://arxiv.org/abs/2303.13999" data-og-url="https://arxiv.org/abs/2303.13999v2" data-og-image="https://scrap.kakaocdn.net/dn/bKXpeo/hyZOSUONEm/4ghN0pBqYXYiYfsMmBKsw1/img.png?width=1200&height=700&face=0_0_1200_700,https://scrap.kakaocdn.net/dn/cs1vcC/hyZORn4B5F/S6uegLsrQr2bMHpSWGPX91/img.png?width=1000&height=1000&face=0_0_1000_1000"><a href="https://arxiv.org/abs/2303.13999" target="_blank" data-source-url="https://arxiv.org/abs/2303.13999"><div class="og-image"><img class="thumb_img" src="https://scrap.kakaocdn.net/dn/bKXpeo/hyZOSUONEm/4ghN0pBqYXYiYfsMmBKsw1/img.png?width=1200&height=700&face=0_0_1200_700,https://scrap.kakaocdn.net/dn/cs1vcC/hyZORn4B5F/S6uegLsrQr2bMHpSWGPX91/img.png?width=1000&height=1000&face=0_0_1000_1000" alt="" xxxxonerror="this.src="//img1.kakaocdn.net/thumb/C200x200/?fname=https%3A%2F%2Ft1.daumcdn.net%2Fcafe_image%2Fcafe_meta_image_190529.png""></div><div class="og-text">Near coincidence of metal-insulator transition and quantum critical fluctuations: Electronic ground state and magnetic order in We present a detailed study of the electronic and magnetic ground state properties of Fe$_{1-x}$Co$_{x}$Si using a combination of macroscopic and microscopic experimental techniques. From these experiments we quantitatively characterize the metal-insulatorarxiv.org</div></a></div>- multi-robot + neural field 기반 협력 SLAM- Monocular / Stereo 카메라- 매우 빠르고 정교한 dense map 생성 4. 3d lidar로 slam깃허브를 찾아보면 대부분 ROS1을 사용하고 ROS2를 포팅하는 경우는 드물다.1) LOAM (Lidar Odometry and Mapping)- 3D LiDAR SLAM의 기본 모델(Odometry + Mapping 구조) - ROS1 지원2) FLOAM (Fast LOAM)GitHub: <a href="https://github.com/wh200720041/floam" target="_top" class="ke-link">https://github.com/wh200720041/floam</a><div class="figure-open" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="GitHub - wh200720041/floam: Fast LOAM: Fast and Optimized Lidar Odometry And Mapping for indoor/outdoor localization IROS 2021" data-og-description="Fast LOAM: Fast and Optimized Lidar Odometry And Mapping for indoor/outdoor localization IROS 2021 - wh200720041/floam" data-og-host="github.com" data-og-source-url="https://github.com/wh200720041/floam" data-og-url="https://github.com/wh200720041/floam" data-og-image="https://scrap.kakaocdn.net/dn/boa6w1/hyZO4gEtoB/kpuOhD5KV9GAKxl7gpWhEk/img.png?width=1200&height=600&face=971_136_1064_237,https://scrap.kakaocdn.net/dn/gvA6z/hyZONMzuND/aEXLn7h8AcOXfYw5PqiOmk/img.png?width=1200&height=600&face=971_136_1064_237"><a href="https://github.com/wh200720041/floam" target="_blank" data-source-url="https://github.com/wh200720041/floam"><div class="og-image"><img class="thumb_img" src="https://scrap.kakaocdn.net/dn/boa6w1/hyZO4gEtoB/kpuOhD5KV9GAKxl7gpWhEk/img.png?width=1200&height=600&face=971_136_1064_237,https://scrap.kakaocdn.net/dn/gvA6z/hyZONMzuND/aEXLn7h8AcOXfYw5PqiOmk/img.png?width=1200&height=600&face=971_136_1064_237" alt="" xxxxonerror="this.src="//img1.kakaocdn.net/thumb/C200x200/?fname=https%3A%2F%2Ft1.daumcdn.net%2Fcafe_image%2Fcafe_meta_image_190529.png""></div><div class="og-text">GitHub - wh200720041/floam: Fast LOAM: Fast and Optimized Lidar Odometry And Mapping for indoor/outdoor localization IROS 2021Fast LOAM: Fast and Optimized Lidar Odometry And Mapping for indoor/outdoor localization IROS 2021 - wh200720041/floamgithub.com</div></a></div>LOAM을 더 빠르게 경량화한 버전, CPU 부담 ↓, 드론/라파4에서도 실시간 가능- ROS1만 지원, ROS2 포팅 필요함 3) lidarslam_ros2<a href="https://github.com/rsasaki0109/lidarslam_ros2" target="_top" class="ke-link">https://github.com/rsasaki0109/lidarslam_ros2</a><div class="figure-open" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="GitHub - rsasaki0109/lidarslam_ros2: ROS 2 package of 3D lidar slam using ndt/gicp registration and pose-optimization" data-og-description="ROS 2 package of 3D lidar slam using ndt/gicp registration and pose-optimization - rsasaki0109/lidarslam_ros2" data-og-host="github.com" data-og-source-url="https://github.com/rsasaki0109/lidarslam_ros2" data-og-url="https://github.com/rsasaki0109/lidarslam_ros2" data-og-image="https://scrap.kakaocdn.net/dn/DxsGm/hyZODwow04/kMmA4OV8ACIiGPVHzpeG51/img.png?width=1200&height=600&face=0_0_1200_600,https://scrap.kakaocdn.net/dn/0sDuX/hyZO3B3sQk/PdQVCR0v1FGzBn8rxraIqK/img.png?width=1200&height=600&face=0_0_1200_600"><a href="https://github.com/rsasaki0109/lidarslam_ros2" target="_blank" data-source-url="https://github.com/rsasaki0109/lidarslam_ros2"><div class="og-image"><img class="thumb_img" src="https://scrap.kakaocdn.net/dn/DxsGm/hyZODwow04/kMmA4OV8ACIiGPVHzpeG51/img.png?width=1200&height=600&face=0_0_1200_600,https://scrap.kakaocdn.net/dn/0sDuX/hyZO3B3sQk/PdQVCR0v1FGzBn8rxraIqK/img.png?width=1200&height=600&face=0_0_1200_600" alt="" xxxxonerror="this.src="//img1.kakaocdn.net/thumb/C200x200/?fname=https%3A%2F%2Ft1.daumcdn.net%2Fcafe_image%2Fcafe_meta_image_190529.png""></div><div class="og-text">GitHub - rsasaki0109/lidarslam_ros2: ROS 2 package of 3D lidar slam using ndt/gicp registration and pose-optimizationROS 2 package of 3D lidar slam using ndt/gicp registration and pose-optimization - rsasaki0109/lidarslam_ros2github.com</div></a></div> - ROS2 기반 순수 LiDAR SLAM ==> ROS2 네이티브 → Foxy/Humble에서 잘 돌아감- Ouster/Velodyne과 호환됨- LiDAR 단일 센서만 사용 (IMU 필요 없음) <Front-end: LiDAR Odometry><ul style="list-style-type: disc;" data-ke-list-type="disc"><li>NDT(Normal Distributions Transform)<ul style="list-style-type: disc;" data-ke-list-type="disc"><li>포인트클라우드를 격자(voxel grid)로 나누고, 각 격자마다 정규분포(가우시안)을 만들어서 맵을 표현하는 방식                     => 점들이 아니라 공간을 통계적 분포로 표현하는 방식</li><li>점군 특징 없이도 성능 안정적으로 버팀</li><li>Velodyne/Ouster 오프셋 변화에도 강함</li></ul></li><li>GICP(Generalized ICP)<ul style="list-style-type: disc;" data-ke-list-type="disc"><li>ICP(Iterative Closest Point) 알고리즘의 강화 버전이며,두 개의 포인트클라우드를 정밀하게 정렬하는 방법                       => 일반 ICP = 점 대 점(point-to-point)이고  GICP = 점 대 면(point-to-plane) + 공분산 기반 정합임.</li><li>NDT보다 정밀한 경우 많음</li><li>약간 느림</li></ul></li></ul>➡ 파라미터로 NDT or GICP 선택 가능 <Back-end: Graph-based SLAM>- 포즈 그래프 생성, keyframe 등록, loop closure (가능), global map 재구성➡ 사실상 간단한 LIO-SAM의 LiDAR-only 버전 같은 형태 <맵 생성 방식><ul style="list-style-type: disc;" data-ke-list-type="disc"><li>Local map (LOAM-style) : 80–150m 정도의 local map 관리</li><li>Global map (Graph optimization): loop closure가 감지되면 전체 pose graph를 최적화 및 global 맵 업데이트</li><li>PCL 기반 3D map 출력: RViz2(ROS2에서 로봇 데이터를 실시간 3D로 보여주는 시각화 툴)에서 바로 보임.<ul style="list-style-type: disc;" data-ke-list-type="disc"><li>RVizz2에서 3d point cloud, SLAM Map, 로봇 위치 / 자세(odometry), 카메라 영상, 센서 TF 프레임 구조 볼 수 있음</li></ul></li></ul>