Sicheng Zuo

I am a third year Ph.D student in i-VisionGroup in the Department of Automation, Tsinghua University, advised by Prof. Jiwen Lu . In 2023, I received my BS degree from the Department of Automation, Tsinghua University. I am interested in computer vision and deep learning. My current research focuses on autonomous driving and vision foundation models.

Email  /  Google Scholar  /  GitHub

profile photo
News

  • 2025-09: One paper on 3D occupancy prediction is accepted to NeurIPS 2025.
  • 2025-06: One paper on embodied 3D occupancy prediction is accepted to ICCV 2025.
  • 2025-02: One paper on 3D occupancy prediction is accepted to CVPR 2025.
  • 2024-07: One paper on image representation learning is accepted to ECCV 2024.
  • Publications

    *Equal contribution    Project leader.

    dise QuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy Prediction
    Sicheng Zuo* , Wenzhao Zheng* , Xiaoyong Han* , Longchao Yang, Yong Pan, Jiwen Lu
    The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025.
    [arXiv] [Code] [Project Page]

    QuadricFormer proposes geometrically expressive superquadrics as scene primitives, enabling efficient and powerful object-centric representation of driving scenes.

    dise Gaussianworld: Gaussian world model for streaming 3d occupancy prediction
    Sicheng Zuo* , Wenzhao Zheng* , Yuanhui Huang , Jie Zhou , Jiwen Lu
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025.
    [arXiv] [Code]

    GaussianWorld reformulates 3D occupancy prediction as a 4D occupancy forecasting problem conditioned on the current sensor input and proposes a Gaussian World Model to exploit the scene evolution for perception.

    dise EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding
    Yuqi Wu*, Wenzhao Zheng* , Sicheng Zuo , Yuanhui Huang , Jie Zhou , Jiwen Lu
    IEEE International Conference on Computer Vision (ICCV), 2025.
    [arXiv] [Code] [Project Page]

    EmbodiedOcc formulates an embodied 3D occupancy prediction task and employs a Gaussian-based framework to accomplish it.

    dise SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding
    Han Xiao* , Wenzhao Zheng* , Sicheng Zuo , Peng Gao, Jie Zhou , Jiwen Lu
    European Conference on Computer Vision (ECCV), 2024.
    [Paper]

    SpatialFormer proposes an efficient vision transformer architecture with explicit spatial understanding for generalizable image representation learning.

    dise PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction
    Sicheng Zuo* , Wenzhao Zheng* , Yuanhui Huang , Jie Zhou , Jiwen Lu
    arXiv, 2023.
    [arXiv] [Code] [中文解读 (in Chinese)]

    As the first 2D-projection-based method on the 3D semantic occupancy prediction task, PointOcc significantly outperforms all other methods by a large margin with a much faster speed.


    Website Template


    © Sicheng Zuo | Last updated: October 8, 2025.