Event-Triggered Maps of Dynamics: A Framework for Modeling Spatial Motion Patterns in Non-Stationary Environments
Junyi Shi, Qingyun Guo, Tomasz Piotr Kucner
Exciting updates from our research group including upcoming workshops, paper acceptances, and the acceptance of a new proposal!
Junyi Shi, Qingyun Guo, Tomasz Piotr Kucner
Maryam Kazemi Eskeri, Ville Kyrki, Dominik Baumann, and Tomasz Piotr Kucner
The advancement of socially-aware autonomous vehicles hinges on precise modeling of human behavior. Within this broad paradigm, the specific challenge lies in accurately predicting pedestrian's trajectory and intention. Traditional methodologies have leaned heavily on historical trajectory data, frequently overlooking vital contextual cues such as pedestrian-specific traits and environmental factors. Furthermore, there is a notable knowledge gap as trajectory and intention prediction have largely been approached as separate problems, despite their mutual dependence. To bridge this gap, we introduce PTINet (Pedestrian Trajectory and Intention Prediction Network), which jointly learns the trajectory and intention prediction by combining past trajectory observations, local contextual features (individual pedestrian behaviors), and global features (signs, markings etc.). The efficacy of our approach is evaluated on widely used public datasets: JAAD, PIE and TITAN, where it has demonstrated superior performance over existing state-of-the-art models in trajectory and intention prediction. The results from our experiments and ablation studies robustly validate PTINet's effectiveness in jointly exploring intention and trajectory prediction for pedestrian behavior modeling. The experimental evaluation indicates the advantage of using global and local contextual features for pedestrian trajectory and intention prediction. The effectiveness of PTINet in predicting pedestrian behavior paves the way for the development of automated systems capable of seamlessly interacting with pedestrians in urban settings https://github.com/munirfarzeen/PTINet.
Effective modeling of human behavior is crucial for the safe and reliable coexistence of humans and autonomous vehicles. Traditional deep learning methods have limitations in capturing the complexities of pedestrian behavior, often relying on simplistic representations or indirect inference from visual cues, which hinders their explainability. To address this gap, we introduce PedVLM, a vision-language model that leverages multiple modalities (RGB images, optical flow, and text) to predict pedestrian intentions and also provide explainability for pedestrian behavior. PedVLM comprises a CLIP-based vision encoder and a text-to-text transfer transformer (T5) language model, which together extract and combine visual and text embeddings to predict pedestrian actions and enhance explainability. Furthermore, to complement our PedVLM model and further facilitate research, we also publicly release the corresponding dataset, PedPrompt, which includes the prompts in the Question-Answer (QA) template for pedestrian intention prediction. PedVLM is evaluated on PedPrompt, JAAD, and PIE datasets demonstrates its efficacy compared to state-of-the-art methods. The dataset and code will be made available at https://github.com/munirfarzeen/PedVLM
We are collaborating with researchers from Munich University of Applied Sciences, the University of Gothenburg, and Chalmers Institute of Technology to organize a workshop at IV 2025. This workshop will bring together experts to discuss key challenges and advancements in Autonomous Driving, fostering interdisciplinary collaboration and knowledge exchange.
Two members of our group, Tomasz Kucner and Farzeen Munir, have been invited to give a talk at the Cyber-Human Lab, University of Cambridge. They will present their research. For more details about the talk, please visit the link.
The project ´Towards fleets of robust and agile mobile robots in harsh environments´ is led by Assistant Professor Tomasz Kucner.
The IEEE RAS Summer School 2024, hosted at the Czech Technical University (CTU) in Prague, was an incredible event focused on multi-agent systems and swarm robotics. The program offered in-depth insights into cutting-edge algorithms, coordination strategies, and the future of robotics, covering topics such as coordination in challenging environments, localization and planning, drone platforms, and safety considerations. Dr. Stefano V. Albrecht's lectures on multi-agent reinforcement learning were particularly impactful for me. The highlight of the event was the real-world competition, where teams tackled a complex multi-robot inspection and monitoring task. Collaborating with talented peers from various universities, our team developed a solution for assigning predetermined viewpoints and addressing trajectory planning challenges for two UAVs in a 3D environment with obstacles. Our approach focused on optimizing inspection time while maintaining collision-free paths and respecting dynamic constraints. Competing against 37 international teams in both virtual and real-world challenges, we secured 3rd place!
In recent years, many of the technical and scientific advancements in machine learning and computer vision systems led to major innovations in the field of scene understanding. However, due to the limited generalization capabilities of these approaches and the lack of standards, only a small fraction of these promising ideas has been widely adopted by the robotic research community. Instead, the spatial representation research field continues to be largely influenced by algorithms and methods established prior to the deep learning revolution. In this workshop, our objective is twofold. Firstly, we seek to explore the opportunities presented to the field of spatial and semantic representations for robotics by recent innovations within the machine learning community. We will focus the discussion on learning-based models, including large-language and foundation models and their exceptional capabilities in comprehending and processing semantic knowledge, allowing open-vocabulary navigation and promising increased generalization. Simultaneously, we aim to identify the barriers hindering the widespread adoption of these technologies within our community. Our goal is to establish the groundwork for a machine learning toolkit for semantic spatial representation, specifically designed for the needs of the autonomous mobile robotics community.
The Unite! Seed Fund aims to stimulate and support bottom-up proposals by teachers, researchers and students for collaborative activities.
Our work enhances autonomous vehicle hazard anticipation and decision-making by integrating contextual and spatial environmental representations, inspired by human driving patterns. We introduce a framework utilizing three cameras (left, right, center) and top-down bird's-eye-view data fused via self-attention, with a vision transformer for sequential feature representation. Experimental results show our method reduces displacement error by 0.67 m in open-loop settings on nuScenes and enhances driving performance in CARLA's Town05 Long and Longest6 benchmarks.
IV2024 Workshop
Autonomous mobile robots are being deployed in more diverse environments than ever before. That include shared spaces, where robots and humans have to coexist and cooperate. To assure that our joint life will be safe and successful, it is necessary to enable robots to learn and utilise the information about human motion patterns for imporved performance.
The goal of the talk is to introduce the listeners to the field and present existing and potential applications of maps of dynamics. Furthermore, the talk will also provide insight into open research questions and under-explored research direction.