Spatio-Temporal Graph Convolutional Network for Stochastic Traffic Speed Imputation
The rapid increase of traffic data generated by different sensing systems opens many opportunities to improve transportation services. An important opportunity is to enable high-resolution stochastic routing that computes the arrival time probabilities for each suggested route instead of only the expected travel time. Stochastic routing relies on stochastic speed data that captures the speed distributions of vehicles in a road network. However, traffic datasets typically have many missing values, which prevents the construction of stochastic speeds. To address this limitation, we propose the Stochastic Spatio-Temporal Graph Convolutional Network (SST-GCN) architecture that accurately imputes missing speed distributions in a road network. SST-GCN combines Temporal Convolutional Networks and Graph Convolutional Networks into a single framework to capture both spatial and temporal correlations between road segments and time intervals, thereby providing a highly accurate estimation model for speed distributions. Moreover, to cope with datasets with many missing values, we propose a novel self-adaptive context-aware diffusion process that regulates the propagated information around the network, avoiding the spread of false information. We extensively evaluate the effectiveness of SST-GCN on real-world datasets, showing that it achieves from 4.6% to 50% higher accuracy than state-of-the-art baselines using three different evaluation metrics. Furthermore, multiple ablation studies confirm our design choices and scalability to large road networks.
Satellite Image Search in AgoraEO
The growing operational capability of global Earth Observation (EO) creates new opportunities for data-driven approaches to understand and protect our planet. However, the current use of EO archives is very restricted due to the huge archive sizes and the limited exploration capabilities provided by EO platforms. To address this limitation, we have recently proposed MiLaN, a content-based image retrieval approach for fast similarity search in satellite image archives. MiLaN is a deep hashing network based on metric learning that encodes high-dimensional image features into compact binary hash codes. We use these codes as keys in a hash table to enable real-time nearest neighbor search and highly accurate retrieval. In this demonstration, we showcase the efficiency of MiLaN by integrating it with EarthQube, a browser and search engine within AgoraEO. EarthQube supports interactive visual exploration and Query-by-Example over satellite image repositories. Demo visitors will interact with EarthQube playing the role of different users that search images in a large-scale remote sensing archive by their semantic content and apply other filters.
In-Place Updates in Tree-Encoded Bitmaps
The Tree-Encoded Bitmap (TEB) is a novel bitmap compression scheme that provides a high compression ratio and logarithmic read time. It uses a tree-based compression algorithm that maps runs in the bitmap to leaf nodes of a binary tree. Currently, TEBs perform updates using an auxiliary differential data structure. However, consulting an additional data structure at every read introduces both memory and read overheads. To mitigate the shortcomings of differential updates, we propose algorithms to update TEBs in place. To that end, we identified two types of updates that can occur in a TEB: run-forming and run-breaking updates. Run-forming updates correspond to leaf nodes at the lowest level of the binary tree. All other updates are run-breaking. Each type of update requires different handling. Through experimentation with synthetic data, we determined that in-place run-forming updates are 2-3× faster than differential updates, while run-breaking updates cannot be efficiently performed in place. As a result, we propose a hybrid solution that performs run-forming updates in place while storing run-breaking updates in a differential data structure. Our experiments using synthetic data show that our hybrid solution is faster than differential updates as long as run-forming updates occur in a given workload. For instance, when 7% of all updates are run forming, our hybrid solution is 15% faster than differential updates.
Efficient Specialized Spreadsheet Parsing for Data Science
Spreadsheets are widely used for data exploration. Since spreadsheet systems have limited capabilities, users often need to load spreadsheets to other data science environments to perform advanced analytics. However, current approaches for spreadsheet loading suffer from either high runtime or memory usage, which hinders data exploration on commodity systems. To make spreasheet loading practical on commodity systems, we introduce a novel parser that minimizes memory usage by tightly coupling decompression and parsing. Furthermore, to reduce the runtime, we introduce optimized spreadsheet-specific parsing routines and employ parallelism. To evaluate our approach, we implement a prototype for loading Excel spreadsheets into R environments. Our evaluation shows that our novel approach is up to 3× faster while consuming up to 40× less memory than state-of-the-art approaches.