Syntactic Code Search with Sequence-to-Tree Matching: Supporting Syntactic Search with Incomplete Code Fragments
AuthorsGabriel Matute†**, Wode Ni‡**, Titus Barik, Alvin Cheung†, Sarah E. Chasins†
Syntactic Code Search with Sequence-to-Tree Matching: Supporting Syntactic Search with Incomplete Code Fragments
AuthorsGabriel Matute†**, Wode Ni‡**, Titus Barik, Alvin Cheung†, Sarah E. Chasins†
Lightweight syntactic analysis tools like Semgrep and Comby leverage the tree structure of code, making them more expressive than string and regex search. Unlike traditional language frameworks (e.g., ESLint) that analyze codebases via explicit syntax tree manipulations, these tools use query languages that closely resemble the source language. However, state-of-the-art matching techniques for these tools require queries to be complete and parsable snippets, which makes in-progress query specifications useless. We propose a new search architecture that relies only on tokenizing (not parsing) a query. We introduce a novel language and matching algorithm to support tree-aware wildcards on this architecture by building on tree automata. We also present stsearch, a syntactic search tool leveraging our approach. In contrast to past work, our approach supports syntactic search even for previously unparsable queries. We show empirically that stsearch can support all tokenizable queries, while still providing results comparable to Semgrep for existing queries. Our work offers evidence that lightweight syntactic code search can accept in-progress specifications, potentially improving support for interactive settings.
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search
January 12, 2026research area Computer Vision, research area Speech and Natural Language Processing
Multimodal Large Language Models (MLLMs) in real-world applications require access to external knowledge sources and must remain responsive to the dynamic and ever-changing real-world information in order to address information-seeking and knowledge-intensive user queries. Existing approaches, such as retrieval augmented generation (RAG) methods, search agents, and search equipped MLLMs, often suffer from rigid pipelines, excessive search calls,…
High-Throughput Vector Similarity Search in Knowledge Graphs
April 24, 2023research area Knowledge Bases and Search, research area Methods and Algorithmsconference SIGMOD
There is an increasing adoption of machine learning for encoding data into vectors to serve online recommendation and search use cases. As a result, recent data management systems propose augmenting query processing with online vector similarity search. In this work, we explore vector similarity search in the context of Knowledge Graphs (KGs). Motivated by the tasks of finding related KG queries and entities for past KG query workloads, we focus…