Toward a Unified Framework for Unsupervised Complex Tabular Reasoning
Published in IEEE ICDE, CCF A, 2022
Recommended citation: Zhenyu Li, Xiuxing Li, Zhichao Duan, Bowen Dong, Ning Liu, Jianyong Wang. "Toward a Unified Framework for Unsupervised Complex Tabular Reasoning," 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 2023, pp. 1691-1704, doi: 10.1109/ICDE55515.2023.00133. https://ieeexplore.ieee.org/document/10184763
Structured tabular data exist across nearly all fields. Reasoning task over these data aims to answer questions or determine the truthiness of hypothesis sentences by understanding the semantic meaning of a table. While previous works have devoted significant efforts to the tabular reasoning task, they always assume there are sufficient labeled data. However, constructing reasoning samples over tables (and related text) is labor-intensive, especially when the reasoning process is complex. When labeled data is insufficient, the performance of models will suffer an unendurable decline. In this paper, we propose a unified framework for unsupervised complex tabular reasoning (UCTR), which generates sufficient and diverse synthetic data with complex logic for tabular reasoning tasks, assuming no human-annotated data at all. Specifically, we first utilize a random sampling strategy to collect diverse programs of different types and execute them on tables based on a “Program-Executor” module. To bridge the gap between the programs and natural language sentences, we design a powerful “NL-Generator” module to generate natural language sentences with complex logic from these programs. Since a table often occurs with its surrounding texts, we further propose novel “Table-to-Text” and “Text-to-Table” operators to handle joint table-text reasoning scenarios. This way, we can adequately exploit the unlabeled table resources to obtain a well-performed reasoning model under an unsupervised setting. Our experiments cover different tasks (question answering and fact verification) and different domains (general and specific), showing that our unsupervised methods can achieve at most 93% performance compared to supervised models. The impressive performance demonstrates that UCTR can significantly reduce the dependence on manual annotation. Moreover, we also find that it can substantially boost the supervised performance in low-resourced domains as a data augmentation technique.