META-GUI - Towards Multi-modal Conversational Agents on Mobile GUI

What is META-GUI?

META-GUI is a dataset for training a Multi-modal convErsaTional Agent on mobile GUI. It consists of 1125 dialogues, 4684 dialogue turns and 18337 data points in total. Each data point contains screenshot history, action history, dialogue history, items appeared on the current screen and actions to be performed.

News

Oct 6, 2022 Our paper is accepted by EMNLP 2022, dataset and baseline are listed below. Dataset is available on Download from huggingface Baseline is available on META-GUI Baseline

If you have any questions about this dataset, please contact slt19990817@sjtu.edu.cn, galaxychen@sjtu.edu.cn or chenlusz@sjtu.edu.cn

Leaderboard

Three metrics, i.e. Action Completion Rate (CR), Turn Completion Rate, and Reply BLEU score, are used to evaluate on the test set of META-GUI. Please refer to the paper to find more details about evaluation metrics.

Rank	Model	Action CR	Turn CR	Reply BLEU score
1 Oct 06, 2022	m-BASH Shanghai Jiao Tong University (Sun et al., EMNLP'22) code	82.74	56.88	63.11
2 Oct 06, 2022	BERT Shanghai Jiao Tong University (Sun et al., EMNLP'22) code	78.42	52.08	62.19
3 Oct 06, 2022	LayoutLM Shanghai Jiao Tong University (Sun et al., EMNLP'22) code	67.76	38.12	50.43
4 Oct 06, 2022	LayoutLMv2 Shanghai Jiao Tong University (Sun et al., EMNLP'22) code	64.48	36.88	58.20