Author

Lin XiaFollow

Semester

Spring

Date of Graduation

2025

Document Type

Dissertation

Degree Type

DBA

College

Chambers College of Business and Economics

Department

Accounting

Committee Chair

Richard B. Dull

Committee Co-Chair

L. Christian Schaupp

Committee Member

Romina Rakipi

Committee Member

Stephane Collignon

Abstract

This dissertation is composed of two studies, both of which revolve around the application of machine learning to financial statement fraud. The first study entitled, “Complementary Role of Algorithm and Theory for Financial Statement Fraud” investigates the nature of relationship of theory building research and machine learning algorithms. Recent research on financial statement fraud has seen an increase in the use of machine learning techniques. While some machine learning based studies build on variables from causal inference research, others claim that machine learning algorithms work better with raw accounting variables and thus cast doubt on the role of causal inference research in the realm of machine learning. This study shows that ratio-based models, incorporating a synthesis of ratios from extant research and those computed from raw variables, consistently outperforms models based on raw variables. Also, ratio-based models outperform models based on combining ratios and raw variables. The results provide evidence supporting the complementary role of machine learning algorithms and theory building research in the realm of financial statement fraud.

The second study entitled, “Undetected Accounting Fraud: Implications for Theory and Machine Learning Predictive Models” examines the issue of undetected accounting fraud, its implication for theory development and predictive model building. Financial statement fraud, often referred to as accounting fraud, is a rare phenomenon, which makes it challenging to conduct research. Exacerbating the situation is the possibility that large number of accounting fraud went undetected due to resource constraint and shifting priority of the enforcement agencies. The extant literature on financial statement fraud, whether causal inference or predictive model, largely ignored the undetected fraud issue. The implication is that data used for research are not clean in that undetected fraud have been treated as “non fraud”, resulting in potentially biased estimates, invalid statistical inference, and sub optimal predictive model performance. This study applies several machine learning methods to identify non fraud observations with high confidence. We show that “cleaning” the data via these methods enhances inference in that several variables identified in prior literature change from insignificant to significant compared to the baseline of ignoring the undetected fraud issue. Further, “cleaning” the data via these methods significantly improve the performance of predictive model.

Share

COinS