AI to tackle tax fraud
A I • Sep 13,2024
Summary:
By integrating AI tools, processing speeds improved by 5X without code modifications, and further optimization boosted performance by 20X.
Client:
The Internal Revenue Service (IRS) is the federal agency in the United States tasked with collecting taxes and enforcing tax laws. The IRS conducts audits on taxpayers either randomly or after identifying discrepancies in their tax returns.
Problem Statement:
To tackle tax fraud and identify bad actors, IRS investigators need to sift through decades of data, link individuals to suspicious activities, and trace transactions across multiple layers and steps on a graph.
One IRS data scientist was assigned the task of analyzing over 3 terabytes of data to detect fraud patterns. However, the available computing resources were inadequate. Even after running the job overnight on a large array of CPUs, it still failed to complete. The team attempted to divide the datasets across different servers, but they had to manually combine the resulting data subsets, making the process cumbersome. Despite their efforts, they couldn’t achieve full visibility for real-time fraud detection.
To address these obstacles, the IRS turned to AI tools, machine learning, and advanced fraud detection applications.
Results:
- 20X increase in speed for running data science experiments.
- Workloads processed 5X faster instantly, without any changes to the code.
- 50% reduction in costs for data science and data engineering workflows.
- Lowered costs and improved protection for taxpayers by effectively preventing fraud and identity theft.
AI Solution Overview:
The IRS has started using advanced AI tools, machine learning, and applications designed to quickly detect fraud and identity theft. The integration of powerful computing infrastructure and software solutions allowed the IRS to easily scale its AI and machine learning operations. By leveraging Cloudera on NVIDIA GPUs, processing speeds increased by up to 5X without requiring any changes to the code. However, there was still potential for further optimization.
Cloudera enlisted a team of NVIDIA data scientists to review the IRS code and found that some tasks involving complex data structures were still being processed on CPUs. NVIDIA developed new code to run these tasks on GPUs, integrating it into Spark’s interface with NVIDIA RAPIDS™, an open library for GPU-accelerated data analytics.
When the IRS deployed the updated code on GPUs in a distributed Spark cluster, the performance improved by a staggering 20X. By utilizing Apache Spark and graph analysis, engineering teams built vast networks of nodes and edges. AI bots and machine learning algorithms then analyzed these graphs, enabling investigators to link individuals to institutions and, ultimately, larger networks over extended periods. These insights revealed patterns that helped identify fraud much more quickly.
Data sets that previously took weeks or months to compile and process are now handled in hours or minutes.
Testing showed a 10X boost in engineering and data science workflows, along with a 50 percent reduction in infrastructure costs.
With its upgraded computing infrastructure and AI deployment, the IRS is lowering costs and enhancing protection for taxpayers by more effectively preventing fraud and identity theft.
Building on these advances in data preparation and analytics, the IRS plans to speed up AI inference tasks and use the Spark-GPU infrastructure to address natural language processing and other analytical challenges.
References:
- Using AI and Accelerated Computing to Root Out Waste, Fraud, and Theft. https://www.nvidia.com/en-us/case-studies/fraud-detection-applications/
Industry: Public Services
Vendor: NVIDIA and Cloudera
Client: Internal Revenue Service (IRS)
Publication Date: 2024
Previos Article AI for Translation in Criminal Investigations
Next Article Estonian courts are using AI transcription services