My Projects and Achievements

Python

1. Executive Summary

An AI-powered automated tool called DataParrot-Home Remittance Analyzer was created to drastically modernize and enhance the compliance monitoring mechanism for large-scale home remittance transactions. The technology uses a specialized machine learning model to identify hidden business names, automatically detects split transactions, and indicates instances in which a single remitter transfers money to the same beneficiary more than five times in a single month.

This function is essential since only eligible home-remittance transactions qualify for a fixed rebat of SAR 20 per eligible transaction under the most recent TT Charges Reimbursement Scheme announced by State Bank Pakistan . Multiple transfers from the same remitter to the same recipient on the same day or over the course of a month must be consolidated or restricted in order to qualify, and the remittance amount must meet a minimum threshold.

Financial institutions must make sure that both remitter and beneficiary names are correct and that remittance transactions are authentic, not illicitly split or duplicated to get around scheme limits, in order to submit legitimate claims for reimbursement under SBP's plan. Therefore, it is essential to streamline the detection of split or recurring transfers in order to prevent financial loss, avoid invalid rebate claims, and maintain regulatory compliance.

Compliance staff manually investigated more than a million transactions prior to the implementation of DataParrot. A process that required a significant amount of staff time, took one to two weeks, and had a high chance of oversight and human mistake. DataParrot's automation reduces processing time to five to ten minutes, improves accuracy and consistency, and produces outputs that are audit-ready. DataParrot demonstrates applied AI innovation in banking operations while supporting operational efficiency, regulatory compliance, and financial accuracy.

2. Background and Problem Statement

Home Remittance departments are responsible for monitoring large-scale international inward remittance transactions. The bank's monthly workload surpasses one million transactions, which need to be investigated for regulatory and compliance risks. Key challenges in the existing manual process included:

2.1 Manual Detection Risks

Officers in charge of compliance had to manually verify:

  • Transactions that are split or structured
  • Names of companies or organizations inserted into beneficiary and remitter fields
  • Repeated remittances that surpass compliance requirements (5+ transactions per month)

2.2 Operational Bottlenecks

This manual workflow:

  • Required 3 compliance staff
  • Consumed 1–2 weeks per cycle
  • Caused delays in reporting and case escalation
  • Suffered from human errors in fuzzy-matching tasks and name recognition

2.3 Regulatory Challenges

Manual review processes increased the risk of:

  • Missing suspicious patterns
  • Delayed reporting to regulatory bodies
  • Inconsistent findings during audits

To eliminate these challenges, an AI-enabled, automated, and scalable solution was required.

3. Project Objective

The primary goals of DataParrot were:

  • Automate the detection of suspicious or split transactions
  • Build a machine learning model to detect company/organization names
  • Identify repeat remittances exceeding regulatory thresholds
  • Drastically reduce manual effort and processing time
  • Deliver a user-friendly GUI suitable for non-technical staff
  • Improve audit compliance, accuracy, and reporting capabilities

4. Project Scope

In-Scope

  • Ingestion of Excel/CSV remittance data
  • Data preprocessing and standardization
  • AI-based entity (company name) recognition
  • Fuzzy-matching to detect split transactions
  • Frequency analysis for repeated transfers
  • GUI development for operational teams
  • Automated Excel output generation

5. My Role and Responsibilities

Although my official designation is Data Analyst, I performed the responsibilities of an end-to-end solution architect, including:

  • Requirement analysis
  • Data engineering and pipeline design
  • Machine learning model development
  • Statistical modeling and feature engineering
  • Backend algorithm implementation
  • GUI design and user workflow creation
  • Testing and validation
  • Documentation and internal demonstration

This solution was developed independently from scratch.

6. Technology Stack

Programming Language

Python

Libraries / Modules

  • GUI: tkinter
  • Image Handling: PIL
  • Data Processing: pandas, openpyxl, itertools, glob, datetime
  • Machine Learning: scikit-learn (Logistic Regression Classifier)
  • NLP / Fuzzy Matching: rapidfuzz, re
  • Performance: threading, time
  • System Libraries: os, pygame

7. System Architecture Overview

7.1 Data Input Layer

  • Accepts Excel (.xlsx), CSV, or raw remittance data files
  • Handles large datasets (>1,000,000 rows)

7.2 Preprocessing Module

  • Data cleaning and normalization
  • Removal of special characters
  • Reformatting date fields
  • Text standardization

7.3 AI-Based Company Detection (org_pookie Model)

  • Custom logistic regression classifier
  • Identifies whether remitter/beneficiary fields contain company names
  • Trained on 3 years of internal historical data
  • Additional dataset of ~2 million organization names

7.4 Split Transaction Detection Engine

  • Uses rapidfuzz fuzzy scoring
  • Regular expressions
  • Pattern recognition

7.5 Repeat Transaction Analysis

  • Identifies remitter sending money to same beneficiary >5 times per month
  • Flags potential structuring or suspicious behavior

7.6 GUI Layer

  • Developed using tkinter
  • User-friendly interface
  • One-click execution
  • Progress tracking and notifications

7.7 Output Layer

  • Generates processed results in Excel format
  • Includes flags, risk indicators, company detection results, and summary sheet

8. Key Features and Functionalities

8.1 Automated Split Transaction Identification

  • Detects name similarities
  • Identifies structured patterns
  • Uses fuzzy ratios and strict thresholds

8.2 Company Name Detection via AI

  • ML model identifies whether a text contains a registered organization
  • Handles spelling variations, abbreviations, and non-standard formatting

8.3 Remitter–Beneficiary Frequency Analysis

  • Finds cases >5 transfers per month
  • Groups by remitter and beneficiary
  • Produces compliance-ready results

8.4 Large Dataset Compatibility

  • Optimized for >2 million rows
  • Uses vectorized operations for speed

8.5 Intuitive GUI

  • Easy for compliance staff
  • No coding skills required
  • Reduces risk of user error

9. Machine Learning Model Details (org_pookie)

9.1 Model Type

Logistic Regression Classifier (scikit-learn)

9.2 Objective

Classify whether a given name field contains a company/organization.

9.3 Training Data

  • Internal remittance data from past 3 years
  • External dataset of ~2 million business names
  • Manually and semi-automatically labeled dataset

9.4 Preprocessing Techniques

  • Regex cleaning
  • Tokenization
  • Case normalization
  • Noise removal
  • N-gram feature extraction

9.5 Model Performance

  • High precision in identifying organizations
  • Robust with noisy text
  • Outperforms manual identification

10. Challenges and Resolutions

10.1 No structured dataset for training

Resolution: Combined internal data, public business lists, and curated 2 million organization names.

10.2 Processing extremely large datasets

Resolution: Optimized pandas operations and implemented threading.

10.3 Unstructured and noisy text

Resolution: Applied regex normalization and fuzzy matching.

10.4 Non-technical user adoption

Resolution: Built a clean and guided tkinter interface.

11. Results and Impact

11.1 Operational Efficiency

  • Reduced processing time from 1–2 weeks to 5–10 minutes
  • Fully automated workflow

11.2 Workforce Optimization

  • Previously required 3 staff members
  • Now requires minimal human intervention

11.3 Accuracy and Consistency

  • AI-driven analysis ensures consistent detection
  • Reduces oversight risks

11.4 Compliance & Audit Benefits

  • Faster reporting
  • Standardized logic
  • Audit-friendly outputs

12. Conclusion

DataParrot represents a major step forward in digital transformation, automation, and risk-centered innovation in the banking sector. The system significantly improves efficiency, accuracy, compliance readiness, and operational scalability.
This project demonstrates strong capabilities in AI application development, data engineering, compliance automation, and full-stack Python solution delivery.

Phone:

+923025547429

Email:

msmkhan.afridi1239@gmail.com

Connect With Me: