emailclassification / README.md
Aman Garg
Update README.md
782e10b verified
---
title: Email Classification and PII Masking API
emoji: πŸ“§
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: latest
app_file: main.py
pinned: false
models:
- Davlan/bert-base-multilingual-cased-ner-hrl
- sentence-transformers/paraphrase-multilingual-mpnet-base-v2
---
# Email Classification and PII Masking API
This FastAPI application provides an API for classifying emails and masking Personally Identifiable Information (PII) in text.
## Features
- PII Detection and Masking
- Full names
- Email addresses
- Phone numbers
- Dates of birth
- Aadhar numbers
- Credit/Debit card numbers
- CVV numbers
- Card expiry dates
- Email Classification using MLP model
- Multilingual support using BERT-based models
## Setup
1. Create a virtual environment:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Download required model files:
- `label_encoder.pkl`
- `pca.pkl`
- `mlp_model.pth`
Place these files in the same directory as `main.py`.
## Usage
1. Start the FastAPI server:
```bash
uvicorn main:app --reload --host 0.0.0.0 --port 80
```
**Note for Hugging Face Spaces:** We explicitly bind to `0.0.0.0` and port `80`, which is typically required by Spaces.
2. The API will be available at the Space's URL (e.g., `https://your-username-your-space-name.hf.space`).
3. API Endpoints:
- **POST `/classify`**: Classify and mask PII in email text
- **Input:** JSON with `input_email_body` field
```json
{
"input_email_body": "Hello, my name is John Doe and my email is john.doe@example.com. Please help with my billing issue."
}
```
- **Output:** JSON with masked text, detected entities, and classification
```json
{
"input_email_body": "Hello, my name is John Doe and my email is john.doe@example.com. Please help with my billing issue.",
"list_of_masked_entities": [
{
"position": [
16,
24
],
"classification": "full_name",
"entity": "John Doe"
},
{
"position": [
39,
60
],
"classification": "email",
"entity": "john.doe@example.com"
}
],
"masked_email": "Hello, my name is [full_name] and my email is [email]. I'm having trouble with my account.",
"category_of_the_email": "Billing Issues"
}
```
## API Documentation
Once the server is running on Hugging Face Spaces, the Swagger UI and ReDoc documentation endpoints might not be directly accessible via the standard `/docs` and `/redoc` paths in a "Static" Space setup. You would typically interact with the `/classify` endpoint directly via POST requests.
## Project Structure
```
.
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ main.py # FastAPI application entry point
β”œβ”€β”€ models.py # ML model definitions and training logic
β”œβ”€β”€ utils.py # Utility functions for text processing
β”œβ”€β”€ label_encoder.pkl # Label encoder for classification
β”œβ”€β”€ pca.pkl # PCA model for dimensionality reduction
└── mlp_model.pth # Trained MLP model weights
```
## Deployment on Hugging Face Spaces
To deploy this application on Hugging Face Spaces:
1. **Create a new Space** on [https://huggingface.co/spaces](https://huggingface.co/spaces).
2. Choose a **Space name**, select a **license**, and for **Space Hardware**, the "Free" tier should be sufficient for this type of API.
3. Crucially, under **SDK**, select **"Static"**.
4. In your Space's settings, link your **GitHub repository** containing these files.
5. Hugging Face Spaces will automatically detect the `requirements.txt` and install the dependencies.
6. It will then look for an `app_file` specified in the frontmatter (`main.py` in this case) to run. For a "Static" Space running a FastAPI application, it will execute `uvicorn main:app --host 0.0.0.0 --port 80`.
Ensure all your model files (`label_encoder.pkl`, `pca.pkl`, `mlp_model.pth`) are present in your repository at the root level or in the same directory as `main.py`.