Spaces:
Sleeping
Sleeping
title: Email Classification and PII Masking API | |
emoji: π§ | |
colorFrom: blue | |
colorTo: purple | |
sdk: docker | |
sdk_version: latest | |
app_file: main.py | |
pinned: false | |
models: | |
- Davlan/bert-base-multilingual-cased-ner-hrl | |
- sentence-transformers/paraphrase-multilingual-mpnet-base-v2 | |
# Email Classification and PII Masking API | |
This FastAPI application provides an API for classifying emails and masking Personally Identifiable Information (PII) in text. | |
## Features | |
- PII Detection and Masking | |
- Full names | |
- Email addresses | |
- Phone numbers | |
- Dates of birth | |
- Aadhar numbers | |
- Credit/Debit card numbers | |
- CVV numbers | |
- Card expiry dates | |
- Email Classification using MLP model | |
- Multilingual support using BERT-based models | |
## Setup | |
1. Create a virtual environment: | |
```bash | |
python -m venv venv | |
source venv/bin/activate # On Windows: venv\Scripts\activate | |
``` | |
2. Install dependencies: | |
```bash | |
pip install -r requirements.txt | |
``` | |
3. Download required model files: | |
- `label_encoder.pkl` | |
- `pca.pkl` | |
- `mlp_model.pth` | |
Place these files in the same directory as `main.py`. | |
## Usage | |
1. Start the FastAPI server: | |
```bash | |
uvicorn main:app --reload --host 0.0.0.0 --port 80 | |
``` | |
**Note for Hugging Face Spaces:** We explicitly bind to `0.0.0.0` and port `80`, which is typically required by Spaces. | |
2. The API will be available at the Space's URL (e.g., `https://your-username-your-space-name.hf.space`). | |
3. API Endpoints: | |
- **POST `/classify`**: Classify and mask PII in email text | |
- **Input:** JSON with `input_email_body` field | |
```json | |
{ | |
"input_email_body": "Hello, my name is John Doe and my email is john.doe@example.com. Please help with my billing issue." | |
} | |
``` | |
- **Output:** JSON with masked text, detected entities, and classification | |
```json | |
{ | |
"input_email_body": "Hello, my name is John Doe and my email is john.doe@example.com. Please help with my billing issue.", | |
"list_of_masked_entities": [ | |
{ | |
"position": [ | |
16, | |
24 | |
], | |
"classification": "full_name", | |
"entity": "John Doe" | |
}, | |
{ | |
"position": [ | |
39, | |
60 | |
], | |
"classification": "email", | |
"entity": "john.doe@example.com" | |
} | |
], | |
"masked_email": "Hello, my name is [full_name] and my email is [email]. I'm having trouble with my account.", | |
"category_of_the_email": "Billing Issues" | |
} | |
``` | |
## API Documentation | |
Once the server is running on Hugging Face Spaces, the Swagger UI and ReDoc documentation endpoints might not be directly accessible via the standard `/docs` and `/redoc` paths in a "Static" Space setup. You would typically interact with the `/classify` endpoint directly via POST requests. | |
## Project Structure | |
``` | |
. | |
βββ README.md | |
βββ requirements.txt | |
βββ main.py # FastAPI application entry point | |
βββ models.py # ML model definitions and training logic | |
βββ utils.py # Utility functions for text processing | |
βββ label_encoder.pkl # Label encoder for classification | |
βββ pca.pkl # PCA model for dimensionality reduction | |
βββ mlp_model.pth # Trained MLP model weights | |
``` | |
## Deployment on Hugging Face Spaces | |
To deploy this application on Hugging Face Spaces: | |
1. **Create a new Space** on [https://huggingface.co/spaces](https://huggingface.co/spaces). | |
2. Choose a **Space name**, select a **license**, and for **Space Hardware**, the "Free" tier should be sufficient for this type of API. | |
3. Crucially, under **SDK**, select **"Static"**. | |
4. In your Space's settings, link your **GitHub repository** containing these files. | |
5. Hugging Face Spaces will automatically detect the `requirements.txt` and install the dependencies. | |
6. It will then look for an `app_file` specified in the frontmatter (`main.py` in this case) to run. For a "Static" Space running a FastAPI application, it will execute `uvicorn main:app --host 0.0.0.0 --port 80`. | |
Ensure all your model files (`label_encoder.pkl`, `pca.pkl`, `mlp_model.pth`) are present in your repository at the root level or in the same directory as `main.py`. |