bert-dp-second

This model is a fine-tuned version of on the generator dataset. It achieves the following results on the evaluation set:

  • Loss: 3.2321

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 19
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
7.3416 0.23 500 6.6532
6.5752 0.47 1000 6.5275
6.4866 0.7 1500 6.4720
6.4273 0.93 2000 6.4540
6.4036 1.17 2500 6.4236
6.3779 1.4 3000 6.4018
6.3528 1.63 3500 6.3768
6.3258 1.87 4000 6.3679
6.3009 2.1 4500 6.3305
6.2646 2.33 5000 6.3142
6.2583 2.57 5500 6.3004
6.2223 2.8 6000 6.2605
6.1941 3.03 6500 6.2353
6.1382 3.27 7000 6.2095
6.1301 3.5 7500 6.1774
6.09 3.73 8000 6.1480
6.0624 3.97 8500 6.1061
6.0056 4.2 9000 6.0655
5.9444 4.43 9500 5.9461
5.7101 4.67 10000 5.2594
5.005 4.9 10500 4.7348
4.6127 5.13 11000 4.4626
4.3907 5.37 11500 4.2862
4.241 5.6 12000 4.1701
4.1286 5.83 12500 4.0673
4.0151 6.07 13000 3.9967
3.934 6.3 13500 3.9292
3.8789 6.53 14000 3.8707
3.8231 6.77 14500 3.8222
3.7696 7.0 15000 3.7800
3.7078 7.23 15500 3.7424
3.6671 7.47 16000 3.7093
3.6446 7.7 16500 3.6780
3.6069 7.93 17000 3.6476
3.5782 8.17 17500 3.6283
3.5384 8.4 18000 3.6098
3.5245 8.63 18500 3.5942
3.5209 8.87 19000 3.5841
3.4948 9.1 19500 3.5728
3.4877 9.33 20000 3.5692
3.4818 9.57 20500 3.5641
3.4844 9.8 21000 3.5640
3.5323 10.03 21500 3.6026
3.5123 10.27 22000 3.5877
3.5046 10.5 22500 3.5595
3.4787 10.73 23000 3.5403
3.4568 10.97 23500 3.5125
3.4154 11.2 24000 3.4916
3.3998 11.43 24500 3.4749
3.3986 11.67 25000 3.4578
3.372 11.9 25500 3.4405
3.3402 12.13 26000 3.4317
3.3281 12.37 26500 3.4215
3.322 12.6 27000 3.4093
3.3198 12.83 27500 3.4026
3.3039 13.07 28000 3.3971
3.296 13.3 28500 3.3954
3.3015 13.53 29000 3.3954
3.2939 13.77 29500 3.3927
3.3013 14.0 30000 3.3918
3.343 14.23 30500 3.4265
3.3438 14.47 31000 3.4133
3.3397 14.7 31500 3.3951
3.3156 14.93 32000 3.3681
3.2815 15.17 32500 3.3503
3.2654 15.4 33000 3.3313
3.2492 15.63 33500 3.3184
3.2399 15.87 34000 3.2995
3.2222 16.1 34500 3.2922
3.2026 16.33 35000 3.2818
3.191 16.57 35500 3.2723
3.1825 16.8 36000 3.2640
3.1691 17.03 36500 3.2530
3.1656 17.27 37000 3.2487
3.1487 17.5 37500 3.2419
3.1635 17.73 38000 3.2411
3.1675 17.97 38500 3.2330
3.1422 18.2 39000 3.2344
3.1443 18.43 39500 3.2331
3.1425 18.67 40000 3.2348
3.139 18.9 40500 3.2321

Framework versions

  • Transformers 4.26.1
  • Pytorch 1.11.0+cu113
  • Datasets 2.13.0
  • Tokenizers 0.13.3
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support