Counting 3,663 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

Last Commit
Mar. 24, 2019
Feb. 15, 2018


seqeval is a Python framework for sequence labeling evaluation. seqeval can evaluate the performance of chunking tasks such as named-entity recognition, part-of-speech tagging, semantic role labeling and so on.

This is well-tested by using the Perl script conlleval, which can be used for measuring the performance of a system that has processed the CoNLL-2000 shared task data.

Support features

seqeval supports following formats:

  • IOB1
  • IOB2
  • IOE1
  • IOE2

and supports following metrics:

metrics description
accuracy_score(y_true, y_pred) Compute the accuracy.
precision_score(y_true, y_pred) Compute the precision.
recall_score(y_true, y_pred) Compute the recall.
f1_score(y_true, y_pred) Compute the F1 score, also known as balanced F-score or F-measure.
classification_report(y_true, y_pred, digits=2) Build a text report showing the main classification metrics. digits is number of digits for formatting output floating point values. Default value is 2.


Behold, the power of seqeval:

>>> from seqeval.metrics import accuracy_score
>>> from seqeval.metrics import classification_report
>>> from seqeval.metrics import f1_score
>>> y_true = [['O', 'O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
>>> y_pred = [['O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
>>> f1_score(y_true, y_pred)
>>> accuracy_score(y_true, y_pred)
>>> classification_report(y_true, y_pred)
             precision    recall  f1-score   support

       MISC       0.00      0.00      0.00         1
        PER       1.00      1.00      1.00         1

avg / total       0.50      0.50      0.50         2


To install seqeval, simply run:

$ pip install seqeval


  • numpy >= 1.14.0