Spark NLP logo

Spark NLP

Provides accurate NLP annotations for scalable machine learning pipelines with 160+ pretrained models in 20+ languages.

Made by Unknown Author

    What is Spark NLP?

    Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark ML. It delivers high-performance and accurate NLP annotations for scalable machine learning pipelines in a distributed environment. Boasting an extensive collection of 160+ pre-trained pipelines and models spanning over 20 languages, Spark NLP offers a versatile toolkit for tackling a wide range of NLP tasks and applications

    Highlights

    • Scalable distributed NLP capabilities leveraging Apache Spark ML
    • 160+ pre-trained pipelines and models across 20+ languages
    • Support for a diverse range of NLP tasks and applications
    • Open-source with an active community (3.3K GitHub stars, 663 forks)

    Features

      • Regex Matching

      • Chunking

      • Word Embeddings (GloVe and Word2Vec)

      • Lemmatizer

      • Part-of-speech tagging

      • Text Matching

      • Spell Checker (ML and DL models)

      • Sentence Detector

      • Stemmer

      • Dependency parsing (Labeled/unlabled)

      • BERT Embeddings

      • Chunk Embeddings

      • Date Matcher

      • Sentiment Detection (ML models)

      • Normalizer

      • NGrams

      • Tokenization

      • Stop Words Removal

      • Universal Sentence EncoderSentence Embeddings

      • ELMO Embeddings