What is Spark NLP?
Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark ML. It delivers high-performance and accurate NLP annotations for scalable machine learning pipelines in a distributed environment. Boasting an extensive collection of 160+ pre-trained pipelines and models spanning over 20 languages, Spark NLP offers a versatile toolkit for tackling a wide range of NLP tasks and applications
Highlights
- Scalable distributed NLP capabilities leveraging Apache Spark ML
- 160+ pre-trained pipelines and models across 20+ languages
- Support for a diverse range of NLP tasks and applications
- Open-source with an active community (3.3K GitHub stars, 663 forks)
Features
Regex Matching
Chunking
Word Embeddings (GloVe and Word2Vec)
Lemmatizer
Part-of-speech tagging
Text Matching
Spell Checker (ML and DL models)
Sentence Detector
Stemmer
Dependency parsing (Labeled/unlabled)
BERT Embeddings
Chunk Embeddings
Date Matcher
Sentiment Detection (ML models)
Normalizer
NGrams
Tokenization
Stop Words Removal
Universal Sentence EncoderSentence Embeddings
ELMO Embeddings