SparkNLP简单样例(MRS-offline)
前期准备:
-
创建MRS2.1.0非安全集群
-
样例代码
在提交任务的节点(比如master1),代码文件路径为/opt/Bigdata/program
spark_nlp_demo.py代码如下:
from pyspark.sql import SparkSession
from pyspark.sql import Row
conf=SparkConf().setAppName(“Bird”)
sc = SparkContext.getOrCreate(conf)
spark = SparkSession(sc)
sparknlp.start()
finisher = Finisher().setInputCols([“token”, “lemmas”, “pos”])
loadedDocumentPipeline = PipelineModel.load(‘hdfs:///sparknlp/explain_document_ml_en_2.4.0_2.4_1580252705962/’)
pipeline = Pipeline()
.setStages([
loadedDocumentPipeline,
finisher
])
sentences = [
[‘Hello, this is an example sentence’],
[‘And this is a second sentence.’]
]
data = spark.createDataFrame(sentences).toDF(“text”)
model = pipeline.fit(data)
annotations_finished_df = model.transform(data)
annotations_finished_df.select(‘finished_token’).show(truncate=False)
a=np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20])
da=sc.parallelize(a)
rep=da.repartition(3)
reduce=rep.reduce(lambda a,b : a+b)
print(reduce)
sc.stop() -
将模型文件上传到"hdfs://sparknlp/"目录下
方案:
- 依赖包上传
1.1 将sparknlp.zip上传到”hdfs://sparknlp/”目录下,sparknlp.zip使用下面链接中的 sparknlp.zip
https://bbs.huaweicloud.com/blogs/300155
1.2 从上面文件下载spark-nlp-spark23-assembly-3.0.3.jar,放置在/opt/Bigdata/program路径下 - 使用cluster模式运行测试
/opt/client/Spark/spark/bin/spark-submit --conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=./sparknlp.zip/sparknlp/bin/python --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./sparknlp.zip/sparknlp/bin/python --master yarn-cluster --archives hdfs:///sparknlp/sparknlp.zip --jars /opt/Bigdata/program/spark-nlp-spark23-assembly-3.0.3.jar /opt/Bigdata/program/spark_nlp_demo.py
- 点赞
- 收藏
- 关注作者
评论(0)