programing tip

Genia Corpus로 Stanford Parser를 훈련시키는 방법은 무엇입니까?

itbloger 2020. 8. 31. 07:32

Genia Corpus로 Stanford Parser를 훈련시키는 방법은 무엇입니까?

Stanford Parser의 새 모델을 만드는 데 몇 가지 문제가 있습니다.

Stanford에서 마지막 버전도 다운로드했습니다.

그리고 여기에는 xml과 ptb (Penn Treebank)의 두 가지 형식의 Genia Corpus가 있습니다.

Standford Parser는 ptd 파일로 학습 할 수 있습니다. 그런 다음 생의학 텍스트 작업을 원하기 때문에 Genia Corpus를 다운로드했습니다. (더 이상 사용할 수없는 링크) (genia_ptb.tar.gz)

그런 다음 하나의 생체 의학 문장의 종속성 표현을 얻기 위해 짧은 Main 클래스가 있습니다.

    String treebankPath = "/stanford-parser-2012-05-22/genia_ptb/GENIA_treebank_v1/ptb";

    Treebank tr = op.tlpParams.diskTreebank();
    LexicalizedParser lpc=LexicalizedParser.trainFromTreebank(tr,op);

나는 다른 방법을 시도했지만 항상 같은 결과를 얻습니다.

마지막 줄에 오류가 있습니다. 이것은 내 결과입니다.

Currently Fri Jun 01 15:02:57 CEST 2012
Options parameters:
useUnknownWordSignatures 2
smoothInUnknownsThreshold 100
smartMutation false
useUnicodeType false
unknownSuffixSize 1
unknownPrefixSize 1
flexiTag true
useSignatureForKnownSmoothing false
parserParams edu.stanford.nlp.parser.lexparser.EnglishTreebankParserParams
forceCNF false
doPCFG true
doDep false
freeDependencies false
directional true
genStop true
distance true
coarseDistance false
dcTags false
nPrune false
Train parameters: smooth=false PA=true GPA=false selSplit=true (400.0; deleting [VP^SQ, VP^VP, VP^SINV, VP^NP]) mUnary=1 mUnaryTags=false sPPT=false tagPA=true tagSelSplit=false (0.0) rightRec=true leftRec=false collinsPunc=false markov=true mOrd=2 hSelSplit=true (10) compactGrammar=3 postPA=false postGPA=false selPSplit=false (0.0) tagSelPSplit=false (0.0) postSplitWithBase=false fractionBeforeUnseenCounting=0.5 openClassTypesThreshold=50 preTransformer=null taggedFiles=null
Using EnglishTreebankParserParams splitIN=4 sPercent=true sNNP=0 sQuotes=false sSFP=false rbGPA=false j#=false jJJ=false jNounTags=false sPPJJ=false sTRJJ=false sJJCOMP=false sMoreLess=false unaryDT=true unaryRB=true unaryPRP=false reflPRP=false unaryIN=false sCC=1 sNT=false sRB=false sAux=2 vpSubCat=false mDTV=2 sVP=3 sVPNPAgr=false sSTag=0 mVP=false sNP%=0 sNPPRP=false dominatesV=1 dominatesI=false dominatesC=false mCC=0 sSGapped=4 numNP=false sPoss=1 baseNP=1 sNPNNP=0 sTMP=1 sNPADV=1 cTags=true rightPhrasal=false gpaRootVP=false splitSbar=0 mPPTOiIN=0
Binarizing trees...done. Time elapsed: 141 ms
Extracting PCFG...done. Time elapsed: 56 ms
Compiling grammar...done Time elapsed: 1 ms
Extracting Lexicon...Exception in thread "main" edu.stanford.nlp.util.ReflectionLoading$ReflectionLoadingException: edu.stanford.nlp.util.MetaClass$ClassCreationException: java.lang.ClassNotFoundException: edu.stanford.nlp.parser.lexparser.EnglishUnknownWordModelTrainer
    at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(
    at edu.stanford.nlp.parser.lexparser.BaseLexicon.initializeTraining(
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserFromTreebank(
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.trainFromTreebank(
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.trainFromTreebank(
    at ABravoDemo.main(
Caused by: edu.stanford.nlp.util.MetaClass$ClassCreationException: java.lang.ClassNotFoundException: edu.stanford.nlp.parser.lexparser.EnglishUnknownWordModelTrainer
    at edu.stanford.nlp.util.MetaClass.createFactory(
    at edu.stanford.nlp.util.MetaClass.createInstance(
    at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(
    ... 5 more
Caused by: java.lang.ClassNotFoundException: edu.stanford.nlp.parser.lexparser.EnglishUnknownWordModelTrainer
    at Method)
    at java.lang.ClassLoader.loadClass(
    at sun.misc.Launcher$AppClassLoader.loadClass(
    at java.lang.ClassLoader.loadClass(
    at java.lang.ClassLoader.loadClassInternal(
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(
    at edu.stanford.nlp.util.MetaClass$ClassFactory.construct(
    at edu.stanford.nlp.util.MetaClass$ClassFactory.<init>(
    at edu.stanford.nlp.util.MetaClass$ClassFactory.<init>(
    at edu.stanford.nlp.util.MetaClass.createFactory(
    ... 7 more

이 말뭉치로 새 모델을 어떻게 만들 수 있습니까?

As andrucz stated in his comment, the real cause of your problem seems to stem from a missing class.

Try checking whether you correctly imported your library ( and make sure that it contains the class EnglishUnknownWordModelTra‌​iner in edu.stanford.nlp.parser.lexparser.

(If you're using Maven, verify that you correctly added the dependency - a quick google brougt this up : Stanford Parser Maven Repo )

Did the NLP library install correctly? Check in the logs to verify there are no errors. Most of the times this issue comes when there the stanford nltk library does not install correctly.

A quick way to check is by running the GUI to try out the parser if that runs successfully then the library installed correctly otherwise if that throws errors then you know your installation was poor.

The Stanford website also mentions this take a look:

If you're new to parsing, you can start by running the GUI to try out the parser. Scripts are included for linux ( and Windows (lexparser-gui.bat). Take a look at the Javadoc lexparser package documentation and LexicalizedParser class documentation. (Point your web browser at the index.html file in the included javadoc directory and navigate to those items.) Look at the parser FAQ for answers to common questions. If none of that helps, please see our email guidelines for instructions on how to reach us for further assistance.

Check whether you have correctly imported library and make sure that it is containing the class {EnglishUnknownWordModelTra‌​iner} and also make sure that version you downloaded properly works with Genia Corps.

참고URL :
