We provide the following JSON-returning Web Services:
The tools backing these services are mostly not originally our own, but we've wrapped them for your convenience. For specifics, see the details of each service. For general questions about this service, contact eetu.makela@aalto.fi.
Tries to recognize the language of an input. Call with e.g.
/las/identify?text=The+quick+brown+fox+jumps+over+the+lazy+dog
or with a list of possible locales, e.g. /las/identify?text=The+quick+brown+fox+jumps+over+the+lazy+dog&locales=fi&locales=en&locales=sv
Also available using HTTP POST with parameters given either as form-urlencoded or JSON. For intensive use, there is also a JSON-understanding WebSocket-version at /las/identifyWS. All methods are CORS-enabled.
Returns results as JSON, e.g.:
{"locale":"en","certainty":0.6803500000000001,"details":{"languageRecognizerResults":{"en":0.1973},"languageDetectorResults":[{"en":1.0}],"hfstAcceptorResults":[{"en":0.84375},{"fi":0.09375},{"sme":0.010416666666666666},{"sv":0.010416666666666666},{"la":0.010416666666666666},{"tr":0.010416666666666666},{"de":0.010416666666666666},{"it":0.010416666666666666}]}}When called without parameters but with an Accept header other than text/html, returns the supported locales as JSON, e.g.:
{"acceptedLocales":["af","an","ar","ast","be","bg","bn","br","ca","cs","cy","da","de","el","en","es","et","eu","fa","fi","fr","ga","gl","gu","he","hi","hr","ht","hu","id","is","it","ja","km","kn","ko","la","liv","lt","lv","mdf","mhr","mk","ml","mr","mrj","ms","mt","myv","ne","nl","no","oc","pa","pl","pt","ro","ru","sk","sl","sme","so","sq","sr","sv","sw","ta","te","th","tl","tr","udm","uk","ur","vi","yi","zh-CN","zh-TW"]}Pretty printing is enabled with the boolean parameter pretty.
In total, the service supports 78 locales, combining results from three sources:
Lemmatizes the input into its base form.
Call with e.g. /las/baseform?text=Albert+osti+fagotin+ja+t%C3%B6r%C3%A4ytti+puhkuvan+melodian.&locale=fi
or just /las/baseform?text=The+quick+brown+fox+jumps+over+the+lazy+dog to guess locale.
Also available using HTTP POST with parameters given either as form-urlencoded or JSON. For intensive use, there is also a JSON-understanding WebSocket-version at /las/baseformWS. All methods are CORS-enabled.
Returns results as JSON (e.g. "Albert ostaa fagotti ja töräyttää puhkua melodia."
or {"locale":"en","baseform":"the quick brown fox jump over the lazy dog"}
)
When called without parameters but with an Accept header other than text/html, returns the 21 supported locales as JSON. A boolean segment parameter can be set to segment compound words with a '#'. The boolean parameter guess on the other hand decides whether baseforms will be guessed for unknown words or not. Also accepts an optional depth parameter of either 0 or 1 for less or more in-depth analysis (default=1). Pretty printing is enabled with the boolean parameter pretty.
Uses finite state transducers provided by the HFST, Omorfi and Giellatekno projects where available (locales de, en, fi, fr, it, la, liv, mdf, mhr, mrj, myv, sme, sv, tr, udm). Note that the quality and scope of the lemmatization varies wildly between languages.
Snowball stemmers are used for locales dk, es, nl, no, pt, ru (not used: de, en, fi, fr, it, sv)
Gives a morphological analysis of the text. Call with e.g. /las/analyze?text=Albert+osti&locale=fi&forms=V+N+Nom+Sg&forms=N+Nom+Pl
or just /las/analyze?text=Bier+bitte to guess locale.
Also available using HTTP POST with parameters given either as form-urlencoded or JSON. For intensive use, there is also a JSON-understanding WebSocket-version at /las/analyzeWS. All methods are CORS-enabled.
Returns results as JSON, e.g.:
[ { "word" : "Albert", "analysis" : [ { "weight" : 0.099609375, "wordParts" : [ { "lemma" : "Albert", "tags" : { "SEGMENT" : [ "Albert" ], "KTN" : [ "5" ], "UPOS" : [ "PROPN" ], "NUM" : [ "SG" ], "PROPER" : [ "LAST" ], "BASEFORM_FREQUENCY" : [ "2712" ], "CASE" : [ "NOM" ] } } ], "globalTags" : { "HEAD" : [ "3" ], "FIRST_IN_SENTENCE" : [ "TRUE" ], "DEPREL" : [ "nsubj" ], "POS_MATCH" : [ "TRUE" ], "BEST_MATCH" : [ "TRUE" ], "BASEFORM_FREQUENCY" : [ "2712" ] } }, { "weight" : 0.099609375, "wordParts" : [ { "lemma" : "Albert", "tags" : { "SEGMENT" : [ "Albert" ], "KTN" : [ "5" ], "UPOS" : [ "PROPN" ], "NUM" : [ "SG" ], "SEM" : [ "MALE" ], "PROPER" : [ "FIRST" ], "BASEFORM_FREQUENCY" : [ "2712" ], "CASE" : [ "NOM" ] } } ], "globalTags" : { "HEAD" : [ "3" ], "FIRST_IN_SENTENCE" : [ "TRUE" ], "DEPREL" : [ "nsubj" ], "POS_MATCH" : [ "TRUE" ], "BEST_MATCH" : [ "TRUE" ], "BASEFORM_FREQUENCY" : [ "2712" ] } } ] }, { "word" : " ", "analysis" : [ { "weight" : 1.0, "wordParts" : [ { "lemma" : " ", "tags" : { } } ], "globalTags" : { "WHITESPACE" : [ "TRUE" ], "BEST_MATCH" : [ "TRUE" ] } } ] }, { "word" : "osti", "analysis" : [ { "weight" : 0.099609375, "wordParts" : [ { "lemma" : "ostaa", "tags" : { "TENSE" : [ "PAST" ], "SEGMENT" : [ "ost", "{MB}i" ], "KTN" : [ "53" ], "UPOS" : [ "VERB" ], "MOOD" : [ "INDV" ], "PERS" : [ "SG3" ], "INFLECTED_FORM" : [ "V N Nom Sg" ], "VOICE" : [ "ACT" ], "INFLECTED" : [ "ostaminen" ], "BASEFORM_FREQUENCY" : [ "4034" ] } } ], "globalTags" : { "HEAD" : [ "0" ], "DEPREL" : [ "ROOT" ], "POS_MATCH" : [ "TRUE" ], "BEST_MATCH" : [ "TRUE" ], "BASEFORM_FREQUENCY" : [ "4034" ] } } ] } ]or
{ "locale" : "de", "analysis" : [ { "word" : "Bier", "analysis" : [ { "weight" : 1.0, "wordParts" : [ { "lemma" : "Bier", "tags" : { "Neut" : [ "Neut" ], "Sg" : [ "Sg" ], "+NN" : [ "+NN" ], "Nom" : [ "Nom" ] } } ], "globalTags" : { "BEST_MATCH" : [ "TRUE" ] } }, { "weight" : 1.0, "wordParts" : [ { "lemma" : "Bier", "tags" : { "Neut" : [ "Neut" ], "Sg" : [ "Sg" ], "Dat" : [ "Dat" ], "+NN" : [ "+NN" ] } } ], "globalTags" : { "BEST_MATCH" : [ "TRUE" ] } }, { "weight" : 1.0, "wordParts" : [ { "lemma" : "Bier", "tags" : { "Akk" : [ "Akk" ], "Neut" : [ "Neut" ], "Sg" : [ "Sg" ], "+NN" : [ "+NN" ] } } ], "globalTags" : { "BEST_MATCH" : [ "TRUE" ] } } ] }, { "word" : " ", "analysis" : [ { "weight" : 1.0, "wordParts" : [ { "lemma" : " ", "tags" : { } } ], "globalTags" : { "WHITESPACE" : [ "TRUE" ] } } ] }, { "word" : "bitte", "analysis" : [ { "weight" : 1.0, "wordParts" : [ { "lemma" : "bitten", "tags" : { "Sg" : [ "Sg" ], "+V" : [ "+V" ], "1" : [ "1" ], "Konj" : [ "Konj" ], "Pres" : [ "Pres" ] } } ], "globalTags" : { "BEST_MATCH" : [ "TRUE" ] } }, { "weight" : 1.0, "wordParts" : [ { "lemma" : "bitten", "tags" : { "Sg" : [ "Sg" ], "Ind" : [ "Ind" ], "+V" : [ "+V" ], "1" : [ "1" ], "Pres" : [ "Pres" ] } } ], "globalTags" : { "BEST_MATCH" : [ "TRUE" ] } }, { "weight" : 1.0, "wordParts" : [ { "lemma" : "bitten", "tags" : { "Sg" : [ "Sg" ], "+V" : [ "+V" ], "Konj" : [ "Konj" ], "3" : [ "3" ], "Pres" : [ "Pres" ] } } ], "globalTags" : { "BEST_MATCH" : [ "TRUE" ] } }, { "weight" : 1.0, "wordParts" : [ { "lemma" : "bitten", "tags" : { "Sg" : [ "Sg" ], "+V" : [ "+V" ], "Imp" : [ "Imp" ] } } ], "globalTags" : { "BEST_MATCH" : [ "TRUE" ] } }, { "weight" : 1.0, "wordParts" : [ { "lemma" : "bitte", "tags" : { "+PTKL" : [ "+PTKL" ], "Ant" : [ "Ant" ] } } ], "globalTags" : { "BEST_MATCH" : [ "TRUE" ] } }, { "weight" : 1.0, "wordParts" : [ { "lemma" : "bitte", "tags" : { "+ADV" : [ "+ADV" ] } } ], "globalTags" : { "BEST_MATCH" : [ "TRUE" ] } } ] } ] }When called without parameters but with an Accept header other than text/html, returns the 15 supported locales as JSON (e.g.
{"acceptedLocales":["de","en","fi","fr","it","la","liv","mdf","mhr","mrj","myv","sme","sv","tr","udm"]}
).
A boolean segment parameter can be set to segment compound words with a '#'. The boolean parameter guess on the other hand decides whether baseforms will be guessed for unknown words or not. Also accepts an optional depth parameter of 0-2 for less or more in-depth analysis (default=2). Pretty printing is enabled with the boolean parameter pretty.
The analysis web services also supports inflection, with the same parameters as the inflection service.
Uses finite state transducers provided by the HFST, Omorfi and Giellatekno projects. Note that the quality and scope of analysis as well as tags returned vary wildly between languages.
Analysis: {{analysis|json}}
Transforms the text given a set of inflection forms, by default also converting words not matching the inflection forms to their base form. Call with e.g. /las/inflect?text=Albert+osti+fagotin&forms=V+N+Nom+Sg&forms=N+Nom+Pl&segment=true
or /las/inflect?text=Albert+osti+fagotin&forms=V+N+Nom+Sg&forms=N+Nom+Pl
Also available using HTTP POST with parameters given either as form-urlencoded or JSON. For intensive use, there is also a JSON-understanding WebSocket-version at /las/inflectWS. All methods are CORS-enabled.
Returns results as JSON (e.g. "Albert ostaminen fagotit"
)
When called without parameters but with an Accept header other than text/html, returns the 14 supported locales as JSON (e.g. {"acceptedLocales":["de","en","fi","fr","it","liv","mdf","mhr","mrj","myv","sme","sv","tr","udm"]}
).
A boolean segment parameter can be set to segment compound words with a '#'. The boolean parameter guess on the other hand decides whether baseforms will be guessed for unknown words or not.
The boolean baseform parameter decides whether uninflected words are returned in their baseform or original form. Pretty printing is enabled with the boolean parameter pretty.
Uses finite state transducers provided by the HFST, Omorfi and Giellatekno projects. Note that the inflection form syntaxes differ wildly between languages.
Hyphenates the given text. Call with e.g. /las/hyphenate?text=Albert+osti+fagotin+ja+t%C3%B6r%C3%A4ytti+puhkuvan+melodian.&locale=fi
or just /las/hyphenate?text=ein+Bier+bitte to guess locale.
Also available using HTTP POST with parameters given either as form-urlencoded or JSON. For intensive use, there is also a JSON-understanding WebSocket-version at /las/hyphenateWS. All methods are CORS-enabled.
Returns results as JSON (e.g. "al-bert os-ti fa-go-tin ja tö-räyt-ti puh-ku-van me-lo-dian ."
or {"locale":"fi","hyphenation":"ein bier bit-te"}
)
When called without parameters but with an Accept header other than text/html, returns the 46 supported locales as JSON, e.g.:
{"acceptedLocales":["bg","ca","cop","cs","cy","da","el","es","et","eu","fi","fr","ga","gl","hr","hsb","hu","ia","in","is","it","la","liv","mdf","mhr","mn","mrj","myv","nb","nl","nn","pl","pt","ro","ru","sa","sh","sk","sl","sme","sr","sv","tr","udm","uk","zh"]}Pretty printing is enabled with the boolean parameter pretty.
Uses finite state transducers provided by the HFST, Omorfi and Giellatekno projects. Those provided by HFST have been automatically translated from the TeX CTAN distribution's hyphenation rulesets.