User talk:AndreaWest/WDQS Q and A

From Wikitech
Jump to navigation Jump to search

Wikidata Query Questions

Hi @AndreaWest: regarding some of your questions:

Question: Do queries use SPARQL functions such as CONCAT? If so, what functions are used?
Question: What SPARQL functions are used and does their use correlate with timeouts occurring?

  • You can expect pretty much all of the functions defined in section 17 of https://www.w3.org/TR/sparql11-query to have been used for some purpose or another. One way to get at least some idea of what may be being used for what is to search the archives at d:WD:RAQ -- eg for pages containing occurrences of CONCAT(  : [1]
As to whether their use correlate with timeouts occurring, I would doubt that the functions are causing timeouts -- but it may be that some queries that are more ambitious are more likely to use less common parts of the language, and might also be more likely to hit the 60-second limiter. And I suppose one does expect REGEX() or string functions generally to be that bit more expensive, and so reduce the amount that can be achieved in 60 seconds. But most timeouts, I suspect, are simply because the current solution set has got very very large at some point in the query -- for reasons that may be avoidable, or that may not, depending on what is trying to be achieved.

Question: Do queries ever use the SPARQL forms, CONSTRUCT, INSERT, DELETE and DESCRIBE? (Or only SELECT and ASK)

  • As ordinary users, I don't think we have the write permissions to INSERT and DELETE, so I think these may not be used -- at least, not by regular users.
At least one example of DESCRIBE and one of CONSTRUCT can be found in the RAQ archives. I suspect they are very rarely used by the local community here, given that you can't then INSERT the results. But if you looked at a dump of queries submitted, it is just possible that you might find them more used by more 'external' users, who might eg also be working with their own SPARQL installation elsewhere, or might have some particular use for an output of triples.

Question: For all wikidata, what is the count/prevalence of items that are only used as subjects (NOT used as objects)?
Question: Same question as above but for scholarly article items only

  • Interesting questions. But remember, even when items are not currently used as objects, they may have been created for the potential to be used as objects. In particular, I suspect only a very small proportion of scholarly article items are currently used as objects, but it's often been argued that they're good to have here for the potential to use them as objects of references.

GeoSPARQL

  • Not sure if the present limited GeoSPARQL-like functionality was added in-house no, or whether WMF paid the Blazegraph dev team to implement it, or whether the Blazegraph team were promising it, and this is as far as they got, but that may explain why only a very limited barebones subset of features were implemented. AIUI the full GeoSPARQL standard is pretty vast, with all its requirements for identifying object overlapping etc.
One thing of note may be Blazegraph's capability to allow custom datatypes to be added, and functions to be extended/overloaded for them, beyond the ones defined in section 17.1 of the standard [2]. I believe this may have been used to add geo:wktLiteral as a type, one that STRDT() and STR() can be used to convert from and to strings. It may be that other custom types have been defined too. See [3], in particular the paragraph [4]
One thing that the user community might definitely appreciate would be a more flexible datatype for representing times than the basic xsd:dateTime, which (AFAIK) cannot represent the precision of dates -- so dates with only precision to a year appear in query outputs with "1 January" as a spurious day and month. An extended type with more flexibility would be valuable, so long as all functions like < = > day() month() year() max() min() str() strdt() etc were extended/overloaded to be defined for it.
Hope these thoughts help. d:User:Jheald (talk) 20:00, 28 January 2022 (UTC)Reply[reply]

Extending types and overloading operators

Hi @AndreaWest:. Under "Other Things to Consider" you write that "Use of custom data types and functions ... would require both a change to Wikidata's ontology and then new SPARQL functions. Such a change would be out-of-scope for this work on Blazegraph alternatives."

I agree that actually implementing any custom data types and functions would be out-of-scope.

However, considering whether it is possible to define custom data types, and to define code to overload functions to appropriately apply to them, surely should be in scope, to indicate how open to such future extensions the various shortlist candidates may be. (Indeed some might even have additional types pre-rolled).

An additional use for such functionality might be to define a "URL + link text" type, that a query variable could be cast to (or emitted as, by some function). It would be useful if the GUI could identify such an object, and present such output as linked text in a single column, saving screen estate from the two separate columns for URL + text that are now required.

The functionality could also have other benefits in future, to accommodate new desirable types, in the way that eg Blazegraph was able to accommodate wkt objects, because it had this extensibility. Jheald (talk) 23:13, 27 June 2022 (UTC)Reply[reply]

@Jheald It appears that custom datatypes can be supported by Apache Jena (https://jena.apache.org/documentation/notes/typed-literals.html), RDF4J (extending https://rdf4j.org/javadoc/3.3.0/org/eclipse/rdf4j/rio/DatatypeHandler.html) and Virtuoso (http://docs.openlinksw.com/virtuoso/datatypes/). Similar to custom functions, QLever would require manipulating the code directly and would likely require help from its development team. Andrea Westerinen (talk) 16:04, 28 June 2022 (UTC)Reply[reply]
Thanks for checking this out. It's good to know that the option is there, if thought useful down the line. Jheald (talk) 17:48, 28 June 2022 (UTC)Reply[reply]