News/Actor storage changes on the Wiki Replicas
The Actor table was introduced to MediaWiki by T167246 and related work on the Actor migration project. This page is about related changes that have been made to the Wiki Replica views to be compatible with the MediaWiki operational database schema. These changes to the Wiki Replicas were first announced on 2019-05-17 via the cloud-announce mailing list and then made live on 2019-06-03.
*_user and *_user_text fields have been removed from the Wiki Replica views. This change was made to keep the replicas ahead of changes being made to the production databases in the near future.
An example error message that could be caused by this change is:
ERROR 1054 (42S22): Unknown column 'ar_user' in 'field list'
The fields that were dropped and their replacements (see T223406 for more details):
|Table||Columns removed||Columns added|
The same changes were made to the associated _userindex views such as
Tables with names ending in _compat will be effectively unchanged, but the hidden joins that preserve the legacy *_user and *_user_text fields will make queries involving those tables slower than direct use of the actor table.
Changes to the
revision_userindex view were made to optimize usage of the
rev_actor field in T221339. These changes have the side effect of reducing the speed of queries using the
What should I do?
The best solution
Every table that had user and user_text type fields has a similar actor field. That field is the id to find in the actor table.
ar_user_text from the
archive table would be replaced with a join on the
actor table using the
ar_actor field as
actor_id or a separate query, which can be faster in some cases, to obtain the needed
actor_name associated with a given archive action.
The list of actor fields follows:
What about a quick solution if the refactor is too time consuming at the moment
Tables affected by the actor change also have a _compat view (for example archive_compat) which have the actor table joins already present in the view's code and will present just like the affected table did before the changes took place in the MediaWiki codebase. The _compat views will suffer a performance penalty because of those same joins. In some cases, where performance isn't the most important factor, this may be the quickest way to fix a tool or application. Use of the _compat views should not be considered a permanent solution for a long lived tool. These views will very likely be removed in the future.
The actor table seems really slow--so does comment
The views for actor and comment are burdened by subqueries against 8 other tables in order to ensure that only the correct rows are available. This is being worked on here T215445, but since that's work toward a long-term solution, WMCS has put up a set of sub-views of these tables to make things faster for individual queries (phab:T224850). Each subview only makes a subquery against a single table. This way, if you are looking for an
actor row from the
logging table, then you can query against the
actor_logging view. This view will be more efficient for that case. The
actor_logging view will, naturally, not have any rows available that might be referenced instead in the
revision table or the
Example query using the new views (in this case actor_logging), provided by:
SELECT * FROM logging_userindex JOIN actor_logging ON actor_id = log_actor WHERE actor_name = 'MusikAnimal'
The specialized views that can be used in this case are:
Advanced use cases of specialized views
There are other more clever ways to exploit the fact that these views include a single subquery against the external table in the second part of the name to remove joins or where clauses from your own queries, but we suggest experimenting to ensure you are getting the same results if you try this. An example of this is, if you were just trying to see if user "Foo" had ever made edits to a wiki, you could reasonably get that from
SELECT 1 FROM actor_revision WHERE actor_name = 'Foo' because any rows with that actor name would be rows that are visible in the revision table. This would likely be faster than a join query for that particular piece of information.