Jump to content

LimeSurveyPerformanceRecs

From Wikitech

LimeSurvey Recommendations

Support for InnoDB

LS exposes the $databasetabletype which can be set to InnoDB, however the schema breaks when using InnoDB for large surveys due to storing all of a users responses in a single row. InnoDB has a row limit of 8k (one page) which includes the first 768 bytes of text columns or blobs, and the full length of most other types. Several WMF surveys were designed to allow responses that exceed this, they can't be converted to InnoDB. There's no limit around designing such surveys, so if using InnoDB, we would end up frustrating users who invested time responding to a long survey, only to have it break on them when their complete responses couldn't be saved.

Session handling

Neither of the available handlers are appropriate for the kind of scale wikipedia can generate. The file handler doesn't scale properly across servers, the db based handler places too great of a write burden on the db if exposed to a high number of unique requests. Memcached based sessions would fix this.

Database efficiency

Due to the use of the ADODB ORM, making major improvements here would be a large undertaking, but effective caching should fully mask the issue. ADODB does itself support memcached, though it doesn't appear that LimeSurvey exposes config options to take advantage of it. I suspect that implementing Memcached in LimeSurvey above the ADODB layer would be more effective however, as well as providing LS developers better control over how caching works.

  • When examining database utilization, loading one page of survey presenting 10 questions (out of approximately 80 total) generated 677 db queries. Many of these, such as:
 SELECT defaultvalue FROM `lime_defaultvalues` WHERE qid=7822 AND scale_id=0 AND language='en' LIMIT 1
 SELECT code, answer FROM `lime_answers` WHERE qid=7465 AND code= AND scale_id=0 AND language='en'

iterated through all 80 of the questions in the survey, even though only 10 were being displayed. This pattern repeated for every page view.

  • There were quite a few duplicate queries as well, this was one executed over 40 times per single page view, all on the same qid:
 SELECT code FROM lime_answers WHERE qid=7465 AND language='en'
  • This query was executed 17 times in a page view:
 SELECT * FROM `lime_assessments` WHERE sid=52291 and language='en' ORDER BY scope,id
  • Use of like/wildcards could be eliminated
 SELECT c.qid, c.scenario, c.cqid, c.cfieldname, c.value,  as type, 0 as sid, 0 as gid, c.method, q.gid as srcgid 
 FROM lime_conditions c, lime_questions q WHERE c.qid=q.qid AND c.qid=7470 AND c.cfieldname LIKE '{%' 
 ORDER BY c.scenario, c.cqid, c.cfieldname

As well as the heavy use of ORDER BY on non-indexed columns. Simple sorting can be done very efficiently in php if data needs to be presented in something other than primary-key order.

Summation

Since survey's are unlikely to be edited much once live, they should be highly cacheable. Changes to ensure that InnoDB is usable for all surveys regardless of size and implementation of memcached for session handling and all survey data objects should be sufficient to operate LimeSurvey at Wikipedia's scale.