Come velocizzare le query di aggregazione?

In seguito è la query di aggregazione:Come velocizzare le query di aggregazione?

[ 
    { 
    "$match": { 
     "UserId": { 
     "$in": [ 
      5 
     ] 
     }, 
     "WorkflowStartTime": { 
     "$gte": ISODate('2015-04-09T00:00:00.000Z'), 
     "$lte": ISODate('2015-04-16T00:00:00.000Z') 
     } 
    } 
    }, 
    { 
    "$group": { 
     "_id": { 
     "Task": "$TaskId", 
     "WorkflowId": "$WorkflowInstanceId" 
     }, 
     "TaskName": { 
     "$first": "$Task" 
     }, 
     "StartTime": { 
     "$first": "$StartTime" 
     }, 
     "EndTime": { 
     "$last": "$EndTime" 
     }, 
     "LastExecutionTime": { 
     "$last": "$StartTime" 
     }, 
     "WorkflowName": { 
     "$first": "$WorkflowName" 
     } 
    } 
    }, 
    { 
    "$project": { 
     "_id": 1, 
     "LastExecutionTime": 1, 
     "TaskName": 1, 
     "AverageExecutionTime": { 
     "$subtract": [ 
      "$EndTime", 
      "$StartTime" 
     ] 
     }, 
     "WorkflowName": 1 
    } 
    }, 
    { 
    "$group": { 
     "_id": "$_id.Task", 
     "LastExecutionTime": { 
     "$last": "$LastExecutionTime" 
     }, 
     "AverageExecutionTime": { 
     "$avg": "$AverageExecutionTime" 
     }, 
     "TaskName": { 
     "$first": "$TaskName" 
     }, 
     "TotalInstanceCount": { 
     "$sum": 1 
     }, 
     "WorkflowName": { 
     "$first": "$WorkflowName" 
     } 
    } 
    }, 
    { 
    "$project": { 
     "Id": "$_id", 
     "_id": 0, 
     "Name": "$TaskName", 
     "LastExecutionDate": { 
     "$substr": [ 
      "$LastExecutionTime", 
      0, 
      30 
     ] 
     }, 
     "AverageExecutionTimeInMilliSeconds": "$AverageExecutionTime", 
     "TotalInstanceCount": "$TotalInstanceCount", 
     "WorkflowName": 1 
    } 
    } 
]

miei documenti di raccolta sono i seguenti:

{ 
     "_id" : ObjectId("550ff07ce4b09bf056df4ac1"), 
     "OutputData" : "xyz", 
     "InputData" : null, 
     "Location" : null, 
     "ChannelName" : "XYZ", 
     "UserId" : 5, 
     "TaskId" : 95, 
     "ChannelId" : 5, 
     "Status" : "Success", 
     "TaskTypeId" : 7, 
     "WorkflowId" : 37, 
     "Task" : "XYZ", 
     "WorkflowStartTime" : ISODate("2015-03-23T05:09:26Z"), 
     "EndTime" : ISODate("2015-03-23T05:22:44Z"), 
     "StartTime" : ISODate("2015-03-23T05:22:44Z"), 
     "TaskType" : "TRIGGER", 
     "WorkflowInstanceId" : "23-3-2015-95d17f17-2580-4fe3-b627-12e862af08ce", 
     "StackTrace" : null, 
     "WorkflowName" : "XYZ data workflow" 
}

Ho un indice su {WorkflowStartTime: 1, UserId: 1, StartTime: 1}

I loro sono appena 900000 record nella raccolta e, poiché sto utilizzando un sottoinsieme di dati mentre sto usando l'intervallo di date, ci sono ancora da 1,5 a 1,7 secondi. Ho usato il framework di aggregazione con altre collezioni con enormi dati e le prestazioni sono molto buone. Non so cosa c'è di sbagliato in questa query poiché mostra un output molto lento, mi aspetto che sia in milles come una query di analisi in tempo reale. Ogni puntatore apprezzabile.

uscita quando {spiegare: true} aggiunti all'aggregazione interrogazione

{ 
    "stages": [ 


     { 
      "$cursor": { 
      "query": { 
       "UserId": { 
       "$in": [ 
        5 
       ] 
       }, 
       "WorkflowStartTime": { 
       "$gte": "ISODate(2015-04-09T00:00:00Z)", 
       "$lte": "ISODate(2015-04-16T00:00:00Z)" 
       } 
      }, 
      "fields": { 
       "EndTime": 1, 
       "StartTime": 1, 
       "Task": 1, 
       "TaskId": 1, 
       "WorkflowInstanceId": 1, 
       "WorkflowName": 1, 
       "_id": 0 
      }, 
      "plan": { 
       "cursor": "BtreeCursor ", 
       "isMultiKey": false, 
       "scanAndOrder": false, 
       "indexBounds": { 
       "WorkflowStartTime": [ 
        [ 
        "ISODate(2015-04-16T00:00:00Z)", 
        "ISODate(2015-04-09T00:00:00Z)" 
        ] 
       ], 
       "UserId": [ 
        [ 
        5, 
        5 
        ] 
       ] 
       }, 
       "allPlans": [ 
       { 
        "cursor": "BtreeCursor ", 
        "isMultiKey": false, 
        "scanAndOrder": false, 
        "indexBounds": { 
        "WorkflowStartTime": [ 
         [ 
         "ISODate(2015-04-16T00:00:00Z)", 
         "ISODate(2015-04-09T00:00:00Z)" 
         ] 
        ], 
        "UserId": [ 
         [ 
         5, 
         5 
         ] 
        ] 
        } 
       } 
       ] 
      } 
      } 
     }, 
     { 
      "$group": { 
      "_id": { 
       "Task": "$TaskId", 
       "WorkflowId": "$WorkflowInstanceId" 
      }, 
      "TaskName": { 
       "$first": "$Task" 
      }, 
      "StartTime": { 
       "$first": "$StartTime" 
      }, 
      "EndTime": { 
       "$last": "$EndTime" 
      }, 
      "LastExecutionTime": { 
       "$last": "$StartTime" 
      }, 
      "WorkflowName": { 
       "$first": "$WorkflowName" 
      } 
      } 
     }, 
     { 
      "$project": { 
      "_id": true, 
      "LastExecutionTime": true, 
      "TaskName": true, 
      "AverageExecutionTime": { 
       "$subtract": [ 
       "$EndTime", 
       "$StartTime" 
       ] 
      }, 
      "WorkflowName": true 
      } 
     }, 
     { 
      "$group": { 
      "_id": "$_id.Task", 
      "LastExecutionTime": { 
       "$last": "$LastExecutionTime" 
      }, 
      "AverageExecutionTime": { 
       "$avg": "$AverageExecutionTime" 
      }, 
      "TaskName": { 
       "$first": "$TaskName" 
      }, 
      "TotalInstanceCount": { 
       "$sum": { 
       "$const": 1 
       } 
      }, 
      "WorkflowName": { 
       "$first": "$WorkflowName" 
      } 
      } 
     }, 
     { 
      "$project": { 
      "_id": false, 
      "Id": "$_id", 
      "Name": "$TaskName", 
      "LastExecutionDate": { 
       "$substr": [ 
       "$LastExecutionTime", 
       { 
        "$const": 0 
       }, 
       { 
        "$const": 30 
       } 
       ] 
      }, 
      "AverageExecutionTimeInMilliSeconds": "$AverageExecutionTime", 
      "TotalInstanceCount": "$TotalInstanceCount", 
      "WorkflowName": true 
      } 
     } 
     ], 
     "ok": 1 
    }

fonte

2015-04-15 Ninad

[ciò che l'uscita spiega dire sulla query] (http://docs.mongodb.org/manual/reference/command/aggregate/)? – Philipp

@Philipp spiega l'output aggiunto, mi mostra sempre questo solo per spiegare l'output ma in documenti è qualcosa d'altro mi manca qualcosa? – Ninad

Dice che sta usando 'BtreeCursor', che dovrebbe andare bene: * Una query che usa un indice ha un cursore di tipo BtreeCursor * (http://openmymind.net/Speedig-Up-Queries-Understanding-Query-Plans /). Potresti eseguire la tua query senza framework di aggregazione e dirci quanti documenti di corrispondenza trova? '...' forse potrebbe anche aiutare a rovesciare l'indice a '{UserId: 1, WorkflowStartTime: 1}', perché la tua ricerca '$ match' ha prima' UserId' e poi 'WorkflowStartTime'. Ma potrebbe anche non portare a nessun vantaggio. –

L'aggregazione non utilizzare Index. È necessario creare un nuovo indice:

{UserId:1,WorkflowStartTime:1}

Se tutto va bene, l'aggregazione + spiegare devono apparire questa linea:

"winningPlan" :...

fonte

2015-04-21 04:27:53

la mia query utilizza già l'indice nella fase di corrispondenza e utilizzo workflowstartTime prima di userId nell'indice è utile bcz stiamo eseguendo query di intervallo su WorkflowStartTime – Ninad

Un'altra soluzione che è possibile modificare l'ordine di "corrispondenza", prima "WorkflowStartTime" e seconda "ID utente". Credo che l'indice non stia usando nell'aggregazione. –

Nella spiegazione dell'output, il valore per ** cursor ** nella sezione ** plan ** è ** Btree ** che indica che l'indice è utilizzato – Ninad

Come velocizzare le query di aggregazione?

risposta

Problemi correlati