2015-12-17 3 views
9

Il mio lavoro di scintilla sembra passare molto tempo a ottenere blocchi. A volte lo farà per un'ora o 2. Ho 1 partizione per il mio set di dati, quindi non sono sicuro del motivo per cui sta facendo così tanto mischiare. Qualcuno sa cosa sta succedendo esattamente qui?Cosa sta succedendo quando Spark chiama ShuffleBlockFetcherIterator?

15/12/16 18:05:27 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 
15/12/16 18:05:27 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks 
15/12/16 18:05:27 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 
15/12/16 18:05:40 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks 
15/12/16 18:05:40 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 
15/12/16 18:05:40 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks 
15/12/16 18:05:40 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 
15/12/16 18:05:59 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks 
15/12/16 18:05:59 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 
15/12/16 18:05:59 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks 
15/12/16 18:05:59 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 
15/12/16 18:06:13 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks 
15/12/16 18:06:13 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 
15/12/16 18:06:13 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks 
15/12/16 18:06:13 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 
15/12/16 18:06:33 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks 
15/12/16 18:06:33 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 
15/12/16 18:06:33 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks 
15/12/16 18:06:33 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 
15/12/16 18:06:49 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks 
15/12/16 18:06:49 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 
15/12/16 18:06:49 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks 
15/12/16 18:06:49 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 
15/12/16 18:07:14 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks 
15/12/16 18:07:14 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms 
15/12/16 18:07:14 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks 
15/12/16 18:07:14 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 
15/12/16 18:07:33 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks 
15/12/16 18:07:33 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms 
15/12/16 18:07:33 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks 
15/12/16 18:07:33 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 
15/12/16 18:07:46 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks 
15/12/16 18:07:46 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms 
15/12/16 18:07:47 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks 
15/12/16 18:07:47 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 
15/12/16 18:07:58 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks 
15/12/16 18:07:58 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 
15/12/16 18:07:58 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks 
15/12/16 18:07:58 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms 

risposta

0

ShuffleBlockFetcherIterator è un Iterator Scala che recupera più blocchi casuale (aka mischiare le uscite mappa) dal BlockManagers locali e remoti.

Consente di eseguire iterazioni su una sequenza di blocchi come coppie (BlockId, InputStream) in modo che un chiamante possa gestire i blocchi shuffle in modo pipeline non appena vengono ricevuti.

Per prestazioni: è necessario regolare le operazioni; o configs.