Ho un progetto Scrapy e sto cercando di salvare gli elementi di uscita come un oggetto da una definizione di modello di Django (io non sto usando DjangoItem).modello di risparmio Django dal progetto Scrapy
sto importazione delle impostazioni di Django come specificato here.
def setup_django_env(path):
import imp, os
from django.core.management import setup_environ
f, filename, desc = imp.find_module('settings', [path])
project = imp.load_module('settings', f, filename, desc)
setup_environ(project)
setup_django_env(PATH_TO_DJANGO_PROJECT)
Nel mio progetto Scrapy ho una classe oleodotto che elabora tutte le voci, alla fine e lo salva al DB:
from my_django_project.apps.my_books.models import Book, Category, Image
class DjangoPipeline(object):
def process_item(self, item, spider):
category = Category.objects.get(name='Horror')
book = Book(name='something', category=category)
book.save()
image = Image(name='something', book=book)
image.save()
return item
Tuttavia, qualcosa di strano accade e per la prima voce ottengo un errore (vedi sotto). Per il resto degli articoli va tutto bene. Diciamo che ho 7 elementi da salvare, quindi ricevo un errore nel primo e l'altro 6 viene salvato.
Traceback (most recent call last):
File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/scrapy/middleware.py", line 54, in _process_chain
return process_chain(self.methods[methodname], obj, *args)
File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/scrapy/utils/defer.py", line 65, in process_chain
d.callback(input)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/internet/defer.py", line 243, in callback
self._startRunCallbacks(result)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/internet/defer.py", line 312, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/internet/defer.py", line 328, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "https://stackoverflow.com/users/ale/djcode/books/lib/scraper/scraper/djangopipeline.py", line 34, in process_item
selected_category = Category.objects.get(name='Horror')
File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/manager.py", line 132, in get
return self.get_query_set().get(*args, **kwargs)
File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/query.py", line 333, in get
clone = self.filter(*args, **kwargs)
File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/query.py", line 550, in filter
return self._filter_or_exclude(False, *args, **kwargs)
File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/query.py", line 568, in _filter_or_exclude
clone.query.add_q(Q(*args, **kwargs))
File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/sql/query.py", line 1131, in add_q
can_reuse=used_aliases)
File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/sql/query.py", line 1026, in add_filter
negate=negate, process_extras=process_extras)
File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/sql/query.py", line 1182, in setup_joins
field, model, direct, m2m = opts.get_field_by_name(name)
File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/options.py", line 291, in get_field_by_name
cache = self.init_name_map()
File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/options.py", line 321, in init_name_map
for f, model in self.get_all_related_m2m_objects_with_model():
File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/options.py", line 396, in get_all_related_m2m_objects_with_model
cache = self._fill_related_many_to_many_cache()
File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/options.py", line 410, in _fill_related_many_to_many_cache
for klass in get_models():
File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/loading.py", line 167, in get_models
self._populate()
File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/loading.py", line 61, in _populate
self.load_app(app_name, True)
File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/loading.py", line 76, in load_app
app_module = import_module(app_name)
File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/utils/importlib.py", line 35, in import_module
__import__(name)
exceptions.ImportError: No module named my_books
Se faccio qualcosa di simile a questo, tutti i 7 articoli vengono salvati:
from my_django_project.apps.my_app.models import Book, Category, Image
class DjangoPipeline(object):
def process_item(self, item, spider):
try:
category = Category.objects.get(name='something')
except:
category = Category.objects.get(name='something')
book = Book(name='something', category=category)
try:
book.save()
except:
book.save()
image = Image(name='something', book=book)
try:
image.save()
except:
image.save()
return item
non so quello che sto facendo male. Qualcuno mi potrebbe aiutare per favore?
Grazie!
Quando si fa riferimento a my_django_project sono davvero riferimento a tale o la sostituzione che di riferimento con il nome del progetto, come da mysite.apps import *? – emschorsch
Sto sostituendo tale riferimento con il nome del mio progetto :) – Alex
Ciao Alex, sto cercando di fare quello che hai fatto e avere problemi. Sembra che si capito questo fuori quindi speravo che saresti disposto a guardare il mio [domanda] (http://stackoverflow.com/questions/14686223/scrapy-project-cant-find-django-core-management) e offrire consigli. Grazie! – GChorn