Our hacks to forcibly set a class-level path then reuse it in
instances was not working. It seems cleaner to explicitly force
callers to provide the path to the file we are trying to manipulate in
the object, and then that object only handles that path,
explicitly. Not more messing around with load-time guessing that
doesn't respect the environment, which should make testing for issue
We also use composition rather than inheritance in the FeedManager
now. The previous method was ambiguous - the manager takes care of
both configuration, cache and other data points. When we "add" an
entry - what do we add? Making storage member objects instead of
parents makes that explicit, at the cost of being a little more
verbose - but that should be Pythonic.
This lead to all sorts of cleanup: the Feed object doesn't need to be
aware of locking or force, and doesn't handle plugins anymore: it only
parses. That behavior is moved to the FeedManager dispatch command,
which is more logical. This also opens the door to making that parser
more pluggable as well, but it makes the standalone "parse" command a
little less clean: it doesn't work with an empty config anymore, and
indeed, this refactoring makes it impossible to have a FeedManager
that isn't backed by a configuration and database (although you
*could* pass a memory-only database to sqlite and /dev/null as a
config path...)
The problem this peculiar change may raise is abusive coupling between
dispatch and Feed - those two are quite tangled now because they have
been bounced back and forth between the two data structures.
We have *one* convenience shortcut between the Manager and config
storage: the "pattern", because it's actually passed through the
constructor so it seems to me that it makes sense to have that
accessible as a property.
Unfortunately, it's still unclear what that pattern applies to in the
current API from the outside, even though we can read through the code
that it applies only to the config, that seems rather arbitrary.
We also add __repr__ functions here and there to ease debugging,
especially during tests. This also allows us to log objects easily.
Finally, note that plugin *execution* is now done serially when
running in parallel. This is a result of splitting plugin execution
out of the parse function, but may lead to performance
degredation. Then again, it may *eventually* improve performance as we
can execute plugins in parallel with parsing, something we keep for
later for now.
There could be issues with execution ordering between parsing and
plugin execution: hopefully I am reading the code right and order will
be retained when inspecting that `results` array in fetch, but I could
be mistaken.
The main goal of all those changes is to simplify clean up the test
suite. Now the "db" and "conf" paths are coupled together: one cannot
go without the other, and we directly use the FeedManager object in
the fixture. The fixture is also done per-test, which might slow
things down and hide some bugs, but without this, the tests would just
fail and I want to go green before trying to diagnose or clean things
up any further.
Strangely, there is a performance impact for single-process
performance (~600ms slower), but a performance *improvement* in
multiprocessing - I was expecting the opposite (~300ms faster). This
could be considered within the margin of error, however.
Single process
==============
Before
------
In [14]: run -t -N10 -m feed2exec -- fetch -n
IPython CPU timings (estimated):
Total runs performed: 10
Times : Total Per run
User : 36.67 s, 3.67 s.
System : 0.83 s, 0.08 s.
Wall time: 173.24 s.
After
-----
IPython CPU timings (estimated):
Total runs performed: 10
Times : Total Per run
User : 37.30 s, 3.73 s.
System : 0.90 s, 0.09 s.
Wall time: 179.08 s.
Multi-process
=============
Before
------
In [15]: run -t -N10 -m feed2exec -- fetch -n --parallel
IPython CPU timings (estimated):
Total runs performed: 10
Times : Total Per run
User : 7.58 s, 0.76 s.
System : 0.80 s, 0.08 s.
Wall time: 153.56 s.
After
-----
IPython CPU timings (estimated):
Total runs performed: 10
Times : Total Per run
User : 12.03 s, 1.20 s.
System : 0.92 s, 0.09 s.
Wall time: 150.33 s.
↧