Dogcow Land
NNSync 1.0.6 released and still going
14 Dec 2009 3:27 pm
[ Mood: Excited ][ Currently: Working tar.bzip2'ing the past days' archives ]
NNSync 1.0.6 was developed and tested last evening, while the instance of NNSync mentioned in the earlier post is still running after 4 straight days (including today). The code is definitely getting closer to something that I wouldn't mind releasing to the public. 
While I thought that it would be finished with the initial sync of 43,000 newsgroups by last Friday, it died sometime on Thursday, I didn't get it restarted until later on Friday, and I also underestimated the volume of those remaining 30,000 or so newsgroups.
On two days, the 12th and 13th, NNSync has pulled in an average of 7 GB of posts in over 1.2 Million requests.
Daily stats so far (total articles / total size):
2009-12-08: 936,000 / 4.4 GB
2009-12-09: 187,000 / 1.4 GB
2009-12-10: 554 / 4.5 MB (this is the day it died)
2009-12-11: 187,000 1.4 GB (day I restarted it)
2009-12-12: 1,050,000 / 6.18 GB
2009-12-13: 1,224,000 / 8.34 GB
2009-12-14: 836,000 / 5.26 GB (still in progress, as of now)
NNSync 1.0.6
While NNSync 1.0.6 hasn't been put into main production yet, (that will be done when the currently running version dies or finishes OK) it was tested last evening and has a number of advantages over the previous version.
The first major enhancement is improved efficiency of the logic which looks for new newsgroups from the NNTP server. On an NNTP server with 43,000 groups, this check used to take over 160 seconds, caused 99% load of 1 CPU, and required three loops in the code. Now it can be completed in under 10 seconds with only 1 loop and less than 3% CPU load.
In addition, the NNTP connection manager has had some basic improvements made to handle temporary connection failures. It will pause for 1 second, then attempt to resend a message or re-read a reply if the first attempt failed.
Other changes were made to make NNSync use less memory and less looping while updating the last article for groups, as well as getting the total article count. This was done by splitting the one array which held both group names and group last articles into two separate arrays.
The Trackback URL for this entry is:
http://www.macgui.com/blogs/?mode=trackback&e=346
| Author |
Message |
Dog Cow Mayor
 Joined: 11 Dec 2004 5:20 pm Location: USA
|
Posted: 14 Dec 2009 10:47 pm Subject: |
|
Update: it just finished today at 4:46 PM.
Saved 3618640 msgs to disk in 270497 s.
Time to start up 1.0.6. The funny thing is that in the 3 days that was running, more posts are still coming in, so even this next run will have to play 3 days' worth of catch-up. But then we'll be good.  _________________ Moof!
|
| Back to top |
|
 |
Dog Cow Mayor
 Joined: 11 Dec 2004 5:20 pm Location: USA
|
Posted: 16 Dec 2009 11:38 pm Subject: |
|
NNSync 1.0.7 was just put in to place this morning and has successfully handled 6 batches so far, each running approximately half an hour.
TOTAL statistics since 15 Nov 09:
Size on disk: 46.55 GB (uncompressed)
Total articles: 7,748,870
Compressed size on disk: 4.25 GB
Not bad, considering that there's probably 1 million duplicate articles, and the oldest ones are from late October.  _________________ Moof!
|
| Back to top |
|
 |
Dog Cow Mayor
 Joined: 11 Dec 2004 5:20 pm Location: USA
|
Posted: 20 Jan 2010 6:46 pm Subject: |
|
TOTAL statistics since 20 Jan 10:
Size on disk: 69.91 GB (uncompressed)
Total articles: 11,886,540
Compressed size on disk: 6.57 GB
These totals do not include a parallel archive which has posts dating back to mid-2006 / early 2007 in most newsgroups; that archive is still being built from a secondary newsserver. _________________ Moof!
|
| Back to top |
|
 |
Dog Cow Mayor
 Joined: 11 Dec 2004 5:20 pm Location: USA
|
Posted: 29 Jan 2010 7:31 pm Subject: |
|
TOTAL statistics since 29 Jan 10:
Size on disk: 76.55 GB (uncompressed)
Total articles: 13,067,864
Compressed size on disk: 6.8GB
Again, these stats don't include the parallel archive being built from Big-8 groups on a second newsserver. When that one is finished, its stats will be merged in.
Also note the efficiency of the bzip2 compression program. The archive has grown by about 6 GB, but the compressed size grew only 0.3 GB. _________________ Moof!
|
| Back to top |
|
 |
Dog Cow Mayor
 Joined: 11 Dec 2004 5:20 pm Location: USA
|
Posted: 14 Feb 2010 8:39 pm Subject: |
|
TOTAL statistics since 14 Feb 10:
Size on disk: 87.51 GB (uncompressed)
Total articles: 15,048,807
Compressed size on disk: 8.58 GB _________________ Moof!
|
| Back to top |
|
 |
Dog Cow Mayor
 Joined: 11 Dec 2004 5:20 pm Location: USA
|
Posted: 26 Feb 2010 3:16 am Subject: |
|
TOTAL statistics since 25 Feb 10:
Size on disk: 95.28 GB (uncompressed)
Total articles: 16,423,768
Compressed size on disk: 9.47 GB
The total size and articles took about 25 minutes to compute. This archive currently has about 4 months of posts. _________________ Moof!
|
| Back to top |
|
 |
Dog Cow Mayor
 Joined: 11 Dec 2004 5:20 pm Location: USA
|
Posted: 08 Mar 2010 11:06 pm Subject: |
|
TOTAL statistics since 08 Mar 10:
Size on disk: 102.4 GB (uncompressed)
Total articles: 17,724,691
Compressed size on disk: 10.27 GB _________________ Moof!
|
| Back to top |
|
 |
Dog Cow Mayor
 Joined: 11 Dec 2004 5:20 pm Location: USA
|
Posted: 22 Mar 2010 10:56 pm Subject: |
|
TOTAL statistics since 22 Mar 10:
Size on disk: 111.75 GB (uncompressed)
Total articles: 19,409,347
Compressed size on disk: 11.32 GB
|
| Back to top |
|
 |
Dog Cow Mayor
 Joined: 11 Dec 2004 5:20 pm Location: USA
|
Posted: 11 Apr 2010 8:40 pm Subject: |
|
TOTAL statistics since 11 Apr 10:
Size on disk: 124.85 GB (uncompressed)
Total articles: 21,715,791
Compressed size on disk: 12.75 GB
|
| Back to top |
|
 |
Dog Cow Mayor
 Joined: 11 Dec 2004 5:20 pm Location: USA
|
Posted: 20 Apr 2010 4:21 am Subject: |
|
TOTAL statistics since 19 Apr 10:
Size on disk: 131.23 GB (uncompressed)
Total articles: 22,884,250
Compressed size on disk: 13.34 GB
(these statistics took over 30 minutes to generate)
|
| Back to top |
|
 |
|
|
Town Square
-> Blogs
-> Dogcow Land
-> NNSync 1.0.6 released and still going
|
|
|