[2000-08-05]

Dejafido

 

Introduction

Note: This project is dead. I have no intention of working on it any more. I don't think the source is worth the effort to revise (it is improbable that it will run under modern versions of PHP without bugs). You are certainly welcome to inherit the source if you want to, just let me know and I'll sign it all over to the GPL and you can take it from there.

I'm a member of the message-exchange network FidoNet, and since early 1999 I've been collecting all incoming mail in national areas (region 20) for archival. So far I've only been saving the raw packets, but those aren't very practical when it comes to searching back for a particular post or creating statistics, so I have decided to export (toss) all mail into a SQL-driven database.

 

My goals

I will summarize my project goals in this list:

 

Implementation notes

Some of you might object to how I have moduled the database. I feel that I must mention that this extreme "atomisation" was a concious design decision; I want the database to be as small as possible. Speed is not very important to me. Not that it is unimportant, it's just a low-priority (and I am after all implementing this in a script language).

 

Current status

Since not everyone will have the patience or even inclination to try this out (and why would you, really?), I took the time to create some screenshots so that you might get a better idea of what this is all about:

Screenshot #1 from 2000-08-05 -- User statistics. (18Kb)
Screenshot #2 from 2000-07-26 -- Result of a search -- all messages of a specific packet, in this case. (21Kb)
Screenshot #3 from 2000-07-28 -- Posting frequency graph -- Snapshot of the newly developed graphing code. (7Kb)
Screenshot #4 from 2000-08-05 -- Address lookup -- Very basic. (15Kb)
Screenshot #5 from 2000-08-05 -- Packet viewer -- Clicking a packet will get you a view like the search result in picture #2 (22Kb)
Screenshot #6 from 2000-08-05 -- Beta message viewer -- Somewhat touched up (color coding of text). Lacks all bells and whistles. (26Kb)

History

2000-08-05: Sat down an 'plinked' a little. Address lookup will now list any connected (i.e: seen) points for nodes, and for points a link will be supplied to quickly go to it's boss node. I've also made the message viewer a little better by inserting line-breaks, and linear browsing of the current area is now functional. Wrote some code for posting to the message-base, but the code is still too young to be of much use. Besides, I must write the exportpkt code too. Not that it is hard, only boring. :-)

2000-08-01: The graph plotting is now accessable from the user statistics screen in the form of a function for plotting posting frequencies for a specific user and area. Furthermore, I was able to double the tossing speed by utilizing indeces in a better way. Apart from the problem of finding the highest messagenum described below, much time were also spent searching the kludges. Tested the code for message deletion for the first time, worked okay. Must look into the whole duplicate message/packet thing, I might be seeing false packet-dupes down the road.

2000-07-29: Began work on the message-viewer. There's a looong way to go yet, but at least you can read the contents of a message, even though it won't be formatted. Gotta fix the character-set problem sometime too, and a million other small things. Right now, I will focus on the message-view first, getting it to display all the data and get it to support all the navigation features I have planned.

2000-07-26/28: Renaming 'tosser' to 'importer'. Now displaying elapsed time (per packet and total) in the importer. Imported massive amounts of data to debug the importer. Made the userlist/search use a table. Began implementing code for plotting graphs from the database, returning PNG images.

2000-07-25: The behaviour of strpos() was apparently changed as of v4.0b3 of PHP, which caused my old code to die horribly when importing. I changed the part that died, and hope there are no more occurences where the change in behaviour breaks my code. Tossing is slooow on my new server (it's only a Pentium 180 or somesuch). Sometime I will have to profile the tossing.

2000-01-19: Had to add a record to the msgs structure to allow for efficient linerar browsing through the messages in an area, so the tosser code was adapted to that. Also, I changed a lot of other fields to allow for better indexing, and this prompted many small updates to the database code where storing NULL was no longer allowed. Only The Big Test - tossing a year of mail - will show if I got everything right.

2000-01-15: Wrote the form for and began implementing the engine for handling 'advanced' queries.

1999-12-22: More statistics added, and a nasty redundant db-connection removed giving a nice increase in responsiveness. Also fixed a nasty SQL-query which were going unindexed through all messages (ouch!). But remember, the db is not finalized yet.

1999-12-16: Added more statistics (a couple of nice Top-lists). Begin to look actually useful :-)

1999-12-15: Added the basic user list and user statistics functionallity.

1999-12-13: Wrote authentication code. It's now possible to register/login/logout

1999-12-09: Began working on the web interface. It's now possible to see some statistics, and also to browse around the list of packets that have been tossed into the database.

1999-12-06: Now registering everything in the database, but still some way till it can stand on it's own. Parsing needs more work, and charset conversion has yet to be implemented. Database layout is not final, but close to.

1999-11-24: The memory-leaking problem seems to have disappeared when I upgraded to the latest beta of the PHP4/MySQL packages, thank IPU. Added registration of all kludges but MSGID/REPLY. Those are to go as fields in the message record instead. I hope you agree with me that splitting it like that makes sense (with almost every message having these, and them always changing). Rewrote some code that started issuing warnings after the upgrade, and also implemented a database based log for information, warning and errors issued by the tosser, which will aid debugging in a production environment. Not much longer now...

1999-11-23: The parsing has been much improved (especially when dealing with broken messages). I've also added registration of paths and seenbys. I have a major problem with memory-leaks, which seems to be coming from MySQL. While this is a problem I must look into, it isn't a showstopper as the memory is reclaimed (it would seem) when MySQL flushes it's tables. Maybe I must configure a upper limit (seems like the database is trying to cache an awful lot of requests, even though I do free them in the code!). Anyway, not much left to do really; Some more tweaking of the parsing code (adding tearlines to the process) and then I must optimize the database code as to minimize redundant accesses (I will cache some of the IDs retrieved, since the IDs will be used when registering the main message data too).

1999-11-17: I've written the code to register packets in the db.

1999-11-16: I've begun working against the database. All addresses found are now registered in the database and even this early interesting statistics can be extracted (how many points in net 206? Which points have been active under host 233?, etc). I've cleaned up the code, divided it into more sub-classes, and I have also written a number of basic functions needed throughout the project.

1999-11-15: I've begun working on the database. Instead of reading the raw ftscprod file as earlier I have now moved that data into the database.

1999-11-14: I've been working on the base functionality of reading Fidonet type-2+ packets and parsing the necessary information from the messages therein. This step is almost finished.

 

Source

Here is a snapshot (including current database specification) of my current code. Please note that this is development code and it might not even run. You'll need MySQL, PHP4 and a webserver to try this.

Parsing type-2+ packets is a delicate thing, and so I will spend some more time creating a good solid fido_packet2.php class, and also a number of miscellaneous routines, specifically for character-set conversion, need to be written. The last stage is to write a HTML/PHP frontend, and possibly a classic fidonet robot for handling incoming queries over fidonet.

 

Feedback

Please, send your feedback to me.

 

©1999 Eddy L O Jansson. All rights reserved. All trademarks acknowledged.