Ceaseless Student

Things I learn while living life as per usual

Saturday, June 9, 2007

OlinDocs datbase generation (part the last)

All right so we left off with databases in all of the directories. Now it’s time to put everything together and get it all in its final form. If you haven’t been reading, this is not the place to start; try this instead.

This is all actually fairly easy and short. To combine all of the databases in all of the directories I use this little function.

def combine(path):
_mydb= open(’/…/OlinDocs/docdb.txt’,'w’)

_for location in path:
__tmp=open(location+’/part_db.txt’,'r’)
__mydb.write(tmp.read())
__tmp.close()

_mydb.close()

Since path is a list of all of my locations already this just checks the directories where I just made some partial databases, reads the whole thing and puts it into the main database. Just iterate and enjoy the fun. The rest is actually just manipulating text to put it in the form I described in part two for the javascript so I won’t really go into it. One little trick I will mention, is using str() on a list to get something in that exact format that can be written to a text file.

This gets me my final database! Now i just chuck the database.js online and the index.html and all of the documents on olindocs.com in the home directory. (I won’t push the html on people, but I encourage the interested to look at the source code of olindocs; it’s mostly iteration and is fairly easy to understand.)

posted by boris at 7:48 pm  

Thursday, June 7, 2007

OlinDocs datbase generation (part 2)

OK so back to some Python. Here’s the general setup I have (if you’re just tuning in, you’ll want to start here):

I have every document in a series of directories such that its location is …\semester\class\prof\author.

Single Directory Database

This is mostly what we did yesterday (undersores represent tabbing b/c I seem to be having some trouble putting tabs in right in blogger. Stupid html):

def prelim(location):
_import os

_params = location.split(’/')
_sem=params[-4]
_class_=params[-3]
_prof=params[-2]
_author=params[-1]

__db = open( location+’/part_db.txt’, ‘w’ )

_for fileName in os.listdir (location):
_if fileName==’part_db.txt’:
__continue

_temp = fileName.split(’.')
_file_name = temp[0]
_file_type = temp[1]

_…more stuff…
_db.write(’\t’)
_db.write(’\t’)
_db.write(’{a href=”http://olindocs.com/’)
_db.write(fileName)
_db.write(’”}’)
_db.write(file_type)
_db.write(’{/a}’)
_…more stuff…
_db.close()

path_file=open(’/…/OlinDocs/path.txt’,'r’)
path=path_file.read().replace(’C:’,”).replace(’\\’,'/’).split(’\n’)
for location in path:
_prelim(location)

First it imports the module for looking at directories. Then it gets out all the metadata that we previously embedded in its location. by splitting the location at every /. Then I just assign each portion of the location to the variable it defines. You’ll notice that I use negative indices so that I don’t have to wory about how many directories in I’m looking. This works just as well from C:\\ as it does from C:\\~\~\~\~\~\ (that’s actually a little bit of a lie b/c windows sucks, but we’ll get to that later). I have a little if statement to skip adding part_db.txt to the database; this is a file that I’m using to build up a database and not something that needs to be on OlinDocs. What goes on next is just writing out a text file. I do end up doing dome cool things like having the text file have some html (I have <> in the real program) so that the metadata can include links. Then we close the file and we’re done. We have a database that describes all of the documents in the directory. It’s actually not in its final format because I just wrote the data for each file in a new line instead of doing the one line of names and one line of metadata thing. This is really easy to change later so I’m just keeping this step simple.

Cool. Now all of our directories have a file called part_db.txt that tells us about the documents in it, but it’s in the wrong format and scatter everywhere. This is getting long, so we’ll merge these databases and put them in the right format tomorrow.

posted by boris at 3:42 am  

Wednesday, June 6, 2007

OlinDocs datbase generation (part 1)

All right. I’m finally going to talk a bit more about OlinDocs. (btw, I’m sad that I haven’t gotten, like, anything)

Let’s talk making a database with Python. In order for this site to work, I need to put variables that define the documents and their metadata into an easy to access array. My setup is basically one line gets all the names and one line gets all the meta data. For example:

data_names=['my car','my pen'];
// brand,name,weight
data=[['Honda',''Accord','like a ton'],['Pilot','G2','something in ounces']];

I think I’ll talk about some Python stuff today and put it all together tomorrow. That being said, here’s a link to the text file for my generator. And if you want to get the .py file you can right-click here and save it.

File management in Python:
-First we need to import os. This let’s us use all the other commands that we’ll need.
-Now try
location=’/Documents and Settings/’
for file in os.listdir(location):
print file

Nice. This lets us see files. We can pretty much do anything like copy files, remove files, make directories, rename files etc. but I won’t put that all here; that’s what the internet’s for.

Making
files:
Making text files in python is useful for myriad reasons. For example, they’re persistent (thus useful for saving data) and usable by other programs. The way you make a text file is just by opening it:
new = open(location+’/newfile.txt’,'w’)

The w means open it in write mode. You can also open it in append mode or read mode (a and r respectively).

Now that it’s open we can put stuff in it:
new.write(’Hello World’)

If you need tabs or new lines use \t or \n. If you need a \ you will have to escape that to \\.

If it were open in read mode, we could do new.read() or new.readline() to get a string that has the entire file or the next line of it.

String Tricks
This is likely old hat to a lot of you but strings can be manipulated in a lot of powerful ways. For example we can use replace to find a string and change it with another string by using string.replace(’s1′,’s2′). And there’s also one of my favorite things in Python: string.split(’s3′). This returns a list of items that were separated by some marker (eg comma-separated value files [.csv]). These can be used in-line to give you a lot of firepower for very little real-estate. A cute thing that my program does is
path_file=open(’/Documents and Settings/bdieseldorff/My Documents/OlinDocs/path.txt’,'r’)
path=path_file.read().replace(’C:’,”).replace(’\\’,'/’).split(’\n’)

This baby takes a text document that has a path with all the directories I need to look at in a form that I can copy and paste from Windows explorer and turns it into a list of locations that python understands. -First it reads the whole file
-Then it replaces C: with nothing to leave just \dir\subdir\subsub etc.
-Then it changes all of the \ with / (remember \ is escaped)
-Finally it makes a list of locations with every line break defining a new location in the list.
Pretty neat. So much stuff in so little space. Sweet.

Cool now my program nows where to look and we know how to write stuff in files. Tune in tomorrow for more on how we get from this to a complete database.

In other news I played a little bit of soccer today. I am incredibly out of shape. I’m gonna start running daily (starting tomorrow evening actually).

posted by boris at 9:40 pm  

Thursday, May 24, 2007

OlinDocs Wiki & a little bit of javascript

If nothing else, at least this website has given me a lot to blog about.

I will get to the internals. Really. You’ll get the juicy python and javascript. But. I just added a feature so maybe I’ll babble about that instead. Or wait. I can do some javascript and my new feature. Hoorah!

All right. So I set up a wiki. I set it up to allow people to describe documents. I originally considered putting something directly on OlinDocs in the results, but, honestly, more than 10 results is a ways to scroll without each result being twice as long by virtue of a description paragraph. Also, now other people can do the work instead of me. Yay!

I’m using Netcipia for my Wiki needs. It can host blogs, pics, and of course wikis. You get 2 GB of space, unlimited users can use your wiki for unlimited time, you get a place name (placename.netcipia.net) and you get a decent amount of control over your place and who can see, edit and comment on it. Hot. Right now I just have it fully open; hopefully I don’t have to change that ever. I’m not sure if you have to be an user of netcipia in general to edit… if so I’ll just make an account for general anonymous OlinDocs use.

Despite Netcipia going all buggy yesterday for a couple of hours when I was testing some new code and making me think I was wrong, I decided that I heart it anyways. (That harrowing experience felt something like working on a circuit for hours only to find that the chip your using had been dead). Anyhow, I wrote them for support and they fixed it. It was kinda neat cuz they said you could write in English, Spanish or French; since I know all three to varying degrees, I wrote in all three for them. w00t!

So. Now I had a wiki set up. I used some python magic that I’ll talk about some other day to build myself a database that had links to unique wikis for each document. (This does assume that no two documents with the same name and file extension will coexist, but I’d already made that assumption by putting all the documents in the same directory). So. That rocked. Now all of my documents were linked to wikis that could be used to describe or comment (a different section) on them. Cool.

That’s kinda nice, but I wanted this all to be searchable for the advanced search page. Lucky for me, Netcipia had me covered and all I had to do was point something at olindocs.netcipia.net/xwiki/bin/view/Main/WebSearch?text=$$$ ($$$ represents the search term). Cool.

So. HTML/javascript time: [value="Go" id="wsearch" onclick="window.open(prep_wsearch())" type="button"]. (imagine those are the correct brackets for me ok?) Cool. This makes a button that looks like this:
But what does it do?

Well that’s the javascript part. I have this little text input box next to the button. It has an ID. I can call it’s ID.value to get what’s in the input box. then I join that string with the ‘olindocs.netcipia.net/…/WebSearch?text=’ string and, well, a window is opened at that address. w00t! Feel free to look at the site or its source code: advanced search. There’s lots more stuff in there, but I’ll get to that some other day as this post is getting unwieldy.

posted by boris at 10:17 am  

Wednesday, May 23, 2007

OlinDocs!!

OK.

Here’s my website that I was super-excited about: OlinDocs.

This website is for archiving documents made at Olin in a way that’s easy-to-find. The goal is to have an answer to “Remember that expo poster someone made? I think it was about chaos and music or something…”

So you head over to OlinDocs and you type Expo into the search box, you go to refine and type chaos and you’re good to go. Hot.

So. Obviously I need to get some more documents. Right now it’s Meta plus anything I had handy (read new b/c the rest is on my external at Olin and I’m not for the next two weeks) that was worthwhile. If any of you have stuff that you’d like me to put up, send an email to boris@students. I need two things for document submissions.

1-The document
2-Some meta-info

I’d like the meta-info in a file titled db_params.txt it should follow this example:

Class Name
Semester (eg s07 or f06)
Teacher Names (First Last, First Last)
Author Names (First Last, First Last)

Awesome. I hacked together some neat python scripts to put a database together for me, so it’s fairly painless (I’ll probably post about this and the other mechanics of the site later). Oh yeah. If you’re sending in any group documents, could you please run it by the other people first? Sweet.

Edit: If any of the meta-info doesn’t make sense for your document (eg it wasn’t for any classes), just put it down as - (a dash) and include a note in your e-mail. Thanks!

Oh man. So excited.

Make me happy. Send me stuff. Expo stuff, final deliverables, Capstones, OSS papers… I’m waiting for cool stuff. I might even make the mistake of reading through far more of it than is healthy. Anyways. I’m gonna stop ranting. Sooo excited.

posted by boris at 9:03 am  

Tuesday, May 22, 2007

Website coming…

Oh man. I’m working on this really cool website…

It’s kinda consumed my life a little bit. I’ve worked on it for 6 of the last 20 hours. I was asleep for about 9 of those…

Oh well. It’s gonna rock. I’d really like to share now, but I need some people to give me a thumbs up first…

Sorry for the tease. I’m just really excited and wanted to tell people. Although only kinda. Anyhow, definitely within two days.

posted by boris at 11:28 am  

Powered by WordPress

bdieseldorff