Friday, January 28, 2011

River Waits Patiently

This is a very brief update, but I wanted to make sure everyone knew that I'm still diligently working on the project.

Over the past two weeks, I've had to back off the project and take care of personal matters; however, I've had the time to dive back in the past couple days.

While it's not complete yet, I have replicated most of River's functionality again -- only this time with a simple GUI and in the Qt framework (gibberish to most people I know, sorry about that).

In short, River can once again control Pandora, Google Chrome, my computer's Sound, and she can look up information on IMDB.

I have spent much of the day today waiting patiently for two things. In order to have River access Twitter again, I have to get special permission from Twitter... which takes a little while to get (ironic, since the whole point of Twitter is to quickly communicate).

The second thing I'm waiting on is more technical -- in order to access Facebook (and Twitter), I have to add in support for "https" webpages -- secure webpages. This requires OpenSSL; which in short, means I have to "rebuild" the Qt Framework, which takes about 4 hours to complete. As soon as that finishes, I'll continue to integrate Facebook; and then I'll continue with Twitter once they get back to me.

I'll update more once I'm caught up past the point where I was before I "started over" using Qt.

Monday, January 17, 2011

River gets Rebuilt

I wanted to provide a quick update for anyone following the progress of Project River.
I'm still hard at work on her in most of my free time!

What's Been Accomplished

-------------------------------
Twitter Integration (read tweets, not write them yet)
Facebook Integration (read news feed, not write to it yet)


Currently in Progress

-------------------------------
I am currently in the process of porting River over to the QT framework. This lets me easily create a graphical user interface for River. So far I have her quietly running in the task bar with a nice little icon.


Some benefits of porting to the QT Framework

-------------------------------

  1. Easy use of Windows Task Bar and an actual window.
  2. Everything except the Speech component will be Cross-Platform (work on Windows/Mac/Linux)
  3. Everything will be Unicode -- so it will work with Twitter and Facebook much better.
  4. A lot less "dependencies".
    1. Before I was using all sorts of libraries that would have to be installed on the system. Now, it'll just be QT stuff.
  5. A lot less licensing issues because of #4.
    1. If Project River is a success and I wish to commercialize her, I'll only have to buy the commercial license for QT.
    2. Before, most of the libraries I was using wouldn't have allowed to me to ever sell River.

Next Steps

------------------------------
I'm a long way from being done with my rebuild of River. There's a lot of re-writing of code and a lot of new things to learn (I know how to do them, but not how to do them in the QT Framework).

Specifically, I'm focused on getting the Speech engine to work properly. This is a big jump, because while I'm still using Microsoft Speech, I'm using a very different method of "listening". It will work better once I have it completed -- it won't "hear" better, but the program will work better.

Friday, January 14, 2011

River Goes to the Movies

I've done tons of work on River in the past few days. I spent very long hours learning about regex and strings in C++, and even longer hours applying that knowledge, all just to get basic functionality. The good news is, I can now program similar tasks very quick, since I finally understand it all.


IMDB Integration

River can now use IMDB to get info about Movies and TV Shows.

Sample Commands:
River, what year did the movie AI come out?
River, when was the movie I, Robot released?
What is the movie The Matrix about?
(In context) When did it come out?
(In context) What is the full plot summary?


One of the features that makes her seem the most "real" is the ability to pull up information in context. She remembers what movie I last asked about, so if I ask more information, she is able to immediately tell me the information I want, without having me repeat the name of the movie. It's still a little slower than I'd like, but I believe I can eventually speed that up by parsing the page using only exact string searches rather than regex. For now however, regex is significantly less code, and much faster to program. Since everything is very modular, I can easily go back and update the class in charge of parsing the information to make it perform more efficiently.


New Discoveries

As I was playing around with the IMDB features, I realized just how important it is that I implement unicode support fairly soon. The web is full of unicode characters, and until I make all the necessary changes to River, she's going to have problems with them. She won't crash when she encounters them, but she does read the latin1 encoding characters off as letters.


Next Steps

This weekend I hope to make progress on implementing an ongoing speech recognition training ability. If I say something and River hears me incorrectly, I'll say "Train Speech", and then type what it was I said. She'll then use the recorded audio along with the text to improve her hearing capabilities.

I intend to program the speech recognition training feature slowly. I expect it to have a long learning curve for me, and while it will improve the program significantly, it's not exactly "fun" to work on. As a result, I am also focusing on other simpler, but more fun features. I'll be adding to the IMDB functions, as well as hopefully adding Twitter, Facebook, and Google News integration.


Demonstration Video

You can't see River (there's nothing to see but debug statements), but you can hear her.

Wednesday, January 12, 2011

River Connected to the Ocean

I've learned a LOT in the past 24 hours, and River has a new key skill as well -- she can browse the internet.

The Language Debate

Yesterday started with me debating about switching the programming language I'm using. I considered switching to Python, because development would be considerably faster in some areas (such as internet connectivity). Python has so many freely available libraries that much of my work would be done for me as I progressed. However, after spending several hours getting Python set up and beginning to recreate Project River, I discovered that I was faced with a key limitation.

In the 64 bit versions of Windows 7 (and Vista), the Microsoft Speech Engine has some serious quirks. By default, you can only use 64 bit "voices" for the text-to-speech component. This is a significant problem, because the best voices out there right now are 32 bit. In fact, you're hard pressed to find any 64 bit voices, unless you're shelling out around $1,000.

In C++, I am able to skirt around this implementation, since the method for getting and using the voices on the system ignores the 64 bit requirement. However, in Python (as well as C# and VB), there is no way around this limitation.

Since the believability of the AI is so important to me, and I feel that the voice plays a critical role in this, I decided that if for no other reason, I need to stick with C++ for now.

Implementation of libcURL

It took me a considerable amount of time, but I finally managed to get libcURL compiled properly, and River can now successfully make cURL calls to browse internet pages on the fly.

In the process of trying to set this up, I learned a great deal about static libraries, including both how to create one, and how to use one in my project. As a result, I completely restructured River's file system, so that she follows a much more standardized pattern, with all external headers and library files contained in a single location. I also took a few key classes that I was using and I knew I wasn't likely to change again, and I made them into external libraries.


Overall River has come very far in the past 24 hours. I am confident I chose the correct programming language for the project, and I have significantly improved my knowledge of Visual Studio 2010 as well as some of the best practices in C++.


Next Steps

Getting libcURL set up was merely a precursor to actually using it. Now comes the tedious work of writing and/or implementing a few APIs for her to be able to interact with key websites.

The next large task of setting up constant ongoing training of the Speech Recognition engine is still on the table, but I intend to save that for a weekend, as I believe it will be a very large and time consuming task.

Tuesday, January 11, 2011

Project River is Not Alone

River and I have already discovered alien life!

Thanks to a tip from a coworker today, I discovered Project Jarvis -- created ironically by someone who reportedly works (or worked) for the very same company that I do.

Project Jarvis is about a year old, and already does pretty much everything that I eventually want Project River to do.

Unfortunately Project Jarvis is written in AppleScript and is therefore restricted to running on a Mac. Because I use Windows exclusively (for a variety of reasons), I am unable to to put Project Jarvis to use -- besides the fact that it is not released for others to use and is instead (like River), a personal project of the creator.

The creator of Project Jarvis is creating his first commercial product that will be cross-platform, but I have a hunch this will not be open source, and as a result, not allow me to expand and build my own modules.

I have scoured the internet today for other "Digital Life Assistants" (as Jarvis is dubbed). Everything I found was either abandoned while still in it's infancy, or written in an obscure language and not open source (such as this DOS version of J.A.R.V.I.S).

Because I've failed to find any sufficient open-source implementation of something equivalent to Project River, I have decided to continue on developing my project. Because I've found nothing out there, I think I will put up a Sourceforge or Google Code page at some point in the near future. The only thing holding me back from doing so is fear of embarrassment. Because I am still a novice C++ programmer, I fear that my code would only draw criticism and be unusable by anyone else working towards the same goal. For that reason, I will hold off until I get a more advanced grasp on the language, and have a better codebase to share.

However, if anyone would like to collaborate, and knows some C++ and has no quarrels working with someone who is learning as they go, then please feel free to contact me! I would love some help on River, or to share ideas and resources if you are working on something similar.

Monday, January 10, 2011

Project River Speaks Gibberish!

Success! After many hours of work, I have succeeded in integrating the Rebecca AIML chatbot engine into Project River. I can now carry on conversations with River, which already makes her feel significantly more useful and complete.

Because the speech recognition is so buggy (I need to develop a good system to training it), some funny conversations have already taken place. She constantly mishears what I say, so most of our conversations jump from topic to topic even more than the typical conversation with a chatbot would.

The most interesting conversation I have had so far (entirely on accident) went as follows:


Me: Who is the President of the United States?
River: Barack Obama
Me: Is he a good president? (she heard: Is he a woman?)
River: No.


This was an eerie situation because it made it seem as if River had her own opinions about the state of the nation without me having taught her any.


I'm using the voice Bridget from NeoSpeech to give River her voice. As a result, she sounds incredibly life like -- the British accent helps her sound more natural. I've noticed that even when my conversations with River are completely ridiculous, it is still believable that she is intelligent, because she speaks so naturally. From this I conclude that using a believable voice interface greatly increases the believability of any artificial intelligence system. I also believe that the absence of any animation or physical form also helps increase believability -- there's no "uncanny valley" affect taking place, nor is there any poorly animated avatar to detract from the immersion.



Next Steps
---------------

The next major task I will focus on will be to improve the speech recognition by programming in a training method. If River doesn't hear me properly, I will tell her "That's not what I said.", which will then prompt her to ask me to type exactly what I said -- she will save both items, and use them to train herself to hear better in the background.

My next small task I'll focus on will be to implement a web crawler class. This will let me start to implement functionality such as looking up the release date of a movie or what television shows are airing on a given day.

Sunday, January 9, 2011

Project River Tutorials Index

I realized a few moments ago that one of the tasks I'm spending the most time on is simply finding tutorials and documentation on the various features and APIs that I'm wanting to use in C++.

In order to help other's in similar quests, and to provide a lasting resource for me to revisit, I'm going to use this post as an ongoing record of the best tutorials I've found for a wide variety of C++ topics. The list will likely be short at first, but I"ll add to it every time I find a new one that I recommend.


Microsoft Speech API 5.4
-----------------------------------
Speech API Overview (SAPI 5.4) - This provides an excellent overview of the SAPI 5.4 API. Definitely a great place to start if you're just jumping into SAPI.

Text-to-Speech Tutorial (SAPI 5.4) - The absolute best "Hello World" tutorial out there for SAPI in C++. I wish I would have found this several days ago when I was trying to learn! This is an official Microsoft tutorial.



Sending Key Presses in C++
----------------------------------
SendKeys in C++ - This is more of a resource / C++ class wrapper, but it explains how to use it as well. Doing keypresses in C++ without this is a nightmare. Many thanks to the creator of this (Elias Bachaalany). Without this I would have been lost in the dark.


Connecting to SQL Server 2008 from C++ (using ODBC)
------------------------------------------
Tidy Tutorials - This tutorial is fantastic. Should tell you everything you need to know. One addition to this: make sure your Visual Studio project is set to Multi-Byte. If it's set to Unicode you'll have all kinds of errors. Took me forever to figure that out.


Chatbot Programming in C++
------------------------------------------
Artificial Intelligence Chatbot Tutorial - While I didn't use this tutorial as my code base for my Chatbot, it is a fantastic introduction to how Chatbots work and it really builds your understanding well.


Qt for Visual Studio 2010
------------------------------------------
Installing QT 4.7.x for VS2010 - It doesn't look like I'll end up using Qt just yet (using libcurl instead to accomplish what I need), this is by far the only decent tutorial for the advanced methods needed to get Qt to work with Visual Studio 2010. (HEADS UP: The "nmake" command took about 90 minutes to complete  on my 64bit 2.8GHZ Dual Core machine w/ 4GB RAM)


libcURL for Visual Studio 2010
------------------------------------------
Using libcurl with SSH support in Visual Studio 2008 - I spent HOURS trying to figure out how to get libcURL to play nice with Windows. Ironically in the end, I found the tutorial via a link off of the curl website. That'll teach me to trust Google over the actual webpage.

Saturday, January 8, 2011

Paddling Upstream: Project River Update

Status Update:
River can now help control both Pandora radio and the Google Chrome internet browser. It's very jury-rigged and while it works, it's quickly taught me about the areas where I need to focus. River can also connect to a SQL database and easily make use of the C++ regular expression engine.

Immediate Areas of Improvement:
Voice Recognition is the most obvious part of River that needs improvement. While she recognizes what I say with about 75% accuracy so far (which I'm happy with), the current implementation prevents me from using a whole slew of keywords. For the Google Chrome integration I wanted to be able to say "New Tab", and have a new tab open. But the way I'm using it right now, Windows has the "tab" keyword reserved for the Tab keyboard button, so it doesn't work. I believe there's a way around this, and that's my #1 focus.

Interacting with other application also needs to improve. I'm currently just sending keystrokes, which is fine and probably what I'll have to do for most applications, but I need to figure out how to specifically send keystrokes to specific applications, rather than having to switch to them all the time.



Chatbot Updates:
I'm going to start experimenting with the Rebecca AIML engine, as I believe it'll be a great launching point for me.


Most Recent Idea:
This "Spokeo.com" message thing going around on Facebook just gave me an idea. While the thing is far from being 100% accurate, it would be a great tool for my Project River AI to use to learn about people. When someone new comes over, the AI could do facial recognition, ask their name, and ask where they live. Then she'd be able to find them on Spokeo, and quickly be able to ask them about their family members and hobbies.


New Resources I've Found and Intend on Using:

1) Home Automation: Z-Wave protocol and Vera 2:http://micasaverde.com/vera.php
2) Home Automation: Arduino: http://www.arduino.cc/
3) Chatbot: Rebecca AIML: http://rebecca-aiml.sourceforge.net/
4) Chatbot: WordNet: http://wordnet.princeton.edu/ and http://goo.gl/i9QVB
5) Pandora Control: Pianobar: http://goo.gl/Iw9Td

Friday, January 7, 2011

Commencing Project River

A few days ago I embarked on a new project, something that's long been a dream of mine.

Project River (as I've dubbed "her" for now), is intended to eventually be a complete AI Smart House system, controlling everything from appliances to music to recording television shows and ordering Chinese food.

However, while that's the end goal, I've decided to take very small baby steps for now. I'm programming her in C++, and every feature I add requires me to learn about a new library or function. Thanks to many hours of reading articles on the Internet, I've already made a little bit of progress.

What River can do so far:
1. Speak text allowed.
2. Respond to vocal commands such as "Turn the music up"
3. Access a SQL database
4. Control Pandora by sending keystroke commands to Windows.


Yup, when I said "baby steps", I meant it! But she's coming along pretty quickly in my eyes, considering that she's already useful. It's very convenient to be able to say "I love this song", and have Project River say "I like this song too", and give the song a "thumbs up" in Pandora.


So far I've found to great communities that I plan on tapping for ideas and knowledge throughout what looks to be a very long development cycle. Project Leaf, while written in Lisp, has a bunch of great ideas built into it, and is home to a number of very smart robotics amateurs. I hope to contribute to their project in the future. The other great resource I've already found useful is Chatbots.org. Many very smart Artificial Intelligence hobbyists hang out here, and have already given me a handful of great ideas and tools to use.


I intend to make daily or semi-daily updates as to the status of the project.

I realize this blog post is somewhat short compared to most blogs, but alas, I am a man of few words. I figure it's better to start making short blog posts, rather than never make any because the thought of writing a long one stresses me out.


Until next time... Take me out into the black, tell them I'm ain't coming back.