Python / SOAP: A First Encounter

Thursday, February 9, 2006 by Scott Karlin

This entry describes my recent foray into creating a Simple Object Access Protocol (SOAP) client. Perhaps some will find the information useful and others can provide some helpful hints in the comments.

The web service I wanted to access exports a method that returns an array of strings each with a particular structure (this will be important later). The details of this service are not important so let’s call the method getList. Since this client would be run from a Linux command line and I wanted to brush-up on my Python programming skills, I went to work looking for an appropriate SOAP library for Python. A search on Google led me to two primary choices, SOAPpy and ZSI, both part of the Python Web Services project. SOAPpy looked to be the easier to use and so I installed SOAPpy version 0.12.0 and started with this bit of code:

from SOAPpy import WSDL
server = WSDL.Proxy("http://blah.blah.blah/blah?wsdl")

server.soapproxy.config.dumpSOAPOut = 1
server.soapproxy.config.dumpSOAPIn = 1

result = server.getList("param1", "param2")

# Process result...

From everything I read, this should have worked. Unfortunately, I was greeted with a “duplicate attribute” exception from the depths of the xml.sax parser:

Traceback (most recent call last):
  File "soaptest.py", line 7, in ?
    result = server.getList("param1", "param2")
  File "/usr/lib/python2.4/site-packages/SOAPpy/Client.py",
  line 470, in __call__
    return self.__r_call(*args, **kw)
  File "/usr/lib/python2.4/site-packages/SOAPpy/Client.py",
  line 492, in __r_call
    self.__hd, self.__ma)
  File "/usr/lib/python2.4/site-packages/SOAPpy/Client.py",
  line 395, in __call
    p, attrs = parseSOAPRPC(r, attrs = 1)
  File "/usr/lib/python2.4/site-packages/SOAPpy/Parser.py",
  line 1049, in parseSOAPRPC
    t = _parseSOAP(xml_str, rules = rules)
  File "/usr/lib/python2.4/site-packages/SOAPpy/Parser.py",
  line 1032, in _parseSOAP
    raise e
xml.sax._exceptions.SAXParseException: :1:514: duplicate attribute

Armed with this traceback and the fact that the WDSL file (when viewed with a browser) indicated that the server is “Apache Axis version 1.2.1,” I searched the web and found these bug reports:

With the SOAP debugging turned on, I could see that the server was returning the expected list of items—the XML parser was simply preventing me from getting at them. As it turned out, there was indeed a duplicated xsi:type="soapenc:Array" attribute in the returned SOAP message. The bug reports above imply that this issue is fixed in Axis 1.3; however, since the particular Axis SOAP servlet I needed to use was part of a bundled system (that I didn’t have control over), upgrading Axis was not an option.

The Workaround

The workaround (actually a “hack”) that I used was to temporarily redirect the SOAP debugging output sent to stdout to an internal buffer. From there, I could leverage the fact that the returned strings had a nice structure to extract them from the buffer using a regular expression:

from SOAPpy import WSDL
import xml.sax
import sys
import re

class Sniff:
   def __init__(self):
      self.b = ""

   def write(self, s):
      self.b = self.b + s

   def flush(self):
      pass

   def buffer(self):
      return self.b

server = WSDL.Proxy("http://blah.blah.blah/blah?wsdl")

server.soapproxy.config.dumpSOAPOut = 1
server.soapproxy.config.dumpSOAPIn = 1

sniff      = Sniff()
sys.stdout = sniff
try:
   result = server.getList("param1", "param2")
except xml.sax.SAXParseException, e:
   pass
sys.stdout = sys.__stdout__

c = re.compile(r">([a-z0-9]+):([^,:< ]+),([^:<]+):([^<]+)<")
result = re.findall(c, sniff.buffer())

# Process result...

I should mention that before I resorted to this hack, I did try using ZSI and even cSOAP (a C library for SOAP). ZSI gave a richer set of controls over the SOAP protocol; however, I found that with so many knobs to fiddle with, I could only elicit from the server a 500 Internal Server Error response at the HTTP layer with a corresponding Server.userException response at the SOAP layer. Even if I could get the outgoing SOAP message to be accepted by the server, I wasn’t sure if I wouldn’t have the same parsing problem as I did with the SOAPpy implementation. With cSOAP, I went straight to my C programming roots. Unfortunately, the installation documentation was a little sparse and I didn’t pursue this very far before the above hack came to mind.

Of course, I would rather not have had to resort to the workaround. My current implementation is more brittle than it should be. For a web service this simple, I would have preferred XML-RPC.

Javascript Lurking in Adobe Reader

Tuesday, January 31, 2006 by Scott Karlin

I’ve been using Adobe Reader 7.0 for Linux for a while now. When I first installed it, I went through the Preferences settings to see what new and interesting features were available. One thing that caught my eye was a setting to enable/disable Acrobat Javascript. Since I couldn’t think of a reason why I would want it enabled, I made sure I disabled it. One of the annoyances of my action is that every time I view a document, I get this request window:

Request Window Grab: Do you want to enable JavaScripts from now on?

No matter how many times I run the program and click “No” the program asks me again. Knowing that I don’t need code to run just to view a PDF, I keep on clicking “No.” It turns out that my instincts were right.

DocBug writes that this represents a privacy hole and PDF files can be crafted to make an Internet connection to their mother ship whenever the file is opened. This appears to be the business model for Remote Approach—they provide a service where companies who publish PDF documents can determine who is reading a document and how the document travels through the Internet. The Remote Approach website includes a white paper titled Remote Approach and User Privacy. The interesting thing is that they seem to collect a fair amount of information. They claim that none of it is personally identifiable; however, as more bits of data accumulate in databases, the IP address of the computer opening the PDF will likely become personally identifiable as Jason Hurley points out. For now, I’ll keep clicking “no” and start reconsidering the other (open source) PDF readers for Linux.

Thanks to Ed Felten for the DocBug link.

OS War Over Laptop Program

Monday, January 30, 2006 by Scott Karlin

A recent New York Times article reports that Microsoft has unveiled a mock-up for a “cellphone PC” that would be used to bring laptop computing to the people of developing countries. While this is a great plan, it is clear that it is in reponse to the brouhaha between Microsoft and Nicholas Negroponte, chairman of the One Laptop per Child (OLPC) initiative, who failed to reach an agreement to use Windows software for these laptop computers. The OLPC initiative will be using the Linux operating system. According to the article, OLPC is using Linux because of its quality and maintainability. Negroponte is reported as saying, “I chose open-source because it’s better. I have 100 million programmers I can rely on.”
Whether or not Microsoft does produce a cellphone PC that is aimed at the same audience as the laptops, this development is probably a good thing for the OLPC initiative. First, it’s free advertising for the OLPC initiative. Second, competition from Microsoft only enhances the legitimacy of the initiative.

Mapping the Internet

Saturday, January 21, 2006 by Scott Karlin

The other day, I attended a talk by Bill “Ches” Cheswick of Lumeta at the Princeton Joint Chapter of the ACM and IEEE Computer Society. If you haven’t seen him give a talk and you have the opportunity to do so, I recommend it. His style is quite entertaining. He co-authored the book Firewalls and Internet Security with Steven Bellovin, and Avi Rubin.

He began the talk by discussing Internet security and then followed with a discussion of his Internet mapping work that eventually led to the creation of the Lucent/Bell Labs spin-off company, Lumeta.

For the Internet security portion of the talk, Ches focused on perimeter defense. He showed lots of fun pictures of examples of perimeter defense—everything from cell membranes to castle moats to The Great Wall of China. He also showed real-life failure modes including breeched levees in New Orleans and a castle that had a large portion blown out when the enemy got in and had ignited the stored gunpowder.

His point, of course, is that a network connection can be thought of as a perimeter in need of a defense. These days, it is prudent to have multiple layers of defense. For example, a home network can be protected from the Internet with a dedicated firewall device and each computer on that home network can be running its own firewall software or be otherwise locked down. Are two layers overkill for a home network? Probably not—especially if you have “users” on your network who download (right through that firewall device) programs of dubious lineage. Once malicious software is running inside your network, it’s every computer for itself.

Now, it is certainly possible to protect a computer with only one layer of defense. In fact, if you think about it, the firewall device itself is a computer with only one layer of defense. This is okay because the folks who build the firewall spend a lot of time making sure it works correctly. If you’re a bit of a thrill-seeker, you can connect your PC directly to the Internet without a dedicated firewall. Bill Cheswick calls this “skinny-dipping on the Internet” because it’s exciting, dangerous, and (for the right crowd) fun. He admitted to skinny-dipping with NetBSD and Linux based machines. I’ll save the details of locking down a Linux machine for another time.

The Internet mapping portion of the talk covered data acquisition, map generation, and interpretation. Data acquisition is fairly straightforward: lightweight traceroute-like packets are trickled out to destinations on the Internet and the return (or lack of return) is noted. Since Ches has been collecting data daily since 1998, he had some interesting stories about poeple complaining/inquiring about these probes. My favorite one-liner from this portion of the talk was: “if you want to be a stealthy mapper, pretend you’re an infected machine.” These days, there are enough virus-infected machines generating background noise that a mapper may be able to hide in the crowd.

As Ches explained, the easy part is collecting the data; “the hardest part is converting the data to information.” The fun begins when you try to visualize it. If you haven’t seen his maps, you can find some here. The basic technique is to start adding nodes to the page and have the nodes that are directly connected attract one another and have the nodes that are not directly connected repel one another. I’m sure there is more to it than that–for example, I’d guess that at close range, all nodes repel one another. It sounds as if knowing how to twiddle the attract/repel knobs to produce pretty pictures must be a bit of an art.

Other than pretty art, the utility of the maps is in their interpretation. Two of the examples he gave were (1) remote assessment of bomb damage and (2) what you can learn when you map your own network.

In the first case, he showed a video of Internet maps of Yugoslavia in May 1999. It was very obvious when communication lines and/or power were affected as large parts of the map would change significantly. It’s interesting to compare these maps with a timeline of the war from NPR.

In the second case, he described the benefit of turning the mapper inward and scanning one’s own network. By showing where probes spill from an intranet to the Internet, one can identify firewalls and gateway routers (normal stuff) as well as other router leaks—machines routing packets when they shouldn’t (at best, a forgotten router; at worst, a hacked machine). He wrapped up by describing host leaks and how to detect them. A host leak is a machine that does not route packets yet is simultaneously connected both inside and outside a firewall (when it should only be connected to one side). These machines represent a latent security fault; break into that machine and you create a router leak that bypasses the firewall. To detect a host leak, he explained that one can send a packet to the target host with a spoofed source address. If the initiating host is on one side of the firewall and the spoofed address points to a host on the other side of the firewall, any replies received by the spoofed machine indicate that the target host is connected to both sides of the firewall.

Ches wrapped up the talk with a Q&A session and gave away 4–5 of his Internet maps. Sadly, I did not get one—maybe next time.

Privacy Implications for iTunes Users

Friday, January 13, 2006 by Scott Karlin

PC Magazine provides some insight into the traffic that is sent back to Apple when users listen to songs in their iTunes collection. This is apparently a due to a new “feature” called “MiniStore.” This allows Apple to place advertisements in your iTunes window. In a follow-up posting, the author, Oliver Kaven, comments on how this is different from the accepted web practice of showing advertisements based on your browsing history while visiting an online store:

I think one significant difference between Barnes&Noble or Amazon and iTunes is the fact that the iTunes provides the additional services even though you are not shopping for songs at the time.

To exaggerate a little, imagine Barnes&Nobel calling you after you picked up a book on Italian cuisine off your nightstand, telling you that they also have books on French cooking available.

In his weblog, Marc Garrett asks this question:

Why shouldn’t the MiniStore feature be opt-in instead of opt-out?

From a privacy point of view, opt-in is a better default. Even if no personal information is retained at Apple, they could change this policy in the future. From a commerce point of view, opt-out is better. Apple wants to have as many folks see their advertisements as possible. Of course, if Apple alienates their users…

Credit Card Information Found in UK Hotel Dumpster

Thursday, January 12, 2006 by Scott Karlin

The BBC reports that a passer-by discovered hotel registration cards in a skip (dumpster) from the Grand Hotel, Brighton, UK. The cards contained signatures, credit card numbers, and contact information. This news item caught my eye as I happened to be there at last October’s SOSP’05 conference. Fortunately for me, I stayed in the hotel next door and didn’t have occasion to use my credit card at the Grand Hotel.

Weblog is Back Online

Wednesday, January 11, 2006 by Scott Karlin

Since there were no postings of consequence (yet!), folks probably didn’t even notice that this weblog was down for a while. My upgrade to WordPress 2.0 went smoothly. The trouble started when I was having a “Monk moment” and started to fiddle with the backend database so that I would eventually reuse the ID numbers for the test posts I deleted.  Naturally, things broke.  Fortunately, I came to my senses and decided that it was okay to not use all the available IDs; there are plently of integers out there.  Even more fortunately, I was able to quickly restore the database using the backup I made as part of the WordPress 2.0 upgrade.


Close
E-mail It