润物无声 » Python

Python开发之性能优化

润物无声 — Sat, 31 Aug 2013 02:45:21 +0000

不论什么语言的开发，都会涉及到下面的几个问题，代码的运行时间，性能出现的瓶颈，内存的占用情况，是否有内存的泄露等等，同样Python也不例外，本文介绍了几个优秀的Python工具，可以实现如上所述问题的检测机制，通过动态运行跟踪代码的执行情况，从而发现问题点。

While it’s not always the case that every Python program you write will require a rigorous performance analysis, it is reassuring to know that there are a wide variety of tools in Python’s ecosystem that one can turn to when the time arises.

Analyzing a program’s performance boils down to answering 4 basic questions:

How fast is it running?
Where are the speed bottlenecks?
How much memory is it using?
Where is memory leaking?

Below, we’ll dive into the details of answering these questions using some awesome tools.

Coarse grain timing with time（代码执行时间）

Let’s begin by using a quick and dirty method of timing our code: the good old unix utilitytime.

$ time python yourprogram.py

real    0m1.028s
user    0m0.001s
sys     0m0.003s

The meaning between the three output measurements are detailed in this stackoverflow article, but in short

real - refers to the actual elasped time
user - refers to the amount of cpu time spent outside of kernel
sys - refers to the amount of cpu time spent inside kernel specific functions

You can get a sense of how many cpu cycles your program used up regardless of other programs running on the system by adding together the sys and user times.

If the sum of sys and user times is much less than real time, then you can guess that most your program’s performance issues are most likely related to IO waits.

Fine grain timing with a timing context manager

Our next technique involves direct instrumentation of the code to get access to finer grain timing information. Here’s a small snippet I’ve found invaluable for making ad-hoc timing measurements:

timer.py

import time

class Timer(object):
    def __init__(self, verbose=False):
        self.verbose = verbose

    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        self.end = time.time()
        self.secs = self.end - self.start
        self.msecs = self.secs * 1000  # millisecs
        if self.verbose:
            print 'elapsed time: %f ms' % self.msecs

In order to use it, wrap blocks of code that you want to time with Python’s with keyword and this Timer context manager. It will take care of starting the timer when your code block begins execution and stopping the timer when your code block ends.

Here’s an example use of the snippet:

from timer import Timer
from redis import Redis
rdb = Redis()

with Timer() as t:
    rdb.lpush("foo", "bar")
print "=> elasped lpush: %s s" % t.secs

with Timer as t:
    rdb.lpop("foo")
print "=> elasped lpop: %s s" % t.secs

I’ll often log the outputs of these timers to a file in order to see how my program’s performance evolves over time.

Line-by-line timing and execution frequency with a profiler（代码行运行时间分析）

Robert Kern has a nice project called line_profiler which I often use to see how fast and how often each line of code is running in my scripts.

To use it, you’ll need to install the python package via pip:

$ pip install line_profiler

Once installed you’ll have access to a new module called “line_profiler” as well as an executable script “kernprof.py”.

To use this tool, first modify your source code by decorating the function you want to measure with the @profile decorator. Don’t worry, you don’t have to import anyting in order to use this decorator. The kernprof.py script automatically injects it into your script’s runtime during execution.

primes.py

@profile
def primes(n): 
    if n==2:
        return [2]
    elif n<2:
        return []
    s=range(3,n+1,2)
    mroot = n ** 0.5
    half=(n+1)/2-1
    i=0
    m=3
    while m <= mroot:
        if s[i]:
            j=(m*m-3)/2
            s[j]=0
            while j
Once you’ve gotten your code setup with the @profile decorator, use kernprof.py to run your script.
$ kernprof.py -l -v fib.py
The -l option tells kernprof to inject the @profile decorator into your script’s builtins, and -v tells kernprof to display timing information once you’re script finishes. Here’s one the output should look like for the above script:
Wrote profile results to primes.py.lprof
Timer unit: 1e-06 s

File: primes.py
Function: primes at line 2
Total time: 0.00019 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     2                                           @profile
     3                                           def primes(n): 
     4         1            2      2.0      1.1      if n==2:
     5                                                   return [2]
     6         1            1      1.0      0.5      elif n<2:
     7                                                   return []
     8         1            4      4.0      2.1      s=range(3,n+1,2)
     9         1           10     10.0      5.3      mroot = n ** 0.5
    10         1            2      2.0      1.1      half=(n+1)/2-1
    11         1            1      1.0      0.5      i=0
    12         1            1      1.0      0.5      m=3
    13         5            7      1.4      3.7      while m <= mroot:
    14         4            4      1.0      2.1          if s[i]:
    15         3            4      1.3      2.1              j=(m*m-3)/2
    16         3            4      1.3      2.1              s[j]=0
    17        31           31      1.0     16.3              while j
Look for lines with a high amount of hits or a high time interval. These are the areas where optimizations can yield the greatest improvements.
How much memory does it use?（内存占用）
Now that we have a good grasp on timing our code, let’s move on to figuring out how much memory our programs are using. Fortunately for us, Fabian Pedregosa has implemented a nicememory profiler modeled after Robert Kern’s line_profiler.
First install it via pip:
$ pip install -U memory_profiler
$ pip install psutil
(Installing the psutil package here is recommended because it greatly improves the performance of the memory_profiler).
Like line_profiler, memory_profiler requires that you decorate your function of interest with an@profile decorator like so:
@profile
def primes(n): 
    ...
    ...
To see how much memory your function uses run the following:
$ python -m memory_profiler primes.py
You should see output that looks like this once your program exits:
Filename: primes.py

Line #    Mem usage  Increment   Line Contents
==============================================
     2                           @profile
     3    7.9219 MB  0.0000 MB   def primes(n): 
     4    7.9219 MB  0.0000 MB       if n==2:
     5                                   return [2]
     6    7.9219 MB  0.0000 MB       elif n<2:
     7                                   return []
     8    7.9219 MB  0.0000 MB       s=range(3,n+1,2)
     9    7.9258 MB  0.0039 MB       mroot = n ** 0.5
    10    7.9258 MB  0.0000 MB       half=(n+1)/2-1
    11    7.9258 MB  0.0000 MB       i=0
    12    7.9258 MB  0.0000 MB       m=3
    13    7.9297 MB  0.0039 MB       while m <= mroot:
    14    7.9297 MB  0.0000 MB           if s[i]:
    15    7.9297 MB  0.0000 MB               j=(m*m-3)/2
    16    7.9258 MB -0.0039 MB               s[j]=0
    17    7.9297 MB  0.0039 MB               while j
Where’s the memory leak?（内存泄露）
The cPython interpreter uses reference counting as it’s main method of keeping track of memory. This means that every object contains a counter, which is incremented when a reference to the object is stored somewhere, and decremented when a reference to it is deleted. When the counter reaches zero, the cPython interpreter knows that the object is no longer in use so it deletes the object and deallocates the occupied memory.
A memory leak can often occur in your program if references to objects are held even though the object is no longer in use.
The quickest way to find these “memory leaks” is to use an awesome tool called objgraphwritten by Marius Gedminas. This tool allows you to see the number of objects in memory and also locate all the different places in your code that hold references to these objects.
To get started, first install objgraph:
pip install objgraph
Once you have this tool installed, insert into your code a statement to invoke the debugger:
import pdb; pdb.set_trace()
Which objects are the most common?
At run time, you can inspect the top 20 most prevalent objects in your program by running:
(pdb) import objgraph
(pdb) objgraph.show_most_common_types()

MyBigFatObject             20000
tuple                      16938
function                   4310
dict                       2790
wrapper_descriptor         1181
builtin_function_or_method 934
weakref                    764
list                       634
method_descriptor          507
getset_descriptor          451
type                       439
Which objects have been added or deleted?
We can also see which objects have been added or deleted between two points in time:
(pdb) import objgraph
(pdb) objgraph.show_growth()
.
.
.
(pdb) objgraph.show_growth()   # this only shows objects that has been added or deleted since last show_growth() call

traceback                4        +2
KeyboardInterrupt        1        +1
frame                   24        +1
list                   667        +1
tuple                16969        +1
What is referencing this leaky object?（资源引用计数）
Continuing down this route, we can also see where references to any given object is being held. Let’s take as an example the simple program below:
x = [1]
y = [x, [x], {"a":x}]
import pdb; pdb.set_trace()
To see what is holding a reference to the variable x, run the objgraph.show_backref()function:
(pdb) import objgraph
(pdb) objgraph.show_backref([x], filename="/tmp/backrefs.png")
The output of that command should be a PNG image stored at /tmp/backrefs.png and it should look something like this:

The box at the bottom with red lettering is our object of interest. We can see that it’s referenced by the symbol x once and by the list y three times. If x is the object causing a memory leak, we can use this method to see why it’s not automatically being deallocated by tracking down all of its references.
So to review, objgraph allows us to:

show the top N objects occupying our python program’s memory
show what objects have been deleted or added over a period of time
show all references to a given object in our script

Effort vs precision
In this post, I’ve shown you how to use several tools to analyze a python program’s performance. Armed with these tools and techniques you should have all the information required to track down most memory leaks as well as identify speed bottlenecks in a Python program.
As with many other topics, running a performance analysis means balancing the tradeoffs between effort and precision. When in doubt, implement the simplest solution that will suit your current needs.
Refrences

stack overflow - time explained
line_profiler
memory_profiler
objgraph

文章参考：http://www.huyng.com/posts/python-performance-analysis/
Python开发之性能优化，首发于润物无声。

搭建基于Python的视频服务器

润物无声 — Sat, 24 Aug 2013 02:57:27 +0000

本文基于Python后台搭建了一个视频服务器，集视频的上传，视频的格式转化，视频的播放于一体，后台基于Django框架和Amazon S3的存储，视频格式的转换基于Encoding.com的在线服务，消息队列基于RabbitMQ，视频上传和转换处理完毕后，采用浏览器的Html5播放，采用了Video.js。

Stickyworld's consultation web app has supported video for a long time but it's been hosted via a YouTube embed. When we started building the new version of the web app we wanted to take control of the video content and also free our users from YouTube's terms of service.

I personally had worked on projects with clients in the past which did video transcoding and it was never something easy to achieve. It takes a lot to accept every video, audio and container format under the sun and output them into various video formats that the web knows and loves.

With that in mind we decided the conversion process would be handled byEncoding.com. They let you encode your first Gigabyte of video with them for free and then have a tiered pricing system there after.

Throughout the development of the code below I would upload a two-second, 178KB video to test that everything was working. When the exceptions stopped being raised, I tested larger and more exotic files.

Stage 1: The User uploads a video

At the moment the new codebase just has a quick-and-dirty HTML5-based uploading mechanism. This is the CoffeeScript for uploading from the client to the server:

$scope.upload_slide = (upload_slide_form) ->
    file = document.getElementById("slide_file").files[0]
    reader = new FileReader()
    reader.readAsDataURL file
    reader.onload = (event) ->
      result = event.target.result
      fileName = document.getElementById("slide_file").files[0].name
      $.post "/world/upload_slide",
        data: result
        name: fileName
        room_id: $scope.room.id
        (response_data) ->
          if response_data.success? is not yes
            console.error "There was an error uploading the file", response_data
          else
            console.log "Upload successful", response_data
    reader.onloadstart = ->
      console.log "onloadstart"
    reader.onprogress = (event) ->
      console.log "onprogress", event.total, event.loaded, (event.loaded / event.total) * 100
    reader.onabort = ->
      console.log "onabort"
    reader.onerror = ->
      console.log "onerror"
    reader.onloadend = (event) ->
      console.log "onloadend", event

It would be nice to loop through ("slide_file").files and upload each file individually instead of just the first file. This is somewhere we'll be addressing soon.

Stage 2: Validate and upload to S3

On the backend we're running Django and RabbitMQ. The key modules we're using are:

$ pip install 'Django>=1.5.2' 'django-celery>=3.0.21' \
    'django-storages>=1.1.8' 'lxml>=3.2.3' 'python-magic>=0.4.3'

I created two models: SlideUploadQueue to store references to all the uploads and SlideVideoMedia to store all the references to the processed videos.

class SlideUploadQueue(models.Model):
    created_by = models.ForeignKey(User)
    created_time = models.DateTimeField(db_index=True)
    original_file = models.FileField(
        upload_to=filename_sanitiser, blank=True, default='')
    media_type = models.ForeignKey(MediaType)
    encoding_com_tracking_code = models.CharField(
        default='', max_length=24, blank=True)

    STATUS_AWAITING_DATA = 0
    STATUS_AWAITING_PROCESSING = 1
    STATUS_PROCESSING = 2
    STATUS_AWAITING_3RD_PARTY_PROCESSING = 5
    STATUS_FINISHED = 3
    STATUS_FAILED = 4

    STATUS_LIST = (
        (STATUS_AWAITING_DATA, 'Awaiting Data'),
        (STATUS_AWAITING_PROCESSING, 'Awaiting processing'),
        (STATUS_PROCESSING, 'Processing'),
        (STATUS_AWAITING_3RD_PARTY_PROCESSING,
            'Awaiting 3rd-party processing'),
        (STATUS_FINISHED, 'Finished'),
        (STATUS_FAILED, 'Failed'),
    )

    status = models.PositiveSmallIntegerField(
        default=STATUS_AWAITING_DATA, choices=STATUS_LIST)

    class Meta:
        verbose_name = 'Slide'
        verbose_name_plural = 'Slide upload queue'

    def save(self, *args, **kwargs):
        if not self.created_time:
            self.created_time = \
                datetime.utcnow().replace(tzinfo=pytz.utc)

        return super(SlideUploadQueue, self).save(*args, **kwargs)

    def __unicode__(self):
        if self.id is None:
            return 'new '
        return ' %d' % self.id

class SlideVideoMedia(models.Model):
    converted_file = models.FileField(
        upload_to=filename_sanitiser, blank=True, default='')

    FORMAT_MP4 = 0
    FORMAT_WEBM = 1
    FORMAT_OGG = 2
    FORMAT_FL9 = 3
    FORMAT_THUMB = 4

    supported_formats = (
        (FORMAT_MP4, 'MPEG 4'),
        (FORMAT_WEBM, 'WebM'),
        (FORMAT_OGG, 'OGG'),
        (FORMAT_FL9, 'Flash 9 Video'),
        (FORMAT_THUMB, 'Thumbnail'),
    )

    mime_types = (
        (FORMAT_MP4, 'video/mp4'),
        (FORMAT_WEBM, 'video/webm'),
        (FORMAT_OGG, 'video/ogg'),
        (FORMAT_FL9, 'video/mp4'),
        (FORMAT_THUMB, 'image/jpeg'),
    )

    format = models.PositiveSmallIntegerField(
        default=FORMAT_MP4, choices=supported_formats)

    class Meta:
        verbose_name = 'Slide video'
        verbose_name_plural = 'Slide videos'

    def __unicode__(self):
        if self.id is None:
            return 'new '
        return ' %d' % self.id

Our models use a filename_sanitiser method in each models.FileField field to automatically adjust filenames into a /.format. This sanitises each filename and makes sure they're unique. To add to that, we used signed URLs that expire so we can control who is served our content and for how long.

def filename_sanitiser(instance, filename):
    folder = instance.__class__.__name__.lower()
    ext = 'jpg'

    if '.' in filename:
        t_ext = filename.split('.')[-1].strip().lower()

        if t_ext != '':
            ext = t_ext

    return '%s/%s.%s' % (folder, str(uuid.uuid4()), ext)

A file uploaded as testing.mov would turn into https://our-bucket.s3.amazonaws.com/slideuploadqueue/3fe27193-e87f-4244-9aa2-66409f70ebd3.mov and would be uploaded by Django Storages.

In our backend endpoint where video is uploaded to from the browser we validate the uploaded content using Magic. This detects what kind of file it is based on it's contents:

@verify_auth_token
@return_json
def upload_slide(request):
    file_data = request.POST.get('data', '')
    file_data = base64.b64decode(file_data.split(';base64,')[1])
    description = magic.from_buffer(file_data)

So if description matches something like MPEG v4 system or Apple QuickTime movie then we know it's suitable for transcoding. If it isn't something like the above, we can flag it up with the user.

Next, we'll save the video into a our SlideUploadQueue model and send a job off to RabbitMQ. Because we're using Django Storages, it'll be uploaded to Amazon S3 automatically.

slide_upload = SlideUploadQueue()
...
slide_upload.status = SlideUploadQueue.STATUS_AWAITING_PROCESSING
slide_upload.save()
slide_upload.original_file.\
    save('anything.%s' % file_ext, ContentFile(file_data))
slide_upload.save()

task = ConvertRawSlideToSlide()
task.delay(slide_upload)

Stage 3: Send the video to a 3rd party

RabbitMQ will take over and handle the task.delay(slide_upload) call.

Here all we're doing is sending Encoding.com a URL of our video and instructions on what output formats we want. They'll give us a job number in return which we'll use to check up on the progress on the transcoding at a later point.

class ConvertRawSlideToSlide(Task):
    queue = 'backend_convert_raw_slides'
    ...
    def _handle_video(self, slide_upload):
        mp4 = {
            'output': 'mp4',
            'size': '320x240',
            'bitrate': '256k',
            'audio_bitrate': '64k',
            'audio_channels_number': '2',
            'keep_aspect_ratio': 'yes',
            'video_codec': 'mpeg4',
            'profile': 'main',
            'vcodecparameters': 'no',
            'audio_codec': 'libfaac',
            'two_pass': 'no',
            'cbr': 'no',
            'deinterlacing': 'no',
            'keyframe': '300',
            'audio_volume': '100',
            'file_extension': 'mp4',
            'hint': 'no',
        }

        webm = {
            'output': 'webm',
            'size': '320x240',
            'bitrate': '256k',
            'audio_bitrate': '64k',
            'audio_sample_rate': '44100',
            'audio_channels_number': '2',
            'keep_aspect_ratio': 'yes',
            'video_codec': 'libvpx',
            'profile': 'baseline',
            'vcodecparameters': 'no',
            'audio_codec': 'libvorbis',
            'two_pass': 'no',
            'cbr': 'no',
            'deinterlacing': 'no',
            'keyframe': '300',
            'audio_volume': '100',
            'preset': '6',
            'file_extension': 'webm',
            'acbr': 'no',
        }

        ogg = {
            'output': 'ogg',
            'size': '320x240',
            'bitrate': '256k',
            'audio_bitrate': '64k',
            'audio_sample_rate': '44100',
            'audio_channels_number': '2',
            'keep_aspect_ratio': 'yes',
            'video_codec': 'libtheora',
            'profile': 'baseline',
            'vcodecparameters': 'no',
            'audio_codec': 'libvorbis',
            'two_pass': 'no',
            'cbr': 'no',
            'deinterlacing': 'no',
            'keyframe': '300',
            'audio_volume': '100',
            'file_extension': 'ogg',
            'acbr': 'no',
        }

        flv = {
            'output': 'fl9',
            'size': '320x240',
            'bitrate': '256k',
            'audio_bitrate': '64k',
            'audio_channels_number': '2',
            'keep_aspect_ratio': 'yes',
            'video_codec': 'libx264',
            'profile': 'high',
            'vcodecparameters': 'no',
            'audio_codec': 'libfaac',
            'two_pass': 'no',
            'cbr': 'no',
            'deinterlacing': 'no',
            'keyframe': '300',
            'audio_volume': '100',
            'file_extension': 'mp4',
        }

        thumbnail = {
            'output': 'thumbnail',
            'time': '5',
            'video_codec': 'mjpeg',
            'keep_aspect_ratio': 'yes',
            'file_extension': 'jpg',
        }

        encoder = Encoding(settings.ENCODING_API_USER_ID,
            settings.ENCODING_API_USER_KEY)
        resp = encoder.add_media(source=[slide_upload.original_file.url],
            formats=[mp4, webm, ogg, flv, thumbnail])

        media_id = None

        if resp is not None and resp.get('response') is not None:
            media_id = resp.get('response').get('MediaID')

        if media_id is None:
            slide_upload.status = SlideUploadQueue.STATUS_FAILED
            slide_upload.save()
            log.error('Unable to communicate with encoding.com')
            return False

        slide_upload.encoding_com_tracking_code = media_id
        slide_upload.status = \
            SlideUploadQueue.STATUS_AWAITING_3RD_PARTY_PROCESSING
        slide_upload.save()
        return True

Encoding.com recommended some less-than-ideal python wrappers for communicating with their service. I added a few fixes into the module but there is still work to be done to get it to a state I'm happy with. Below is what this wrapper currently looks like in our codebase:

import httplib
from lxml import etree
import urllib
from xml.parsers.expat import ExpatError
import xmltodict

ENCODING_API_URL = 'manage.encoding.com:80'

class Encoding(object):

    def __init__(self, userid, userkey, url=ENCODING_API_URL):
        self.url = url
        self.userid = userid
        self.userkey = userkey

    def get_media_info(self, action='GetMediaInfo', ids=[],
        headers={'Content-Type': 'application/x-www-form-urlencoded'}):
        query = etree.Element('query')

        nodes = {
            'userid': self.userid,
            'userkey': self.userkey,
            'action': action,
            'mediaid': ','.join(ids),
        }

        query = self._build_tree(etree.Element('query'), nodes)
        results = self._execute_request(query, headers)

        return self._parse_results(results)

    def get_status(self, action='GetStatus', ids=[], extended='no',
        headers={'Content-Type': 'application/x-www-form-urlencoded'}):
        query = etree.Element('query')

        nodes = {
            'userid': self.userid,
            'userkey': self.userkey,
            'action': action,
            'extended': extended,
            'mediaid': ','.join(ids),
        }

        query = self._build_tree(etree.Element('query'), nodes)
        results = self._execute_request(query, headers)

        return self._parse_results(results)

    def add_media(self, action='AddMedia', source=[], notify='', formats=[],
        instant='no',
        headers={'Content-Type': 'application/x-www-form-urlencoded'}):
        query = etree.Element('query')

        nodes = {
            'userid': self.userid,
            'userkey': self.userkey,
            'action': action,
            'source': source,
            'notify': notify,
            'instant': instant,
        }

        query = self._build_tree(etree.Element('query'), nodes)

        for format in formats:
            format_node = self._build_tree(etree.Element('format'), format)
            query.append(format_node)

        results = self._execute_request(query, headers)
        return self._parse_results(results)

    def _build_tree(self, node, data):
        for k, v in data.items():
            if isinstance(v, list):
                for item in v:
                    element = etree.Element(k)
                    element.text = item
                    node.append(element)
            else:
                element = etree.Element(k)
                element.text = v
                node.append(element)

        return node

    def _execute_request(self, xml, headers, path='', method='POST'):
        params = urllib.urlencode({'xml': etree.tostring(xml)})

        conn = httplib.HTTPConnection(self.url)
        conn.request(method, path, params, headers)
        response = conn.getresponse()
        data = response.read()
        conn.close()
        return data

    def _parse_results(self, results):
        try:
            return xmltodict.parse(results)
        except ExpatError, e:
            print 'Error parsing encoding.com response'
            print e
            return None

Left on the todo list include HTTPS-only transmission with strict validation of Encoding.com's SSL certificate and to write some unit tests (tickets, tickets and more tickets).

Stage 4: Download all new video formats

We have a periodic job running every 15 seconds via RabbitMQ which is checking up on the progress of the video transcoding:

class CheckUpOnThirdParties(PeriodicTask):
    run_every = timedelta(seconds=settings.THIRD_PARTY_CHECK_UP_INTERVAL)
    ...
    def _handle_encoding_com(self, slides):
        format_lookup = {
            'mp4': SlideVideoMedia.FORMAT_MP4,
            'webm': SlideVideoMedia.FORMAT_WEBM,
            'ogg': SlideVideoMedia.FORMAT_OGG,
            'fl9': SlideVideoMedia.FORMAT_FL9,
            'thumbnail': SlideVideoMedia.FORMAT_THUMB,
        }

        encoder = Encoding(settings.ENCODING_API_USER_ID,
            settings.ENCODING_API_USER_KEY)

        job_ids = [item.encoding_com_tracking_code for item in slides]
        resp = encoder.get_status(ids=job_ids)

        if resp is None:
            log.error('Unable to check up on encoding.com')
            return False

We'll go through the response from Encoding.com validating each piece of the response as we go along:

if resp.get('response') is None:
    log.error('Unable to get response node from encoding.com')
    return False

resp_id = resp.get('response').get('id')

if resp_id is None:
    log.error('Unable to get media id from encoding.com')
    return False

slide = SlideUploadQueue.objects.filter(
    status=SlideUploadQueue.STATUS_AWAITING_3RD_PARTY_PROCESSING,
    encoding_com_tracking_code=resp_id)

if len(slide) != 1:
    log.error('Unable to find a single record for %s' % resp_id)
    return False

resp_status = resp.get('response').get('status')

if resp_status is None:
    log.error('Unable to get status from encoding.com')
    return False

if resp_status != u'Finished':
    log.debug("%s isn't finished, will check back later" % resp_id)
    return True

formats = resp.get('response').get('format')

if formats is None:
    log.error("No output formats were found. Something's wrong.")
    return False

for format in formats:
    try:
        assert format.get('status') == u'Finished', \
        "%s is not finished. Something's wrong." % format.get('id')

        output = format.get('output')
        assert output in ('mp4', 'webm', 'ogg', 'fl9',
            'thumbnail'), 'Unknown output format %s' % output

        s3_dest = format.get('s3_destination')
        assert 'http://encoding.com.result.s3.amazonaws.com/'\
            in s3_dest, 'Suspicious S3 url: %s' % s3_dest

        https_link = \
            'https://s3.amazonaws.com/encoding.com.result/%s' %\
            s3_dest.split('/')[-1]
        file_ext = https_link.split('.')[-1].strip()

        assert len(file_ext) > 0,\
            'Unable to get file extension from %s' % https_link

        count = SlideVideoMedia.objects.filter(slide_upload=slide,
            format=format_lookup[output]).count()

        if count != 0:
            print 'There is already a %s file for this slide' % output
            continue

        content = self.download_content(https_link)

        assert content is not None,\
            'There is no content for %s' % format.get('id')
    except AssertionError, e:
        log.error('A format did not pass all assertions: %s' % e)
        continue

At this point we've asserted everything is as it should be a we can save each of the videos:

media = SlideVideoMedia()
media.format = format_lookup[output]
media.converted_file.save('blah.%s' % file_ext, ContentFile(content))
media.save()

Stage 5: Video via HTML5

On our frontend we've created a page with an HTML5 video element. We're using video.js to display the video in the best-supported format for each browser.

➫ bower install video.js
bower caching git://github.com/videojs/video.js-component.git
bower cloning git://github.com/videojs/video.js-component.git
bower fetching video.js
bower checking out video.js#v4.0.3
bower copying /home/mark/.bower/cache/video.js/5ab058cd60c5615aa38e8e706cd0f307
bower installing video.js#4.0.3

In our index.jade file we include it's dependencies:

!!! 5
html(lang="en", class="no-js")
  head
    meta(http-equiv='Content-Type', content='text/html; charset=UTF-8')
    ...
    link(rel='stylesheet', type='text/css', href='/components/video-js-4.1.0/video-js.css')
    script(type='text/javascript', src='/components/video-js-4.1.0/video.js')

In a Angular.js/JADE-based template we've included a tag and it's children tags. There is also a poster element that will show a static image of the video we've transcoded from the first few moments of the video.

#main.span12
    video#example_video_1.video-js.vjs-default-skin(controls, preload="auto", width="640", height="264", poster="{{video_thumbnail}}", data-setup='{"example_option":true}', ng-show="videos")
        source(ng-repeat="video in videos", src="{{video.src}}", type="{{video.type}}")

This will print out every video format we've converted into, each as it's own tag. Video.js will decide which of them to play based on the browser the user is using.

We still have a lot of work to do around fallback support, building unit tests and improving the robustness of our Encoding.com service wrapper. If this sort of work interests you please do get in touch.

文章节选：http://techblog.stickyworld.com/video-with-python.html

搭建基于Python的视频服务器，首发于润物无声。

Python模块发布文件setup.py vs requirements.txt

润物无声 — Sun, 04 Aug 2013 09:06:39 +0000

我们都知道发布Python模块的时候需要准备几份文件，其中就包括了setup.py和requirements.txt，这两份文件中都需要填写待发布模块的依赖关系，那么是不是这样就会出现信息的重复，维护起来也不方便呢？如下的文章解释了这两个文件中重复信息的不同功用，前者为“虚引用”，后者为“具体引用”，那么他们利用的场合和背后的机制是什么样子的，马上就会知晓了。

There's a lot of misunderstanding between setup.py and requirements.txt and their roles. A lot of people have felt they are duplicated information and have even created tools to handle this "duplication".

Python Libraries

A Python library in this context is something that has been developed and released for others to use. You can find a number of them on PyPI that others have made available. A library has a number of pieces of metadata that need to be provided in order to successfully distribute it. These are things such as the Name, Version, Dependencies, etc. The setup.py file gives you the ability to specify this metadata like:

from setuptools import setup

setup(
    name="MyLibrary",
    version="1.0",
    install_requires=[
        "requests",
        "bcrypt",
    ],
    # ...
)

This is simple enough, you have the required pieces of metadata declared. However something you don't see is a specification as to where you'll be getting those dependencies from. There's no url or filesystem where you can fetch these dependencies from, it's just "requests" and "bcrypt". This is important and for lack of a better term I call these "abstract dependencies". They are dependencies which exist only as a name and an optional version specifier. Think of it like duck typing your dependencies, you don't care what specific "requests" you get as long as it looks like "requests".

Python Applications

Here when I speak of a Python application I'm going to typically be speaking about something that you specifically deploy. It may or may not exist on PyPI but it's something that likely does not have much in the way of reusability. An application that does exist on PyPI typically requires a deploy specific configuration file and this section deals with the "deploy specific" side of a Python application.

An application typically has a set of dependencies, often times even a very complex set of dependencies, that it has been tested against. Being a specific instance that has been deployed, it typically does not have a name, nor any of the other packaging related metadata. This is reflected in the abilities of a piprequirements file. A typical requirements file might look something like:

# This is an implicit value, here for clarity
--index https://pypi.python.org/simple/

MyPackage==1.0
requests==1.2.0
bcrypt==1.0.2

Here you have each dependency shown along with an exact version specifier. While a library tends to want to have wide open ended version specifiers an application wants very specific dependencies. It may not have mattered up front what version of requests was installed but you want the same version to install in production as you developed and tested with locally.

At the top of this file you'll also notice a --index https://pypi.python.org/simple/. Your typical requirements.txt won't have this listed explicitly like this unless they are not using PyPI, it is however an important part of a requirements.txt. This single line is what turns the abstract dependency of requests==1.2.0 into a "concrete" dependency of "requests 1.2.0 from https://pypi.python.org/simple/". This is not like duck typing, this is the packaging equivalent of an isinstance() check.

So Why Does Abstract and Concrete Matter?

You've read this far and maybe you've said, ok I know that setup.py is designed for redistributable things and that requirements.txt is designed for non-redistributable things but I already have something that reads a requirements.txt and fills out my install_requires=[...] so why should I care?

This split between abstract and concrete is an important one. It was what allows the PyPI mirroring infrastructure to work. It is what allows a company to host their own private package index. It is even what enables you to fork a library to fix a bug or add a feature and use your own fork. Because an abstract dependency is a name and an optional version specifier you can install it from PyPI or from Crate.io, or from your own filesystem. You can fork a library, change the code, and as long as it has the right name and version specifier that library will happily go on using it.

A more extreme version of what can happen when you use a concrete requirement where an abstract requirement should be used can be found in the Go language. In the go language the default package manager (go get) allows you to specify your imports via an url inside the code which the package manager collects and downloads. This would look something like:

import (
        "github.com/foo/bar"
)

Here you can see that an exact url to a dependency has been specified. Now if I used a library that specified its dependencies this way and I wanted to change the "bar" library because of a bug that was affecting me or a feature I needed, I would not only need to fork the bar library, but I would also need to fork the library that depended on the bar library to update it. Even worse, if the bar library was say, 5 levels deep, then that's a potential of 5 different packages that I would need to fork and modify only to point it at a slightly different "bar".

A Setuptools Misfeature

Setuptools has a feature similar to the Go example. It's calleddependency links and it looks like this:

from setuptools import setup

setup(
    # ...
    dependency_links = [
        "http://packages.example.com/snapshots/",
        "http://example2.com/p/bar-1.0.tar.gz",
    ],
)

This "feature" of setuptools removes the abstractness of its dependencies and hardcodes an exact url from which you can fetch the dependency from. Now very similarly to Go if we want to modify packages, or simply fetch them from a different server we'll need to go in and edit each package in the dependency chain in order to update the dependency_links.

Developing Reusable Things or How Not to Repeat Yourself

The "Library" and "Application" distinction is all well and good, but whenever you're developing a Library, in a way it becomes your application. You want a specific set of dependencies that you want to fetch from a specific location and you know that you should have abstract dependencies in your setup.py and concrete dependencies in your requirements.txt but you don't want to need to maintain two separate lists which will inevitably go out of sync. As it turns out pip requirements file have a construct to handle just such a case. Given a directory with a setup.py inside of it you can write a requirements file that looks like:

--index https://pypi.python.org/simple/

-e .

Now your pip install -r requirements.txt will work just as before. It will first install the library located at the file path . and then move on to its abstract dependencies, combining them with its --index option and turning them into concrete dependencies and installing them.

This method grants another powerful ability. Let's say you have two or more libraries that you develop as a unit but release separately, or maybe you've just split out part of a library into its own piece and haven't officially released it yet. If your top level library still depends on just the name then you can install the development version when using the requirements.txt and the release version when not, using a file like:

--index https://pypi.python.org/simple/

-e https://github.com/foo/bar.git#egg=bar
-e .

This will first install the bar library from https://github.com/foo/bar.git, making it equal to the name "bar", and then will install the local package, again combining its dependencies with the --index option and installing but this time since the "bar" dependency has already been satisfied it will skip it and continue to use the in development version.

Recognition: This post was inspired by Yehuda Katz's blog post on a similar issue in Ruby with Gemfile and gemspec.

文章选自：setup-vs-requirements

Python模块发布文件setup.py vs requirements.txt，首发于润物无声。

Python函数调用的自动化内联机制

润物无声 — Sat, 03 Aug 2013 02:14:02 +0000

相信开发过C++语言的同学都知道，inline内联函数，get和set函数通常都写成内联机制，为的是提高程序的运行效率，缩短函数调用的时间。那么开发过Android的同学，也应该知道，android程序优化中有一条机制就是尽量不用get和set函数，因为Java中不支持C++的内联机制，所以只能舍弃get和set方法，尽管有失面向对象的访问机制，但是对手持设备来说，合理利用有限资源，提高运行效率才是首选。

本文中讲的是怎样在Python语言中利用内联机制，Python是解释性的语言，没有编译过程，那么怎么在Python中实现内联机制呢？相信读完全文，就会有个清晰的认识，目前的实现还仍有很多不足的地方，限制很多，期待以后Python的升级能够自带就好了。

Calling functions in Python can be expensive. Consider this example: there are two statements that are being timed, the first one calls a function that returns an integer while the second one calls a function that returns the result of a second function call which returns an integer.

tom@toms ~$ python -m timeit -n 10000000  -s "def get_n(): return 1" "get_n()"
10000000 loops, best of 3: 0.145 usec per loop
tom@toms ~$ time python -m timeit -n 10000000 -s "get_n = lambda: 1; get_r_n = lambda: get_n()" "get_r_n()"
10000000 loops, best of 3: 0.335 usec per loop

The additional function call doubled the program execution time, despite not effecting the output of the function in any way. This got me thinking, how hard would it be to create a Python module that would inline functions, removing the calling overhead from certain functions you specify?

As it turns out, not that hard. Note: This is simply an experiment to see what's possible, don't even think about using this in real Python code (there are some serious limitations explained at the end). Check this out:

from inliner import inline

@inline
def add_stuff(x, y):
    return x + y

def call_func_args(num):
    return add_stuff(1, num)

import dis
dis.dis(call_func_args)
# Prints:
# 0 LOAD_CONST               1 (1)
# 3 LOAD_FAST                0 (num)
# 6 BINARY_ADD          
# 7 RETURN_VALUE

The dis function prints out the bytecode operations for a Python function, which shows that the call_func_args function has been modified so that the add_stuff() call never takes place and instead the body of the add_stuff function has been inlined inside the call_func_args function. I've put the code on GitHub, have a look if you like. Below I will explain how it works, for those interested.

Diving in: Import hooks and the AST module

Python is an interpreted language, when you run a Python program the source code is parsed into an Abstract Syntax Tree which is then 'compiled' into bytecode. We need a way of modifying the AST of an imported module before it gets compiled, and as luck would have it Python provides powerful hooks into the import mechanism that allow you to write importers that grab code from the internet or restrict packages from being imported. Getting our claws into the import mechanism is as simple as this:

import sys, imp

class Loader(object):
    def __init__(self, module):
        self.module = module

    def load_module(self, fullname):
        return self.module

class Importer(object):
    def find_module(self, fullname, path):
        file, pathname, description = imp.find_module(
            fullname.split(".")[-1], path)
        module_contents = file.read()
        # We can now mess around with the module_contents.
        # and produce a module object
        return Loader(make_module(module_contents))

sys.meta_path.append(Importer())

Now whenever anything is imported our find_module() method will be called. This should return an object with a load_module() function, which returns the final module.

Modifying the AST

Python provides an AST module to modify Python AST trees. So inside our find_module function we can get the source code of the module we are importing, parse it into an AST representation and then modify it before compiling it. You can see this in action here.

First we need to find all functions that are wrapped by our inline decorator, which is pretty simple to do. The AST module provides a NodeVisitor and a NodeTransformer class you can subclass. For each different type of AST node a visit_NAME method will be called, which you can then choose to modify or pass along untouched. The InlineMethodLocator runs through all the function definition's in a tree and stores any that are wrapped by our inline decorator:

class InlineMethodLocator(ast.NodeVisitor):
    def __init__(self):
        self.functions = {}

    def visit_FunctionDef(self, node):
        if any(filter(lambda d: d.id == "inline", node.decorator_list)):
            func_name = utils.getFunctionName(node)
            self.functions[func_name] = node

The next step after we have identified the functions we want to inline is to find where they are called, and then inline them. To do this we need to look for all Call nodes in our modules AST tree:

class FunctionInliner(ast.NodeTransformer):
    def __init__(self, functions_to_inline):
        self.inline_funcs = functions_to_inline

    def visit_Call(self, node):
        func = node.func
        func_name = utils.getFunctionName(func)
        if func_name in self.inline_funcs:
            func_to_inline = self.inline_funcs[func_name]
            transformer = transformers.getFunctionHandler(func_to_inline)
            if transformer is not None:
                node = transformer.inline(node, func_to_inline)

        return node

This visits all call objects and if we are calling a function we want to inline then we go grab a transformer object which will be responsible for the actual inlining. I've only written one transformer so far that works on simple functions (functions with 1 statement), but more can be added fairly easily. The simple function transformer simply returns the contents of the function and maps the functions values to the values of the calling function:

class SimpleFunctionHandler(BaseFunctionHandler):
    def inline(self, node, func_to_inline):
        # Its a simple function we have here. That means it is one statement and we can simply replace the
        # call with the inlined functions body
        body = func_to_inline.body[0]
        if isinstance(body, ast.Return):
            body = body.value

        return self.replace_params_with_objects(body, func_to_inline, node)

Limitations

There are some serious limitations with this code:

Inlined functions must have a unique name: The AST provides us with no type information (as Python is dynamically typed), only the name of the function we are calling. That means without writing code that attempts to deduce the type of a class instance (no mean feat) then each function call must have a unique name.
Only inlines functions in the same module: To keep things simple only calls in the same module are inlined.
Inlined class functions can't reference any double underscore attributes: Accessing self.__attr is about as 'private' as you can get in Python. The attribute lookup is prefixed with the class name, which we can't easily detect while inlining.
Everything will break: Python is very dynamic, you may wish to replace functions at runtime. Obviously if the functions have been inlined then this won't have any effect.

文章选自：automatically-inline-python-function-calls

Python函数调用的自动化内联机制，首发于润物无声。

分享Python模块到PyPI

润物无声 — Sat, 03 Aug 2013 01:40:54 +0000

怎么样打包自己的Python模块，并且分享到PyPI，相信下面的文章会给你一个满意的答案！

A completely incomplete guide to packaging a Python module and sharing it with the world on PyPI.

Abstract

Few things give me caremads like Python modules I want to use that aren’t on PyPI (pronounced “pie pee eye”, or “cheese shop”, not “pie pie”!). On the other hand – aspydanny points out – the current situation on packaging is rather confusing.

Therefore I want to help everyone who has some great code but feels lost with getting it on PyPI. I will be using my latest opus “pem” as a realistic yet simple example of how to get a pure-Python 2 and 3 module packaged up, tested and uploaded to PyPI. Including the new and shiny binary wheel format that’s faster and allows for binary extensions (read the wheel story if you want to know more)!

I’ll keep it super simple to get everyone started. At the end, I’ll link more complete documentation to show you the way ahead. Sorry, no Windows.

Tools Used

This is not a history lesson, therefore we will use:

pip 1.4+ (if you have no pip yet, bootstrapping is easy),
setuptools 0.9+,
and wheel 0.21+.

$ pip install -U "pip>=1.4" "setuptools>=0.9" "wheel>=0.21"

Please make sure all installs succeed, ancient installations may need some extra manual labor.

A Minimal Glimpse Into The Past

Forget that there ever was ~~distribute~~ (cordially merged into setuptools), ~~easy_install~~ (part of setuptools, supplanted by pip), or ~~distutils2~~ aka ~~packaging~~ (was supposed to be the official thing from Python 3.3 on, didn’t get done in time due to lack of helping hands, got ripped out by a heart-broken Éric and abandoned now).

Be just vaguely aware that there are distutils and distlibsomewhere underneath but ideally it shouldn't matter to you at all for now.

setup.py

Nowadays, every project that you want to package needs asetup.py file. Let’s have a look at what pem’s could look like:

from setuptools import setup

setup(
    name='pem',
    version='0.1.0',
    description='Parse and split PEM files painlessly.',
    long_description=(open('README.rst').read() + '\n\n' +
                      open('HISTORY.rst').read() + '\n\n' +
                      open('AUTHORS.rst').read()),
    url='http://github.com/hynek/pem/',
    license='MIT',
    author='Hynek Schlawack',
    author_email='hs@ox.cx',
    py_modules=['pem'],
    include_package_data=True,
    classifiers=[
        'Development Status :: 5 - Production/Stable',
        'Intended Audience :: Developers',
        'Natural Language :: English',
        'License :: OSI Approved :: MIT License',
        'Operating System :: OS Independent',
        'Programming Language :: Python',
        'Programming Language :: Python :: 2',
        'Programming Language :: Python :: 2.6',
        'Programming Language :: Python :: 2.7',
        'Programming Language :: Python :: 3',
        'Programming Language :: Python :: 3.3',
        'Topic :: Software Development :: Libraries :: Python Modules',
    ],
)

In the simplest case, you just import setup from setuptoolsand run it with some keyword arguments.

Before I go into these arguments, I want to point out one of the most neglected and yet most important one: license.Always set a license! Otherwise nobody can use your module, which would be a pity, right?

I take a few shortcuts here which you may want to copy too:

I love DRY, therefore my long description is just a concatenation of my README, change log and author credits.
I use bumpversion for manipulating my version strings.

An essential field is py_modules which you'll use to denote the relevant code for packaging, if your package is actually just a single module like in pem’s case. If you have a full-blown package (i.e. a directory), you’ll go for:

from setuptools import setup, find_packages

setup(
...
    packages=find_packages(exclude=['tests*']),
...
)

The exclude makes sure that a top-level tests package doesn’t get installed (it’s still part of the source distribution) since that would wreak havoc.

You can also specify them by hand like I used to fordoc2dash’s setup.py. However, unless you have good reasons, just use find_packages() and be done.

The classifiers field’s usefulness is openly disputed, nevertheless pick them from here. PyPI will refuse to accept packages with unknown classifiers, hence I like to use "Private :: Do Not Upload" for private packages to protect myself from my own stupidity.

One icky thing are dependencies. Unless you really know what you’re doing, don’t pin them (specifying minimal version your package requires to work is fine of course) or your users won’t be able to install security updates of your dependencies:

setup(
...
    install_requires=['Django', 'django-annoying'],
...
)

Rule of thumb: requirements.txt should contain only ==,setup.py the rest (>=, !=, <=, …).

Non-Code Files

Every Python project has a “Fix MANIFEST.in” commit. Look it up, it’s true.

But it’s not really that hard: you add all files and directories that are not packaged due to py_modules or packages in your setup.py.

For “pem”, it’s just

include *.rst LICENSE

For more commands, have a look at the MANIFEST.in docs – especially recursive-include is used widely. There’s also a handy tool to validate your MANIFEST.in: check-manifest.

Important: If you want the files and directories fromMANIFEST.in to also be installed (e.g. if it’s runtime-relevant data), you will have to set include_package_data=True in yoursetup() call as in the example above!

Configuration

For our minimal Python-only project, we’ll only need two lines in setup.cfg:

[wheel]
universal = 1

These will make wheel build a universal wheel file (e.g.pem-0.1.0-py2.py3-none-any.whl) and you won’t have to circle through virtual environments of all supported Python versions to build them separately.

Documentation

As I’ve hopefully established, every open source project needs a license. No excuses.

Additionally, even the simplest package needs a README that tells potential users what they’re looking at. Make itreStructuredText (reST) so PyPI can properly render it on your project page. As a courtesy to your users, also keep a change log so they know what to expect from your releases. And finally it’s good style to credit your contributors in a third file. I like to call them README.rst,HISTORY.rst, and AUTHORS.rst respectively, but there’s no hard rule.

My long project description and thus PyPI text is the concatenation of those three (and I’ve shamelessly stolen it from Kenneth Reitz).

If you host on GitHub, you may want to add aCONTRIBUTING.md that gets displayed when someone wants to open a pull request. Unfortunately it has to be a Markdown file because GitHub doesn’t support reST for this. Have a look at pem’s if you need inspiration.

Let’s Build Finally!

Building a source distribution (which is what is usually used for pure-Python project) is just a matter of

$ python setup.py sdist

Because it’s 2013 and wheels are awesome, we’ll build one too! Fortunately, it’s just as easy (please note that you’ll need to install wheel as per above for this to work!):

$ python setup.py bdist_wheel

Of course, the wheel package is optional, but your users will thank you for much faster installation times.

Now you should have a new directory called dist containing a source distribution file and a wheel file. For pem, it currently looks like this:

dist
├── pem-0.1.0-py2.py3-none-any.whl
└── pem-0.1.0.tar.gz

You can test whether both install properly before we move on to uploading:

$ rm -rf 27-sdist  # ensure clean state if ran repeatedly
$ virtualenv 27-sdist
...
$ 27-sdist/bin/pip install --no-index dist/pem-0.1.0.tar.gz
Ignoring indexes: https://pypi.python.org/simple/
Unpacking ./dist/pem-0.1.0.tar.gz
  Running setup.py egg_info for package from file:///Users/hynek/Projects/pem/dist/pem-0.1.0.tar.gz

Installing collected packages: pem
  Running setup.py install for pem

Successfully installed pem
Cleaning up...
$ 27-sdist/bin/python
Python 2.7.4 (default, Apr 21 2013, 09:35:41)
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.27)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pem
>>> pem.__version__
'0.1.0'

and

$ rm -rf 27-wheel  # ensure clean state if ran repeatedly
$ virtualenv 27-wheel
...
$ 27-wheel/bin/pip install --use-wheel --no-index --find-links dist pem
Ignoring indexes: https://pypi.python.org/simple/
Downloading/unpacking pem
Installing collected packages: pem
Successfully installed pem
Cleaning up...
$ 27-wheel/bin/python
Python 2.7.4 (default, Apr 21 2013, 09:35:41)
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.27)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pem
>>> pem.__version__
'0.1.0'

Please note the lack of “Running setup.py install for pem” in the second test. Yes, you can finally install packages without executing arbitrary code!

So you’re confident that your package is perfect? Let’s use the Test PyPI server to find out!

The PyPI Staging Server

Again, I’ll be using “pem” as the example project name to avoid everywhere.

First, sign up on the test server, you will receive a user name and a password. Please note that this is independent from the live servers. Thus you’ll have to re-register both yourself and your packages. It also gets cleaned from time to time so don’t be surprised if it suddenly doesn’t know about you or your projects anymore. Just re-register.

Next, create a ~/.pypirc consisting of:

[distutils]
index-servers=
    test

[test]
repository = https://testpypi.python.org/pypi
username = 
password =

Then use

$ python setup.py register -r test

to register your project with the PyPI test server.

Finally, upload your distributions:

$ python setup.py sdist upload -r test
$ python setup.py bdist_wheel upload -r test

Please note that calling upload without building something in the same command line will give you a:

error: No dist file created in earlier command

Now test your packages again:

$ pip install -i https://testpypi.python.org/pypi pem

Everything dandy? Then lets tackle the last step: putting it on the real PyPI!

The Final Step

First, register at PyPI, then complete your ~/.pypirc:

[distutils]
index-servers=
    pypi
    test

[test]
repository = https://testpypi.python.org/pypi
username = 
password = 

[pypi]
repository = http://pypi.python.org/pypi
username = 
password =

One last deep breath and let’s rock:

$ python setup.py sdist upload -r pypi
$ python setup.py bdist_wheel upload -r pypi

And thus, your package is only a pip install away for everyone! Congratulations, do more of that!

Next Steps

The information herein will probably get you pretty far but if you get stuck, the current canonical truths for Python packaging are:

Python Packaging User Guide
setuptools
wheel
distutils
and specifically the PyPI chapter in the latter one.

Please don’t let your software rot on GitHub or launchpad! Share!

Thanks

This article has been kindly proof-read by Lynn Root,Donald Stufft, Alex Gaynor, Thomas Heinrichsdobler, andJannis Leidel. All mistakes are still mine though.

I’d be happy if you have any feedback to make this article even more straight-forward for people new to Python packaging!

文章出自：sharing-your-labor-of-love-pypi-quick-and-dirty

分享Python模块到PyPI，首发于润物无声。

python 脚本登录百度空间

润物无声 — Wed, 15 May 2013 13:48:18 +0000

最近半年的时间，百度空间进行了多次变更，自从去年的wordpress百度空间博文同步插件不可用之后，就一直没有维护更新，最近百度空间基本稳定了，通过抓包对比分析，发现和以前的登录过程很不一样，先利用python脚本进行登录过程的模拟，然后再利用php实现，来维护更新同步插件。

大体的登录原理如下：

1. 首先获取登录的cookie文件，没有cookie的话，百度空间不能正常登录，访问如下网址获取cookie

https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=false

2. 获取登录过程的token，同样再次请求上面的网址，但是这一次需要携带第1步中server返回的cookie信息

https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=false

3. 发送登录账号信息（用户名，密码等）到如下网址，同样需要携带步骤2中server返回的cookie信息

http://passport.baidu.com/v2/api/?login

4. 至此，登录过程完毕

代码示例：

下面的示例代码会登录百度空间，然后把所有的博文自动备份到本地，其中登录的过程就如上面的原理所讲。

#!/usr/bin/python

#coding:utf8

import cookielib, urllib2, urllib

import os,sys,socket,re

#解析有多少页博客

pageStr = """var PagerInfo = {\s*allCount\s*:\s*'(\d+)',\s*pageSize\s*:\s*'(\d+)',\s*curPage\s*:\s*'\d+'\s*};"""

pageObj = re.compile(pageStr, re.DOTALL)

#获取登陆token

login_tokenStr = '''bdPass.api.params.login_token='(.*?)';'''

login_tokenObj = re.compile(login_tokenStr,re.DOTALL)

#获取博客标题和url

blogStr = r'''.*?
(.*?)'''

blogObj = re.compile(blogStr,re.DOTALL)

class Baidu(object):

    def __init__(self,user = '', psw = '', blog = ''):

        self.user = user

        self.psw  = psw

        self.blog = blog

        if not user or not psw or not blog:

            print "Plz enter enter 3 params:user,psw,blog"

            sys.exit(0)

        if not os.path.exists(self.user):

            os.mkdir(self.user)

        self.cookiename = 'baidu%s.coockie' % (self.user)

        self.token = ''

        self.allCount  = 0

        self.pageSize  = 10

        self.totalpage = 0

        self.logined = False

        self.cj = cookielib.LWPCookieJar()

        try:

            self.cj.revert(self.cookiename)

            self.logined = True

            print "OK"

        except Exception, e:

            print e

        self.opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(self.cj))

        self.opener.addheaders = [('User-agent','Opera/9.23')]

        urllib2.install_opener(self.opener)

        socket.setdefaulttimeout(30)

    #登陆百度

    def login(self):

        #如果没有获取到cookie，就模拟登陆

        if not self.logined:

            print "logon to baidu ..."

            #第一次先访问一下，目的是为了先保存一个cookie下来

            qurl = '''https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=false'''

            r = self.opener.open(qurl)

            self.cj.save(self.cookiename)

            #第二次访问，目的是为了获取token

            qurl = '''https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=false'''

            r = self.opener.open(qurl)

            rsp = r.read()

            #print rsp

            self.cj.save(self.cookiename)

            #通过正则表达式获取token

            matched_objs = login_tokenObj.findall(rsp)

            if matched_objs:

                self.token = matched_objs[0]

                print 'token =', self.token

                #然后用token模拟登陆

                post_data = urllib.urlencode({'username':self.user,

                                              'password':self.psw,

                                              'token':self.token,

                                              'charset':'UTF-8',

                                              'callback':'parent.bd12Pass.api.login._postCallback',

                                              'index':'0',

                                              'isPhone':'false',

                                              'mem_pass':'on',

                                              'loginType':'1',

                                              'safeflg':'0',

                                              'staticpage':'https://passport.baidu.com/v2Jump.html',

                                              'tpl':'mn',

                                              'u':'http://www.baidu.com/',

                                              'verifycode':'',

                                            })

                #path = 'http://passport.baidu.com/?login'

                path = 'http://passport.baidu.com/v2/api/?login'

                self.opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(self.cj))

                self.opener.addheaders = [('User-agent','Opera/9.23')]

                urllib2.install_opener(self.opener)

                headers = {

                  "Accept": "image/gif, */*",

                  "Referer": "https://passport.baidu.com/v2/?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2F",

                  "Accept-Language": "zh-cn",

                  "Content-Type": "application/x-www-form-urlencoded",

                  "Accept-Encoding": "gzip, deflate",

                  "User-Agent": "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)",

                  "Host": "passport.baidu.com",

                  "Connection": "Keep-Alive",

                  "Cache-Control": "no-cache"

                }

                req = urllib2.Request(path,

                                post_data,

                                headers=headers,

                                )

                rsp = self.opener.open(req).read()

                #print rsp

                self.cj.save(self.cookiename)

                #for login test

                #qurl = '''http://hi.baidu.com/pub/show/createtext'''

                #rsp = self.opener.open(qurl).read()

                #file_object = open('login.txt', 'w')

                #file_object.write(rsp)

                #file_object.close()

            else:

                print "Login Fail"

                sys.exit(0)

    #获取博客一共有多少页，如果有私有博文的话，登陆和不登陆获取的是不一样的

    def getTotalPage(self):

        #获取博客的总页数

        req2 = urllib2.Request(self.blog)

        rsp = urllib2.urlopen(req2).read()

        if rsp:

            rsp = rsp.replace('\r','').replace('\n','').replace('\t','')

            matched_objs = pageObj.findall(rsp)

            if matched_objs:

                obj0,obj1 = matched_objs[0]

                self.allCount = int(obj0)

                self.pageSize = int(obj1)

                self.totalpage = (self.allCount / self.pageSize) + 1

                print 'allCount:%d, pageSize:%d, totalpage:%d' % (self.allCount,self.pageSize,self.totalpage)

    #获取每一页里的博客链接

    def fetchPage(self,url):

        req = urllib2.Request(url)

        rsp = urllib2.urlopen(req).read()

        if rsp:

            rsp = rsp.replace('\r','').replace('\n','').replace('\t','')

            matched_objs = blogObj.findall(rsp)

            if matched_objs:

                for obj in matched_objs:

                    #这里可以用多线程改写一下,单线程太慢

                    self.download(obj[0],obj[1])

    def downloadBywinget(self,url,title):

        #比如使用wget之类的第三方工具，自己填参数写

        pass

    #下载博客

    def download(self,url,title):

        path = '%s/%s.html' % (self.user,title.decode('utf-8'))

        url = 'http://hi.baidu.com%s' % (url)

        print "Download url %s" % (url)

        nFail = 0

        while nFail < 5:

            try:

                sock = urllib.urlopen(url)

                htmlSource = sock.read()

                myfile = file(path,'w')

                myfile.write(htmlSource)

                myfile.close()

                sock.close()

                return

            except:

                nFail += 1

        print ('download blog fail:%s' % (url))

    def dlownloadall(self):

        for page in range(1,self.totalpage+1):

            url = "%s?page=%d" % (self.blog,page)

            #这里可以用多线程改写一下,单线程太慢

            self.fetchPage(url)

def main():

    user = 'runsheng2005'       #你的百度登录名

    psw  = 'password'  #你的百度登陆密码,不输入用户名和密码，得不到私有的文章

    blog = "http://hi.baidu.com/zhourunsheng" #你自己的百度博客链接

    baidu = Baidu(user,psw,blog)

    baidu.login()

    baidu.getTotalPage()

    baidu.dlownloadall()

if __name__ == '__main__':

    main()

例如，我的用户名是runsheng2005，则会在工作目录建立一个名为baidurunsheng2005.coockie的文件用来保存cookie信息，

其中的内容如下：

#LWP-Cookies-2.0
Set-Cookie3: BAIDUID="25B820FB17B13E5F4F7C9836FB465C96:FG=1"; path="/"; domain=".baidu.com"; path_spec; domain_dot; expires="2043-05-07 14:28:07Z"; version=0
Set-Cookie3: BDUSS=mJjbjFrZmp3WXNNbUhIQUxkWDJIMjFaR2dSZjdLaHdwcnhhRDBRLVNxcjQxcmxSQVFBQUFBJCQAAAAAAAAAAAEAAABv9HAAcnVuc2hlbmcyMDA1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPhJklH4SZJRW; path="/"; domain=".baidu.com"; path_spec; expires="2021-07-31 14:28:08Z"; version=0
Set-Cookie3: HOSUPPORT=1; path="/"; domain=".passport.baidu.com"; path_spec; expires="2021-07-31 14:28:07Z"; httponly=None; version=0
Set-Cookie3: PTOKEN=2bb1ab99373dbeeeec6b69af75e6a4c6; path="/"; domain=".passport.baidu.com"; path_spec; expires="2021-07-31 14:28:08Z"; version=0
Set-Cookie3: SAVEUSERID=a00277ba04dba8956259a5c4dfec4d40; path="/"; domain=".passport.baidu.com"; path_spec; expires="2021-07-31 14:28:08Z"; version=0
Set-Cookie3: STOKEN=1f4790267126b2e7dddb1f735f29074f; path="/"; domain=".passport.baidu.com"; path_spec; expires="2021-07-31 14:28:08Z"; version=0

在名为runsheng2005的子目录下面，会下载所有的博文
例如 打造个人的云端笔记本（CareyDiary）.html等等

python 脚本登录百度空间，首发于润物无声。

混乱代码分享之 Python 程序设计

润物无声 — Mon, 12 Sep 2011 01:19:07 +0000

Python代码历来是讲究整洁的，但是也不乏高手能写出如此混乱而又有节奏的代码，下面的混乱代码就是用来生成彭罗斯铺砖图案的。

程序源码

_                                 =
                                """if!
                              1:"e,V=100
                            0,(0j-1)**-.2;
                           v,S=.5/  V.real,
                         [(0,0,4      *e,4*e*
                       V)];w=1          -v"def!
                      E(T,A,              B,C):P
                  ,Q,R=B*w+                A*v,B*w+C
            *v,A*w+B*v;retur              n[(1,Q,C,A),(1,P
     ,Q,B),(0,Q,P,A)]*T+[(0,C            ,R,B),(1,R,C,A)]*(1-T)"f
or!i!in!_[:11]:S       =sum([E          (*x)for       !x!in!S],[])"imp
  ort!cair               o!as!O;      s=O.Ima               geSurfac
   e(1,e,e)               ;c=O.Con  text(s);               M,L,G=c.
     move_to                ,c.line_to,c.s                et_sour
       ce_rgb                a"def!z(f,a)                :f(-a.
        imag,a.       real-e-e)"for!T,A,B,C!in[i       !for!i!
          in!S!if!i[""";exec(reduce(lambda x,i:x.replace(chr
           (i),"n "[34-i:]),   range(   35),_+"""0]]:z(M,A
             );z(L,B);z         (L,C);         c.close_pa
             th()"G             (.4,.3             ,1);c.
             paint(             );G(.7             ,.7,1)
             ;c.fil             l()"fo             r!i!in
             !range             (9):"!             g=1-i/
             8;d=i/          4*g;G(d,d,d,          1-g*.8
             )"!def     !y(f,a):z(f,a+(1+2j)*(     1j**(i
             /2.))*g)"!for!T,A,B,C!in!S:y(M,C);y(L,A);y(M
             ,A);y(L,B)"!c.st            roke()"s.write_t
             o_png('pen                        rose.png')
             """                                       ))

执行效果

环境配置

Python运行时环境需要 Python <= 2.7 版本，我用的是Python2.6.4

C:UsersCarey.RS.Zhou>python
Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

还需要图形库 Pycairo 的支持，win的便捷安装请用 pygtk-all-in-one-2.24.0.win32-py2.6.msi

源码下载

penrose-tiling

混乱代码分享之 Python 程序设计，首发于润物无声。

为什么每个程序设计者都应该学习 Python 或者 Ruby

润物无声 — Mon, 25 Jul 2011 02:47:13 +0000

这是一篇选自外文的评论文章，该文告诉我们为什么要选择python或者ruby，他们相比较其他的编程语言有哪些优势，对我们以后的开发会有怎样的帮助。

=========== 英文原文============

If you are a student, you probably know C, C++ and Java. A few know VB, or C# / .NET. At some point you’ve probably built some web pages, so you know HTML, CSS and maybe JavaScript. By and large, it is difficult to find students who have any exposure to languages beyond this. And this is a shame because there are a number of programming languages out there which will make you a better programmer.

In this article, we give some reasons why you must learn Python or Ruby².

Compared to C/C++/Java – Python/Ruby allow you to write the same program with much, much fewer lines of code. It is estimated that a typical Python or Ruby program will require 5 times fewer lines of code than a corresponding Java code. Why spend that much more time on writing programs unless it is absolutely necessary? Also, someone said that a good programmer can reasonably maintain 20000 lines of code. It does not matter whether those are in assembly, C, or Python/Ruby/PHP/Lisp. So, if you write in Python/Ruby, whatever you do alone would probably need a 5-person team in Java/C/C++.
Compared to VB/PHP – Python/Ruby are much, much better designed languages than PHP/VB. PHP and VB are very popular for writing websites, and desktop applications respectively. The reason they’re popular is that they are very easy to learn and even non-programmers can pick them up quickly. But write any large program in these languages and you’ll start seeing the huge problems with these languages because they’re so badly designed. Friends don’t let friends program in PHP/VB.
Compared to Lisp/Scala/Haskell/Closure/Erlang – Python/Ruby are still quite “mainstream”. Sure these languages have some really cool features, and for advanced programmers, exposure to these languages can really improve the way they think about programming. But there will be time later in your career to decide whether you want to pick up one or more of these. But for now, Python/Ruby do a much better job of balancing the power of the language against commercial applicability.
Compared to Perl¹ – Both Python & Ruby owe a lot to Perl, and Perl was the biggest and best dynamic language before they started gaining prominence. But now, Perl’s popularity is reducing and more and more people are adopting Ruby/Python. I find Perl’s object-orientedness a bit contrived and ugly. In general, I think Perl is a harder language to learn since it has so many different ways of doing things, and the syntax tends to be cryptic and non-intuitive until you get the hang of it. Overall, I feel that Perl is not the best language for a student to pick up, unless there is a very good reason to do so (i.e. if you have lots of regular expression processing, then Perl shines)
Compared to sh/sed/awk/bash – If you have exposure to Linux/Unix, you have probably done some shell programming, and might even have written non-trivial programs. But anything more than a few lines in these languages starts to become a bit painful and it’s much better to do this in Python. Of course, Perl is the best language for this, but Python is a close second. (Ruby is not so great for system shell scripting).

Just do a Google search on ‘Why is X better than Y’ – where you put Python or Ruby for X and put one of the other languages for Y – and you will find a whole bunch of material on why these languages are so good.

If you have the flexibility to choose the programming language for you final year project, then pick Python or Ruby and get done in half the time that it would have required you to do the project (except if it is a mobile app development project, in which case you’ll be forced to use Java or Objective-C).

Here is a cartoon from xkcd which gives an idea of how powerful you feel after having mastered Python:

How to get started? There are many, many website that give tutorials and classes on Python and Ruby. Here are just a couple of them that we’ve chosen:

Google’s Python Class is a good resource for learning Python
RubyLearning is a great website where you can learn Ruby.

Questions? Ask in the comments below, and we’ll try to answer them.

Footnote:

¹: My post seems to have pissed of a lot of Perl fans, and in retrospect I realized that I was harsher on the language than I should have been. Hence I’ve changed the Perl section. Earlier it read:

Both Python & Ruby owe a lot to Perl, and Perl was the biggest and best dynamic language before they showed up. But Perl is now old. It’s object-orientedness is broken. It hasn’t really been updated in a while, and it is losing market share. For new, hot things (like web programming frameworks, web APIs) it is not as up-to-date as Python & Ruby. Basically, Python/Ruby are rising, Perl is setting.

Please keep this in mind when reading the comments of Lars, Torsten and Olaf.

² All the language comparisons made in this article are for the context of students in Indian CS degree programs picking up a new programming language. A statement like “X is better than Y” will never make sense as an absolute statement because all languages that have survived the test of time are obviously better than other languages in some areas, and that is the reason they exist. In other words, there are always scenarios where PHP/Java/C/C++/Perl and others are better languages than Ruby/Python.

=========== 中文翻译============

如果你是个学生,你应该会C，C++和Java。还会一些VB，或C#/.NET。多少你还可能开发过一些Web网页，你知道一些HTML，CSS和JavaScript知识。总体上说，我们很难发现会有学生显露出掌握超出这几种语言范围外的语言的才能。这真让人遗憾，因为还有很多种编程语言，它们能让你成为一个更好的程序员。

在这篇文章里，我将会告诉你，为什么你一定要学习Python或Ruby语言。

跟C/C++/Java相比 — Python/Ruby能让你用少的多的多的代码写出相同的程序。有人计算过，Python或Ruby写出的程序的代码行数只相当于相对应的Java代码的行数的五分之一。如果没有绝对的必要，为什么要花这么多时间写出这么多的代码呢？而且，有人说，一个优秀的程序员能维护的代码量最多是2万行。这不区分用的语言究竟是汇编，C还是Python/Ruby/PHP/Lisp。所以，如果你用Python/Ruby写，你一个人干的，不管是干什么，如果换用Java/C/C++，那都需要一个5人的小团队来干。
跟VB/PHP比较 — 跟PHP/VB相比，Python/Ruby的是一种从设计上讲比它们好的不知多少倍的语言。PHP和VB分别是在开发网站和桌面应用程序上非常流行的语言。它们流行的原因是非常的易学。不懂计算机的人也很容易的上手。如果你用这些语言开发过大型的项目，你就会发现这些语言的设计是如此的糟糕。是朋友，他就不会劝你使用PHP/VB。
跟Lisp/Scala/Haskell/Closure/Erlang相比 — Python/Ruby跟它们比起来显得相当的“主流”。确实，这些语言每种都有其很酷的特征，对于高级编程人员，了解这些语言能给他们对编程的思考带来实际的提升。但这些应该在你以后的职业生涯中才去决定学哪一两种。对于现在，Python/Ruby是在语言功能和实际运用之间平衡后的更好的选择。
跟Perl相比 — Python和Ruby都受恩于Perl，在这两种语言异军突起前，Perl是最好、最大的一种动态语言。但现在，Perl已是昨日黄花，越来越多的人转向Ruby/Python。我感觉Perl的面向对象机制有点做作，很不好用。通常认为，Perl一种比较难学的语言，因为它提供你了太多不同的方法去完成同一个任务，它的语法有点像密码，非常不直观 — 除非你对它掌握的非常好。总之，我感觉Perl是一种对于学生来说不是很合适的语言—除非你有特殊的理由去学它(例如，你有很多正则表达式要处理，这是Perl的闪光点)。
跟sh/sed/awk/bash相比 — 如果你使用Linux/Unix，你可能需要做一些shell编程，甚至会编写一些不小的程序。但是，对于这些语言，一旦程序达到一定的行数，事情就会开始变得让你痛苦不堪，你最好是用Python去做这些事情。当然，做这种事情，Perl是最好的选择，Python排第二。(Ruby对于系统shell脚本不是很合适)。

你可以在Google上搜一下“为什么X比Y好” — 其中把X换成Python或Ruby，把Y换成另外一种语言 — 你就会发现，有无数的文章来说明它们为什么这么好。

如果你有选择你的毕业设计使用的编程语言的自由，你应该选择Python或Ruby，它们能让你在开发项目的过程中节省一半的时间(除非你要开发的是移动应用，这样你必须要使用Java或Objective-C)。

下面是xkcd上的一幅漫画，告诉你掌握Python后你会变得多么的强大：

如何去学它们呢？很多很多的网站上都提供了学习Python和Ruby的教材和课程。下面的是我从中选出的一些：

谷歌的Python课程，学习Python的好资源。
RubyLearning，学习Ruby的一个好网站。

有疑问吗？请在评论了写出来，我会尽量回答你们。

尾注：

¹：我的这篇文章可能会让很多Perl爱好者很郁闷，现在回味一下，我认识到对这种语言的要求过于苛刻了。因此，我把关于Perl的一节改写了一下。

Python和Ruby都受恩于Perl，在这两种语言出现之前，Perl是最大、最好的动态语言。但Perl现在太老了。它的面向对象性不完整。它很久没有升级更新了，它的市场份额正在丢失。对于一些新的、很火的事物(例如Web编程框架，Web API)，它不如Python & Ruby 那样能跟上时代的步伐。基本上，Python/Ruby在兴起，Perl在衰退。

²：本文中的所有语言的比较都是用来给印度计算机科学专业的学生选编程语言时做参考的。像“X比Y好”这样的句子准确的讲是毫无意义的，因为所有的语言都是经过时间的考验而存活下来的，有些语言会在某些领域比另外一种要强，这也是它们存活下来的原因。换句话说，总有一些情况下，PHP/Java/C/C++/Perl 看起来会比 Ruby/Python 等其它语言显的更适合。

英文链接：http://reliscore.com/why-every-programmer-should-learn-python-or-ruby

中文链接：http://www.aqee.net/2011/07/25/why-every-programmer-should-learn-python-or-ruby/

为什么每个程序设计者都应该学习 Python 或者 Ruby，首发于润物无声。

如何成为Python高手

润物无声 — Thu, 23 Jun 2011 01:14:23 +0000

本文是从 How to become a proficient Python programmer 这篇文章翻译而来。

这篇文章主要是对我收集的一些文章的摘要。因为已经有很多比我有才华的人写出了大量关于如何成为优秀Python程序员的好文章。

我的总结主要集中在四个基本题目上：函数式编程，性能，测试，编码规范。

如果一个程序员能将这四个方面的内容知识都吸收消化，那他/她不管怎样都会有巨大的收获。

函数式编程

命令式的编程风格已经成为事实上的标准。命令式编程的程序是由一些描述状态转变的语句组成。虽然有时候这种编程方式十分的有效，但有时也不尽如此(比如复杂性) —— 而且，相对于声明式编程方式，它可能会显得不是很直观。

如果你不明白我究竟是在说什么，这很正常。这里有一些文章能让你脑袋开窍。但你要注意，这些文章有点像《骇客帝国》里的红色药丸 —— 一旦你尝试过了函数式编程，你就永远不会回头了。

性能

你会看到有如此多的讨论都在批评这些“脚本语言”(Python，Ruby)是如何的性能低下，可是你却经常的容易忽略这样的事实：是程序员使用的算法导致了程序这样拙劣的表现。

这里有一些非常好的文章，能让你知道Python的运行时性能表现的细节详情，你会发现，通过这些精炼而且有趣的语言，你也能写出高性能的应用程序。而且，当你的老板质疑Python的性能时，你别忘了告诉他，这世界上第二大的搜索引擎就是用Python写成的 —— 它叫做Youtube(参考Python摘录)

测试

如今在计算机科学界，测试可能是一个最让人不知所措的主题了。有些程序员能真正的理解它，十分重视TDD(测试驱动开发)和它的后继者BDD(行为驱动开发)。而另外一些根本不接受，认为这是浪费时间。那么，我现在将告诉你：如果你不曾开始使用TDD/BDD，那你错过了很多最好的东西！

这并不只是说引入了一种技术，可以替换你的公司里那种通过愚蠢的手工点击测试应用程序的原始发布管理制度，更重要的是，它是一种能够让你深入理解你自己的业务领域的工具 —— 真正的你需要的、你想要的攻克问题、处理问题的方式。如果你还没有这样做，请试一下。下面的这些文章将会给你一些提示：

编码规范

并非所有的代码生来平等。有些代码可以被另外的任何一个好的程序员读懂和修改。但有些却只能被读，而且只能被代码的原始作者修改 —— 而且这也只是在他或她写出了这代码的几小时内可以。为什么会这样？因为没有经过代码测试(上面说的)和缺乏正确的编程规范。

下面的文章给你描述了一个最小的应该遵守的规范合集。如果按照这些指导原则，你将能编写出更简洁和漂亮的代码。作为附加效应，你的程序会变得可读性更好，更容易的被你和任何其他人修改。

那就去传阅这这些资料吧。从坐在你身边的人开始。也许在下一次程序员沙龙或编程大会的时候，也已经成为一名Python编程高手了！

祝你学习旅途顺利。

如果你喜欢这些文章，请在微博上顶一下，让其他人也知道。

本文转自：http://www.aqee.net/2011/06/23/how-to-become-a-proficient-python-programmer/

如何成为Python高手，首发于润物无声。