0%

Fluent Python 笔记

术语解释

doctest: 一种单元测试工具

genxp: generator expression

listcomp: list comprehension

Data Structures

Tuples Are Not Just Immutable Lists

Using * to excess items

1
2
3
4
5
6
7
8
9
10
11
>>> a,b,*rest = range(5)
>>> rest
[2, 3, 4]

>>> a,b,*rest = range(5)
>>> a, b, rest
(0,1,[])

>>> a, *body, c, d = range(5)
>>> a, body, c, d
(0, [1, 2], 3, 4)

Multidimensional Slicing and Ellipsis

Numpy uses ...(not ) as a shortcut when sliceing arrays of many dimensions; for example, if x is a four-dimensional array, x[i, ...] is a shortcut for x[i, :, :, :].


Using + and * with sequences

Both + and * always create a new object, and never change their operands.

Building Lists of Lists

The best way od doing so is with a listcomp.

1
2
3
4
5
6
>>> board = [['_'] * 3 for i in range(3)]
>>> board
[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]
>>> board[1][2] = 'X'
>>> board
[['_', '_', '_'], ['_', '_', 'X'], ['_', '_', '_']]

A tempting but wrong shortcut is doing it like the following example.

1
2
3
4
5
6
>>> false_board = [['_'] * 3] * 3
>>> false_board
[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]
>>> false_board[1][2] = 'X'
>>> false_board
[['_', '_', 'X'], ['_', '_', 'X'], ['_', '_', 'X']]

Augmented Assignment with Sequences

The specical method that makes += work is __iadd__(for “in-place addition”). However, is __iadd__ is not implemented(immutable sequences), Python falls back to call __add__.

Repeated concatenation of immutable sequences is ineffcient.

A += Assignment Puzzler

1
2
t = (1,2,[30,40])
t[2] += [50,60]

The result is

  • t becomes (1,2,[30,40,50,60])
  • TypeError is raised with the message ‘tuple’ object does not support item assignment

There are three steps when s[a] += b is running:

  1. Put the value of s[a] on TOS(Top Of Stack)
  2. Perform TOS += b. This succeeds if TOS is refers to a mutable object.
  3. Assign s[a] = TOS. This fails if s is immutable.

list.sort and the sorted Built-In Function

The list.sort method sorts a list in place–that is, without making a copy. It retuens None to remind us that it changes the target object, and does not create a new list.

In contrast, the built-in function sorted create a new list and retuen it. In fact is accepts any iterable objects, including immutable sequences and generators.

Managing Ordered Sequences with bisect

An interesting application of bisect is to perform table lookups by numeric values–for example, to convert scores to letter grades.

1
2
3
4
5
6
>>> def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
... i = bisect.bisect(breakpoints,score)
... return grades[i]
...
>>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
['F', 'A', 'C', 'C', 'B', 'A', 'A']

When a List Is Not the Answer

Array

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
>>> from array import array
>>> from random import random
>>> floats = array('d',(random() for i in range(10**7)))
>>> floats[-1]
0.652373109695629
>>> fp = open('num.bin', 'wb')
>>> floats.tofile(fp)
>>> fp.close()
>>> floats2 = array('d')
>>> fp = open('num.bin', 'rb')
>>> floats2.fromfile(fp, 10**7)
>>> fp.close()
>>> floats2[-1]
0.652373109695629
>>> floats2 == floats
True

Using array.fromfile is nearly 60 times faster than reading the numbers from a text file. Saving with array.tofile ia about 7 times faster than writing one float per line in a text file.

Memory Views

1
2
3
4
5
6
7
8
9
10
11
12
>>> nums = array.array('h', list(range(-2,3)))
>>> memv = memoryview(nums)
>>> len(memv)
5
>>> memv[0]
-2
>>> memv_oct = memv.cast('B')
>>> memv_oct.tolist()
[254, 255, 255, 255, 0, 0, 1, 0, 2, 0]
>>> memv_oct[5] = 4
>>> nums
array('h', [-2, -1, 1024, 1, 2])

非常底层的内操作,一般来说只有在有高性能需求的时候才会用到吧。

Deques and Other Queues

The .append and .pop methods make a list as a stack or a queue (if you use .pop(0)), you get LIFO behavior. But inserting and removing form the left of a list(the 0-index end) is costly because the entire list must be shifted.

The class collections.deque is a thread-safe double-end queue designed for fast inserting and removing from both ends.And a queue can be bounded, when it is full, it discards items from the opposite end when you append new one.

Note:
Removing items from the middle of a deque is not fast.
THe append and popleft operations are atomic, so deque is safe to use as a LIFO queue in multithreaded aoolications wihtout th need for lock.

Dictionaries and Sets

Generic Mapping Types

What Is Hashabe?

A object is hashable if it has a hash value which never change guring its lifetime (it needs a __hash__() method). and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.

The atomic immutable types(str, bytes, numeric types) are all hashable. A frozen Set si aways hashable, because its elements must be hashable by definition. A tuple is hashable only if all its items are hashable.

Various means of building a dict

1
2
3
4
5
6
7
>>> a = dict(one=1, two=2, three=3)
>>> b = {'one': 1, 'two': 2, 'three': 3}
>>> c = dict(zip(['one', 'two', 'three'],[1, 2, 3]))
>>> d = dict([('two', 2),('one', 1), ('three', 3)])
>>> e = dict({'three': 3, 'one': 1, 'two': 2})
>>> a == b == c == d == e
True

Overview of Common Mapping Methods

defaultdict

1
2
3
4
5
6
7
>>> counts = dict()
>>> counts
{}
>>> counts['puppy'] += 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'puppy'

第一次对单词进行统计时,会抛出 KeyError,为此可以通过加判断解决:

1
2
3
4
5
6
7
8
9
10
11
12
strings = ('puppy', 'kitten', 'puppy', 'puppy',
'weasel', 'puppy', 'kitten', 'puppy')
counts = {}

for kw in strings:
if kw not in counts:
counts[kw] = 1
else:
counts[kw] += 1

# counts:
# {'puppy': 5, 'weasel': 1, 'kitten': 2}

也可以通过dict.setdefault()方法来设置默认值:
setdefault(kw, 0)中如果 kw 存在则返回键 kw 所对应的值,若不存在则返回 0

1
2
3
4
5
6
strings = ('puppy', 'kitten', 'puppy', 'puppy',
'weasel', 'puppy', 'kitten', 'puppy')
counts = {}

for kw in strings:
counts[kw] = counts.setdefault(kw, 0) + 1

当然更简洁的的方式是通过使用 defaultdict 解决:
defaultdict()接受一个default_factory()将其返回值作为默认值

1
2
3
4
5
6
7
8
from collections import defaultdict

strings = ('puppy', 'kitten', 'puppy', 'puppy',
'weasel', 'puppy', 'kitten', 'puppy')
counts = defaultdict(lambda: 0) # 使用 lambda 来定义简单的函数

for s in strings:
counts[s] += 1

The default_factory of a defaultdict is only invoked to provide default values for __getitem calls, and not for the other methods. For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
>>> dd = collections.defaultdict() 
#if no default_factory is provided, the usual KeyError is raised for missing keys
>>> dd['key']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'key'
>>> dd
defaultdict(None, {})
>>> dd = collections.defaultdict(list)
>>> dd['key']
[]
>>> dd.get('kk') # dd.get() still return None
>>> dd.get('key')
[]

Mapping with Flexible Key Lookup

对于dic = {'2': 'two', '4': 'four'} 同时支持数字和字符串2, '2'索引

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class StrKeyDict(dict):

def __missing__(self, key):
if isinstance(key, str):
raise KeyError(key)
return self[str(key)]

def get(self, key, default=None):
try:
return self[key]
except KeyError:
# If a KeyError was raised, __missing__ already failed, so we return default
return default

def __contains__(self, key):
return key in self.keys() or str(key) in self.keys()

d = StrKeyDict({'2': 'two', '4': 'four'})
print(d)
print(d[2],d['4'])
print(2 in d)

Why is the test isinstance(key, str) necessary:
没有这个判断的话,会进入 不存在->转换为字符串->查找->不存在 的死循环

Why not check for the key in the usual Pythonic way – key in my_dict:
It will call __contains__ recursively

在 python3 中dict.keys()返回一个类似与setview, 而 check k in my_set是很高效的,而 python2 中其返回 list, 因此效率不高。

Immutable Mapping

1
2
3
from types import MappingProxyType
dic = {1: 'A'}
d_proxy = MappingProxyType(dic)

创建一个只读的代理字典,遂原始对象同步更新,但不可更改

Set Theory

frozenset: the immutable sibling of set

Set elements must be hashable. The set type is not hashable, but frozenset is, so you can have frozenset elements inside a set.

Count occurrences of needles in a haystack:

1
2
3
found = len(set(needle) & set(haystack))

found = len(set(needle).intersection(haystack))

In Python3, the standard string representation of sets always uses the {…} notation, except for the empty set which uses set().

Calling the constructor is slower because, to evaluate it, Python has to look up the set name to fetch the constructor, then build a list, and ginally pass it to the constructor.

中缀操作符要求两边都是 set, 函数只要 iterable 就行。

dict and set Under the Hood

Key ordering depends on insertion order.

dict([(key1,value1), (key2, value2)]) == dict([(key2, value2),(key1,value1)]), but their key ordering may not be the same if the hashes of key1 and key2 collide.

Adding items to a dict may change the oder of existing keys

When you add a new item to a dict, the Python interpreter may decide that the hash table of that dict needs to grow. This entails building a new, bigger hash table, and adding all current items to the new one. During this process, new (but different) hash collisions may happen, with the result that the kays are likely to ordered differently. And you can’t predict when it will happen.

This is why modifying the contents of a dict while iterating through it is a bad idea. If you need to scan and add ites to a dict, do in two steps:

  1. read the dict from start to end and collect the needed additions in a second dict
  2. updata the first one with it.

In Python3, the .keys(), .items(), .values()return views, which behave like set and are dynamic: they do not replicate the contents of dict, and they immediately reflect any changes to the dict.

Text versus Bytes

Character Issues

The Unicode standard explicitly seperates the identity of characters from specific byte representation:

  • The identity of charater – its code points – is a number from 0 to 1114111 (U+10FFFF), shown in Unicode standard as 4 to 6 hexadecimal digits with a “U+” prefix.
  • The actual vytes that represent a character depends on the encoding in use. An encoding is an algorithm that converts code point to byte sequences and vice versa. The code point for A (U+0041) is encoded as the single byte \x41 in the UTF-8 encoding, or as the bytes \x41\x00 in UTF-16LE encoding. As another example, the Euro sign (U+20AC) becomes three bytes in UTF-8 (\xe2\x82\xac) but in UTF-16LE it is encoded as two bytes: \xac\x20.

Byte Essentials

There are two basic built-in types for binary sequences: the immutable bytes type introduced in Python3 and the mutable bytearray added in Python2.6.

Each item in bytes is an int. A slice of a binary sequence always produces a binary sequence, even slices of 1 length 1.

1
2
3
4
5
6
7
8
9
10
11
12
13
>>> cafe = bytes('café', encoding='utf-8')
>>> cafe
b'caf\xc3\xa9'
>>> type(cafe[0])
<class 'int'>
>>> type(cafe[:1])
<class 'bytes'>

>>> cafe_arr =bytearray(cafe)
>>> cafe_arr
bytearray(b'caf\xc3\xa9')
>>> cafe_arr[-1:]
bytearray(b'\xa9')

The only sequence type where s[0] == s[:1] is str type.

1
2
3
>>> cafe_str = cafe.decode('utf-8')
>>> cafe_str[0] == cafe_str[:1]
True

Three differents displays are used, depending on each byte value:

  • For bytes in printable ASCII, the ASCII character itself is used
  • For bytes corresponding to tab, newline, carriage return and \, the escape sequences \t, \n, \r and \\ are used.
  • For other byte value, a hexadecimal escape sequnece is used.

Both bytes and bytearray support every str method except those that do formatting and a few others that depend on Unicode data, including casefold, isdecimal, isidentifier, isnumeric, isprintable, and encode.

Binary sequences have a method that str doesn’t have – fromhex:

1
2
>>> bytes.fromhex('31 4B CE A9')
b'1K\xce\xa9'

Structs and Memory Views

// 内存操作相关,没怎么用到,碰到了再补上。

Understanding Encode/Decode Problems

coping with UnicodeError

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
>>> str = 'café'
>>> str.encode('utf-8')
b'caf\xc3\xa9'
>>> str.encode('gb2312')
b'caf\xa8\xa6'
>>> str.encode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 3: ordinal not in range(128)
>>> str.encode('ascii', errors='ignore')
b'caf'
>>> str.encode('ascii', errors='replace')
b'caf?'
>>> str.encode('ascii', errors='xmlcharrefreplace')
b'caf&#233;'

Handling Text Files

1
2
3
4
5
6
7
8
9
10
>>> open('cafe.txt', 'w', encoding='utf-8').write('café')
4
>>> open('cafe.txt').read()
'caf 茅'
>>> open('cafe.txt', 'w', encoding='utf-8').write('中国')
2
>>> open('cafe.txt').read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'gbk' codec can't decode byte 0xad in position 2: illegal multibyte sequence

Normalizing Unicode for Saner Comparisons

Unicode 中有一些附加字符 (combining character), 附加在前一个字符上,但在 print 的时候显示成一个字符,如:
'cafe\u0301''café' 应该输出一样的结果,这成为规范等价 (canonical equivalent).

在 Python 中,二者并不等价,需要 from unicodedata import normalize 对字符进行转换。
// TODO 了解一下,用到再查好了

First-Class Functions

“first-class object” as a program entity that can be:

  • Created at runtime
  • Assigned to a variable or element in a data structure
  • Passed as an argument to a function
  • Returned as the result of a functio

A function that takes a function as argument or returns a function as the result is a higher-order function, like map and sort.

The Seven Flavors of Callable Objects

The Python Data Model documentation lists seven callable types:

  • User-defined functions
    Created with def statements or lambda expressions.

  • Built-in functions
    A function implemented in C (for CPython), like len or time.strftime.

  • Built-in methods
    Methods implemented in C, like dict.get.

  • Methods
    Functions defined in the body of a class.

  • Classes
    When invoked, a class runs its __new__ method to create an instance, then __init__ to initialize it, and finally the instance is returned to the caller. Because there is no new operator in Python, calling a class is like calling a function. (Usually calling a class creates an instance of the same class, but other behaviors are possible by overriding __new__. We’ll see an example of this in “Flexible Object Creation with __new__” on page 592.)

  • Class instances
    If a class defines a __call__ method, then its instances may be invoked as functions. See “User-Defined Callable Types” on page 145.

  • Generator functions
    Functions or methods that use the yield keyword. When called, generator functions return a generator object.

Generator functions are unlike other callables in many respects. Chapter 14 is devoted to them. They can also be used as coroutines, which are covered in Chapter 16.

Not only are Python functions real objects, but arbitrary Python objects may also be made to behave like functions. Implementing a __call__ instance method is all it takes.

Retrieving Information About Parameters

// TODO