术语解释
doctest: 一种单元测试工具
genxp: generator expression
listcomp: list comprehension
Data Structures
Tuples Are Not Just Immutable Lists
Using * to excess items
1 | 5) a,b,*rest = range( |
Multidimensional Slicing and Ellipsis
Numpy uses ...
(not …
) as a shortcut when sliceing arrays of many dimensions; for example, if x is a four-dimensional array, x[i, ...]
is a shortcut for x[i, :, :, :]
.
Using + and * with sequences
Both + and * always create a new object, and never change their operands.
Building Lists of Lists
The best way od doing so is with a listcomp.
1 | '_'] * 3 for i in range(3)] board = [[ |
A tempting but wrong shortcut is doing it like the following example.
1 | '_'] * 3] * 3 false_board = [[ |
Augmented Assignment with Sequences
The specical method that makes +=
work is __iadd__
(for “in-place addition”). However, is __iadd__
is not implemented(immutable sequences), Python falls back to call __add__
.
Repeated concatenation of immutable sequences is ineffcient.
A += Assignment Puzzler
1 | t = (1,2,[30,40]) |
The result is
- t becomes (1,2,[30,40,50,60])
- TypeError is raised with the message ‘tuple’ object does not support item assignment
There are three steps when s[a] += b
is running:
- Put the value of s[a] on TOS(Top Of Stack)
- Perform TOS
+= b
. This succeeds if TOS is refers to a mutable object. - Assign s[a] = TOS. This fails if s is immutable.
list.sort and the sorted Built-In Function
The list.sort
method sorts a list in place–that is, without making a copy. It retuens None to remind us that it changes the target object, and does not create a new list.
In contrast, the built-in function sorted
create a new list and retuen it. In fact is accepts any iterable objects, including immutable sequences and generators.
Managing Ordered Sequences with bisect
An interesting application of bisect is to perform table lookups by numeric values–for example, to convert scores to letter grades.
1 | def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'): |
When a List Is Not the Answer
Array
1 | from array import array |
Using array.fromfile
is nearly 60 times faster than reading the numbers from a text file. Saving with array.tofile
ia about 7 times faster than writing one float per line in a text file.
Memory Views
1 | 'h', list(range(-2,3))) nums = array.array( |
非常底层的内操作,一般来说只有在有高性能需求的时候才会用到吧。
Deques and Other Queues
The .append
and .pop
methods make a list as a stack or a queue (if you use .pop(0)
), you get LIFO behavior. But inserting and removing form the left of a list(the 0-index end) is costly because the entire list must be shifted.
The class collections.deque
is a thread-safe double-end queue designed for fast inserting and removing from both ends.And a queue can be bounded, when it is full, it discards items from the opposite end when you append new one.
Note:
Removing items from the middle of a deque is not fast.
THe append
and popleft
operations are atomic, so deque is safe to use as a LIFO queue in multithreaded aoolications wihtout th need for lock.
Dictionaries and Sets
Generic Mapping Types
What Is Hashabe?
A object is hashable if it has a hash value which never change guring its lifetime (it needs a __hash__()
method). and can be compared to other objects (it needs an __eq__()
method). Hashable objects which compare equal must have the same hash value.
The atomic immutable types(str, bytes, numeric types) are all hashable. A frozen Set
si aways hashable, because its elements must be hashable by definition. A tuple
is hashable only if all its items are hashable.
Various means of building a dict
1 | 1, two=2, three=3) a = dict(one= |
Overview of Common Mapping Methods
defaultdict
1 | counts = dict() |
第一次对单词进行统计时,会抛出 KeyError,为此可以通过加判断解决:
1 | strings = ('puppy', 'kitten', 'puppy', 'puppy', |
也可以通过dict.setdefault()
方法来设置默认值:
setdefault(kw, 0)
中如果 kw 存在则返回键 kw 所对应的值,若不存在则返回 0
1 | strings = ('puppy', 'kitten', 'puppy', 'puppy', |
当然更简洁的的方式是通过使用 defaultdict 解决:
defaultdict()
接受一个default_factory()
将其返回值作为默认值
1 | from collections import defaultdict |
The default_factory
of a defaultdict
is only invoked to provide default values for __getitem
calls, and not for the other methods. For example:
1 | dd = collections.defaultdict() |
Mapping with Flexible Key Lookup
对于dic = {'2': 'two', '4': 'four'}
同时支持数字和字符串2, '2'
索引
1 | class StrKeyDict(dict): |
Why is the test isinstance(key, str)
necessary:
没有这个判断的话,会进入 不存在->转换为字符串->查找->不存在 的死循环
Why not check for the key in the usual Pythonic way – key in my_dict:
It will call __contains__
recursively
在 python3 中dict.keys()
返回一个类似与set
的view
, 而 check k in my_set
是很高效的,而 python2 中其返回 list, 因此效率不高。
Immutable Mapping
1 | from types import MappingProxyType |
创建一个只读的代理字典,遂原始对象同步更新,但不可更改
Set Theory
frozenset: the immutable sibling of set
Set elements must be hashable. The set type is not hashable, but frozenset is, so you can have frozenset elements inside a set.
Count occurrences of needles in a haystack:
1 | found = len(set(needle) & set(haystack)) |
In Python3, the standard string representation of sets always uses the {…} notation, except for the empty set which uses set().
Calling the constructor is slower because, to evaluate it, Python has to look up the set name to fetch the constructor, then build a list, and ginally pass it to the constructor.
中缀操作符要求两边都是 set, 函数只要 iterable 就行。
dict and set Under the Hood
Key ordering depends on insertion order.
dict([(key1,value1), (key2, value2)])
== dict([(key2, value2),(key1,value1)])
, but their key ordering may not be the same if the hashes of key1 and key2 collide.
Adding items to a dict may change the oder of existing keys
When you add a new item to a dict, the Python interpreter may decide that the hash table of that dict needs to grow. This entails building a new, bigger hash table, and adding all current items to the new one. During this process, new (but different) hash collisions may happen, with the result that the kays are likely to ordered differently. And you can’t predict when it will happen.
This is why modifying the contents of a dict while iterating through it is a bad idea. If you need to scan and add ites to a dict, do in two steps:
- read the dict from start to end and collect the needed additions in a second dict
- updata the first one with it.
In Python3, the .keys(), .items(), .values()
return views
, which behave like set and are dynamic: they do not replicate the contents of dict, and they immediately reflect any changes to the dict.
Text versus Bytes
Character Issues
The Unicode standard explicitly seperates the identity of characters from specific byte representation:
- The identity of charater – its code points – is a number from 0 to 1114111 (U+10FFFF), shown in Unicode standard as 4 to 6 hexadecimal digits with a “U+” prefix.
- The actual vytes that represent a character depends on the encoding in use. An encoding is an algorithm that converts code point to byte sequences and vice versa. The code point for A (U+0041) is encoded as the single byte \x41 in the UTF-8 encoding, or as the bytes \x41\x00 in UTF-16LE encoding. As another example, the Euro sign (U+20AC) becomes three bytes in UTF-8 (\xe2\x82\xac) but in UTF-16LE it is encoded as two bytes: \xac\x20.
Byte Essentials
There are two basic built-in types for binary sequences: the immutable bytes
type introduced in Python3 and the mutable bytearray
added in Python2.6.
Each item in bytes is an int. A slice of a binary sequence always produces a binary sequence, even slices of 1 length 1.
1 | 'café', encoding='utf-8') cafe = bytes( |
The only sequence type where s[0] == s[:1]
is str
type.
1 | 'utf-8') cafe_str = cafe.decode( |
Three differents displays are used, depending on each byte value:
- For bytes in printable ASCII, the ASCII character itself is used
- For bytes corresponding to tab, newline, carriage return and
\
, the escape sequences\t, \n, \r and \\
are used. - For other byte value, a hexadecimal escape sequnece is used.
Both bytes and bytearray
support every str
method except those that do formatting and a few others that depend on Unicode data, including casefold, isdecimal, isidentifier, isnumeric, isprintable, and encode
.
Binary sequences have a method that str
doesn’t have – fromhex
:
1 | '31 4B CE A9') bytes.fromhex( |
Structs and Memory Views
// 内存操作相关,没怎么用到,碰到了再补上。
Understanding Encode/Decode Problems
coping with UnicodeError
1 | >>> str = 'café' |
Handling Text Files
1 | >>> open('cafe.txt', 'w', encoding='utf-8').write('café') |
Normalizing Unicode for Saner Comparisons
Unicode 中有一些附加字符 (combining character), 附加在前一个字符上,但在 print 的时候显示成一个字符,如:
'cafe\u0301'
与 'café'
应该输出一样的结果,这成为规范等价 (canonical equivalent).
在 Python 中,二者并不等价,需要 from unicodedata import normalize
对字符进行转换。
// TODO 了解一下,用到再查好了
First-Class Functions
“first-class object” as a program entity that can be:
- Created at runtime
- Assigned to a variable or element in a data structure
- Passed as an argument to a function
- Returned as the result of a functio
A function that takes a function as argument or returns a function as the result is a higher-order function, like map
and sort
.
The Seven Flavors of Callable Objects
The Python Data Model documentation lists seven callable types:
-
User-defined functions
Created with def statements or lambda expressions. -
Built-in functions
A function implemented in C (for CPython), like len or time.strftime. -
Built-in methods
Methods implemented in C, like dict.get. -
Methods
Functions defined in the body of a class. -
Classes
When invoked, a class runs its__new__
method to create an instance, then__init__
to initialize it, and finally the instance is returned to the caller. Because there is no new operator in Python, calling a class is like calling a function. (Usually calling a class creates an instance of the same class, but other behaviors are possible by overriding__new__
. We’ll see an example of this in “Flexible Object Creation with__new__
” on page 592.) -
Class instances
If a class defines a__call__
method, then its instances may be invoked as functions. See “User-Defined Callable Types” on page 145. -
Generator functions
Functions or methods that use the yield keyword. When called, generator functions return a generator object.
Generator functions are unlike other callables in many respects. Chapter 14 is devoted to them. They can also be used as coroutines, which are covered in Chapter 16.
Not only are Python functions real objects, but arbitrary Python objects may also be made to behave like functions. Implementing a __call__
instance method is all it takes.
Retrieving Information About Parameters
// TODO